`segmenter`

Work with grapheme, words, and sentences with small, simple, and fast API using Intl.Segmenter

Install

npm install segmenter

Why

Intl.Segmenter is supported in all major browsers and 94% of users have it available — it's time for adoption.
If you have a use case other than iterating over all graphemes/words/sentences in a text, then Intl.Segmenter might be a little hard to work with.
In many cases, working with graphemes is preferable to characters. Graphemes are what the end user sees. For example, the emoji 👨‍🔧️ is:
- a single grapheme
- '👨‍🔧️'.length returns 6
- for of looping 👨‍🔧️ will make 4 iterations
Before Intl.Segmenter, working with graphemes required libraries like graphemer which is 94KB in size.

Usage

import { graphemeAt, graphemeRangeAt, wordAt, wordRangeAt } from "segmenter";

graphemeAt("👨‍🔧️ the fixer", 0); // 👨‍🔧️
graphemeAt("👨‍🔧️ the fixer", 5); // 👨‍🔧️

graphemeRangeAt("👨‍🔧️ the fixer", 0); // { start: 0, end: 6 }
graphemeRangeAt("👨‍🔧️ the fixer", 3); // { start: 0, end: 6 }

wordAt("hello-world", 0); // "hello"

wordRangeAt("hello-world", 0); // { start: 0, end: 5 }

API

Graphemes

`graphemeAt(string: string, position: number): string | undefined`

Get the grapheme at position in string. Returns undefined if position is out of bounds or string is empty.

`graphemeRangeAt(string: string, position: number): { start: number; end: number; } | undefined`

Get the start and end positions of the grapheme at position in string. Returns undefined if position is out of bounds or string is empty.

`graphemes(string: string): string[]`

Get all graphemes in the string as Array.

Words

`wordAt(string: string, position: number): string | undefined`

Get the word at position in string. Returns undefined if position is out of bounds or string is empty.

`wordRangeAt(string: string, position: number): { start: number; end: number; } | undefined`

Get the start and end positions of the word at position in string. Returns undefined if position is out of bounds or string is empty.

`words(string: string): string[]`

Get all words in the string as Array.

Sentences

Note: Intl.Segmenter doesn't do a perfect job of detecting sentences. For example, I went to Dr. Smith's office will be split into two sentences.

`sentenceAt(string: string, position: number): string | undefined`

Get the sentence at position in string. Returns undefined if position is out of bounds or string is empty.

`sentenceRangeAt(string: string, position: number): { start: number; end: number; } | undefined`

Get the start and end positions of the sentence at position in string. Returns undefined if position is out of bounds or string is empty.

`sentences(string: string): string[]`

Get all sentences in the string as Array.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.github/workflows		.github/workflows
src		src
test		test
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
.npmrc		.npmrc
index.ts		index.ts
license		license
package.json		package.json
readme.md		readme.md
tsconfig.json		tsconfig.json
vitest.config.ts		vitest.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Repository files navigation

`segmenter`

Install

Why

Usage

API

Graphemes

`graphemeAt(string: string, position: number): string | undefined`

`graphemeRangeAt(string: string, position: number): { start: number; end: number; } | undefined`

`graphemes(string: string): string[]`

Words

`wordAt(string: string, position: number): string | undefined`

`wordRangeAt(string: string, position: number): { start: number; end: number; } | undefined`

`words(string: string): string[]`

Sentences

`sentenceAt(string: string, position: number): string | undefined`

`sentenceRangeAt(string: string, position: number): { start: number; end: number; } | undefined`

`sentences(string: string): string[]`

About

Uh oh!

Releases 2

Sponsor this project

Uh oh!

Packages

Uh oh!

Languages

Uh oh!

License

astoilkov/segmenter

Folders and files

Latest commit

History

Repository files navigation

segmenter

Install

Why

Usage

API

Graphemes

graphemeAt(string: string, position: number): string | undefined

graphemeRangeAt(string: string, position: number): { start: number; end: number; } | undefined

graphemes(string: string): string[]

Words

wordAt(string: string, position: number): string | undefined

wordRangeAt(string: string, position: number): { start: number; end: number; } | undefined

words(string: string): string[]

Sentences

sentenceAt(string: string, position: number): string | undefined

sentenceRangeAt(string: string, position: number): { start: number; end: number; } | undefined

sentences(string: string): string[]

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Languages

`segmenter`

`graphemeAt(string: string, position: number): string | undefined`

`graphemeRangeAt(string: string, position: number): { start: number; end: number; } | undefined`

`graphemes(string: string): string[]`

`wordAt(string: string, position: number): string | undefined`

`wordRangeAt(string: string, position: number): { start: number; end: number; } | undefined`

`words(string: string): string[]`

`sentenceAt(string: string, position: number): string | undefined`

`sentenceRangeAt(string: string, position: number): { start: number; end: number; } | undefined`

`sentences(string: string): string[]`

Packages