Work with grapheme, words, and sentences with small, simple, and fast API using
Intl.Segmenter
npm install segmenterIntl.Segmenteris supported in all major browsers and94%of users have it available — it's time for adoption.- If you have a use case other than iterating over all graphemes/words/sentences in a text, then
Intl.Segmentermight be a little hard to work with. - In many cases, working with graphemes is preferable to characters. Graphemes are what the end user sees. For example, the emoji
👨🔧️is:- a single grapheme
'👨🔧️'.lengthreturns6for oflooping👨🔧️will make4iterations
- Before
Intl.Segmenter, working with graphemes required libraries likegraphemerwhich is94KBin size.
import { graphemeAt, graphemeRangeAt, wordAt, wordRangeAt } from "segmenter";
graphemeAt("👨🔧️ the fixer", 0); // 👨🔧️
graphemeAt("👨🔧️ the fixer", 5); // 👨🔧️
graphemeRangeAt("👨🔧️ the fixer", 0); // { start: 0, end: 6 }
graphemeRangeAt("👨🔧️ the fixer", 3); // { start: 0, end: 6 }
wordAt("hello-world", 0); // "hello"
wordRangeAt("hello-world", 0); // { start: 0, end: 5 }Get the grapheme at position in string. Returns undefined if position is out of bounds or string is empty.
Get the start and end positions of the grapheme at position in string. Returns undefined if position is out of bounds or string is empty.
Get all graphemes in the string as Array.
Get the word at position in string. Returns undefined if position is out of bounds or string is empty.
Get the start and end positions of the word at position in string. Returns undefined if position is out of bounds or string is empty.
Get all words in the string as Array.
Note: Intl.Segmenter doesn't do a perfect job of detecting sentences. For example, I went to Dr. Smith's office will be split into two sentences.
Get the sentence at position in string. Returns undefined if position is out of bounds or string is empty.
Get the start and end positions of the sentence at position in string. Returns undefined if position is out of bounds or string is empty.
Get all sentences in the string as Array.