ngram (Q91511)
From MaRDI portal
| This is the item page for this Wikibase entity, intended for internal use and editing purposes. Please use this page instead for the normal view: ngram |
Fast n-Gram 'Tokenization'
| Language | Label | Description | Also known as |
|---|---|---|---|
| English | ngram |
Fast n-Gram 'Tokenization' |
Statements
10 December 2023
0 references
An n-gram is a sequence of n "words" taken, in order, from a body of text. This is a collection of utilities for creating, displaying, summarizing, and "babbling" n-grams. The 'tokenization' and "babbling" are handled by very efficient C code, which can even be built as its own standalone library. The babbler is a simple Markov chain. The package also offers a vignette with complete example 'workflows' and information about the utilities offered in the package.
0 references