ngram (Q91511)

From MaRDI portal





Fast n-Gram 'Tokenization'
Language Label Description Also known as
English
ngram
Fast n-Gram 'Tokenization'

    Statements

    0 references
    3.2.2
    31 October 2022
    0 references
    1.0
    23 June 2014
    0 references
    1.1
    25 June 2014
    0 references
    3.0.0
    10 May 2016
    0 references
    3.0.1
    13 July 2016
    0 references
    3.0.2
    17 January 2017
    0 references
    3.0.3
    24 March 2017
    0 references
    3.0.4
    21 November 2017
    0 references
    3.2.0
    31 October 2021
    0 references
    3.2.1
    14 March 2022
    0 references
    3.2.3
    10 December 2023
    0 references
    0 references
    10 December 2023
    0 references
    An n-gram is a sequence of n "words" taken, in order, from a body of text. This is a collection of utilities for creating, displaying, summarizing, and "babbling" n-grams. The 'tokenization' and "babbling" are handled by very efficient C code, which can even be built as its own standalone library. The babbler is a simple Markov chain. The package also offers a vignette with complete example 'workflows' and information about the utilities offered in the package.
    0 references
    0 references
    0 references
    0 references

    Identifiers

    0 references