tok
From MaRDI portal
Software:5983037
Fast Text Tokenization
Last update: 17 August 2023
Copyright license: MIT license, File License
Software version identifier: 0.1.0, 0.1.1
Interfaces with the 'Hugging Face' tokenizers library to provide implementations of today's most used tokenizers such as the 'Byte-Pair Encoding' algorithm <https://huggingface.co/docs/tokenizers/index>. It's extremely fast for both training new vocabularies and tokenizing texts.
This page was built for software: tok