tiktoken

Tokeniser

A fast and efficient tokeniser for natural language models based on Byte Pair Encoding (BPE)

tiktoken is a fast BPE tokeniser for use with OpenAI's models.

GitHub

13k stars
169 watching
873 forks
Language: Python
last commit: 4 months ago

Related projects:

Repository Description Stars
huggingface/tokenizers A toolkit providing optimized tokenizers for natural language processing tasks in various programming languages. 9,156
zurawiki/tiktoken-rs Provides a Rust library for tokenizing text with OpenAI models using tiktoken. 266
openai/baselines High-quality implementations of reinforcement learning algorithms for research and development purposes 15,885
karpathy/minbpe An implementation of the Byte Pair Encoding algorithm used in language model tokenization. 9,253
ricklamers/gpt-code-ui An interactive code generation and execution tool using AI models 3,567
openai/gpt-2 A repository providing code and models for research into language modeling and multitask learning 22,644
openai/whisper A general-purpose speech recognition system trained on large-scale weak supervision 72,752
openbmb/bmtools Tools and platform for building and extending large language models 2,907
openai/guided-diffusion This project is a software implementation of a diffusion model architecture, allowing users to generate synthetic images based on a learned distribution. 6,366
brexhq/prompt-engineering Guides software developers on how to effectively use and build systems around Large Language Models like GPT-4. 8,487
flagai-open/flagai An open-source toolkit for training and deploying large-scale AI models on various downstream tasks with multi-modality 3,840
waylaidwanderer/node-chatgpt-api Provides client-side access to ChatGPT and Bing AI APIs using Node.js 4,210
openai/transformer-debugger An open-source tool that helps investigate specific behaviors of small language models by combining automated interpretability techniques with sparse autoencoders. 4,047
minimaxir/gpt-2-simple A tool for retraining and fine-tuning the OpenAI GPT-2 text generation model on new datasets. 3,398
orhanerday/open-ai A PHP SDK for accessing the OpenAI API and interacting with its GPT-3 and DALL-E services. 2,277