tiktoken

Tokeniser

A fast and efficient tokeniser for natural language models based on Byte Pair Encoding (BPE)

tiktoken is a fast BPE tokeniser for use with OpenAI's models.

GitHub

12k stars
168 watching
852 forks
Language: Python
last commit: about 2 months ago

Related projects:

Repository Description Stars
huggingface/tokenizers A toolkit providing optimized tokenizers for natural language processing tasks in various programming languages. 9,051
zurawiki/tiktoken-rs Provides a Rust library for tokenizing text with OpenAI models using tiktoken. 256
openai/baselines High-quality implementations of reinforcement learning algorithms for research and development purposes 15,810
karpathy/minbpe An implementation of the Byte Pair Encoding algorithm used in language model tokenization. 9,185
ricklamers/gpt-code-ui An interactive code generation and execution tool using AI models 3,561
openai/gpt-2 A repository providing code and models for research into language modeling and multitask learning 22,516
openai/whisper A general-purpose speech recognition system trained on large-scale weak supervision 71,257
openbmb/bmtools Tools and platform for building and extending large language models 2,898
openai/guided-diffusion This project is a software implementation of a diffusion model architecture, allowing users to generate synthetic images based on a learned distribution. 6,269
brexhq/prompt-engineering Guides software developers on how to effectively use and build systems around Large Language Models like GPT-4. 8,440
flagai-open/flagai An open-source toolkit for training and deploying large-scale AI models on various downstream tasks with multi-modality 3,830
waylaidwanderer/node-chatgpt-api Provides client-side access to ChatGPT and Bing AI APIs using Node.js 4,204
openai/transformer-debugger An open-source tool that helps investigate specific behaviors of small language models by combining automated interpretability techniques with sparse autoencoders. 4,035
minimaxir/gpt-2-simple A tool for retraining and fine-tuning the OpenAI GPT-2 text generation model on new datasets. 3,397
orhanerday/open-ai A PHP SDK for accessing the OpenAI API and interacting with its GPT-3 and DALL-E services. 2,268