tiktoken
Tokeniser
A fast and efficient tokeniser for natural language models based on Byte Pair Encoding (BPE)
tiktoken is a fast BPE tokeniser for use with OpenAI's models.
12k stars
168 watching
852 forks
Language: Python
last commit: about 2 months ago Related projects:
Repository | Description | Stars |
---|---|---|
huggingface/tokenizers | A toolkit providing optimized tokenizers for natural language processing tasks in various programming languages. | 9,051 |
zurawiki/tiktoken-rs | Provides a Rust library for tokenizing text with OpenAI models using tiktoken. | 256 |
openai/baselines | High-quality implementations of reinforcement learning algorithms for research and development purposes | 15,810 |
karpathy/minbpe | An implementation of the Byte Pair Encoding algorithm used in language model tokenization. | 9,185 |
ricklamers/gpt-code-ui | An interactive code generation and execution tool using AI models | 3,561 |
openai/gpt-2 | A repository providing code and models for research into language modeling and multitask learning | 22,516 |
openai/whisper | A general-purpose speech recognition system trained on large-scale weak supervision | 71,257 |
openbmb/bmtools | Tools and platform for building and extending large language models | 2,898 |
openai/guided-diffusion | This project is a software implementation of a diffusion model architecture, allowing users to generate synthetic images based on a learned distribution. | 6,269 |
brexhq/prompt-engineering | Guides software developers on how to effectively use and build systems around Large Language Models like GPT-4. | 8,440 |
flagai-open/flagai | An open-source toolkit for training and deploying large-scale AI models on various downstream tasks with multi-modality | 3,830 |
waylaidwanderer/node-chatgpt-api | Provides client-side access to ChatGPT and Bing AI APIs using Node.js | 4,204 |
openai/transformer-debugger | An open-source tool that helps investigate specific behaviors of small language models by combining automated interpretability techniques with sparse autoencoders. | 4,035 |
minimaxir/gpt-2-simple | A tool for retraining and fine-tuning the OpenAI GPT-2 text generation model on new datasets. | 3,397 |
orhanerday/open-ai | A PHP SDK for accessing the OpenAI API and interacting with its GPT-3 and DALL-E services. | 2,268 |