 tiktoken
 tiktoken 
 Tokeniser
 A fast and efficient tokeniser for natural language models based on Byte Pair Encoding (BPE)
tiktoken is a fast BPE tokeniser for use with OpenAI's models.
13k stars
 169 watching
 873 forks
 
Language: Python 
last commit: about 1 year ago  Related projects:
| Repository | Description | Stars | 
|---|---|---|
|  | A toolkit providing optimized tokenizers for natural language processing tasks in various programming languages. | 9,156 | 
|  | Provides a Rust library for tokenizing text with OpenAI models using tiktoken. | 266 | 
|  | High-quality implementations of reinforcement learning algorithms for research and development purposes | 15,885 | 
|  | An implementation of the Byte Pair Encoding algorithm used in language model tokenization. | 9,253 | 
|  | An interactive code generation and execution tool using AI models | 3,567 | 
|  | A repository providing code and models for research into language modeling and multitask learning | 22,644 | 
|  | A general-purpose speech recognition system trained on large-scale weak supervision | 72,752 | 
|  | Tools and platform for building and extending large language models | 2,907 | 
|  | This project is a software implementation of a diffusion model architecture, allowing users to generate synthetic images based on a learned distribution. | 6,366 | 
|  | Guides software developers on how to effectively use and build systems around Large Language Models like GPT-4. | 8,487 | 
|  | An open-source toolkit for training and deploying large-scale AI models on various downstream tasks with multi-modality | 3,840 | 
|  | Provides client-side access to ChatGPT and Bing AI APIs using Node.js | 4,210 | 
|  | An open-source tool that helps investigate specific behaviors of small language models by combining automated interpretability techniques with sparse autoencoders. | 4,047 | 
|  | A tool for retraining and fine-tuning the OpenAI GPT-2 text generation model on new datasets. | 3,398 | 
|  | A PHP SDK for accessing the OpenAI API and interacting with its GPT-3 and DALL-E services. | 2,277 |