BIG-bench
Language model benchmark
A benchmark designed to probe large language models and extrapolate their future capabilities through a diverse set of tasks.
Beyond the Imitation Game collaborative benchmark for measuring and extrapolating the capabilities of language models
3k stars
51 watching
593 forks
Language: Python
last commit: 8 months ago
Linked from 2 awesome lists
Related projects:
Repository | Description | Stars |
---|---|---|
| A toolkit for creating and using natural language prompts to enable large language models to generalize to new tasks. | 2,718 |
| Guides software developers on how to effectively use and build systems around Large Language Models like GPT-4. | 8,487 |
| A collection of benchmarking tests for various programming languages | 2,825 |
| Generates large language model outputs in high-throughput mode on single GPUs | 9,236 |
| A unified framework for evaluating large language models' performance and robustness in various scenarios. | 2,487 |
| Tools and platform for building and extending large language models | 2,907 |
| A toolkit for deploying and serving Large Language Models (LLMs) for high-performance text generation | 9,456 |
| A microbenchmarking library that allows users to measure the execution time of specific code snippets | 9,113 |
| A toolkit for fine-tuning and inferring large machine learning models | 8,312 |
| A platform for training, serving, and evaluating large language models to enable tool use capability | 4,888 |
| An NLP project offering various text classification models and techniques for deep learning exploration | 7,881 |
| Measures the understanding of massive multitask Chinese datasets using large language models | 87 |
| A high-performance mixture-of-experts language model with strong performance and efficient inference capabilities. | 3,758 |
| A framework for evaluating large language models | 4,003 |
| An image-context reasoning benchmark designed to challenge large vision-language models and help improve their accuracy | 259 |