BIG-bench

Language model benchmark

A benchmark designed to probe large language models and extrapolate their future capabilities through a diverse set of tasks.

Beyond the Imitation Game collaborative benchmark for measuring and extrapolating the capabilities of language models

GitHub

3k stars
51 watching
593 forks
Language: Python
last commit: 6 months ago
Linked from 2 awesome lists


Backlinks from these awesome lists:

Related projects:

Repository Description Stars
bigscience-workshop/promptsource A toolkit for creating and using natural language prompts to enable large language models to generalize to new tasks. 2,718
brexhq/prompt-engineering Guides software developers on how to effectively use and build systems around Large Language Models like GPT-4. 8,487
kostya/benchmarks A collection of benchmarking tests for various programming languages 2,825
fminference/flexllmgen Generates large language model outputs in high-throughput mode on single GPUs 9,236
microsoft/promptbench A unified framework for evaluating large language models' performance and robustness in various scenarios. 2,487
openbmb/bmtools Tools and platform for building and extending large language models 2,907
huggingface/text-generation-inference A toolkit for deploying and serving Large Language Models (LLMs) for high-performance text generation 9,456
google/benchmark A microbenchmarking library that allows users to measure the execution time of specific code snippets 9,113
optimalscale/lmflow A toolkit for fine-tuning and inferring large machine learning models 8,312
openbmb/toolbench A platform for training, serving, and evaluating large language models to enable tool use capability 4,888
brightmart/text_classification An NLP project offering various text classification models and techniques for deep learning exploration 7,881
felixgithub2017/mmcu Measures the understanding of massive multitask Chinese datasets using large language models 87
deepseek-ai/deepseek-v2 A high-performance mixture-of-experts language model with strong performance and efficient inference capabilities. 3,758
confident-ai/deepeval A framework for evaluating large language models 4,003
tianyi-lab/hallusionbench An image-context reasoning benchmark designed to challenge large vision-language models and help improve their accuracy 259