lm-evaluation-harness
Evaluation framework
Provides a unified framework to test generative language models on various evaluation tasks.
A framework for few-shot evaluation of language models.
7k stars
38 watching
2k forks
Language: Python
last commit: about 1 month ago
Linked from 4 awesome lists
evaluation-frameworklanguage-modeltransformer
Related projects:
Repository | Description | Stars |
---|---|---|
alpha-vllm/llama2-accessory | An open-source toolkit for pretraining and fine-tuning large language models | 2,732 |
explodinggradients/ragas | A toolkit for evaluating and optimizing Large Language Model applications with objective metrics, test data generation, and seamless integrations. | 7,598 |
microsoft/promptbench | A unified framework for evaluating large language models' performance and robustness in various scenarios. | 2,487 |
microsoft/lmops | A research initiative focused on developing fundamental technology to improve the performance and efficiency of large language models. | 3,747 |
optimalscale/lmflow | A toolkit for fine-tuning and inferring large machine learning models | 8,312 |
parisneo/lollms-webui | An all-encompassing tool providing a web interface to access and utilize various AI models for tasks such as text generation, image analysis, music generation, and more. | 4,394 |
confident-ai/deepeval | A framework for evaluating large language models | 4,003 |
langfuse/langfuse | An integrated development platform for large language models (LLMs) that provides observability, analytics, and management tools. | 7,123 |
haotian-liu/llava | A system that uses large language and vision models to generate and process visual instructions | 20,683 |
brexhq/prompt-engineering | Guides software developers on how to effectively use and build systems around Large Language Models like GPT-4. | 8,487 |
ianarawjo/chainforge | An environment for battle-testing prompts to Large Language Models (LLMs) to evaluate response quality and performance. | 2,413 |
mooler0410/llmspracticalguide | A curated list of resources to help developers navigate the landscape of large language models and their applications in NLP | 9,551 |
openbmb/toolbench | A platform for training, serving, and evaluating large language models to enable tool use capability | 4,888 |
openai/evals | A framework for evaluating large language models and systems, providing a registry of benchmarks. | 15,168 |
young-geng/easylm | A framework for training and serving large language models using JAX/Flax | 2,428 |