jury
NLP evaluator
A comprehensive toolkit for evaluating NLP experiments offering automated metrics and efficient computation.
Comprehensive NLP Evaluation System
187 stars
5 watching
20 forks
Language: Python
last commit: 7 months ago
Linked from 1 awesome list
datasetsevaluateevaluationhuggingfacemachine-learningmetricsnatural-language-processingnlpnlp-evaluationpythonpytorchtransformers
Related projects:
Repository | Description | Stars |
---|---|---|
| A toolset for evaluating and comparing natural language generation models | 1,350 |
| An evaluation framework for machine learning models and datasets, providing standardized metrics and tools for comparing model performance. | 2,063 |
| A framework for evaluating language models on NLP tasks | 326 |
| An expression evaluator library written in Go. | 41 |
| Evaluates language models using standardized benchmarks and prompting techniques. | 2,059 |
| An interactive environment for evaluating code within a running program. | 1,806 |
| Evaluates the legal knowledge of large language models using a custom benchmarking framework. | 273 |
| An automatic evaluation tool for large language models | 1,568 |
| An evaluation suite for assessing chart understanding in multimodal large language models. | 85 |
| An evaluation framework for Polish word embeddings prepared by various research groups using analogy tasks. | 4 |
| An all-in-one toolkit for evaluating Large Language Models (LLMs) across multiple backends. | 879 |
| A comprehensive Python toolbox for evaluating salient object detection and camouflaged object detection tasks | 168 |
| A collection of tools and utilities for evaluating the performance and quality of OCR output | 57 |
| An evaluation suite providing multiple-choice questions for foundation models in various disciplines, with tools for assessing model performance. | 1,650 |
| An API for evaluating the quality of text prompts used in Large Language Models (LLMs) based on perplexity estimation | 43 |