jury
NLP evaluator
A comprehensive toolkit for evaluating NLP experiments offering automated metrics and efficient computation.
Comprehensive NLP Evaluation System
187 stars
5 watching
20 forks
Language: Python
last commit: over 1 year ago
Linked from 1 awesome list
datasetsevaluateevaluationhuggingfacemachine-learningmetricsnatural-language-processingnlpnlp-evaluationpythonpytorchtransformers
Related projects:
| Repository | Description | Stars |
|---|---|---|
| | A toolset for evaluating and comparing natural language generation models | 1,350 |
| | An evaluation framework for machine learning models and datasets, providing standardized metrics and tools for comparing model performance. | 2,063 |
| | A framework for evaluating language models on NLP tasks | 326 |
| | An expression evaluator library written in Go. | 41 |
| | Evaluates language models using standardized benchmarks and prompting techniques. | 2,059 |
| | An interactive environment for evaluating code within a running program. | 1,806 |
| | Evaluates the legal knowledge of large language models using a custom benchmarking framework. | 273 |
| | An automatic evaluation tool for large language models | 1,568 |
| | An evaluation suite for assessing chart understanding in multimodal large language models. | 85 |
| | An evaluation framework for Polish word embeddings prepared by various research groups using analogy tasks. | 4 |
| | An all-in-one toolkit for evaluating Large Language Models (LLMs) across multiple backends. | 879 |
| | A comprehensive Python toolbox for evaluating salient object detection and camouflaged object detection tasks | 168 |
| | A collection of tools and utilities for evaluating the performance and quality of OCR output | 57 |
| | An evaluation suite providing multiple-choice questions for foundation models in various disciplines, with tools for assessing model performance. | 1,650 |
| | An API for evaluating the quality of text prompts used in Large Language Models (LLMs) based on perplexity estimation | 43 |