tonic_validate
Quality checker
A framework for evaluating and monitoring the quality of large language model outputs in Retrieval Augmented Generation applications.
Metrics to evaluate the quality of responses of your Retrieval Augmented Generation (RAG) applications.
271 stars
14 watching
27 forks
Language: Python
last commit: 2 months ago
Linked from 1 awesome list
evaluation-frameworkevaluation-metricslarge-language-modelsllmllmopsllmsragretrieval-augmented-generation
Related projects:
Repository | Description | Stars |
---|---|---|
parsifal-47/muterl | A tool for verifying test quality by introducing small changes to code and checking if tests pass. | 15 |
testdriverai/goodlooks | A tool to visually validate web pages using natural language prompts instead of traditional selectors. | 38 |
raphaelstolt/lean-package-validator-action | Tools to validate the size and contents of software packages during continuous integration | 0 |
angelognazzo/reliable-trustworthy-ai | An implementation of a DeepPoly-based verifier for robustness analysis in deep neural networks | 2 |
psycoy/mixeval | An evaluation suite and dynamic data release platform for large language models | 230 |
krrishdholakia/betterprompt | An API for evaluating the quality of text prompts used in Large Language Models (LLMs) based on perplexity estimation | 43 |
gomate-community/rageval | An evaluation tool for Retrieval-augmented Generation methods | 141 |
cmader/qskos | A tool for identifying quality issues in SKOS vocabularies, integrating with online services and development workflows. | 65 |
orsinium-labs/flake8-pylint | An extension for flake8 that integrates PyLint to check Python code quality and detect potential errors. | 8 |
qcri/llmebench | A benchmarking framework for large language models | 81 |
alecthomas/voluptuous | A Python data validation library providing simple and flexible ways to validate complex data structures. | 1,823 |
brettz9/eslint-config-ash-nazg | A comprehensive configuration for JavaScript projects with enhanced error checking and code quality control. | 6 |
whyhow-ai/rule-based-retrieval | A Python package that enables the creation and management of Retrieval Augmented Generation applications with filtering capabilities. | 229 |
s-weigand/flake8-nb | A tool to check Python code quality in Jupyter notebooks. | 28 |
allenai/olmo-eval | A framework for evaluating language models on NLP tasks | 326 |