tonic_validate

Quality checker

A framework for evaluating and monitoring the quality of large language model outputs in Retrieval Augmented Generation applications.

Metrics to evaluate the quality of responses of your Retrieval Augmented Generation (RAG) applications.

GitHub

258 stars
14 watching
29 forks
Language: Python
last commit: 7 days ago
Linked from 1 awesome list

evaluation-frameworkevaluation-metricslarge-language-modelsllmllmopsllmsragretrieval-augmented-generation

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
parsifal-47/muterl A tool for verifying test quality by introducing small changes to code and checking if tests pass. 15
testdriverai/goodlooks A tool to visually validate web pages using natural language prompts instead of traditional selectors. 37
raphaelstolt/lean-package-validator-action Tools to validate the size and contents of software packages during continuous integration 0
angelognazzo/reliable-trustworthy-ai An implementation of a DeepPoly-based verifier for robustness analysis in deep neural networks 1
psycoy/mixeval An evaluation suite and dynamic data release platform for large language models 224
krrishdholakia/betterprompt An API for evaluating the quality of text prompts used in Large Language Models (LLMs) based on perplexity estimation 38
gomate-community/rageval An evaluation tool for Retrieval-augmented Generation methods 132
cmader/qskos A tool for identifying quality issues in SKOS vocabularies, integrating with online services and development workflows. 65
orsinium-labs/flake8-pylint An extension for flake8 that integrates PyLint to check Python code quality and detect potential errors. 8
qcri/llmebench A benchmarking framework for large language models 80
alecthomas/voluptuous A Python data validation library that provides simple and expressive validation of complex data structures. 1,819
brettz9/eslint-config-ash-nazg A comprehensive configuration for JavaScript projects with enhanced error checking and code quality control. 6
whyhow-ai/rule-based-retrieval A Python package for creating and managing RAG applications with advanced filtering capabilities 222
s-weigand/flake8-nb A tool to check Python code quality in Jupyter notebooks. 28
allenai/olmo-eval An evaluation framework for large language models. 310