fiddler-auditor

LLM auditor

An auditing tool to identify weaknesses in large language models before deployment.

Fiddler Auditor is a tool to evaluate language models.

GitHub

173 stars
8 watching
20 forks
Language: Python
last commit: 10 months ago
Linked from 1 awesome list

ai-observabilityevaluationgenerative-ailangchainllmsnlprobustness

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
mlabonne/llm-autoeval A tool to automate the evaluation of large language models in Google Colab using various benchmarks and custom parameters. 566
dperfly/fiddler2jmeter Tools to convert Fiddler/Charles requests to JMeter scripts and supports filtering functionality. 47
iamgroot42/mimir A Python package for measuring memorization in Large Language Models. 126
h2oai/h2o-llm-eval An evaluation framework for large language models with Elo rating system and A/B testing capabilities 50
qcri/llmebench A benchmarking framework for large language models 81
freedomintelligence/mllm-bench Evaluates and compares the performance of multimodal large language models on various tasks 56
declare-lab/instruct-eval An evaluation framework for large language models trained with instruction tuning methods 535
howiehwong/trustllm A toolkit for assessing trustworthiness in large language models 491
hardlycodeman/audit_helper Automates Foundry boilerplate setup for smart contract audits 20
relari-ai/continuous-eval Provides a comprehensive framework for evaluating Large Language Model (LLM) applications and pipelines with customizable metrics 455
thmsmlr/instructor_ex A library that provides structured outputs for Large Language Models (LLMs) in Elixir 587
allenai/olmo-eval A framework for evaluating language models on NLP tasks 326
mlgroupjlu/llm-eval-survey A repository of papers and resources for evaluating large language models. 1,450
open-compass/lawbench Evaluates the legal knowledge of large language models using a custom benchmarking framework. 273
adebayoj/fairml An auditing toolbox to assess the fairness of black-box predictive models 361