fiddler-auditor

LLM auditor

An auditing tool to identify weaknesses in large language models before deployment.

Fiddler Auditor is a tool to evaluate language models.

GitHub

171 stars
8 watching
20 forks
Language: Python
last commit: 9 months ago
Linked from 1 awesome list

ai-observabilityevaluationgenerative-ailangchainllmsnlprobustness

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
mlabonne/llm-autoeval A tool to automate the evaluation of large language models in Google Colab using various benchmarks and custom parameters. 558
dperfly/fiddler2jmeter Tools to convert Fiddler/Charles requests to JMeter scripts and supports filtering functionality. 47
iamgroot42/mimir Measures memorization in Large Language Models (LLMs) to detect potential privacy issues 121
h2oai/h2o-llm-eval An evaluation framework for large language models with Elo rating system and A/B testing capabilities 50
qcri/llmebench A benchmarking framework for large language models 80
freedomintelligence/mllm-bench Evaluates and compares the performance of multimodal large language models on various tasks 55
declare-lab/instruct-eval An evaluation framework for large language models trained with instruction tuning methods 528
howiehwong/trustllm A toolkit for assessing trustworthiness in large language models 466
hardlycodeman/audit_helper Automates Foundry boilerplate setup for smart contract audits 20
relari-ai/continuous-eval Provides a comprehensive framework for evaluating Large Language Model (LLM) applications and pipelines with customizable metrics 446
thmsmlr/instructor_ex A library that provides structured outputs for Large Language Models (LLMs) in Elixir 558
allenai/olmo-eval An evaluation framework for large language models. 310
mlgroupjlu/llm-eval-survey A repository of papers and resources for evaluating large language models. 1,433
open-compass/lawbench Evaluates the legal knowledge of large language models using a custom benchmarking framework. 267
adebayoj/fairml An auditing toolbox to assess the fairness of black-box predictive models 360