fiddler-auditor

LLM auditor

An auditing tool to identify weaknesses in large language models before deployment.

Fiddler Auditor is a tool to evaluate language models.

GitHub

173 stars

8 watching

20 forks

Language: Python

last commit: over 1 year ago

Linked from 1 awesome list

ai-observabilityevaluationgenerative-ailangchainllmsnlprobustness

Backlinks from these awesome lists:

chawins/llm-sp

Related projects:

Repository	Description	Stars
mlabonne/llm-autoeval	A tool to automate the evaluation of large language models in Google Colab using various benchmarks and custom parameters.	566
dperfly/fiddler2jmeter	Tools to convert Fiddler/Charles requests to JMeter scripts and supports filtering functionality.	47
iamgroot42/mimir	A Python package for measuring memorization in Large Language Models.	126
h2oai/h2o-llm-eval	An evaluation framework for large language models with Elo rating system and A/B testing capabilities	50
qcri/llmebench	A benchmarking framework for large language models	81
freedomintelligence/mllm-bench	Evaluates and compares the performance of multimodal large language models on various tasks	56
declare-lab/instruct-eval	An evaluation framework for large language models trained with instruction tuning methods	535
howiehwong/trustllm	A toolkit for assessing trustworthiness in large language models	491
hardlycodeman/audit_helper	Automates Foundry boilerplate setup for smart contract audits	20
relari-ai/continuous-eval	Provides a comprehensive framework for evaluating Large Language Model (LLM) applications and pipelines with customizable metrics	455
thmsmlr/instructor_ex	A library that provides structured outputs for Large Language Models (LLMs) in Elixir	587
allenai/olmo-eval	A framework for evaluating language models on NLP tasks	326
mlgroupjlu/llm-eval-survey	A repository of papers and resources for evaluating large language models.	1,450
open-compass/lawbench	Evaluates the legal knowledge of large language models using a custom benchmarking framework.	273
adebayoj/fairml	An auditing toolbox to assess the fairness of black-box predictive models	361