fiddler-auditor
LLM auditor
An auditing tool to identify weaknesses in large language models before deployment.
Fiddler Auditor is a tool to evaluate language models.
171 stars
8 watching
20 forks
Language: Python
last commit: 9 months ago
Linked from 1 awesome list
ai-observabilityevaluationgenerative-ailangchainllmsnlprobustness
Related projects:
Repository | Description | Stars |
---|---|---|
mlabonne/llm-autoeval | A tool to automate the evaluation of large language models in Google Colab using various benchmarks and custom parameters. | 558 |
dperfly/fiddler2jmeter | Tools to convert Fiddler/Charles requests to JMeter scripts and supports filtering functionality. | 47 |
iamgroot42/mimir | Measures memorization in Large Language Models (LLMs) to detect potential privacy issues | 121 |
h2oai/h2o-llm-eval | An evaluation framework for large language models with Elo rating system and A/B testing capabilities | 50 |
qcri/llmebench | A benchmarking framework for large language models | 80 |
freedomintelligence/mllm-bench | Evaluates and compares the performance of multimodal large language models on various tasks | 55 |
declare-lab/instruct-eval | An evaluation framework for large language models trained with instruction tuning methods | 528 |
howiehwong/trustllm | A toolkit for assessing trustworthiness in large language models | 466 |
hardlycodeman/audit_helper | Automates Foundry boilerplate setup for smart contract audits | 20 |
relari-ai/continuous-eval | Provides a comprehensive framework for evaluating Large Language Model (LLM) applications and pipelines with customizable metrics | 446 |
thmsmlr/instructor_ex | A library that provides structured outputs for Large Language Models (LLMs) in Elixir | 558 |
allenai/olmo-eval | An evaluation framework for large language models. | 310 |
mlgroupjlu/llm-eval-survey | A repository of papers and resources for evaluating large language models. | 1,433 |
open-compass/lawbench | Evaluates the legal knowledge of large language models using a custom benchmarking framework. | 267 |
adebayoj/fairml | An auditing toolbox to assess the fairness of black-box predictive models | 360 |