giskard

AI auditor

Automates detection and evaluation of performance, bias, and security issues in AI applications

🐢 Open-Source Evaluation & Testing for ML & LLM systems

GitHub

4k stars
33 watching
267 forks
Language: Python
last commit: 6 days ago
Linked from 3 awesome lists

ai-red-teamai-safetyai-securityai-testingethical-artificial-intelligenceevaluation-frameworkfairness-aillmllm-evalllm-evaluationllm-securityllmopsml-safetyml-testingml-validationmlopsrag-evaluationred-team-toolsresponsible-aitrustworthy-ai

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
confident-ai/deepeval A framework for evaluating large language models 3,669
evidentlyai/evidently An observability framework for evaluating and monitoring the performance of machine learning models and data pipelines 5,391
explodinggradients/ragas A toolkit for evaluating and optimizing Large Language Model applications with data-driven insights 7,233
princeton-nlp/swe-agent A tool that uses AI to automatically fix issues in software repositories 13,714
deepset-ai/haystack An AI orchestration framework to build customizable LLM applications with advanced retrieval methods. 17,691
llmware-ai/llmware A framework for building enterprise LLM-based applications using small, specialized models 6,651
trusted-ai/aif360 A comprehensive toolkit for detecting and mitigating bias in machine learning models and datasets. 2,457
openai/evals A framework for evaluating large language models and systems, providing a registry of benchmarks. 15,015
langgenius/dify An open-source LLM app development platform that enables users to build and deploy AI-powered applications quickly and efficiently. 51,873
trusted-ai/adversarial-robustness-toolbox A Python library that provides tools and techniques to defend against various attacks on machine learning models and applications. 4,878
iamgroot42/mimir Measures memorization in Large Language Models (LLMs) to detect potential privacy issues 121
i-gallegos/fair-llm-benchmark Compiles bias evaluation datasets and provides access to original data sources for large language models 110
promptfoo/promptfoo A tool for testing and evaluating large language models (LLMs) to ensure they are reliable and secure 4,754
assafelovic/gpt-researcher An autonomous research agent that gathers and summarizes information from various sources to generate comprehensive reports with citations. 14,904
ianarawjo/chainforge An environment for battle-testing prompts to Large Language Models (LLMs) to evaluate response quality and performance. 2,334