giskard
AI auditor
Automates detection and evaluation of performance, bias, and security issues in AI applications
🐢 Open-Source Evaluation & Testing for ML & LLM systems
4k stars
33 watching
267 forks
Language: Python
last commit: 6 days ago
Linked from 3 awesome lists
ai-red-teamai-safetyai-securityai-testingethical-artificial-intelligenceevaluation-frameworkfairness-aillmllm-evalllm-evaluationllm-securityllmopsml-safetyml-testingml-validationmlopsrag-evaluationred-team-toolsresponsible-aitrustworthy-ai
Related projects:
Repository | Description | Stars |
---|---|---|
confident-ai/deepeval | A framework for evaluating large language models | 3,669 |
evidentlyai/evidently | An observability framework for evaluating and monitoring the performance of machine learning models and data pipelines | 5,391 |
explodinggradients/ragas | A toolkit for evaluating and optimizing Large Language Model applications with data-driven insights | 7,233 |
princeton-nlp/swe-agent | A tool that uses AI to automatically fix issues in software repositories | 13,714 |
deepset-ai/haystack | An AI orchestration framework to build customizable LLM applications with advanced retrieval methods. | 17,691 |
llmware-ai/llmware | A framework for building enterprise LLM-based applications using small, specialized models | 6,651 |
trusted-ai/aif360 | A comprehensive toolkit for detecting and mitigating bias in machine learning models and datasets. | 2,457 |
openai/evals | A framework for evaluating large language models and systems, providing a registry of benchmarks. | 15,015 |
langgenius/dify | An open-source LLM app development platform that enables users to build and deploy AI-powered applications quickly and efficiently. | 51,873 |
trusted-ai/adversarial-robustness-toolbox | A Python library that provides tools and techniques to defend against various attacks on machine learning models and applications. | 4,878 |
iamgroot42/mimir | Measures memorization in Large Language Models (LLMs) to detect potential privacy issues | 121 |
i-gallegos/fair-llm-benchmark | Compiles bias evaluation datasets and provides access to original data sources for large language models | 110 |
promptfoo/promptfoo | A tool for testing and evaluating large language models (LLMs) to ensure they are reliable and secure | 4,754 |
assafelovic/gpt-researcher | An autonomous research agent that gathers and summarizes information from various sources to generate comprehensive reports with citations. | 14,904 |
ianarawjo/chainforge | An environment for battle-testing prompts to Large Language Models (LLMs) to evaluate response quality and performance. | 2,334 |