giskard

AI auditor

Automates the detection of performance, bias, and security issues in AI applications

🐢 Open-Source Evaluation & Testing for AI & LLM systems

GitHub

4k stars

33 watching

274 forks

Language: Python

last commit: 10 months ago

Linked from 3 awesome lists

ai-red-teamai-safetyai-securityai-testingethical-artificial-intelligenceevaluation-frameworkfairness-aillmllm-evalllm-evaluationllm-securityllmopsml-safetyml-testingml-validationmlopsrag-evaluationred-team-toolsresponsible-aitrustworthy-ai

Screenshot of Giskard-AI/giskard website

docs.giskard.ai

Backlinks from these awesome lists:

Related projects:

Repository	Description	Stars
confident-ai/deepeval	A framework for evaluating large language models	4,003
evidentlyai/evidently	An observability framework for evaluating and monitoring the performance of machine learning models and data pipelines	5,519
explodinggradients/ragas	A toolkit for evaluating and optimizing Large Language Model applications with objective metrics, test data generation, and seamless integrations.	7,598
deepset-ai/haystack	An AI orchestration framework to build customizable LLM applications with advanced retrieval methods.	18,094
llmware-ai/llmware	A framework for building enterprise LLM-based applications using small, specialized models	8,303
trusted-ai/aif360	A comprehensive toolkit for detecting and mitigating bias in machine learning models and datasets.	2,483
openai/evals	A framework for evaluating large language models and systems, providing a registry of benchmarks.	15,168
langgenius/dify	An open-source LLM app development platform that enables users to build and deploy AI-powered applications quickly and efficiently.	54,931
trusted-ai/adversarial-robustness-toolbox	A Python library that provides tools and techniques to defend against various attacks on machine learning models and applications.	4,945
iamgroot42/mimir	A Python package for measuring memorization in Large Language Models.	126
i-gallegos/fair-llm-benchmark	Compiles bias evaluation datasets and provides access to original data sources for large language models	115
promptfoo/promptfoo	A tool for testing and evaluating large language models (LLMs) to ensure they are reliable and secure	4,976
assafelovic/gpt-researcher	An autonomous research agent that gathers and summarizes information from various sources to generate comprehensive reports with citations.	15,255
ianarawjo/chainforge	An environment for battle-testing prompts to Large Language Models (LLMs) to evaluate response quality and performance.	2,413