giskard
AI auditor
Automates the detection of performance, bias, and security issues in AI applications
🐢 Open-Source Evaluation & Testing for AI & LLM systems
4k stars
33 watching
274 forks
Language: Python
last commit: 3 months ago
Linked from 3 awesome lists
ai-red-teamai-safetyai-securityai-testingethical-artificial-intelligenceevaluation-frameworkfairness-aillmllm-evalllm-evaluationllm-securityllmopsml-safetyml-testingml-validationmlopsrag-evaluationred-team-toolsresponsible-aitrustworthy-ai
Related projects:
Repository | Description | Stars |
---|---|---|
| A framework for evaluating large language models | 4,003 |
| An observability framework for evaluating and monitoring the performance of machine learning models and data pipelines | 5,519 |
| A toolkit for evaluating and optimizing Large Language Model applications with objective metrics, test data generation, and seamless integrations. | 7,598 |
| An AI orchestration framework to build customizable LLM applications with advanced retrieval methods. | 18,094 |
| A framework for building enterprise LLM-based applications using small, specialized models | 8,303 |
| A comprehensive toolkit for detecting and mitigating bias in machine learning models and datasets. | 2,483 |
| A framework for evaluating large language models and systems, providing a registry of benchmarks. | 15,168 |
| An open-source LLM app development platform that enables users to build and deploy AI-powered applications quickly and efficiently. | 54,931 |
| A Python library that provides tools and techniques to defend against various attacks on machine learning models and applications. | 4,945 |
| A Python package for measuring memorization in Large Language Models. | 126 |
| Compiles bias evaluation datasets and provides access to original data sources for large language models | 115 |
| A tool for testing and evaluating large language models (LLMs) to ensure they are reliable and secure | 4,976 |
| An autonomous research agent that gathers and summarizes information from various sources to generate comprehensive reports with citations. | 15,255 |
| An environment for battle-testing prompts to Large Language Models (LLMs) to evaluate response quality and performance. | 2,413 |