TruthfulQA
Truthfulness checker
Evaluates model performance on detecting human falsehoods in text responses
TruthfulQA: Measuring How Models Imitate Human Falsehoods
618 stars
8 watching
71 forks
Language: Jupyter Notebook
last commit: about 1 year ago Related projects:
Repository | Description | Stars |
---|---|---|
rowanz/grover | A framework for defending against neural fake news through both generation and detection of fake news articles. | 917 |
findalexli/scigraphqa | A dataset and benchmarking framework for evaluating the performance of large language models on multi-turn question answering tasks for scientific graphs. | 37 |
nyu-mll/bbq | A dataset and benchmarking framework to evaluate the performance of question answering models on detecting and mitigating social biases. | 87 |
gair-nlp/factool | An open-source framework for detecting factual errors in AI-generated text | 825 |
0x4d31/deception-as-detection | Maps deception detection techniques to the ATT&CK framework and provides documentation for security professionals | 285 |
adoreste/truehunter | Detects encrypted files using a fast and memory efficient approach without external dependencies. | 30 |
jiasenlu/hiecoattenvqa | A framework for training Hierarchical Co-Attention models for Visual Question Answering using preprocessed data and a specific image model. | 349 |
strongqa/howitzer | A Ruby-based framework for acceptance testing with flexibility and scalability for different testing tools and cloud services. | 261 |
yosefk/checkedthreads | A parallelism framework that detects and prevents race conditions in multithreaded code by automatically load balancing and using Valgrind-based instrumentation. | 290 |
truera/trulens | A tool to evaluate and track the performance of large language model (LLM) experiments | 2,163 |
masaiahhan/correlationqa | An investigation into the relationship between misleading images and hallucinations in large language models | 8 |
rifkybujana/fnd | A machine learning-based system to predict whether news articles are fake or not | 8 |
ai4risk/antifraud | Develops and evaluates machine learning models for detecting financial fraud | 174 |
jagilley/fact-checker | A tool for fact-checking LLM outputs with self-ask using prompt chaining | 286 |
yg-smile/rl_vvc_dataset | A collection of benchmarks and implementations for testing reinforcement learning-based Volt-VAR control algorithms | 20 |