TruthfulQA

Truth detector

Evaluating model performance on human falsehoods in language models

TruthfulQA: Measuring How Models Imitate Human Falsehoods

631 stars

8 watching

74 forks

Language: Jupyter Notebook

last commit: over 1 year ago

Screenshot of sylinrl/TruthfulQA website

arxiv.org/abs/2109.07958

Related projects:

Repository	Description	Stars
rowanz/grover	A framework for defending against neural fake news through both generation and detection of fake news articles.	918
findalexli/scigraphqa	A dataset and benchmarking framework for evaluating the performance of large language models on multi-turn question answering tasks for scientific graphs.	38
nyu-mll/bbq	A dataset and benchmarking framework to evaluate the performance of question answering models on detecting and mitigating social biases.	92
gair-nlp/factool	An open-source framework for detecting factual errors in AI-generated text	839
0x4d31/deception-as-detection	Maps deception detection techniques to the ATT&CK framework and provides documentation for security professionals	287
adoreste/truehunter	Detects encrypted files using a fast and memory efficient approach without external dependencies.	30
jiasenlu/hiecoattenvqa	A framework for training Hierarchical Co-Attention models for Visual Question Answering using preprocessed data and a specific image model.	349
strongqa/howitzer	A Ruby-based framework for acceptance testing with flexibility and scalability for different testing tools and cloud services.	261
yosefk/checkedthreads	A parallelism framework that detects and prevents race conditions in multithreaded code by automatically load balancing and using Valgrind-based instrumentation.	290
truera/trulens	A tool to evaluate and track the performance of large language model (LLM) experiments	2,233
masaiahhan/correlationqa	An investigation into the relationship between misleading images and hallucinations in large language models	8
rifkybujana/fnd	An AI-powered tool that detects whether news articles are fake or not	8
ai4risk/antifraud	Develops and evaluates machine learning models for detecting financial fraud	195
jagilley/fact-checker	A tool for fact-checking LLM outputs with self-ask using prompt chaining	289
yg-smile/rl_vvc_dataset	A collection of benchmarks and implementations for testing reinforcement learning-based Volt-VAR control algorithms	20