TruthfulQA

Truthfulness checker

Evaluates model performance on detecting human falsehoods in text responses

TruthfulQA: Measuring How Models Imitate Human Falsehoods

GitHub

618 stars
8 watching
71 forks
Language: Jupyter Notebook
last commit: about 1 year ago

Related projects:

Repository Description Stars
rowanz/grover A framework for defending against neural fake news through both generation and detection of fake news articles. 917
findalexli/scigraphqa A dataset and benchmarking framework for evaluating the performance of large language models on multi-turn question answering tasks for scientific graphs. 37
nyu-mll/bbq A dataset and benchmarking framework to evaluate the performance of question answering models on detecting and mitigating social biases. 87
gair-nlp/factool An open-source framework for detecting factual errors in AI-generated text 825
0x4d31/deception-as-detection Maps deception detection techniques to the ATT&CK framework and provides documentation for security professionals 285
adoreste/truehunter Detects encrypted files using a fast and memory efficient approach without external dependencies. 30
jiasenlu/hiecoattenvqa A framework for training Hierarchical Co-Attention models for Visual Question Answering using preprocessed data and a specific image model. 349
strongqa/howitzer A Ruby-based framework for acceptance testing with flexibility and scalability for different testing tools and cloud services. 261
yosefk/checkedthreads A parallelism framework that detects and prevents race conditions in multithreaded code by automatically load balancing and using Valgrind-based instrumentation. 290
truera/trulens A tool to evaluate and track the performance of large language model (LLM) experiments 2,163
masaiahhan/correlationqa An investigation into the relationship between misleading images and hallucinations in large language models 8
rifkybujana/fnd A machine learning-based system to predict whether news articles are fake or not 8
ai4risk/antifraud Develops and evaluates machine learning models for detecting financial fraud 174
jagilley/fact-checker A tool for fact-checking LLM outputs with self-ask using prompt chaining 286
yg-smile/rl_vvc_dataset A collection of benchmarks and implementations for testing reinforcement learning-based Volt-VAR control algorithms 20