FakeNewsCorpus

News corpus

A large dataset of news articles with labeled categories to train fake news recognition algorithms

A dataset of millions of news articles scraped from a curated list of data sources.

GitHub

385 stars

16 watching

97 forks

last commit: over 6 years ago

artificial-intelligencecorpusdatabasedatasetfakenewsmachine-learningnatural-language-processingnlp

Related projects:

Repository	Description	Stars
cluebenchmark/cluecorpus2020	A large-scale Chinese corpus for pre-training language models.	927
christos-c/bible-corpus	A multilingual parallel corpus created from translations of the Bible.	177
chatopera/insuranceqa-corpus-zh	An insurance industry conversation corpus with pre-processed data for natural language processing and question answering tasks.	1,019
nytud/hucopa	A dataset and annotation scheme for Hungarian causal reasoning tasks.	1
poltextlab/hunempoli_corpus	A manually annotated corpus for training and testing machine learning models of Aspect Based Sentiment Analysis (ABSA) in Hungarian language.	0
rifkybujana/fnd	An AI-powered tool that detects whether news articles are fake or not	8
zake7749/gossiping-chinese-corpus	A collection of question-answer pairs extracted from online Chinese forums.	236
rowanz/grover	A framework for defending against neural fake news through both generation and detection of fake news articles.	918
certainlyio/corona_dataset	A collection of data to train chatbots on COVID-19-related questions	11
blairconrad/selfinitializingfakes	A framework for creating reusable fake objects with persistent behavior after the initial setup	11
bertez/corpora	A collection of Galician language data in JSON format.	2
cyberboysumanjay/inshorts-news-api	An unofficial API to fetch news content from Inshorts using Flask and Python.	228
ibm/max-news-text-generator	Generates English-language text similar to news articles using machine learning and natural language processing techniques.	26
jbaiter/archiscribe-corpus	A repository of transcribed 19th century German texts from various sources.	8
josecannete/spanish-corpora	A collection of unannotated Spanish text data, compiled from various sources and processed for natural language processing tasks.	92