FakeNewsCorpus

News corpus

A large dataset of news articles with labeled categories to train fake news recognition algorithms

A dataset of millions of news articles scraped from a curated list of data sources.

GitHub

385 stars
16 watching
97 forks
last commit: almost 5 years ago
artificial-intelligencecorpusdatabasedatasetfakenewsmachine-learningnatural-language-processingnlp

Related projects:

Repository Description Stars
cluebenchmark/cluecorpus2020 A large-scale pre-training corpus for Chinese language models 925
christos-c/bible-corpus A multilingual parallel corpus created from translations of the Bible. 176
chatopera/insuranceqa-corpus-zh An insurance industry conversation corpus with pre-processed data for natural language processing and question answering tasks. 1,020
nytud/hucopa A dataset of Hungarian translations of English 'cause-and-effect' questions with plausible alternative answers 1
poltextlab/hunempoli_corpus A manually annotated corpus for training and testing machine learning models of Aspect Based Sentiment Analysis (ABSA) in Hungarian language. 0
rifkybujana/fnd A machine learning-based system to predict whether news articles are fake or not 8
zake7749/gossiping-chinese-corpus A collection of question-answer pairs extracted from online Chinese forums. 238
rowanz/grover A framework for defending against neural fake news through both generation and detection of fake news articles. 917
certainlyio/corona_dataset A collection of data to train chatbots on COVID-19-related questions 11
blairconrad/selfinitializingfakes A framework for creating reusable fake objects with persistent behavior after the initial setup 11
bertez/corpora A collection of Galician language data in JSON format. 2
cyberboysumanjay/inshorts-news-api An unofficial API to fetch news content from Inshorts using Flask and Python. 226
ibm/max-news-text-generator Generates English-language text similar to news articles using machine learning and natural language processing techniques. 26
jbaiter/archiscribe-corpus A repository of transcribed 19th century German texts from various sources. 8
josecannete/spanish-corpora A collection of unannotated Spanish text data, compiled from various sources and processed for natural language processing tasks. 92