StackOverflow-Question-Code-Dataset

Question dataset

A collection of mined question-code pairs from Stack Overflow used for training and testing AI models

StaQC: a systematically mined dataset containing around 148K Python and 120K SQL domain question-code pairs, as described in "StaQC: A Systematically Mined Question-Code Dataset from Stack Overflow" (WWW'18)

GitHub

166 stars

7 watching

28 forks

Language: Python

last commit: almost 5 years ago

Linked from 1 awesome list

Backlinks from these awesome lists:

src-d/awesome-machine-learning-on-source-code

Related projects:

Repository	Description	Stars
crownpku/small-chinese-corpus	A collection of datasets and tools for NLP tasks on Chinese texts, including part-of-speech tagging, named entity recognition, and question answering.	529
ysu1989/graphquestions	A characteristic-rich dataset for factoid question answering with explicit specification of question characteristics and logical forms.	92
src-d/datasets	Provides datasets and tools for analyzing source code in various aspects such as programming languages, commits, and more.	323
ymcui/cmrc2018	A collection of data for evaluating Chinese machine reading comprehension systems	419
pku-yuangroup/open-sora-dataset	A large video dataset collected from various open-source websites for use in computer vision and multimedia applications.	94
certainlyio/corona_dataset	A collection of data to train chatbots on COVID-19-related questions	11
maluuba/newsqa	Compiles and provides structured access to Maluuba's NewsQA dataset for natural language question answering research.	253
thu-coai/cdial-gpt	A large-scale Chinese conversation dataset and pre-trained dialog models for text generation	1,799
srush/minichain	A tiny library for using large language models in code generation and debugging	1,221
ujjwalkarn/datasciencepython	A curated list of tutorials and resources for learning Python for data science, machine learning, and other related topics.	5,301
websail-nu/codah	Releases an adversarially constructed commonsense question-answering dataset for testing common sense in natural language understanding	22
fido-ai/ua-datasets	Provides a collection of datasets for natural language processing in Ukrainian.	57
witiko/semeval-2016_2017-task3-subtaskb-english	Converts XML datasets to JSON for community question answering task 3 subtask b	1
pratyushmaini/llm_dataset_inference	Detects whether a given text sequence is part of the training data used to train a large language model.	23
mikegu721/xiezhibenchmark	An evaluation suite to assess language models' performance in multi-choice questions	93