StackOverflow-Question-Code-Dataset

Question dataset

A collection of mined question-code pairs from Stack Overflow used for training and testing AI models

StaQC: a systematically mined dataset containing around 148K Python and 120K SQL domain question-code pairs, as described in "StaQC: A Systematically Mined Question-Code Dataset from Stack Overflow" (WWW'18)

GitHub

165 stars
7 watching
28 forks
Language: Python
last commit: about 3 years ago
Linked from 1 awesome list


Backlinks from these awesome lists:

Related projects:

Repository Description Stars
crownpku/small-chinese-corpus A collection of datasets and tools for NLP tasks on Chinese texts, including part-of-speech tagging, named entity recognition, and question answering. 531
ysu1989/graphquestions A characteristic-rich dataset for factoid question answering with explicit specification of question characteristics and logical forms. 92
src-d/datasets Provides datasets and tools for analyzing source code in various aspects such as programming languages, commits, and more. 323
ymcui/cmrc2018 A collection of data for evaluating Chinese machine reading comprehension systems 415
pku-yuangroup/open-sora-dataset A large video dataset collected from various open-source websites for use in computer vision and multimedia applications. 94
certainlyio/corona_dataset A collection of data to train chatbots on COVID-19-related questions 11
maluuba/newsqa Compiles and provides structured access to Maluuba's NewsQA dataset for natural language question answering research. 253
thu-coai/cdial-gpt A large-scale Chinese conversation dataset and pre-trained dialog models for text generation 1,782
srush/minichain A tiny library for using large language models in code generation and debugging 1,215
ujjwalkarn/datasciencepython A curated list of tutorials and resources for learning Python for data science, machine learning, and other related topics. 5,276
websail-nu/codah Releases an adversarially constructed commonsense question-answering dataset for testing common sense in natural language understanding 22
fido-ai/ua-datasets Provides a collection of datasets for natural language processing in Ukrainian. 56
witiko/semeval-2016_2017-task3-subtaskb-english Converts XML datasets to JSON for community question answering task 3 subtask b 1
pratyushmaini/llm_dataset_inference Detects whether a given text sequence is part of the training data used to train a large language model. 23
mikegu721/xiezhibenchmark An evaluation suite to assess language models' performance in multi-choice questions 91