StackOverflow-Question-Code-Dataset
Question dataset
A collection of mined question-code pairs from Stack Overflow used for training and testing AI models
StaQC: a systematically mined dataset containing around 148K Python and 120K SQL domain question-code pairs, as described in "StaQC: A Systematically Mined Question-Code Dataset from Stack Overflow" (WWW'18)
165 stars
7 watching
28 forks
Language: Python
last commit: about 3 years ago
Linked from 1 awesome list
Related projects:
Repository | Description | Stars |
---|---|---|
crownpku/small-chinese-corpus | A collection of datasets and tools for NLP tasks on Chinese texts, including part-of-speech tagging, named entity recognition, and question answering. | 531 |
ysu1989/graphquestions | A characteristic-rich dataset for factoid question answering with explicit specification of question characteristics and logical forms. | 92 |
src-d/datasets | Provides datasets and tools for analyzing source code in various aspects such as programming languages, commits, and more. | 323 |
ymcui/cmrc2018 | A collection of data for evaluating Chinese machine reading comprehension systems | 415 |
pku-yuangroup/open-sora-dataset | A large video dataset collected from various open-source websites for use in computer vision and multimedia applications. | 94 |
certainlyio/corona_dataset | A collection of data to train chatbots on COVID-19-related questions | 11 |
maluuba/newsqa | Compiles and provides structured access to Maluuba's NewsQA dataset for natural language question answering research. | 253 |
thu-coai/cdial-gpt | A large-scale Chinese conversation dataset and pre-trained dialog models for text generation | 1,782 |
srush/minichain | A tiny library for using large language models in code generation and debugging | 1,215 |
ujjwalkarn/datasciencepython | A curated list of tutorials and resources for learning Python for data science, machine learning, and other related topics. | 5,276 |
websail-nu/codah | Releases an adversarially constructed commonsense question-answering dataset for testing common sense in natural language understanding | 22 |
fido-ai/ua-datasets | Provides a collection of datasets for natural language processing in Ukrainian. | 56 |
witiko/semeval-2016_2017-task3-subtaskb-english | Converts XML datasets to JSON for community question answering task 3 subtask b | 1 |
pratyushmaini/llm_dataset_inference | Detects whether a given text sequence is part of the training data used to train a large language model. | 23 |
mikegu721/xiezhibenchmark | An evaluation suite to assess language models' performance in multi-choice questions | 91 |