StackOverflow-Question-Code-Dataset
Question dataset
A collection of mined question-code pairs from Stack Overflow used for training and testing AI models
StaQC: a systematically mined dataset containing around 148K Python and 120K SQL domain question-code pairs, as described in "StaQC: A Systematically Mined Question-Code Dataset from Stack Overflow" (WWW'18)
166 stars
7 watching
28 forks
Language: Python
last commit: over 3 years ago
Linked from 1 awesome list
Related projects:
Repository | Description | Stars |
---|---|---|
| A collection of datasets and tools for NLP tasks on Chinese texts, including part-of-speech tagging, named entity recognition, and question answering. | 529 |
| A characteristic-rich dataset for factoid question answering with explicit specification of question characteristics and logical forms. | 92 |
| Provides datasets and tools for analyzing source code in various aspects such as programming languages, commits, and more. | 323 |
| A collection of data for evaluating Chinese machine reading comprehension systems | 419 |
| A large video dataset collected from various open-source websites for use in computer vision and multimedia applications. | 94 |
| A collection of data to train chatbots on COVID-19-related questions | 11 |
| Compiles and provides structured access to Maluuba's NewsQA dataset for natural language question answering research. | 253 |
| A large-scale Chinese conversation dataset and pre-trained dialog models for text generation | 1,799 |
| A tiny library for using large language models in code generation and debugging | 1,221 |
| A curated list of tutorials and resources for learning Python for data science, machine learning, and other related topics. | 5,301 |
| Releases an adversarially constructed commonsense question-answering dataset for testing common sense in natural language understanding | 22 |
| Provides a collection of datasets for natural language processing in Ukrainian. | 57 |
| Converts XML datasets to JSON for community question answering task 3 subtask b | 1 |
| Detects whether a given text sequence is part of the training data used to train a large language model. | 23 |
| An evaluation suite to assess language models' performance in multi-choice questions | 93 |