datasets
Code analysis datasets
Provides datasets and tools for analyzing source code in various aspects such as programming languages, commits, and more.
source{d} datasets ("big code") for source code analysis and machine learning on source code
323 stars
20 watching
82 forks
Language: Jupyter Notebook
last commit: almost 5 years ago
Linked from 2 awesome lists
datasetdatasetsgitgithubmachine-learningmlosc
Related projects:
Repository | Description | Stars |
---|---|---|
src-d/gemini | A tool for searching and identifying similar code in large source code repositories. | 54 |
haskell-suite/haskell-src-exts | A toolset for manipulating and analyzing Haskell source code | 193 |
src-d/apollo | A system to identify near-duplicate code projects and files by analyzing their similarities | 52 |
rhiokim/grunt-sloc | Analyzes the source lines of code in JavaScript projects to report code complexity and quality metrics | 23 |
asciidisco/sloccount | Analyzes code files and reports line counts of specific languages in a standard format. | 6 |
littleyuyu/stackoverflow-question-code-dataset | A collection of mined question-code pairs from Stack Overflow used for training and testing AI models | 165 |
flosse/sloc | A tool to analyze and report on the size of source code in various programming languages | 945 |
code-kern-ai/refinery | A tool to help data scientists manage and annotate natural language data for training AI models | 1,402 |
schochastics/networkdata | Provides access to a collection of network datasets in igraph format | 142 |
rucaibox/recsysdatasets | A repository of public data sources for Recommender Systems. | 856 |
hernanmd/designinfo | Tools for analyzing and visualizing code metrics in Pharo Smalltalk projects | 4 |
sciruby/daru | A Ruby library for data analysis and manipulation, providing intuitive APIs and tools for data visualization, statistics, and more. | 1,042 |
techascent/tech.ml.dataset | A Clojure library for efficient tabular data processing and analysis | 681 |
srush/minichain | A tiny library for using large language models in code generation and debugging | 1,215 |
guxd/deep-code-search | A tool for searching and analyzing code based on its structure and content | 279 |