datasets

Code analysis datasets

Provides datasets and tools for analyzing source code in various aspects such as programming languages, commits, and more.

source{d} datasets ("big code") for source code analysis and machine learning on source code

GitHub

323 stars
20 watching
82 forks
Language: Jupyter Notebook
last commit: almost 5 years ago
Linked from 2 awesome lists

datasetdatasetsgitgithubmachine-learningmlosc

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
src-d/gemini A tool for searching and identifying similar code in large source code repositories. 54
haskell-suite/haskell-src-exts A toolset for manipulating and analyzing Haskell source code 193
src-d/apollo A system to identify near-duplicate code projects and files by analyzing their similarities 52
rhiokim/grunt-sloc Analyzes the source lines of code in JavaScript projects to report code complexity and quality metrics 23
asciidisco/sloccount Analyzes code files and reports line counts of specific languages in a standard format. 6
littleyuyu/stackoverflow-question-code-dataset A collection of mined question-code pairs from Stack Overflow used for training and testing AI models 165
flosse/sloc A tool to analyze and report on the size of source code in various programming languages 945
code-kern-ai/refinery A tool to help data scientists manage and annotate natural language data for training AI models 1,402
schochastics/networkdata Provides access to a collection of network datasets in igraph format 142
rucaibox/recsysdatasets A repository of public data sources for Recommender Systems. 856
hernanmd/designinfo Tools for analyzing and visualizing code metrics in Pharo Smalltalk projects 4
sciruby/daru A Ruby library for data analysis and manipulation, providing intuitive APIs and tools for data visualization, statistics, and more. 1,042
techascent/tech.ml.dataset A Clojure library for efficient tabular data processing and analysis 681
srush/minichain A tiny library for using large language models in code generation and debugging 1,215
guxd/deep-code-search A tool for searching and analyzing code based on its structure and content 279