datasets
Code analysis datasets
Provides datasets and tools for analyzing source code in various aspects such as programming languages, commits, and more.
source{d} datasets ("big code") for source code analysis and machine learning on source code
323 stars
20 watching
82 forks
Language: Jupyter Notebook
last commit: over 5 years ago
Linked from 2 awesome lists
datasetdatasetsgitgithubmachine-learningmlosc
Related projects:
Repository | Description | Stars |
---|---|---|
| A tool for searching and identifying similar code in large source code repositories. | 54 |
| A toolset for manipulating and analyzing Haskell source code | 194 |
| A system to identify near-duplicate code projects and files by analyzing their similarities | 52 |
| Analyzes the source lines of code in JavaScript projects to report code complexity and quality metrics | 23 |
| Analyzes code files and reports line counts of specific languages in a standard format. | 6 |
| A collection of mined question-code pairs from Stack Overflow used for training and testing AI models | 166 |
| A tool to analyze and report on the size of source code in various programming languages | 951 |
| A tool to help data scientists manage and annotate natural language data for training AI models | 1,405 |
| Provides access to a collection of network datasets in igraph format | 144 |
| A repository of public data sources for Recommender Systems. | 887 |
| Tools for analyzing and visualizing code metrics in Pharo Smalltalk projects | 4 |
| A Ruby library for data analysis and manipulation, providing intuitive APIs and tools for data visualization, statistics, and more. | 1,044 |
| A Clojure library for efficient tabular data processing and analysis | 687 |
| A tiny library for using large language models in code generation and debugging | 1,221 |
| A tool for searching and analyzing code based on its structure and content | 279 |