Awesome-Bioinformatics

Bioinformatics toolkit

A curated list of software tools and resources for bioinformatics analysis and computational biology

A curated list of awesome Bioinformatics libraries and software.

GitHub

3k stars
170 watching
617 forks
last commit: 8 months ago
Linked from 4 awesome lists

awesomeawesome-listbioinformatics

Awesome Bioinformatics / Data Tools / Downloading

web Go Get Data; A command line interface for obtaining genomic data. [ ]
web Easily get SRA download links and other information. [ ]

Awesome Bioinformatics / Data Tools / Compressing

web A compressor of common genomic file formats (BAM, CRAM, FASTQ, VCF etc). [ | ]

Awesome Bioinformatics / Data Processing / Command Line Utilities

web Modular and universal bioinformatics, Bionode provides pipeable UNIX command line tools and JavaScript APIs for bioinformatics analysis workflows. [ ]
paper-2018 Syntax Highlighting for Computational Biology file formats (SAM, VCF, GTF, FASTA, PDB, etc...) in vim/less/gedit/sublime. [ | ]
web Utilities for working with CSV/Tab-delimited files. [ ]
web Another cross-platform, efficient, practical and pretty CSV/TSV toolkit. [ ]
web Data transformations and statistics. [ ]
Here General parallelizer that runs jobs in parallel on a single multi-core machine. are some example scripts using GNU Parallel. [ ]
paper-2011 Table file index. [ ]

Awesome Bioinformatics / Next Generation Sequencing / Workflow Managers

paper-2014 A cross-system scripting language for working with big data pipelines in computer systems of different sizes and capabilities. [ | ]
web A small language for defining pipeline stages and linking them together to make pipelines. [ ]
web a specification for describing analysis workflows and tools that are portable and scalable across a variety of software and hardware environments, from workstations to cluster, cloud, and high performance computing (HPC) environments. [ ]
web A Workflow Management System geared towards scientific workflows. [ ]
paper-2018 a popular open-source, web-based platform for data intensive biomedical research. Has several features, from data analysis to workflow management to visualization tools. [ | ]
paper-2018 A fluent DSL modelled around the UNIX pipe concept, that simplifies writing parallel and scalable pipelines in a portable manner. [ | ]
paper-2010 Computation Pipeline library for python widely used in science and bioinformatics. [ | ]
paper-2019 Workflow library embedded in the Go programming language, focusing on supporting complex workflow constructs, compiling to a single binary, providing powerful file naming and comprehensive audit reports for every output [ | ]
paper-2010 Hadoop Oozie-based workflow system focused on genomics data analysis in cloud environments. [ | ]
paper-2018 A workflow management system in Python that aims to reduce the complexity of creating workflows by providing a fast and comfortable execution environment. [ | ]
web Workflow standard developed by the Broad. [ ]

Awesome Bioinformatics / Next Generation Sequencing / Pipelines

web A flexible pipeline, built with Nextflow, for the complete analysis of bacterial genomes. [ ]
web A generic but comprehensive bacterial annotation pipeline, built with Nextflow, with nice graphical options for investigating results. [ ]
web Batteries included genomic analysis pipeline for variant and RNA-Seq analysis, structural variant calling, annotation, and prediction. [ ]
web Customizable pipeline for differential expression analysis with an intuitive GUI. [ ]
web A pipeline for preprocessing short and long sequencing reads, built with Nextflow. [ ]

Awesome Bioinformatics / Next Generation Sequencing / Sequence Processing

paper-2017 Automatic Filtering, Trimming, Error Removing and Quality Control for fastq data. [ ]
web A quality control tool for high throughput sequence data. [ ]
web FASTQ/A short-reads pre-processing tools: Demultiplexing, trimming, clipping, quality filtering, and masking utilities. [ ]
paper-2016 Aggregate results from bioinformatics analyses across many samples into a single report. [ | ]
paper-2021 Sequence manipulation toolkit for FASTA/FASTQ files written in Nim. [ | ]
paper-2016 A cross-platform and ultrafast toolkit for FASTA/Q file manipulation in Golang. [ | ]
web file format conversion in Biopython in a convenient way. [ ]

Awesome Bioinformatics / Next Generation Sequencing / Data Analysis

paper-2018 Scalable gVCF merging and joint variant calling for population sequencing projects. [ ]

Awesome Bioinformatics / Next Generation Sequencing / Sequence Alignment

paper-2012 An ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences. [ | ]
paper-2020 the wavefront alignment algorithm (WFA) which expoit sequence similarity to speed up alignment [ ]
paper-2016 SIMD C library for global, semi-global, and local pairwise sequence alignments [ ]
paper-1999 A system for rapidly aligning entire genomes, whether in complete or draft form. [ | | | ]
paper-2021 An ultrafast protein aligner for and like searches. [ ]
paper-2002 Partial-Order Alignment for fast alignment and consensus of multiple homologous sequences. [ ]
paper-2017 Ultra-fast, sensitive search and clustering suite for protein and nucleotide sequence sets. [ | ]

Awesome Bioinformatics / Next Generation Sequencing / Quantification

paper-2010 Cufflinks assembles transcripts, estimates their abundances, and tests for differential expression and regulation in RNA-Seq samples. [ ]
paper-2011 A software package for estimating gene and isoform expression levels from RNA-Seq data. [ | ]

Awesome Bioinformatics / Next Generation Sequencing / Variant Calling

paper-2018 Deep learning-based variant caller [ ]
web Bayesian haplotype-based polymorphism discovery and genotyping. [ ]
web Variant Discovery in High-Throughput Sequencing Data. [ ]
paper-2021 A polymorphic bayesian genotyping model with wide applicability. [ ]
paper-2009 samtools/bcftools are a suite of tools for manipulating NGS data and can be used to call variants. [ | ]
paper-2012 Structural variant discovery by integrated paired-end and split-read analysis. [ ]
paper-2014 lumpy: a general probabilistic framework for structural variant discovery. [ ]
paper-2015 Structural variant and indel caller for mapped sequencing data. [ ]
paper-2017 GRIDSS: the Genomic Rearrangement IDentification Software Suite. [ ]

Awesome Bioinformatics / Next Generation Sequencing / BAM File Utilities

paper-2011 Collection of tools for working with BAM files. [ ]
paper-2017 fast BAM/CRAM depth calculation for WGS, exome, or targeted sequencing. [ ]
paper-2010 Displaying sequence statistics for next-generation sequencing. [ | ]
paper-2020 Fast sample-swap and relatedness checks on BAMs/CRAMs/VCFs/GVCFs. [ ]
paper-2014 Telseq is a tool for estimating telomere length from whole genome sequence data. [ ]

Awesome Bioinformatics / Next Generation Sequencing / VCF File Utilities

paper-2016 Set of tools for manipulating VCF files. [ | | ]
paper-2016 Annotate a VCF with other VCFs/BEDs/tabixed files. [ ]
paper-2011 VCF manipulation and statistics (e.g. linkage disequilibrium, allele frequency, Fst). [ ]

Awesome Bioinformatics / Next Generation Sequencing / GFF BED File Utilities

web Suite of tools to handle gene annotations in any GTF/GFF format. [ ]
web - GFF and GTF file manipulation and interconversion. [ ]
paper-2012 The fast, highly scalable and easily-parallelizable genome analysis toolkit. [ ]
paper-2010 A Swiss Army knife for genome arithmetic. [ | | ]

Awesome Bioinformatics / Next Generation Sequencing / Variant Simulation

web Tools for adding mutations to existing files, used for testing mutation callers. [ ]
web - Reads simulator. [ ]

Awesome Bioinformatics / Next Generation Sequencing / Variant Prediction/Annotation

paper-2003 Predicts whether an amino acid substitution affects protein function. [ | ]
paper-2012 Genetic variant annotation and effect prediction toolbox. [ | ]
paper-2016 The VEP determines the effect of your variants (SNPs, insertions, deletions, CNVs or structural variants) on genes, transcripts, and protein sequence, as well as regulatory regions. [ | ]

Awesome Bioinformatics / Next Generation Sequencing / Python Modules

paper-2013 Pythonic access to the UCSC Genome database. [ ]
web Pythonic Access to the Ensembl database. [ ]
paper-2013 Access to Biological Web Services from Python. [ | ]
pyVCF 404 about 1 year ago A port of using Cython for speed
paper-2017 Cython + HTSlib == fast VCF parsing; even faster parsing than pyVCF. [ | ]
bedtools 140 over 3 years ago Python wrapper for . [ | ]
samtools 1,631 8 days ago Python wrapper for . [ ]
web A VCF Parser for Python. [ ]

Awesome Bioinformatics / Visualization / Genome Browsers / Gene Diagrams

paper-2018 Easy-to-use DNA sequence visualization tool that turns FASTA files into browser-based visualizations. [ | ]
paper-2011 Embeddable genome viewer. Integration data from a wide variety of sources, and can load data directly from popular genomics file formats including bigWig, BAM, and VCF. [ | ]
paper-2014 BioJS is a library of over hundred JavaScript components enabling you to visualize and process data using current web technologies. [ | ]
paper-2014 Flexible circular visualization of genome-associated data with BioPerl and SVG. [ ]
paper-2016 Horizon chart D3-based JavaScript library for DNA data. [ | ]
paper-2019 Java-based browser. Fast, efficient, scalable visualization tool for genomics data and annotations. Handles a large variety of formats. [ | ]
paper-2015 D3 JavaScript based genome viewer. Constructs SVGs. [ ]
paper-2016 JavaScript genome browser that is highly customizable via plugins and track customizations. [ | ]
paper-2018 Point and click, cross platform suite for analysing and visualizing next-generation sequencing datasets. [ | ]
paper-2016 JavaScript library that can be used to generate interactive and highly customizable web-based genome browsers. [ ]
paper-2012 JavaScript library for drawing canvas-based gene diagrams. [ | ]
web A modern sequence alignment viewer. [ ]
paper-2009 Perl package for circular plots, which are well suited for genomic rearrangements. [ | ]
paper-2015 An interactive web-based service of Circos. [ ]
paper-2014 R package for circular plots for omics data. [ | ]
paper-2014 A Java application for doing interactive work with circos plots. [ | ]
paper-2013 R package for circular plots. [ | ]
paper-2018 A circos representation of multiple GWAS results. [ ]

Awesome Bioinformatics / Database Access

Entrez Direct: E-utilities on the UNIX command line UNIX command line tools to access NCBI's databases programmatically. Instructions to install and examples are found in the link

Awesome Bioinformatics / Resources / Becoming a Bioinformatician

What is a bioinformatician
Bioinformatics Curriculum Guidelines: Toward a Definition of Core Competencies
Top N Reasons To Do A Ph.D. or Post-Doc in Bioinformatics/Computational Biology
A 10-Step Guide to Party Conversation For Bioinformaticians Here is a step-by-step guide on how to convey concepts to people not involved in the field when asked the question: 'So, what do you do?'
A History Of Bioinformatics (In The Year 2039) A talk by C. Titus Brown on his take of looking back at bioinformatics from the year 2039. His notes for this talk can be found
A farewell to bioinformatics A critical view of the state of bioinformatics
A Series of Interviews with Notable Bioinformaticians Dr. Keith Bradnam "thought it might be instructive to ask a simple series of questions to a bunch of notable bioinformaticians to assess their feelings on the current state of bioinformatics research, and maybe get any tips they have about what has been useful to their bioinformatics careers."
Open Source Society University on Bioinformatics 5,509 8 months ago Solid path for those of you who want to complete a Bioinformatics course on your own time, for free, with courses from the best universities in the World
Rosalind Rosalind is a platform for learning bioinformatics through problem solving
A guide for the lonely bioinformatician This guide is aimed at bioinformaticians, and is meant to guide them towards better career development
A brief history of bioinformatics

Awesome Bioinformatics / Resources / Bioinformatics on GitHub

Awesome-alternative-splicing 51 over 6 years ago List of resources on alternative splicing including software, databases, and other tools
Awesome AI-based Protein Design 229 6 months ago A collection of research papers for AI-based protein design

Awesome Bioinformatics / Resources / Sequencing

Next-Generation Sequencing Technologies - Elaine Mardis (2014) [1:34:35] - Excellent (technical) overview of next-generation and third-generation sequencing technologies, along with some applications in cancer research
Annotated bibliography of *Seq assays List of ~100 papers on various sequencing technologies and assays ranging from transcription to transposable element discovery
For all you seq... (PDF) (3456x5471) - Massive infographic by Illumina on illustrating how many sequencing techniques work. Techniques cover protein-protein interactions, RNA transcription, RNA-protein interactions, RNA low-level detection, RNA modifications, RNA structure, DNA rearrangements and markers, DNA low-level detection, epigenetics, and DNA-protein interactions. References included

Awesome Bioinformatics / Resources / RNA-Seq

Review papers on RNA-seq (Biostars) Includes lots of seminal papers on RNA-seq and analysis methods
Informatics for RNA-seq: A web resource for analysis on the cloud 1,337 over 1 year ago Educational resource on performing RNA-seq analysis in the cloud using Amazon AWS cloud services. Topics include preparing the data, preprocessing, differential expression, isoform discovery, data visualization, and interpretation
RNA-seqlopedia RNA-seqlopedia provides an awesome overview of RNA-seq and of the choices necessary to carry out a successful RNA-seq experiment
A survey of best practices for RNA-seq data analysis Gives awesome roadmap for RNA-seq computational analyses, including challenges/obstacles and things to look out for, but also how you might integrate RNA-seq data with other data types
Stories from the Supplement [46:39] - Dr. Lior Pachter shares his stories from the supplement for well-known RNA-seq analysis software CuffDiff and and explains some of their methodologies
List of RNA-seq Bioinformatics Tools Extensive list on Wikipedia of RNA-seq bioinformatics tools needed in analysis, ranging from all parts of an analysis pipeline from quality control, alignment, splice analysis, and visualizations
RNA-seq Analysis 936 about 3 years ago 's notes on various steps and considerations when doing RNA-seq analysis

Awesome Bioinformatics / Resources / ChIP-Seq

ChIP-seq analysis notes from Tommy Tang 752 4 months ago Resources on ChIP-seq data which include papers, methods, links to software, and analysis

Awesome Bioinformatics / Resources / YouTube Channels and Playlists

Current Topics in Genome Analysis 2016 Excellent series of fourteen lectures given at NIH about current topics in genomics ranging from sequence analysis, to sequencing technologies, and even more translational topics such as genomic medicine
GenomeTV "GenomeTV is NHGRI's collection of official video resources from lectures, to news documentaries, to full video collections of meetings that tackle the research, issues and clinical applications of genomic research."
Leading Strand Keynote lectures from Cold Spring Harbor Laboratory (CSHL) Meetings. More on
Genomics, Big Data and Medicine Seminar Series "Our seminars are dedicated to the critical intersection of GBM, delving into 'bleeding edge' technology and approaches that will deeply shape the future."
Rafael Irizarry's Channel Dr. Rafael Irizarry's lectures and academic talks on statistics for genomics
NIH VideoCasting and Podcasting "NIH VideoCast broadcasts seminars, conferences and meetings live to a world-wide audience over the Internet as a real-time streaming video." Not exclusively genomics and bioinformatics video but many great talks on domain specific use of bioinformatics and genomics

Awesome Bioinformatics / Resources / Blogs

ACGT Dr. Keith Bradnam writes about this "thoughts on biology, genomics, and the ongoing threat to humanity from the bogus use of bioinformatics acroynums."
Opiniomics Dr. Mick Watson write on bioinformatics, genomes, and biology
Bits of DNA Dr. Lior Pachter writes review and commentary on computational biology
it is NOT junk Dr. Michael Eisen writes "a blog about genomes, DNA, evolution, open science, baseball and other important things"
#!/perl/bioinfo The Computational and Structural Biology group at EEAD-CSIC writes, in Spanish and English, about ideas and code for plant genomics, computational and structural biology problems

Awesome Bioinformatics / Resources / Miscellaneous

The Leek group guide to genomics papers 471 about 6 years ago Expertly curated genomics papers to get up to speed on genomics, RNA-seq, statistics (used in genomics), software development, and more
A New Online Computational Biology Curriculum "This article introduces a catalog of several hundred free video courses of potential interest to those wishing to expand their knowledge of bioinformatics and computational biology. The courses are organized into eleven subject areas modeled on university departments and are accompanied by commentary and career advice."
How Perl Saved the Human Genome Project An anecdote by Lincoln D. Stein on the importance of the Perl programming language in the Human Genome Project
Educational Papers from Nature Biotechnology and PLoS Computational Biology Page of links to primers and short educational articles on various methods used in computational biology and bioinformatics
The PeerJ Bioinformatics Software Tools Collection Collection of tools curated by Keith Crandall and Claus White, aimed at collating the most interesting, innovative, and relevant bioinformatics tools articles in PeerJ

Awesome Bioinformatics / Online networking groups

Bioinformatics (on Discord) a Discord server for general bioinformatics
r-bioinformatics the official Slack workspace of r/bioinformatics ( )
BioinformaticsGRX A community of bioinformaticians based in Granada, Spain
Comunidad de Desarolladores de Software en Bioinformática A community of bioinformaticians centered in Latin America
COMBINE An Austrialian group for bioinformatics students

Backlinks from these awesome lists:

More related projects: