CVMFS Modules

C3G in partnership with Compute Canada offers and maintain a large set of bioinformatics ressources to the community.

Here you can find the list of software currently deployed at several HPC center:

Anaconda

Anaconda is a freemium open source distribution of the Python and R programming languages for large-scale data processing, predictive analytics, and scientific computing, that aims to simplify package management and deployment.

Versions available: 2-4.0.0

ASCAT

A tool for accurate dissection of genome-wide allele-specific copy number in tumors.

Versions available: 2.3

Aspera Connect

High-performance transfer plug-in

Versions available: 3.3.3

ATLAS

A data warehouse for integrative bioinformatics

Versions available: 3.10.2

BCFtools

Utilities for variant calling and manipulating VCFs and BCFs

Versions available: 1.2, 1.3

Beagle

Beagle is a software package that performs genotype calling, genotype phasing, imputation of ungenotyped markers, and identity-by-descent segment detection.

Versions available: 4.r1128

bedtools

A software suite for the comparison, manipulation and annotation of genomic features in browser extensible data (BED) and general feature format (GFF) format.

Versions available: 2.17.0, 2.22.1, 2.25.0

BisSNP

A bisulfite space genotyper & methylation caller

Versions available: 0.82.2

BLAST

Basic Local Alignment Search Tool

Versions available: 2.2.29+, 2.3.0+

Bowtie

An ultrafast, memory-efficient short read aligner.

Versions available: 1.0.0

Bowtie2

Bowtie 2 is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences.

Versions available: 2.2.3, 2.2.4

BbreakDancer

A Perl/C++ package that provides genome-wide detection of structural variants from next generation paired-end sequencing reads.

Versions available: 1.1_2011_02_21

bvatools

BVATools — Bam and Variant Analysis Tools

Versions available: 1.3, 1.4, 1.5, 1.6

BWA

A software package for mapping low-divergent sequences against a large reference genome, such as the human genome.

Versions available: 0.7.10, 0.7.12

Bwakit

Bwakit is a self-consistent installation-free package of scripts and precompiled binaries, providing an end-to-end solution to read mapping.

Versions available: 0.7.12

CD-HIT

CD-HIT is a very widely used program for clustering and comparing protein or nucleotide sequences.

Versions available: 4.5.4-2011-03-07

Cufflinks

Cufflinks assembles transcripts, estimates their abundances, and tests for differential expression and regulation in RNA-Seq samples.

Versions available: 2.2.1

ea-utils

A command-line tools for processing biological sequencing data. Barcode demultiplexing, adapter trimming, etc.

Versions available: 1.1.2-537

EPACTS

A versatile software pipeline to perform various statistical tests for identifying genome-wide association from sequence data through a user-friendly interface, both to scientific analysts and to method developers.

Versions available: 3.2.6

Exonerate

A generic tool for pairwise sequence comparison.

Versions available: 2.2.0

FastQC

A quality control tool for high throughput sequence data.

Versions available: 0.11.2

FastTree

FastTree infers approximately-maximum-likelihood phylogenetic trees from alignments of nucleotide or protein sequences.

Versions available: 2.1.7

FASTX-Toolkit

The FASTX-Toolkit is a collection of command line tools for Short-Reads FASTA/FASTQ files preprocessing.

Versions available: 0.0.13.2

FLASH

FLASH (Fast Length Adjustment of SHort reads) is a very fast and accurate software tool to merge paired-end reads from next-generation sequencing experiments. FLASH is designed to merge pairs of reads when the original DNA fragments are shorter than twice the length of reads. The resulting longer reads can significantly improve genome assemblies.

Versions available: 1.2.11, 1.2.8

gcc

The GNU Compiler Collection includes front ends for C, C++, Objective-C, Fortran, Java, Ada, and Go, as well as libraries for theselanguages (libstdc++, libgcj,…). GCC was originally written as the compiler for the GNU operating system.

Versions available: 4.9.3

GEMINI

Flexible framework for exploring genetic variation in the context of the wealth of genome annotations available for the human genome.

Versions available: 0.18.0, 0.18.2, 0.18.3

GEM

The GEM library strives to be a true “next-generation” tool for handling any kind of sequence data, offering state-of-the-art algorithms and data structures specifically tailored to this demanding task.

Versions available: v1.315

Genome Analysis Toolkit

Versions available: 3.2-2, 3.3-0, 3.5

Ghostscript

an interpreter for the PostScript language and for PDF.

Versions available: 8.7

Gnuplot

Gnuplot is a portable command-line driven graphing utility for
Linux, OS/2, MS Windows, OSX, VMS, and many other platforms.

Versions available: 4.6.4, 4.6.6

HMMER

HMMER is used for searching sequence databases for sequence homologs,and for making sequence alignments. It implements methods using probabilistic models called profile hidden Markov models (profile HMMs).

Versions available: 2.3.2, 3.1b1, 3.1b2

HOMER

HOMER offers tools and methods for interpreting Next-gen-Seq experiments.  In addition to Genome Browser/UCSC visualization support and peak finding [and motif finding of course], HOMER can help assemble data across multiple experiments and look at positional specific relationships between sequencing tags, motifs, and other features. You do not need to use the peak finding methods in this package to use motif finding.

Versions available: 4.7

HTSlib

A C library for reading/writing high-throughput sequencing data

Versions available: 1.2.1, 1.3

IGV

The Integrative Genomics Viewer (IGV) is a high-performance visualization tool for interactive exploration of large, integrated genomic datasets.

Versions available: 2.3.23

igvtools

The igvtools utility provides a set of tools for pre-processing data files. File names must contain an accepted file extension, e.g. test-xyz.bam.

Versions available: 2.3.14, 2.3.67

Java

Versions available: openjdk-jdk1.6.0_38, openjdk-jdk1.7.0_60, openjdk-jdk1.8.0_72

JELLYFISH

JELLYFISH is a tool for fast, memory-efficient counting of k-mers in DNA.

Versions available: 2.1.3

KmerGenie

KmerGenie estimates the best k-mer length for genome de novo assembly.

Versions available: 1.5692

KronaTools

Krona Tools is a set of scripts to create Krona charts from several Bioinformatics tools as well as from text and XML files.

Versions available: 2.6.1

LAPACK

LAPACK provides routines for solving systems of simultaneous linear equations, least-squares solutions of linear systems of equations, eigenvalue problems, and singular value problems.

Versions available: 3.5.0

MACS

Model-based Analysis of ChIP-Seq (MACS) on short reads sequencers such as Genome Analyzer (Illumina / Solexa)

Versions available: 2.0.10.09132012

MACS2

Novel algorithm, named Model-based Analysis of ChIP-Seq (MACS), for
identifying transcript factor binding sites.

Versions available: 2.1.0.20140616, 2.1.0.20151222

miRDeep2

miRDeep2 is a completely overhauled tool which discovers microRNA genes by analyzing sequenced RNAs. The tool reports known and hundreds of novel microRNAs with high accuracy in seven species representing the major animal clades. The low consumption of time and memory combined with user-friendly interactive graphic output makes miRDeep2 accessible for straightforward application in current reasearch.

Versions available: 2_0_0_5

mpich

MPICH is a high performance and widely portable implementation of the Message Passing Interface (MPI) standard.

Versions available: 3.1.4

mugqic_pipelines

MUGQIC pipelines consist of Python scripts which create a list of jobs running Bash commands. Those scripts support dependencies between jobs and smart restart mechanism if some jobs fail during pipeline execution. Jobs can be submitted in different ways: by being sent to a PBS scheduler like Torque or by being run as a series of commands in batch through a Bash script

Versions available: 2.0.1, 2.0.2, 2.1.0, 2.1.1, 2.2.0

mugqic_R_packages

This library implements various -seq downstream analysis, as well as Nozzle-based reporting for mugqic_pipelines.

Versions available: 1.0.1, 1.0.2, 1.0.3, 1.0.4

mugqic_tools

Perl, python, R, awk and sh scripts use in several bioinfomatics pipelines of the MUGQIC PIPELINE.

Versions available: 2.0.2, 2.0.3, 2.1.0, 2.1.1, 2.1.3, 2.1.4, 2.1.5, 2.1.6

MUMmer

Ultra-fast alignment of large-scale DNA and protein sequences

Versions available: 3.23

MUSCLE

Program for creating multiple alignments of protein sequences.

Versions available: 3.8.31

MuTect

Reliable and accurate identification of somatic point mutations in next generation sequencing data of cancer genomes

Versions available: 1.1.6

NextClip

Tool for analysing reads from LMP libraries, generating a comprehensive quality report and extracting good quality trimmed and deduplicated reads

Versions available: b833dd9

OpenBLAS

Optimized BLAS library based on GotoBLAS2 1.13 BSD version

Versions available: 0.2.14, 0.2.17

Pandoc

Universal document converter

Versions available: 1.13.1, 1.15.2

parallel

Shell tool for executing jobs in parallel using one or more computers

Versions available: 20130822

pbs-drmaa

DRMAA for Torque/PBS Pro is implementation of Open Grid Forum DRMAA (Distributed Resource Management Application API) specification for submission and control jobs to PBS systems

Versions available: 1.0.18

perl

Feature-rich programming language

Versions available: 5.18.2, 5.22.1

Picard

Set of tools (in Java) for working with next generation sequencing data in the BAM format

Versions available: 1.118, 1.123, 2.0.1

pigz

Replacement for gzip that exploits multiple processors and multiple cores when compressing data

Versions available: 2.3

PRINSEQ-lite

Used to filter, reformat, or trim your genomic and metagenomic sequence data

Versions available: 0.20.3, 0.20.4

Python

Programming language that lets you work quickly and integrate systems more effectively

Versions available: 2.7.10_qiime, 2.7.11, 2.7.8

R_Bioconductor

Tools for the analysis and comprehension of high-throughput genomic data. Bioconductor uses the R statistical programming language

Versions available: 3.1.2_3.0, 3.2.3_3.2

Ray

Parallel genome assemblies for parallel DNA sequencing

Versions available: 2.3.1

RNAmmer

Predicts 5s/8s, 16s/18s, and 23s/28s ribosomal RNA in full genome sequences.

Versions available: 1.2

RNA-SeQC

Java program which computes a series of quality control metrics for RNA-seq data

Versions available: 1.1.7, 1.1.8

RSEM

Accurate quantification of gene and isoform expression from RNA-Seq data

Versions available: 1.2.12

SAMtools

A suite of programs for interacting with high-throughput sequencing data.

Versions available: 0.1.19, 1.0, 1.2, 1.3

Scalpel

Software package for detecting INDELs (INsertions and DELetions) mutations in a reference genome

Versions available: 0.3.2, 0.5.2

ShortStack

Tool developed to process and analyze small RNA-seq data with respect to a reference genome, and output a comprehensive and informative annotation of all discovered small RNA genes

Versions available: 3.3

SignalP

Predicts the presence and location of signal peptide cleavage sites in amino acid sequences from different organisms

Versions available: 4.1

SMRT-Analysis

Pacbio secondary analysis through a graphical or command-line user interface.

Versions available: 2.3.0.140936.p1, 2.3.0.140936.p2, 2.3.0.140936.p4, 2.3.0.140936.p5

SNAP

General purpose gene finding program suitable for both eukaryotic and prokaryotic genomes

Versions available: 11/29/2013

SnpEff

Variant annotation and effect prediction tool. It annotates and predicts the effects of variants on genes

Versions available: 3.6, 4.0, 4.2

Sphinx

Sphinx is a tool that makes it easy to create intelligent and beautiful documentation of Python projects

Versions available: master

SplAdder

Splicing Adder, a toolbox for alternative splicing analysis based on RNA-Seq alignment data. Briefly, the software takes a given annotation and RNA-Seq read alignments, transforms the annotation into a splicing graph representation,
augments the splicing graph with additional information extracted from the read data, extracts alternative splicing events from the graph and quantifies the events.

Versions available: 1.0.0

STAR

Spliced Transcripts Alignment to a Reference. Based on a previously undescribed RNA-seq alignment algorithm that uses sequential maximum mappable seed search in uncompressed suffix arrays followed by seed clustering and stitching procedure.

Versions available: 2.4.0f1, 2.5.0c, 2.5.1b, 2.5.2a

Tabix

Tabix indexes a TAB-delimited genome position file in.tab.bgz and creates an index file ( in.tab.bgz.tbi or in.tab.bgz.csi ) when region is absent from the command-line.

Versions available: 0.2.6

TMHMM

Predicting Transmembrane Protein Topology with a Hidden Markov
Model

Versions available: 2.0c

tools

Perl, Python, R, awk and sh scripts use in several bioinfomatics pipelines of the MUGQIC PIPELINES repo.

Versions available: 1.10.4

TopHat

TopHat is a fast splice junction mapper for RNA-Seq reads. It aligns RNA-Seq reads to mammalian-sized genomes using the ultra high-throughput short read aligner Bowtie, and then analyzes the mapping results to identify splice junctions between exons.

Versions available: 2.0.13, 2.0.14

TransDecoder

TransDecoder identifies candidate coding regions within transcript sequences, such as those generated by de novo RNA-Seq transcript assembly using Trinity, or constructed based on RNA-Seq alignments to the genome using Tophat and Cufflinks

Versions available: 2.0.1

Trimmomatic

Trimmomatic performs a variety of useful trimming tasks for illumina paired-end and single ended data.The selection of trimming steps and their associated parameters are supplied on the command line.

Versions available: 0.32, 0.35

Trinity

Trinity assembles transcript sequences from Illumina RNA-Seq data

Versions available: 2.0.4, 2.1.1, 2.2.0, 20140413p1

Tinotate

A comprehensive annotation suite for functional annotation of transcriptomes, particularly de novo assembled transcriptomes, from model or non-model organisms. Trinotate makes use of a number of different well referenced methods for functional annotation including homology search to known sequence data (BLAST+/SwissProt), protein domain identification (HMMER/PFAM), protein signal peptide and transmembrane domain prediction (signalP/tmHMM), and leveraging various annotation databases (eggNOG/GO/Kegg databases).

Versions available: 2.0.1, 2.0.2, 20131110

UCSC tools

UCSC genome browser ‘kent’ bioinformatic utilities

Versions available: 20140212, v326

USEARCH

Ultra-fast search for high-identity top hit or hits from sequence files

Versions available: 7.0.1090, 8.1.1861

VarScan

VarScan is a platform-independent mutation caller for targeted, exome, and whole-genome resequencing data generated on Illumina, SOLiD, Life/PGM, Roche/454 and similar instruments. It can be used to detect different types of variation: Germline variants, multi-sample variants, somatic mutations and somatic copy number alterations

Versions available: 2.3.9

VCFtools

A program package that can be used to perform the following operations on standard variants (VCF) files: Filter out specific variantsCompare filesSummarize variantsConvert to different file typesValidate and merge filesCreate intersections and subsets of variants

Versions available: 0.1.11, 0.1.14

VerifyBamID

Verifies whether the reads in particular file match previously known genotypes for an individual (or group of individuals), and checks whether the reads are contaminated as a mixture of two samples. verifyBamID can detect sample contamination and swaps when external genotypes are available. When external genotypes are not available, verifyBamID still robustly detects sample swaps

Versions available: devMaster_20151216

VSEARCH

VSEARCH supports de novo and reference based chimera detection, clustering, full-length and prefix dereplication, reverse complementation, masking, all-vs-all pairwise global alignment, exact and global alignment searching, shuffling, subsampling and sorting. It also supports FASTQ file analysis, filtering and conversion.

Versions available: 1.11.1

vt

A tool set for short variant discovery in genetic sequence data.

Versions available: 0.57

WebLogo

A tool for creating sequence logos from biological sequence alignments. It can be run on the command line as a standalone webserver, as a CGI webapp, or as a python library.

Versions available: 2.8.2, 3.3

Celera Assembler

A de novo whole-genome shotgun (WGS) DNA sequence assembler. It reconstructs long sequences of genomic DNA from fragmentary data produced by whole-genome shotgun sequencing

Versions available: 8.1