Infant Glioma: Characterizing the landscape of genetic drivers and their clinical impact
Recently published in Nature Communications, this paper presents work by members of the C3G Toronto node that integrates genomic and transcriptomic analyses to assess the molecular and clinical features of infant glioma patients. Examining single nucleotide variants, changes in copy number, fusion formation and other transcriptomic analyses revealed three clinical glioma subgroups in infants, each with distinct genetic drivers, locations in the brain and responses to treatment. Gliomas in infants have substantially different treatment outcomes compared to those that occur in children and adults, yet little is understood about the molecular basis of these differences. This paper gives a comprehensive molecular analysis of infant gliomas to helps to ascertain the biological mechanisms driving their oncogenesis and to help guide future diagnostics and treatment approaches for these patients.
Methylation signatures investigations
The C3G Toronto node has also been involved in two projects to investigate methylation signatures for specific conditions. We have recently published a manuscript in BMC Medical Genomics in which we describe specific DNA methylation signatures for Nicolaides-Baraitser syndrome (NBS), a rare childhood condition that affects physical features and intellectual ability (Chater-Diehl et al., 2019). We showed that specific methylation patterns are associated with pathogenic variants of the NBS causal gene, SMARCA2, which encodes the catalytic domain of a chromatin remodeling complex. We have also identified DNA methylation signatures associated with autism spectrum disorder risk loci, which has been recently published in Clinical Epigenetics (Siu et al., 2019). We show that methylation signatures can be used to identify and distinguish individuals with specific autism-associated mutations and can help determine if specific gene variants are pathogenic or benign to improve autism diagnostics.
Genetics & Genomics Analysis Platform: version 2
We are pleased to announce that the new version of GenAP is now available [HERE]. This release offers a completely re engineered platform that leverages Cloud resources at Compute Canada, and will eventually be deployed as well on other HPC resources. It already offers 2 types of applications: Data Hubs (as in GenAP1), a new graphical “File Browser” allowing files transfer to and from your workspaces,. A new Galaxy application, including up-to-date tools and pipelines, will eventually be added.
ForCasT: a fully integrated and open source pipeline to design CRISPR mutagenesis experiments
ForCas Tool (ForCasT) is a comprehensive tool for the design, evaluation and collection of CRISPR/Cas9 guide RNAs and primers. Using robust parameters, it generates guide RNAs for target loci, assesses their quality for any potential off-target effects and designs associated primers. The results are then stored in a local database that serves as a shared resource for users within a research team, and is constantly being updated to reflect the quality of guides and primers based on additional computational and wet-lab results. ForCasT is a single tool that research teams from various fields of biology can use to build and maintain a collection guide RNAs and primers for Cas-mediated genome editing that are suited to their specific needs. It is currently available as a web-app and as a Dockerized version, and can be found at https://github.com/ ccmbioinfo/CasCADe
GenPipes: an open-source framework for
distributed and scalable genomic analyses
It started in June with the publication of our beloved GenPipes framework and set of NGS data analysis pipelines in GigaScience. We use these pipelines on a daily basis for data production and routine analysis and hope the community will find it useful. While GenPipes is the product of several years of teamwork, kudos to co-first authors Mathieu Bourgey and Rola Dali who worked very hard to get this long-awaited paper out!
Altered microbiome composition in individuals with fibromyalgia
This summer has also been very special for C3G’s metagenome specialist Emmanuel Gonzalez with a publication in Pain highlighting a strong potential link between the microbiome and fibromyalgia, a terrible and elusive disorder affecting a large fraction of the population. This study drew quite a nice amount of media attention, notably from the
Montreal Gazette and the CBC. We are very proud to say that through Emmanuel, C3G provided first-rate analysis services for experimental design, species identification, statistical analysis, machine learning and finally for working hand-in-hand with Drs. Minerbi and Brereton on biological interpretation. Importantly, Emmanuel’s applied ANCHOR here, a method he also
published earlier this year, which enables the identification of microbial species at a resolution higher than for other common 16S sequencing data analysis methods.
Altered differentiation is central to HIV-specific CD4+ T cell dysfunction in progressive disease
Another noteworthy publication to which C3G members contributed as authors was published in Nature Immunology this summer. An important focus of the study was the comparison of HIV-specific CD4 T-cells subpopulations from patients who have undergone antiretroviral therapy and patients who spontaneously suppress HIV viral load below detectable limits (a.k.a. elite controller patients). This comparison contributes to an understanding of why viral control is lost once antiretrovial activity therapy is interrupted.
GSoC 2019 is over!
Again this year, C3G was a Google Summer of Code organization. For people unfamiliar with it, GSoC is in Google’s own words: ” a global program that matches students up with open source, free software and technology-related organizations to write code and get paid to do it! ”
We would like to thank participating students this year for their contributions.
Jiahuang Lin (TBD) – Human history and genome evolution
Konstantinos Kyriakidis (AUTh) – Batchtools for Compute Canada
Madhav Vats (IIIT Delhi) – Flowchart creator for GenPipes
Pranav Tharoor (MAHE) – MiCM Project Match
SriHarshitha Ayyalasomayajula (KMIT) – GenPipes single-cell pipeline
Tip of the Month
There is a huge library of common bioinformatics software available on Compute Canada resources via the modules maintained by C3G staff and distributed via the CernVM-File System (CVMFS). Despite the breadth of the C3G CVMFS library, there may be times when using the provided software isn’t ideal.
you might want to use software that we haven’t yet made available via CVMFS and you don’t want to repeatedly install it at each HPC facility you might want to guarantee comparability of results by running exactly the same software stack on Compute Canada, your workstation, or on infrastructure from a cloud provider such as Amazon Web Services or Google Cloud there is a more recent version of the software already available in a container. Listings of existing images are available from community efforts such as biocontainers, but also might be made built directly from the source repository.In circumstances such as these, containers offer an excellent solution by packaging up your software and its dependencies into a single image that contains all the software needed for a particular analysis or workflow.
The process for running containerized software on Compute Canada can be described in three steps:
- Ensure singularity is available
- Download a container
- Run your containerized software
Step 1: Ensure singularity is available
At all Compute Canada facilities, singularity is available as a module. Loading the module is as simple as running:
If you’re running singularity on your linux laptop or workstation, download instructions are available here.
Step 2: Download a container
Many software stacks are already available as Docker images at repositories such as Docker Hub or Quay.io. Unfortunately, running Docker on shared clusters introduces potential security vulnerabilities. Fortunately for us, Singularity can use Docker images to build new singularity containers. For example, let’s say that we wanted to run the genometools suite. The biocontainers repository shows me that the latests version (1.5.10) is available at quay.io/repository/biocontainers/genometools-genometools as a Docker image. To download the image to my Compute Canada instance, I can run “singularity pull”:
This produces the singularity image “sif” file in the current directory.
Step 3: Run your containerized software
To run the genome tools suite from inside the new container, prepend your command with “singularity exec ”:
That’s it! You have a perfectly reproducible software stack running without needing to worry about installation or dependencies.
Next Steps and Getting Help
As you might imagine, there are plenty of details we don’t have time to cover in this short blog post. If you’d like to learn more, or if you’re having trouble, there are plenty of ways to find help.
- The Compute Canada wiki has an excellent page on running containers on their infrastructure (en/fr)
- The Singularity docs are the definitive guide
- The C3G has a weekly open door session to which you are welcome to bring questions about containers and reproducible bioinformatics analyses.
Why better Data Sharing means better health Care
The future of personalized medicine is dependent on data sharing, according to Yann Joly, Research Director of the Centre of Genomics and Policies; and Guillaume Bourque, Director of the Canadian Centre for Computational Genomics.
Using big data techniques to analyze the function of human genes is already helping develop treatments tailored to individual patients. The more data researchers can access from across the world, the better chances of treating even rare diseases. But privacy and consent regulations differ by country, making sharing this information across borders slow and frustrating.