Job posting

Scientific Data Curator


Position Summary

The Canadian Center for Computational Genomics (C3G) at McGill University C3G offers a stimulating and rewarding working environment, where projects bring concrete contributions to the advancement of research in fields such as human health and the environment. Our team is composed of people that are passionate about what they are doing, and that work with cutting edge sequencing technologies and high performance computing equipment. Learn more about why McGill is an exciting place to work: https://www.mcgill.ca/careers/why-mcgill.

C3G develops customized and case-by-case bioinformatics solutions as well as an extensive suite of open-source software, including bioinformatics analysis pipelines (bitbucket.org/mugqic/genpipes) used by multiple academic institutions, and many data access and analysis portals. Since 2011, C3G has completed more than a thousand bioinformatic analysis projects for over 600 distinct groups of researchers across Canada.

We are seeking a Scientific Data Curator to engage in the development and maintenance of a number of internal and public databases, for projects such as the International Human Epigenome Consortium (IHEC) Data Portal (epigenomesportal.ca/ihec/) , Terry Fox PROFYLE (www.terryfox.org/recent-posts/profyle/) and more.

Primary Responsibilities:

Under the supervision of a Bioinformatics Manager, the Scientific Data Curator will plan, implement and maintain local and public web-based databases, to solve challenges arising from the management and analysis of scientific data, fulfilling C3G’s engagements towards Open Science. Short development cycles will involve small but frequent data and application releases, and constant interactions with the bioinformatics platform team members.

  • Organize and maintain large sets of data and metadata for projects in genomics, epigenomics and other life science-related fields.
  • Participate in all key points of the data handling process, at the levels of ingestion, discovery and retrieval.
  • Maintain and improve existing data models to reflect recent experimental technological advances, following metadata organization best practices.
  • Setup and maintain automation tools for databases sanity checks and testing.
  • Follow internal and community-established conventions for data storage and exchange.
  • Through the development and usage of quality control scripts, software and pipelines, abstract user-submitted data and metadata, and oversee its quality.
  • Implement curation tools that are reusable by other curators and the rest of the C3G team.
  • Serve as a contact point, through helpdesks and mailing lists, for the research community, to facilitate deposition and acquisition of data from our databases and applications.
  • Create user documentation, reports and demonstration material on tools and services usage.
  • Ensure software development is done by following the best coding practices, including proper code commenting, unit testing, etc.
  • Make use of an issue tracking system to document tasks, issues and bugs, and their resolution status.
  • Run existing bioinformatics analysis pipelines on submitted data, and contribute to the development of new pipelines for data curation purposes.

Qualifying Skills And/Or Abilities

Mandatory:

  • Experience with creating, querying and maintaining relational databases (RDBMS), such as MySQL and Postgre.
  • Experience with server-side scripting, such as Python, Perl, Bash.
  • Exquisite attention to details; the curator will be responsible for maintaining healthy datasets on which several research projects will depend.
  • Excellent communication and organizational skills and ability to work in a highly interactive group.
  • Open mind towards new technologies, having at least basic knowledge of the various layers involved in public databases and application.
  • Capable of getting multiple tasks assigned at once, and making them all progress steadily.
  • Undergraduate degree in computer science, engineering, bioinformatics or related field.

Strong Assets:

  • Experience with other types of databases concepts, such as NoSQL.
  • Familiarity with the use of controlled vocabularies and ontologies to describe biological concepts.
  • Experience with some web application frameworks on the front and back end, such as Flask, React, NodeJS.
  • Experience with applications containerization (e.g. Docker, Singularity).
  • Knowledge in the field of genetics and bioinformatics, high-throughput experimental methodologies and with large genomic databases.
  • Prior experience generating and curating genomic datasets and/or bioinformatics analysis experience is a plus.
  • Experience with the Git version control system.
  • English and French (spoken and written).

HOW TO APPLY:
Internal candidates: Please provide your McGill ID number when applying.*
*Please submit your cover letter and resume as one [.PDF].
Submit your application online clicking on HERE and indicate the McGill job Number MT0604.