Scientific Data Architect
The Canadian Center for Computational Genomics (C3G) at McGill University C3G offers a stimulating and rewarding working environment, where projects bring concrete contributions to the advancement of research in fields such as human health and the environment. Our team is composed of people that are passionate about what they are doing, and that work with cutting edge sequencing technologies and high performance computing equipment. Learn more about why McGill is an exciting place to work: https://www.mcgill.ca/careers/why-mcgill.
C3G develops customized and case-by-case bioinformatics solutions as well as an extensive suite of open-source software, including bioinformatics analysis pipelines used by multiple academic institutions, and many data access and analysis portals. Since 2011, C3G has completed more than a thousand bioinformatic analysis projects for over 600 distinct groups of researchers across Canada. In parallel, the Montreal Neurological Institute develops brain imaging analyses pipelines and computational infrastructures for the neuroscience community.
We are seeking a Scientific Data Architect to engage in the development of public and internal databases for projects in the fields of genomics, epigenomics, neuro-imaging, and other life science related fields. To accomplish this, the chosen candidate will also interact closely with development teams at the Montreal Neurological Institute (MNI).
Under the supervision of a Bioinformatics Manager, the Data Architect will be responsible for architecture in various projects within C3G and the MNI, solving challenges arising from the management and analysis of scientific data. By creating blueprints, implementing and maintaining data management systems, the candidate will develop novel methods to increase interoperability and discoverability of data and metadata across life science research fields, fulfilling the C3G and MNI engagements towards Open Science. This will involve constant interactions with the bioinformatics and neuro-imaging platform team members.
- Plan the organization of large sets of data and metadata for projects in genomics, epigenomics and other life science-related fields. This is done at both the database and filesystem levels.
- Participate in all key points of the data handling process, at the levels of ingestion, discovery and retrieval, providing data architecture direction.
- Work with data providers and analysts to create successful data models to support reporting and analytics needs.
- Participate in the conception of data exchange objects to modelize new types of information emanating from the constantly evolving field of high-throughput sequencing technologies.
- Participate in the design of metadata mapping strategies, harmonizing genomics and neuroinformatics data for combined access.
- Maintain and improve existing data models to reflect recent experimental technological advances, following metadata organization best practices.
- Follow community and internal conventions for data storage and exchange. Such an example is the set of standards established by the Global Alliance for Genomics and Health (GA4GH).
- Design database partitions and other strategies to improve data access performance.
- Make use of an issue tracking system to document tasks, issues and bugs, and their resolution status.
Qualifying Skills and/or abilities
- Solid experience in the use of controlled vocabularies and ontologies to describe life science research concepts.
- Experience with data exchange and linking concepts, and technologies such as RDF, JSON-LD and SPARQL.
- Experience with the Git content tracking system.
- Experience with creating, querying and maintaining relational databases (RDBMS), such as MySQL and Postgre.
- Knowledge of performance considerations for different database designs, based on environment requirements.
- Experience in the use of data modelling methods and tools.
- Experience in monitoring and enforcing data modelling/normalization standards.
- Excellent communication and organizational skills and ability to work in a highly interactive group.
- Open mind towards new technologies, having at least basic knowledge of the various software layers involved in public databases and application.
- Capable of getting multiple tasks assigned at once, and making them all progress steadily.
- Undergraduate degree in computer science, engineering, bioinformatics or related field.
- 4 years experience working with scientific research data in the field of genomics.
- Experience with other types of databases concepts, such as NoSQL.
- Knowledge in the field of genetics and bioinformatics, high-throughput experimental methodologies and with large genomic databases.
- Experience with server-side scripting, such as Python, Perl, Bash.
- Prior experience generating and analyzing genomic datasets and/or bioinformatics analysis experience is a plus.
- Experience of working in a cloud environment.
- Experience with git-annex.
- English and French (spoken and written).
HOW TO APPLY:
Internal candidates: Please provide your McGill ID number when applying.*
*Please submit your cover letter and resume as one [.PDF].
Submit your application online by clicking HERE and indicate the McGill job Number MT0594.