MCB Data Science Option

The Molecular and Cellular Biology (MCB) graduate program is an interdisciplinary graduate program encompassing all aspects of molecular and cellular biology. The MCB Data Science Option will provide interested students within the program with specific training in the fundamentals of Data Science methods and applications. Specifically, this curriculum will enable students to become effective Tool Users, Tool Builders, or Tool Trainers according to their research and career goals. Tool Users are primarily interested in learning how to use computers, programming, and statistics to analyze and visualize their data to more effectively answer their biological questions of interest. Tool Builders are primarily interested in learning to develop computational methods or tools for biological analysis both as an end unto itself and as a means to answering their biological questions. Tool Trainers want to develop expertise in machine learning to apply these advanced statistical methods to biological systems. In addition to coursework to build these competencies, the MCB Data Science option will include exposure to and engagement with the broader Data Science community at University of Washington through the eScience institute.

Requirements and how to apply

Students participating in the MCB Data Science Option must complete 9 credits covering 3 of 4 topic areas shown below and participate in 2 quarters of eScience community seminars.

To officially participate, students must submit the following materials to the MCB co-directors for approval:

  1. A personal statement that:
    1. Describes why participation in the MCB Data Science Option would benefit their training and career goals
    2. Affirms their commitment to completing the requirements of the MCB Data Science Option (9 credits covering 3 of 4 topic areas and participation in 2 quarters of the required eScience community seminars).
  2. A statement from their rotation and/or thesis advisor acknowledging support for participation in the MCB Data Science Option.

Curriculum

Students participating in the MCB Data Science Option must complete 9 credits covering 3 of 4 topic areas shown below and participate in 2 quarters of eScience community seminars. Aside from these distinct requirements, all other requirements for the MCB Data Science Option are the same as the existing Doctor of Philosophy in MCB. Students are required to complete a total of 90 graduate level credits including 18 graded elective credits. While courses within the MCB Data Science Option curriculum would be allowed as elective credits, none constitute required classes for an MCB doctoral degree.

The Computational Biology Faculty and Student Area Directors can provide guidance for individual curriculum design. Upon completion of all MCB Data Science Option requirements, please notify the MCB Graduate Program to add the MCB Data Science Option to your transcript.

Data Science Topic Areas

The four MCB Data Science Topic areas are:

  1. Software Development for Data Science
  2. Statistics and Machine Learning
  3. Data Management and Data Visualization
  4. Department Specific Courses for Data Science

Unless otherwise specified, the courses below do not have prerequisites.

MCB 517A – Tools for Computational Biology (3.0 credits, AUT)
CSE 583 – Software Development for Data Scientists (4.0 credits; AUT)
GENOME 559 – Introduction to Statistical and Computational Genomics with introductory Python (3.0 credits, WIN)
ChemE 546 – Software Engineering for Molecular Data Scientists (3.0 credits, WIN)
IMT 511– Introduction to Programming for Information and Data Science (4.0 credits, SPR and SUM) *additional course fee required

STAT 556 – Introduction to Statistics and Probability (5.0 credits, AUT, fee-based)
STAT 512 – Statistical Inference (4.0 credits, AUT); Prerequisite: STAT 395 and STAT 421, STAT 423, STAT 504, or BIOST 512
STAT 513 – Statistical Inference (4.0 credits, WIN); Prerequisite: STAT 512
BIOST 511 – Medical Biometry I (4.0 credits, AUT)
BIOST 514 – Biostatistics I (4.0 credits, AUT)
BIOST 517 – Applied Biostatistics I (4.0 credits, AUT)
STAT 557 – Applied Statistics and Experimental Design (5 credits, WIN); Prerequisite: STAT/BIOST/DATA 556 or instructor’s permission
BIOST 512 – Medical Biometry II (4.0 credits, WIN); Prerequisite: BIOST 511 or BIOSTAT 517 or equivalent
BIOST 515 – Biostatistics II (4.0 credits, WIN); Prerequisite: BIOSTAT 514
BIOST 518 – Applied Biostatistics II (4.0 credits, WIN); Prerequisite: BIOSTAT 517
GENOME 560 – Introduction to statistical genomics (4.0 credits, SPR)
BIOST 526 – Bayesian Biostatistics (3.0 credits, SPR); Prerequisite: Any 400-level or higher statistics course or instructor’s permission
STAT 534 – Statistical computing (3.0 credits, SPR); Prerequisite: experience with programming in a high level language

CSE 546 – Machine Learning (4.0 credits, AUT); Prerequisite: either CSE 312, STAT 341, STAT 391 or equivalent
STAT 535 – Statistical Learning: Modeling, Prediction, and Computing (3.0 credits, AUT); Prerequisite: experience with programming in a high level language
BIOST 546 – Machine Learning for Biomedical and Public Health Big Data (3.0 credits, WIN); Prerequisite: BIOST 511 or BIOST 512 and familiarity with R
CSE 517 – Natural Language Processing (4.0 credits, WIN); Prerequisite: Course assumes familiarity with probability and programming.
STAT 548 – Machine Learning for Big Data (4.0 credits, SPR); Prerequisite: STAT 535 or CSE 546
STAT 558 – Statistical Machine Learning for Data Scientists (5.0 credits, SPR); Prerequisite: STAT/BIOST/DATA 557 or instructor’s permission
IMT 574 – Data Science II: Machine Learning and Econometrics (4.0 credits, SPR); Prerequisite: IMT 573*additional course fee required

CSE 414 – Introduction to Database Systems (4.0 credits, SPR)
CSE 544 – Principles of Database Management Systems (4.0 credits, WIN)
IMT 562 – Interactive Information Visualization (4.0 credits) *additional course fee required
CSE 512 – Data Visualization (4.0 credits, SPR)
IMT 561 – Visualization Design (4.0 credits, SPR) *additional course fee required
HCDE 511 – Information Visualization (4.0 credits, SPR)

BIOL 519 – Data Science for Biologists (4.0 credits, WIN)
CSE 527 – Computational Biology (4.0 credits, AUT)
GS 540 – Introduction to Computational Molecular Biology I (4.0 credits, WIN)
GS 541 – Introduction to Computational Molecular Biology II (4.0 credits, SPR); Prerequisite: GENOME 540
BIOST 544 – Introduction to Biomedical Data Science (4.0 credits, AUT and WIN); Prerequisite: Either BIOST 511 or equivalent, BIOST 509 or equivalent, or permission of instructor
IMT 573 – Data Science I: Theoretical Foundations (4.0 credits, WIN); Prerequisite: Either Q METH 201, IMT 570, or equivalent college coursework, CSE 142 or equivalent college coursework (college-level statistics and programming experience with R or python) *additional course fee required

Example curricula for MCB students

The following example course plans show how MCB students might design their data science curriculum if they seek to become more effective Tool Users, Tool Builders, or Tool Trainers. These example plans also include MCB requirements like first-year courses, rotations, and teaching responsibilities that need to be balanced with data science coursework.

Tool Users are primarily interested in learning how to use computers, programming, and statistics to analyze and visualize their data to more effectively answer their biological questions of interest. Research projects for Tool Users may be almost entirely experimental or a balance between experimental and computational experiments.

Tool Builders are primarily interested in learning to develop computational methods or tools for biological analysis both as an end unto itself and as a means to answering their biological questions. This group includes students interested in statistical models and software development. Research projects for Tool Builders are likely to be primarily computational.

Tool Trainers want to develop expertise in machine learning to apply these advanced statistical methods to biological systems. This option requires intensive coursework on programming, statistics, and machine learning and could easily require more than the three required classes and more than two years to complete. Research projects for Tool Trainers are likely to be entirely computational.

Year 1: Rotations

Autumn: Literature Review, Rotation, Software Development (MCB 517A (3.0 credits) or CSE 583 (4.0 credits)), Biology elective(s)
Winter: Literature Review, Rotation, Data Science (BIOL 519 (4.0 credits)), Biology elective(s), Weekly eScience Community Seminars
Spring: Grant Writing, Rotation, Data Management (CSE 414 (4.0 credits)), Biology elective(s)

Year 2: Teaching

Autumn: Statistics (STAT 556 (5.0 credits)), Weekly eScience Community Seminars
Winter: Teaching assistantship
Spring: Teaching assistantship

Year 1: Rotations

Autumn: Literature Review, Rotation, Software Development (MCB 517A (3.0 credits) or CSE 583 (4.0 credits))
Winter: Literature Review, Rotation, Data Science (BIOL 519 (4.0 credits)), Biology elective(s), Weekly eScience Community Seminars
Spring: Grant Writing, Rotation, Biology elective(s)

Year 2: Teaching

Autumn: Computational biology (CSE 527 (4.0 credits)), Weekly eScience Community Seminars
Winter: Data management (CSE 544 (4.0 credits)) Teaching assistantship
Spring: Statistics (GENOME 560 (4.0 credits))
Summer: Alternate teaching assistantship

Year 1: Rotations

Autumn: Literature Review, Rotation, Software Development (MCB 517A (3.0 credits))
Winter: Literature Review, Rotation, Data Science (BIOL 519 (4.0 credits)), Biology elective(s), Weekly eScience Community Seminars
Spring: Grant Writing, Rotation, Statistics (GENOME 560 (4.0 credits))
Summer: Alternate teaching assistantship (if doing bioquest)

Year 2: Teaching

Autumn: Computational biology (CSE 527 (4.0 credits), Weekly eScience Community Seminars
Winter: Computational biology (GENOME 540 (4.0 credits))
Spring: Teaching assistantship OR GENOME 541 (4.0 credits))
Summer: Alternate teaching assistantship (if needed)

Year 1: Rotations

Autumn: Literature Review, Rotation, Software Development (CSE 583 (4.0 credits))
Winter: Literature Review, Rotation, Computational biology (GENOME 540 (4.0 credits) or STAT 534 (3.0 credits)), Weekly eScience Community Seminars
Spring: Grant Writing, Rotation, Statistics (GENOME 560 (4.0 credits))
Summer: Biology elective

Year 2: Teaching

Autumn: Machine learning (STAT 535 (4.0 credits)), Weekly eScience Community Seminars
Winter: Data management (CSE 544 [4.0 credits)), Teaching assistantship
Spring: Machine learning (STAT 548 (4.0 credits))
Summer: Alternate teaching assistantship (if needed)