MCB Data Science Option

Overview

The Molecular and Cellular Biology (MCB) graduate program is an interdisciplinary graduate program encompassing all aspects of molecular and cellular biology. The MCB Data Science Option will provide interested students within the program with specific training in the fundamentals of Data Science methods and applications. Specifically, this curriculum will enable students to become effective Tool Users, Tool Builders, or Tool Trainers according to their research and career goals. Tool Users are primarily interested in learning how to use computers, programming, and statistics to analyze and visualize their data to more effectively answer their biological questions of interest. Tool Builders are primarily interested in learning to develop computational methods or tools for biological analysis both as an end unto itself and as a means to answering their biological questions. Tool Trainers want to develop expertise in machine learning to apply these advanced statistical methods to biological systems. In addition to coursework to build these competencies, the MCB Data Science option will include exposure to and engagement with the broader Data Science community at University of Washington through the eScience institute.

Requirements and how to apply

Students participating in the MCB Data Science Option must complete 9 credits covering 3 of 4 topic areas shown below and participate in 2 quarters of eScience community seminars.

To officially participate, students must submit the following materials to the MCB directors for approval:

  1. A personal statement that:
    1. describes why participation in the MCB Data Science Option would benefit their training and career goals
    2. affirms their commitment to completing the requirements of the MCB Data Science Option (9 credits covering 3 of 4 topic areas and participation in 2 quarters of the required eScience community seminars).
  2. A statement from their rotation and/or thesis advisor acknowledging support for participation in the MCB Data Science Option.

MCB Directors can be found here: Richard Gardner and Nina Salama

Curriculum

Students participating in the MCB Data Science Option must complete 9 credits covering 3 of 4 topic areas shown below and participate in 2 quarters of eScience community seminars. Aside from these distinct requirements, all other requirements for the MCB Data Science Option are the same as the existing Doctor of Philosophy in MCB. Students are required to complete a total of 90 graduate level credits including 18 graded elective credits. While courses within the MCB Data Science Option curriculum would be allowed as elective credits, none constitute required classes for an MCB doctoral degree.

The Computational Biology Faculty and Student Area Directors can provide guidance for individual curriculum design (and see example course plans below). Upon completion of all MCB Data Science Option requirements, please notify the MCB Graduate Program to add the MCB Data Science Option to your transcript.

Data Science Topic Areas

The four MCB Data Science Topic areas are:

  1. Software Development for Data Science
  2. Statistics and Machine Learning
  3. Data Management and Data Visualization
  4. Department Specific Courses for Data Science

Unless otherwise specified, the courses below do not have prerequisites.

Software Development for Data Science

  • MCB 517A – Tools for Computational Biology (3 credits, Fall)
  • CSE 583 – Software Development for Data Scientists (4 credits; Fall)
  • GENOME 559 – Introduction to Statistical and Computational Genomics with introductory Python (3 credits, Winter?)
  • ChemE 546 – Software Engineering for Molecular Data Scientists (3 credits, Winter)
  • IMT 511– Introduction to Programming for Information and Data Science (4 credits, Spring and Summer)

Statistics and Machine Learning

Statistics

  • STAT 556 – Introduction to Statistics and Probability (5 credits, Fall) – Fee-based
  • STAT 512 – Statistical Inference (4 credits, Fall); prerequisite: STAT 395 and STAT 421, STAT 423, STAT 504, or BIOST 512
  • STAT 513 – Statistical Inference (4 credits, Winter); prerequisite: STAT 512
  • BIOST 511 – Medical Biometry I (4 credits, Fall)
  • BIOST 514 – Biostatistics I (4 credits, Fall)
  • BIOST 517 – Applied Biostatistics I (4 credits, Fall)
  • STAT 557 – Applied Statistics and Experimental Design (5 credits, Winter); prerequisite: STAT/BIOST/DATA 556 or instructor’s permission
  • BIOST 512 – Medical Biometry II (4 credits, Winter); BIOST 511 or BIOSTAT 517 or equivalent
  • BIOST 515 – Biostatistics II (4 credits, Winter); BIOSTAT 514
  • BIOST 518 – Applied Biostatistics II (4 credits, Winter); BIOSTAT 517
  • GENOME 560 – Introduction to statistical genomics (4 credits, Spring)
  • BIOST 526 – Bayesian Biostatistics (3 credits, Spring); requires any 400-level or higher statistics course or instructor’s permission
  • STAT 534 – Statistical computing (3 credits, Spring); prerequisite: experience with programming in a high level language

Machine Learning

  • CSE 546 – Machine Learning (4 credits, Fall); prerequisite: either CSE 312, STAT 341, STAT 391 or equivalent
  • STAT 535 – Statistical Learning: Modeling, Prediction, and Computing (3 credits, Fall); prerequisite: experience with programming in a high level language
  • BIOST 546 – Machine Learning for Biomedical and Public Health Big Data (3 credits, Winter); prerequisite: BIOST 511 or BIOST 512 and familiarity with R
  • CSE 517 – Natural Language Processing (4 credits, Winter); Course assumes familiarity with probability and programming.
  • STAT 548 – Machine Learning for Big Data (4 credits, Spring); requires either STAT 535 or CSE 546
  • STAT 558 – Statistical Machine Learning for Data Scientists (5 credits, Spring); prerequisite: STAT/BIOST/DATA 557 or instructor’s permission
  • IMT 574 – Data Science II: Machine Learning and Econometrics (4 credits, Spring); prerequisite: IMT 573

Data Management and Data Visualization

  • CSE 414 – Introduction to Database Systems (4 credits, Spring)
  • CSE 544 – Principles of Database Management Systems (4 credits, Winter?)
  • IMT 562 – Interactive Information Visualization (4 credits, ?)
  • CSE 512 – Data Visualization (4 credits, Spring)
  • IMT 561 – Visualization Design (4 credits, Spring)
  • HCDE 511 – Information Visualization (4 credits, Spring)

Department-specific Courses for Data Science

  • BIOL 519 – Data Science for Biologists (4 credits, Winter)
  • CSE 527 – Computational Biology (4 credits, Fall)
  • GS 540 – Introduction to Computational Molecular Biology I (4 credits, Winter)
  • GS 541 – Introduction to Computational Molecular Biology II (4 credits, Spring); prerequisite: GENOME 540
  • BIOST 544 – Introduction to Biomedical Data Science (4 credits, Fall and Winter); prerequisite: either BIOST 511 or equivalent; either BIOST 509 or equivalent; or permission of instructor
  • IMT 573 – Data Science I: Theoretical Foundations (4 credits, Winter?); prerequisite: either Q METH 201, IMT 570, or equivalent college coursework; either CSE 142, or equivalent college coursework (college-level statistics and programming experience with R or python)

Example curricula for MCB students

The following example course plans show how MCB students might design their data science curriculum if they seek to become more effective Tool Users, Tool Builders, or Tool Trainers. These example plans also include MCB requirements like first-year courses, rotations, and teaching responsibilities that need to be balanced with data science coursework.

Tool Users are primarily interested in learning how to use computers, programming, and statistics to analyze and visualize their data to more effectively answer their biological questions of interest. Research projects for Tool Users may be almost entirely experimental or a balance between experimental and computational experiments.

Tool Builders are primarily interested in learning to develop computational methods or tools for biological analysis both as an end unto itself and as a means to answering their biological questions. This group includes students interested in statistical models and software development. Research projects for Tool Builders are likely to be primarily computational.

Tool Trainers want to develop expertise in machine learning to apply these advanced statistical methods to biological systems. This option requires intensive coursework on programming, statistics, and machine learning and could easily require more than the three required classes and more than two years to complete. Research projects for Tool Trainers are likely to be entirely computational.

Tool Users: apply computational tools to answer biological questions

  • Year 1: Rotations
    • Fall
      • Literature review
      • Rotation
      • Software development: MCB 517A (3) or CSE 583 (4)
      • Biology elective(s)
    • Winter
      • Literature review
      • Rotation
      • Data science: BIOL 519 (4)
      • Biology elective(s)
      • Weekly eScience Community Seminars
    • Spring
      • Grant writing
      • Rotation
      • Data management: CSE 414 (4)
      • Biology elective(s)
    • Year 2: Teaching
      • Fall
        • Statistics: STAT 556 (5)
        • Weekly eScience Community Seminars
      • Winter
        • Teaching assistantship
      • Spring
        • Teaching assistantship

Tool Builders: develop computational methods or tools for biology

  • Year 1: Rotations
    • Fall
      • Literature review
      • Rotation
      • Software development: MCB 517A (3) or CSE 583 (4)
    • Winter
      • Literature review
      • Rotation
      • Data science: BIOL 519 (4)
      • Weekly eScience Community Seminars
    • Spring
      • Grant writing
      • Rotation
      • Biology elective
    • Year 2: Teaching (recommend alternate teaching assistantship in summer)
      • Fall
        • Computational biology: CSE 527 (4)
        • Weekly eScience Community Seminars
      • Winter
        • Data management: CSE 544 (4)
        • Teaching assistantship
      • Spring
        • Statistics: GENOME 560 (4)
      • Summer
        • Alternate teaching assistantship

Tool Builders in Genomics

  • Year 1: Rotations
    • Fall
      • Literature review
      • Rotation
      • Software development: MCB 517A (3)
    • Winter
      • Literature review
      • Rotation
      • Data science: BIOL 519 (4)
      • Weekly eScience Community Seminars
    • Spring
      • Grant writing
      • Rotation
      • Statistics: GENOME 560 (4)
    • Summer
      • Alternate TAship (if doing bioquest)
    • Year 2: Teaching
      • Fall
        • Computational biology: CSE 527 (4)
        • Weekly eScience Community Seminars
      • Winter
        • Computational biology: GENOME 540 (4)
      • Spring
        • TA
        • Or GENOME 541 (4)
      • Summer
        • Alternate TAship (if needed)

Tool Trainers: apply machine learning to biological systems

  • Year 1: Rotations
    • Fall
      • Literature review
      • Rotation
      • Software development: CSE 583 (4)
    • Winter
      • Literature review
      • Rotation
      • Computational biology: GENOME 540 (4) or STAT 534 (3)
      • Weekly eScience Community Seminars
    • Spring
      • Grant writing
      • Rotation
      • Statistics: GENOME 560 (4)
    • Summer
      • Biology elective?
    • Year 2: Teaching (recommend alternate teaching assistantship in summer)
      • Fall
        • Machine learning: STAT 535 (4)
        • Weekly eScience Community Seminars
      • Winter
        • Data management: CSE 544 (4)
        • Teaching assistantship
      • Spring
        • Machine learning: STAT 548 (4)
      • Summer
        • Alternate teaching assistantship