An unprecedented wealth of data is being generated by large genome/metagenome/epigenetic projects and other efforts to determine the structure and function of molecular biological systems. Thistechnical elective will focus on a selection of algorithms and data structures aimed at the analysis of biomolecular data. In other words, CS 144 is a Data Science class oriented at the analysis of biomolecular data.
Catalog Description
Introduces fundamental algorithms and data structures for solving analytical problems in molecular biology and genomics. Includes exact and approximate string matching; sequence alignment; genome assembly; and gene and regulatory motifs recognition.
Note: Credit is awarded for one of the following CS 144, CS 234, or CS 238.
Faisal Bin Ashraf, email, office MRB 3rd floor (cubicles)
Class Meeting
TR 12:30pm-1:50pm, Physics, Room 2104
Discussions
W 8:00pm-8:50pm, Sproul Hall, Room 2340
Office hours
Stefano: Fridays 1-2pm (or by appointment), Zoom meeting ID 976 1037 9494, check Canvas for password
Faisal: Fridays 12noon-1pm (or by appointment), Zoom meeting ID 997 8978 5948, check Canvas for password
Discussion Forum
We will use a Discord server for discussion and questions about CS 144 (and beyond -- religion and politics excluded). The forum will be moderated by the instructor and the TA who will respond to questions, but students are encouraged to help each other via discussion. However, assignment specifics should not be discussed. Please check Canvas for details about Discord. Please be respectful.
Intro to molecular and computational biology, including biotech tools
Space-efficient data structures for sequences
Short read mapping (suffix tries/trees, suffix arrays, B-W transform)
Sequence alignment (global and local), linear space, multiple
Genome assembly, overlap graphs, de Bruijn graphs
Hidden Markov models, Profile HMM, Viterbi and Baum-Welch learning
Motif finding and Gibbs sampling
Construction of evolutionary trees (phylogeny)
Course Format
Seven individual homework to be developed on JupyterLab (50% of the grade)
One programming project (50% of the grade)
Cheating
We will not tolerate any kind of cheating in this course. Homework and final project are to be completed on your own. The only external sources allowed are those mentioned above or by the instructor throughout the course. If you have a doubt or question, please just ASK. As per standard UCR policy, you may not submit answers (written or programming) to problem sets that contain material you did not produce yourself for the express purpose of this offering of this course. If I find that you have submitted work that is not your own or is work you submitted in a different course, I will assign you a zero on that assignment (and possibly a zero on the entire course, depending on the severity), and I will forward the case to Student Conduct and Academic Integrity Programs for campus-level consideration.
Late work
Each student is granted five "late days" which can be used (in integer units) on any of the homework. If a more dire situation arises, please contact the instructor.
Homework (in the form of Python notebooks) will be released on Sundays on Canvas (go to Assignments), and they will be due the following Sunday at 11:59pm. Download these Python notebooks on your computer, then upload them into JupyterLab. Homework will have to be completed using CS department’s Juypter Hub server at https://locus.cs.ucr.edu/. Submit your Python notebook on Canvas by the due date. Solutions will be posted on Canvas.
Calendar
Week 1
Tuesday, Jan 9: Intro, Molecular Biology
Thursday, Jan 11: Molecular Biology
Sunday, Jan 14: [hw1 posted]
Week 2
Tuesday, Jan 16: Molecular Biology
Thursday, Jan 18: Read Mapping
Sunday, Jan 21: [hw1 due], [hw2 posted]
Week 3
Tuesday, Jan 23: Read Mapping
Thursday, Jan 25: Read Mapping
Sunday, Jan 28: [hw2 due], [hw3 posted]
Week 4
Tuesday, Jan 30: Discussion of projects
Thursday, Feb 1: Sequence Alignment
Sunday, Feb 4: [hw3 due], [hw4 posted]
Week 5
Tuesday, Feb 6: Sequence Alignment
Thursday, Feb 8: Genome Assembly
Sunday, Feb 11: [hw4 due]
Week 6
Tuesday, Feb 13: Genome Assembly
Thursday, Feb 15: HMM
Sunday, Feb 18: [hw5 posted]
Week 7
Tuesday, Feb 20: HMM
Thursday, Feb 22: HMM
Sunday, Feb 25: [hw5 due], [hw6 posted]
Week 8
Tuesday, Feb 27: Motif finding
Thursday, Feb 29: Motif finding
Sunday, Mar 3: [hw6 due], [hw7 posted]
Week 9
Tuesday, Mar 5: Evolutionary trees
Thursday, Mar 7: Evolutionary trees
Sunday, Mar 10: [hw7 due]
Week 10
Tuesday, Mar 12: Evolutionary trees, Concluding remarks
Thursday, Mar 14:
Sunday, Mar 17: [project due]
Finals' Week
Project demo
Additional References
(HMMs) Richard Durbin, A. Krogh, G. Mitchison, and S. Eddy, Biological Sequence Analysis : Probabilistic Models of Proteins and Nucleic Acids, Cambridge University Press, 1999.
(Suffix Trees) Dan Gusfield, Algorithms on Strings, Trees and Sequences - Computer Science and Computational Biology, Cambridge University Press, 1997.
(Algorithms) Dan E. Krane, Michael L. Raymer, Fundamental Concepts of Bioinformatics, Benjamin Cummings 2002
(Algorithms) Neil C. Jones and Pavel Pevzner, An Introduction to Bioinformatics Algorithms, MIT Press, 2004
(Algorithms) Marketa Zvelebil, Jeremy O. Baum, Understanding Bioinformatics, Garland Science, 2007
Additional resources
Learn how to Fold it! A great game about protein folding that can help the scientific community