Wang Lab Bootcamp¶
There is just so much to learn when starting to do research in a lab that does so many different things. It can be quite daunting. Here is a quick list of skills you may or may not have, but more importantly, it lays out some resources where you can gain these skills. The most important is for all of us to understand where you are technically, where you interests lie, and to find the right project where you can contribute and simultaneously grow both technically and as a scientist.
Computing Skills¶
Here is a list of high level computing skills that are using in projects in the lab. Not all projects require all of the skills, but will require generally more than one.
It is good to have an assessment of your skills so that we pair you with the right project and also provide you the right resources to get up to speed when you need them.
- Python
- Linux Command Line
- Conda Environment Management
- NextFlow Workflow Proficiency
- CS Data Structure
- CS Algorithms
- Data Science - Data Manipulation
- Data Science - Data Visualization
- Containerization
- Web Tools
- Batch Computing
- Source Version Control
- Remote Computing
Python Language¶
You are familiar with the python3 programming language. This includes how to write functions, dependency installation (pip), create modules, commandline tools, and testing.
Here are a few resources to get you up to speed:
- https://wiki.python.org/moin/BeginnersGuide/Programmers
Linux Command Line¶
You are familiar with the Linux command line to run tools, manipulate files smoothly, and install packages.
Here are a few resources to get you up to speed:
- https://ubuntu.com/tutorials/command-line-for-beginners
Conda Environment Management¶
You are familiar with Conda in order to manipulate development and dependency environments.
Here are a few resources to get you up to speed:
- https://conda.io/projects/conda/en/latest/user-guide/getting-started.html
NextFlow Workflow Development¶
You are familiar with NextFlow workflow environment. You are able to
- Run a nextflow workflow
- Write a nextflow workflow from scratch
Here are a few resources to get you up to speed:
- https://training.seqera.io/
Computer Science Data Structures¶
You are familiar with common data science data structures, their properties, their advantages/disadvantages, and common algorithms that utilize them.
This is generally covered in CS 014 at UC Riverside.
Computer Science Algorithms¶
You are familiar with algorithms and their application.
This is generally covered in CS 141 at UC Riverside.
Data Science - Data Manipulation¶
You are familiar with how to read and parse data and manipulate it, e.g. filtering, sorting, grouping, pivoting, melting, joining, cleaning, etc.
You can learn how to do this with the following resources:
- https://pandas.pydata.org/pandas-docs/stable/user_guide/10min.html
Data Science - Data Visualization¶
You are familiar with how to summarize and visualize data. This can include the following visualization techniques: histograms, 2D histograms, bar plots, scatter plots, box plots, etc.
You can learn how to get started with the following resources:
- https://seaborn.pydata.org/tutorial/introduction.html
- https://www.fireblazeaischool.in/blogs/data-visualization-using-plotly/
Machine Learning/Deep Learning¶
Lorem Ipsum
Containerization, e.g. Docker, Kubernetes¶
You are familiar with how to containerize your applications with docker and docker-compose.
The following resources are a decent place to start:
- https://stackify.com/docker-tutorial/
Web Tools/Services¶
You are familiar with how to build interactive web applications. In our lab, we recommend using Dash and Flask.
The following reosurces are a decent place to get started:
- https://dash.plotly.com/
Additionally, the Wang Lab has its own templates for building these applications - check them out here
Batch Computing/Kubernetes¶
You are familiar with how to run large numbers of tasks in HPC environments like batch systems or Kubernetes.
Source Version Control¶
You are familiar with how to use source code version control, specifically git and github.
Please review the following topics if you are not familiar
For a quick reference, if you are working with a repository that exists and want to contribute your own code into a branch, you'll want to do the following
# Creating a branch
git branch my-new-branch
# Checkout this branch
git checkout my-new-branch
# Add files to add to this branch
git add new_file.txt
# Commit the data
git commit -m "Adding new file"
# Push the branch to GitHub
git push --set-upstream origin my-new-branch
Remote Computing¶
You are familiar with how to set up a remote workstation over ssh.
Development Environment¶
We recommend to use VS Code for software development as it is a powerful platform for text editing, debugging, and running software.
Mass Spectrometry Background¶
We want you to have an understanding of mass spectrometry for working with the lab. Resources to come!