Coding/Development Resources¶
This is a page that contains little tidbits that in our lab we found useful for getting work done. A lot of random small things that we always get hung-up on so we compiled it on one place.
VM Linux Management¶
Allocating Additional Disk Space¶
- Allocate additional space in VM Hypervisor
- Follow this tutorial to repartition the VM disk
Changing Linux Hostname in Ubuntu¶
hostnamectl set-hostname new-hostname
Python¶
Profiling and creating a visualzaiton¶
Link.
Worked well for me. Specifically cProfile + gprof2dot + graphViz. 
python -m cProfile -o myLog.profile ./test.py
gprof2dot -f pstats myLog.profile -o callingGraph.dot
dot -Tsvg callingGraph.dot -o callingGraph.svg
Creating chunks of big list¶
# Yield successive n-sized
# chunks from l.
def divide_chunks(l, n):
    # looping till length l
    for i in range(0, len(l), n): 
        yield l[i:i + n]
Building your Python package with¶
Check the new pyproject.toml file example data/pyproject_template.toml to configure the building process with poetry.
python3 -m build
Uploading a package to Pypi¶
It is recommended to test everything in TestPypiy first
- Create an account in Pypi and/or TestPypi
- Create your API token in Pypi or TestPypi using a 2FA application. Save the token in a file .pypirc in your home directory (~). This is the config file used for pypi data/.pypirc
- Install twine:
python3 -m pip install --upgrade twine
- Upload to testpypi or pypi
python3 -m twine upload --repository testpypi dist/*
python3 -m twine upload --repository pypi dist/*
4.1. Install from testpypi:
python3 -m pip install --index-url https://test.pypi.org/simple/ your-package
4.2. Install from pypi:
python3 -m pip install your-package
NextFlow¶
Flags for Command Line¶
script:
def flag = params.ms2_flag == true ? "--ms2_flag" : ''
"""
$flag
"""
Enable docker in NextFlow¶
Add the following lines in the nextflow.config file:
process.container = 'nextflow/examples:latest'
docker.enabled = true
Different Docker image for each process:
process foo {
  container 'image_name_1'
  '''
  do this
  '''
}
process bar {
  container 'image_name_2'
  '''
  do that
  '''
Execute a task over n files in parallel¶
Channel.fromPath( '<path>*.<file_format_to_process>' ) # The path is usually read from the command line argument. 
Mitigate Input File Name Collisions¶
process create_file {
    input:
    each dummy
    output:
    path 'test.txt', emit: x
    """
    touch test.txt
    """
}
process mergeResults {
    conda "$TOOL_FOLDER/conda_env.yml"
    input:
    path tests, stageAs: "./test/test*.csv"
    output:
    path 'all_tests.csv'
    """
    cat test/*.csv > all_tests.csv
    """
}
workflow {
    // Run Create File a Bunch
    ch = Channel.from([1,2,3,4,5,6])
    create_file(ch) 
    mergeResults(create_file.out.x.collect())
}
Merge Large Number of CSV/TSV files¶
// Merge results in chunks of size params.merge_batch_size
process chunkResults {
    conda "$TOOL_FOLDER/conda_env.yml"
    input:
    path to_merge, stageAs: './results/*'  // A directory of files, "results/*"
    output:
    path "batched_results.tsv"
    """
    python $TOOL_FOLDER/tsv_merger.py // Your Merge Script Here
    """
}
// Use a separate process to merge all the batched results
process mergeResults {
    conda "$TOOL_FOLDER/conda_env.yml"
    input:
    path to_merge, stageAs: './results/batched_results*.tsv' // Will automatically number inputs to avoid name collisions
    // Note to_merge can also be replcaed by an single path (e.g., path 'test_results.csv', stageAs: './results/batched_results*.tsv')
    output:
    path 'merged_results.tsv'
    """
    python $TOOL_FOLDER/tsv_merger.py // Your Merge Script Here
    """
}
workflow {
    // Perform Computation
    results = doSomething()
    // Split outputs into chunks and merge
    chunked_results = chunkResults(results.buffer(size: params.merge_batch_size, remainder: true))
    // Collect all the batched results and merge them at the end
    merged_results = mergeResults(chunked_results.collect())
}
Git¶
Cleaning out already merged branches¶
git branch --merged | grep -v "\*" | xargs -n 1 git branch -d
Updating all submodules to the latest commit¶
git submodule update --remote --merge
Command Line¶
Stop all running python tasks¶
ps -u $(whoami) | grep '[p]ython' | awk '{print $1}' | xargs kill -9
For a dry run, use:
ps -u $(whoami) | grep '[p]ython' | awk '{print $1}'
Adding users¶
Add the user:
sudo useradd -m -s /bin/bash <username>
Create a passowrd for the user:
sudo passwd <username>
Add the user to the sudo group (if applicable)
sudo usermod -aG sudo <username>
Add user to any other group
sudo usermod -aG <groupname> <username>
Make sure folder permissions are correct
chmod 700 ~/.ssh && chmod 600 ~/.ssh/authorized_keys
Docker¶
Start an interactive terminal in a container
docker exec -it <container_name_or_id> /bin/bash
Share a docker volume in different images in a docker-compose cluster
services:
  servie1:
    image: <image_name>
    volumes:
      - <name_of_docker_volume>: <path_of_the_image>
...
# In all images where the docker volume wants to be shared the volume should be added
volumes:
    <name_of_docker_volume>
Mount a host file system in different images in a docker-compose cluster
# In all images where the file system needs to be mount
services:
  servie1:
    image: <image_name>
    volumes:
    - <host_local_path>:<docker_path>
Killing all docker containers
docker kill $(docker ps -q)
Cleaning Up Images/Containers
docker system prune
SLURM¶
Since we utilize SLURM internally here are some things that help us figure out whats going on with the clusters:
Show allocations
scontrol show node