Coding/Development Resources¶
This is a page that contains little tidbits that in our lab we found useful for getting work done. A lot of random small things that we always get hung-up on so we compiled it on one place.
VM Linux Management¶
Allocating Additional Disk Space¶
- Allocate additional space in VM Hypervisor
- Follow this tutorial to repartition the VM disk
Changing Linux Hostname in Ubuntu¶
hostnamectl set-hostname new-hostname
Python¶
Profiling and creating a visualzaiton¶
Link.
Worked well for me. Specifically cProfile + gprof2dot + graphViz
.
python -m cProfile -o myLog.profile ./test.py
gprof2dot -f pstats myLog.profile -o callingGraph.dot
dot -Tsvg callingGraph.dot -o callingGraph.svg
Creating chunks of big list¶
# Yield successive n-sized
# chunks from l.
def divide_chunks(l, n):
# looping till length l
for i in range(0, len(l), n):
yield l[i:i + n]
Building your Python package with¶
Check the new pyproject.toml file example data/pyproject_template.toml to configure the building process with poetry.
python3 -m build
Uploading a package to Pypi¶
It is recommended to test everything in TestPypiy first
- Create an account in Pypi and/or TestPypi
- Create your API token in Pypi or TestPypi using a 2FA application. Save the token in a file .pypirc in your home directory (~). This is the config file used for pypi data/.pypirc
- Install twine:
python3 -m pip install --upgrade twine
- Upload to testpypi or pypi
python3 -m twine upload --repository testpypi dist/*
python3 -m twine upload --repository pypi dist/*
4.1. Install from testpypi:
python3 -m pip install --index-url https://test.pypi.org/simple/ your-package
4.2. Install from pypi:
python3 -m pip install your-package
NextFlow¶
Flags for Command Line¶
script:
def flag = params.ms2_flag == true ? "--ms2_flag" : ''
"""
$flag
"""
Enable docker in NextFlow¶
Add the following lines in the nextflow.config file:
process.container = 'nextflow/examples:latest'
docker.enabled = true
Different Docker image for each process:
process foo {
container 'image_name_1'
'''
do this
'''
}
process bar {
container 'image_name_2'
'''
do that
'''
Execute a task over n files in parallel¶
Channel.fromPath( '<path>*.<file_format_to_process>' ) # The path is usually read from the command line argument.
Mitigate Input File Name Collisions¶
process create_file {
input:
each dummy
output:
path 'test.txt', emit: x
"""
touch test.txt
"""
}
process mergeResults {
conda "$TOOL_FOLDER/conda_env.yml"
input:
path tests, stageAs: "./test/test*.csv"
output:
path 'all_tests.csv'
"""
cat test/*.csv > all_tests.csv
"""
}
workflow {
// Run Create File a Bunch
ch = Channel.from([1,2,3,4,5,6])
create_file(ch)
mergeResults(create_file.out.x.collect())
}
Merge Large Number of CSV/TSV files¶
// Merge results in chunks of size params.merge_batch_size
process chunkResults {
conda "$TOOL_FOLDER/conda_env.yml"
input:
path to_merge, stageAs: './results/*' // A directory of files, "results/*"
output:
path "batched_results.tsv"
"""
python $TOOL_FOLDER/tsv_merger.py // Your Merge Script Here
"""
}
// Use a separate process to merge all the batched results
process mergeResults {
conda "$TOOL_FOLDER/conda_env.yml"
input:
path to_merge, stageAs: './results/batched_results*.tsv' // Will automatically number inputs to avoid name collisions
// Note to_merge can also be replcaed by an single path (e.g., path 'test_results.csv', stageAs: './results/batched_results*.tsv')
output:
path 'merged_results.tsv'
"""
python $TOOL_FOLDER/tsv_merger.py // Your Merge Script Here
"""
}
workflow {
// Perform Computation
results = doSomething()
// Split outputs into chunks and merge
chunked_results = chunkResults(results.buffer(size: params.merge_batch_size, remainder: true))
// Collect all the batched results and merge them at the end
merged_results = mergeResults(chunked_results.collect())
}
Git¶
Cleaning out already merged branches¶
git branch --merged | grep -v "\*" | xargs -n 1 git branch -d
Updating all submodules to the latest commit¶
git submodule update --remote --merge
Command Line¶
Stop all running python tasks¶
ps -u $(whoami) | grep '[p]ython' | awk '{print $1}' | xargs kill -9
For a dry run, use:
ps -u $(whoami) | grep '[p]ython' | awk '{print $1}'
Adding users¶
Add the user:
sudo useradd -m -s /bin/bash <username>
Create a passowrd for the user:
sudo passwd <username>
Add the user to the sudo group (if applicable)
sudo usermod -aG sudo <username>
Add user to any other group
sudo usermod -aG <groupname> <username>
Make sure folder permissions are correct
chmod 700 ~/.ssh && chmod 600 ~/.ssh/authorized_keys
Docker¶
Start an interactive terminal in a container
docker exec -it <container_name_or_id> /bin/bash
Share a docker volume in different images in a docker-compose cluster
services:
servie1:
image: <image_name>
volumes:
- <name_of_docker_volume>: <path_of_the_image>
...
# In all images where the docker volume wants to be shared the volume should be added
volumes:
<name_of_docker_volume>
Mount a host file system in different images in a docker-compose cluster
# In all images where the file system needs to be mount
services:
servie1:
image: <image_name>
volumes:
- <host_local_path>:<docker_path>
Killing all docker containers
docker kill $(docker ps -q)
Cleaning Up Images/Containers
docker system prune
SLURM¶
Since we utilize SLURM internally here are some things that help us figure out whats going on with the clusters:
Show allocations
scontrol show node