IsoInfer is a C/C++ program to infer isoforms based on short RNA-Seq (single-end and paired-end) reads, exon-intron boundary and TSS/PAS information. This version of IsoInfer uses a unified way to handle different types of short reads with different lengths. The source code is provided for non-commercial usage. We appologize for the unavailability of the Windows and Mac versions of this program at the present time.
Install the following C/C++ libraries: glpk, gsl and QuadProg++. If you
cannot install those packages in the standard system directories but
install them at "/your/installed/path/" for example, you have to modify the
environment variables LD_LIBRARY_PATH and CXXFLAGS by :
$export CXXFLAGS="-I/your/installed/path/include -L/your/installed/path/lib"
$export LD_LIBRARY_PATH="/your/installed/path/lib:"$LD_LIBRARY_PATH
Compile the graphlib package in the source code. The compilation follows the standard
configure, make process by execute the following commands in sequence:
$./configure
$make
Compile the isoinfer package in the source code. The compilation follows the standard
configure, make process by execute the following commands in sequence:
$./configure
$make
Note that it is necessary to keep the graphlib and isoinfer under the same directory. Environment variable LD_LIBRARY_PATH should be exported like the first step every time when you open a new shell to run the program. I suggest you putting it in the .bashrc file in your home directory.
-h | Print help information |
-ext_junc_ref | Extract junction ref sequence. -rstart, -bound, -grange, -tsspas, -ref, -read_info are required. |
-gen_instance | Generate instances of problem for IsoInfer. Expression level will be used to define expressed segments. A segment is expressed if the expression level on this segment is above the expression level specified by -noise. -bound, -grange, -tsspas and -read_info are required. |
-predict | Infer isoforms provided the instances generated by -predict. -ins, -conf_level, -minexp, -mindup, -ps, -bpe, -bse are required. |
-rstart | number | For job -ext_junc_ref, the parameter specifies the start position of the first neocliotide of a chromosome. This parameter is to make sure that the coordinations used in the program is consistent with the coordinations provided by -bound, -grange and -tsspas. Default 0. |
-bound | file | Boundary file. The format of the file is : chromosome strand position type
|
-grange | file | Gene range file. The format of the file is : gene_name chromosome strand start_position end_position
|
-tsspas | file | TSS and PAS file. The format of the file is : gene_name TSSs PASs
|
-ref | file | Reference sequence in a single file. |
-m | file | A file containing the mapping information of short reads to the ref sequence. The format of this file is: chromosome strand start_positions end_positions
|
-read_info | file | A file storing the basic read information. The format of this file is: mapping_file 0/1 [end_len] cross_strength noise_level total_read_cnt distribution_type "definition of a distribution" The format for the mapping_file is : chromosome strand start_positions end_positions
my_map_file 1 20 3 5 10000000 1 300, 30 After one read info, another one could be followed in the same file. On job -ext_junc_ref, only the first read info in the file is effective. If, on some job, not all the information in the read info is usefully, then the unused items can be set to any value. |
-s | T/F | Whether the operations are strand specific or not? Default F |
-ins | file | A file containing instances. |
-bse | T/F | Use the TSS/PAS information or not. |
-min_exp | number | The minimum expression level. Default 0. |
-min_dup | number | The minimum effective duplication of part comb. Default 1. This parameter is effective when paired-end reads are available. |
-ps | number | Partition size. Default 7. On whole mouse genome, the isoform inference process (Step4 in the following example) costs about 10 minutes on a standard PC with this default parameter. A larger value is supposed to lead to better results. |
-noise | number | The noise level in RPKM. When doing job -gen_instance, a segment with expression level below the number specified in this parameter will be considered as an intron. Default 0 |
-conf_level | number in [0,1] | Set the confidence level. Default 0.05. |
-o | file | A file for output
|
The following example is based on single-end short reads. In the following example, an example read_info file and several useful scripts are provided. Because of security reasons, the ".pl" suffix of all the scripts are deleted. The usages of all the scripts are straight forward. Please read the script for the usages.
Jianxing Feng, Wei Li and Tao Jiang. Inference of isoforms from short sequence reads. 2010. Accepted by RECOMB 2010.
Please email to:
jianxing
TA
cs.ucr.edu