Install the following C/C++ libraries: glpk, gsl and QuadProg++. If you
cannot install those packages in the standard system directories but
install them at "/your/installed/path/" for example, you have to modify the
environment variables LD_LIBRARY_PATH and CXXFLAGS by :
$export CXXFLAGS="-I/your/installed/path/include -L/your/installed/path/lib"
$export LD_LIBRARY_PATH="/your/installed/path/lib:"$LD_LIBRARY_PATH
Compile the graphlib package in the source code. The compilation follows the standard
configure, make process by executing the following commands in sequence:
$./configure
$make
Compile the isoinfer package in the source code. The compilation follows the standard
configure, make process by executing the following commands in sequence:
$./configure
$make
Note that it is necessary to keep the graphlib and isoinfer under the same directory. Environment variable LD_LIBRARY_PATH should be exported like the first step every time when you open a new shell to run the program. I suggest you putting it in the .bashrc file in your home directory.
-h | Print help information |
-ext_junc_ref | Extract junction ref sequence. Involved parameters are -rstart, -bound, -grange, -tsspas, -ref and -read_info. |
-gen_instance | Generate instances of problem for IsoInfer. Expression level will be used to define expressed segments. A segment is expressed if the expression level on this segment is above the noise level specified in the read_info file. Involved parameters are -bound, -grange, -tsspas and -read_info. |
-predict | Infer isoforms provided the instances generated by -predict. Involved parameters are -ins, -read_info, -conf_level, -min_exp, -ps, -min_dup, -bse and -o. |
-rstart | number | For job -ext_junc_ref, the parameter specifies the start position of the first neocliotide of a chromosome. This parameter is to make sure that the coordinations used in the program is consistent with the coordinations provided by -bound, -grange and -tsspas. Default 0. |
-bound | file | Boundary file. The format of the file is : chromosome strand position type
|
-grange | file | Gene range file. The format of the file is : gene_name chromosome strand start_position end_position
|
-tsspas | file | TSS and PAS file. The format of the file is : gene_name TSSs PASs
|
-ref | file | Reference sequence in a single file. |
-read_info | file | A file storing the basic read information. The format of this file is: mapping_file 0/1 [end_len] cross_strength noise_level total_read_cnt distribution_type "definition of a distribution" The format for the mapping_file is : chromosome strand start_positions end_positions
my_map_file 1 20 3 5 10000000 1 300, 30 After one read info, another one could be followed in the same file. On job -ext_junc_ref, only the first read info in the file is effective. If, on some job, not all the information in the read info is usefully, then the unused items can be set to any value. If there are more than one read infos in this file, the overall noise level is the maximum noise level among all the read infos. When doing job -gen_instance, a segment with expression level below the overall noise level will be considered as an intron. A carefully selected noise level is critical. If the noise level is 0, then all the segments will considered as expressed, which will introduce noise segments in the following isoform predictions. If the noise level is too high, many expressed segments will be considered as introns, which will lower the sensitivity. By our tests, 3~5 is a reasonable value for this parameter. |
-update_read_info | T/F | Whether to update the read_info file by correcting the "total_read_cnt" in the read info file. This parameter is only effective when short reads are loaded. Default F |
-s | T/F | Whether the operations are strand specific or not? Default F |
-ins | file | A file containing instances. |
-min_exp | number | The minimum expression level. Default 0. |
-min_dup | number | A junction is covered if at least "min_dup" reads covers this junction. Default 1. |
-bse | T/F | Use the TSS/PAS information or not. |
-ps | number | Partition size. Default 7. On whole mouse genome, the isoform inference process (Step4 in the following example) costs about 10 minutes on a standard PC with this default parameter. A larger value is supposed to lead to better results. |
-conf_level | number in [0,1] | Set the confidence level. Default 0.05. |
-o | file | A file for output
|
The following example is based on single-end short reads. In the following example, an example read_info file and several useful scripts are provided. The usages of all the scripts are straightforward. Please read the script for the usages.