Tools -> OMIGA
OMIGA
Optimized Maker-based Insect Genome Annotation
1. INTRODUCTION
OMIGA is a pipeline for genome annotation which is running on linux platform. There are some modules in the pipeline as below:
-Genome integrity assessment,
-Genomic repetitive sequence identification,
-Collection of evidence for transcriptome data,
-Training for gene prediction software,
-Ab initio prediction,
-Gene evidence integration(evidence for gene prediction, gene expression, homology gene), etc.
2. INSTALL
2.1 Pre-installed packages needed for OMIGA.
- mysql(version 5.1 or later versions)
- abyss(version 1.3.5 or later versions)
- cegma(version 2.4.0 or later versions)
- RepeatModeler(version 1.0.5 or later versions)
- RepeatMasker(version 3.3.0 or later versions)
- tophat(version 2.0.6 or later versions)
- Cufflinks(version 2.0.2 or later versions)
- Augustus(version 2.6.1 or later versions)
- SNAP(version 2006-07-28 or later versions)
- GeneMark-es(version 2.3 or later versions)
- Maker(version 2.26 or later versions)
- Interproscan(version 4.8 or later versions)
Third-party software packages may be required to be installed for each soft package above. For example, geneid, NCBI-BLAST and other software packages are required for CEGMA. Please follow these instructions to install the packages.
2.2 Unpack OMIGA.gz
> tar -xzvf OMIGA.gz
In OMIGA path, we can see sub-directories as below:
Assembly the genome you are interested.
- 02_CEGMA_assess_gene_space:
Assess the integrity of your genome assembled.
- 03_mask_repeat_element:
Identification of repeat sequences of genome.
- 04_cufflinks_annotation:
Collecting evidence of transcription by cufflinks.
- 05_train_augustus_snap_genemark:
This is used for training gene prediction softwares such as augustus,snap and genemark. The training gene set are from 04_cufflinks_annotation.
- 06_Augustus_workspace:
Using Augustus for ab initio gene prediction. Interproscan is used for identifying conserved protein domains of genes predicted by Augustus.
- 07_snap_workspace:
Same as 06_Augustus_workspace. Using snap instead of Augustus.
- 08_genemarker_workspace:
Same as 06_Augustus_workspace. Using genemarker instead of Augustus.
- 09_maker_workspace:
Maker is used to integrate the evidence of homology, prediction and expression.
- 10_pick_up_ab-inito_genes_with_iprdomain:
Keep the genes which contain protein conserved domains but they are discarded originally by maker .
- 11_generate_final_OGS1_geneSet:
Gather the genes from 09 and 10 to be a final gene set.
- database:
This is used for annotation of nucleic acid sequence. In OMIGA pipeline, nr database is used for annotation.
- homology-protein:
This is a protein sequence file named the file homology.fa. The protein sequences best be from closely related species and refSeq protein sequence.
- OMIGA_PACKAGE:
Few programs required for OMIGA.
- raw-data:
There are two sub-directories that are DNA-raw-data and rna-raw-data. DNA-raw-data is for raw data of genome sequencing. rna-raw-data is for raw data of transcriptome sequencing.
2.3 set environment variable
-
Mysql service must be running before using OMIGA.
You must create a user name "OMIGA" for mysql. - Set shell environment variable: export OMIGA_USER=OMIGA,export OMIGA_PWD="PASSWORD".
- Set another shell enviroment
- variable for OMIGA_PACKAGE: export OMIGA_PACKAGE=/my_path_to_OMIGA/OMIGA_PACKAGE
- And add $OMIGA_PACKAGE to system variable PATH: export PATH=OMIGA_PACKAGE:$PATH.
3. RUNNING OMIGA
3.1 Before running OMIGA, the data should be put into corresponding directory.
- Put those protein sequences into /my_path_to_OMIGA/homology-protein, which are used to get protein homology evidence. Those Protein sequences should be saved in the file named homology.fa
- Put raw DNA sequencing data and RNA-Seq data into /my_path_to_OMIGA/raw-data, which are used to assemble genome sequence, and obtain gene transcription evidence.
3.2 Some parameters in shell scripts should be modified according to DNA sequencing data and RNA-seq data, such as:
- /my_path_to_OMIGA//04_cufflinks_annotation/01_cufflinks_work/cmd.sh,
3.3 Open each directory according to the directory id.
> ./cmd.sh
When all cmd.sh command finished, five files will be created at directory:11_generate_final_OGS1_geneSet. They are OGS1.gff3, OGS1.cds.fasta, OGS1.pep.fasta, masked_genome.fa, unmasked_genome.fa.
4. DOWNLOAD
5. CITATION
Jinding Liu, Huamei Xiao, Shuiqing Huang, Fei Li. (2014) OMIGA: Optimized Maker-based Insect Genome Annotation. Molecular Genetics and Genomics. DOI 10.1007/s00438-014-0831-7.[PDF]