Tools -> OMIGA

OMIGA

Optimized Maker-based Insect Genome Annotation

1. INTRODUCTION

OMIGA is a pipeline for genome annotation which is running on linux platform. There are some modules in the pipeline as below:

-Genome assembly,
-Genome integrity assessment,
-Genomic repetitive sequence identification,
-Collection of evidence for transcriptome data,
-Training for gene prediction software,
-Ab initio prediction,
-Gene evidence integration(evidence for gene prediction, gene expression, homology gene), etc.

2. INSTALL

2.1 Pre-installed packages needed for OMIGA.

- perl (version 5.10 or later versions)
- mysql(version 5.1 or later versions)
- abyss(version 1.3.5 or later versions)
- cegma(version 2.4.0 or later versions)
- RepeatModeler(version 1.0.5 or later versions)
- RepeatMasker(version 3.3.0 or later versions）
- tophat(version 2.0.6 or later versions）
- Cufflinks(version 2.0.2 or later versions)
- Augustus(version 2.6.1 or later versions)
- SNAP(version 2006-07-28 or later versions)
- GeneMark-es(version 2.3 or later versions)
- Maker(version 2.26 or later versions）
- Interproscan(version 4.8 or later versions)

Third-party software packages may be required to be installed for each soft package above. For example, geneid, NCBI-BLAST and other software packages are required for CEGMA. Please follow these instructions to install the packages.

2.2 Unpack OMIGA.gz

> tar -xzvf OMIGA.gz

In OMIGA path, we can see sub-directories as below:

- 01_genome_assembly：
        Assembly the genome you are interested.

- 02_CEGMA_assess_gene_space：
        Assess the integrity of your genome assembled.

- 03_mask_repeat_element：
        Identification of repeat sequences of genome.

- 04_cufflinks_annotation：
         Collecting evidence of transcription by cufflinks.

- 05_train_augustus_snap_genemark：
         This is used for training gene prediction softwares such as augustus,snap and genemark. The training          gene set are from 04_cufflinks_annotation.

- 06_Augustus_workspace：
         Using Augustus for ab initio gene prediction. Interproscan is used for identifying conserved protein          domains of genes predicted by Augustus.

- 07_snap_workspace：
        Same as 06_Augustus_workspace. Using snap instead of Augustus.

- 08_genemarker_workspace：
        Same as 06_Augustus_workspace. Using genemarker instead of Augustus.

- 09_maker_workspace：
         Maker is used to integrate the evidence of homology, prediction and expression.

- 10_pick_up_ab-inito_genes_with_iprdomain：
         Keep the genes which contain protein conserved domains but they are discarded originally by maker .

- 11_generate_final_OGS1_geneSet：
        Gather the genes from 09 and 10 to be a final gene set.

- database：
         This is used for annotation of nucleic acid sequence. In OMIGA pipeline, nr database is used for          annotation.

- homology-protein：
         This is a protein sequence file named the file homology.fa. The protein sequences best be from closely          related species and refSeq protein sequence.

- OMIGA_PACKAGE：
        Few programs required for OMIGA.

- raw-data：
         There are two sub-directories that are DNA-raw-data and rna-raw-data. DNA-raw-data is for raw data          of genome sequencing. rna-raw-data is for raw data of transcriptome sequencing.

2.3 set environment variable

Mysql service must be running before using OMIGA.
You must create a user name "OMIGA" for mysql.
Set shell environment variable: export OMIGA_USER=OMIGA，export OMIGA_PWD="PASSWORD".
Set another shell enviroment
variable for OMIGA_PACKAGE: export OMIGA_PACKAGE=/my_path_to_OMIGA/OMIGA_PACKAGE
And add $OMIGA_PACKAGE to system variable PATH: export PATH=OMIGA_PACKAGE:$PATH.

3. RUNNING OMIGA

3.1 Before running OMIGA, the data should be put into corresponding directory.

- Put NR database into /my_path_to_OMIGA/database/nrDB,
- Put those protein sequences into /my_path_to_OMIGA/homology-protein, which are used to get protein homology evidence. Those Protein sequences should be saved in the file named homology.fa
- Put raw DNA sequencing data and RNA-Seq data into /my_path_to_OMIGA/raw-data, which are used to assemble genome sequence, and obtain gene transcription evidence.

3.2 Some parameters in shell scripts should be modified according to DNA sequencing data and RNA-seq data, such as:

- /my_path_to_OMIGA/01_genome_assembly/01_assembly_genome_by_raw_data/cmd.sh,
- /my_path_to_OMIGA//04_cufflinks_annotation/01_cufflinks_work/cmd.sh,

3.3 Open each directory according to the directory id.

Running:
> ./cmd.sh

When all cmd.sh command finished, five files will be created at directory:11_generate_final_OGS1_geneSet. They are OGS1.gff3, OGS1.cds.fasta, OGS1.pep.fasta, masked_genome.fa, unmasked_genome.fa.

4. DOWNLOAD

OMIGA-1.0

5. CITATION

Jinding Liu, Huamei Xiao, Shuiqing Huang, Fei Li. (2014) OMIGA: Optimized Maker-based Insect Genome Annotation. Molecular Genetics and Genomics. DOI 10.1007/s00438-014-0831-7.[PDF]

Please contact me for any question.>

Warning! We strongly recommend Internet Explorer (9.0 and later) and Google Chrome for better display.

Contact us