Warning! We strongly recommend Internet Explorer (9.0 and later) and Google Chrome for better display.

Tools -> OMIGA

OMIGA

Optimized Maker-based Insect Genome Annotation

1. INTRODUCTION


OMIGA is a pipeline for genome annotation which is running on linux platform. There are some modules in the pipeline as below:

-Genome assembly,
-Genome integrity assessment,
-Genomic repetitive sequence identification,
-Collection of evidence for transcriptome data,
-Training for gene prediction software,
-Ab initio prediction,
-Gene evidence integration(evidence for gene prediction, gene expression, homology gene), etc.

2. INSTALL


2.1 Pre-installed packages needed for OMIGA.

- perl (version 5.10 or later versions)
- mysql(version 5.1 or later versions)
- abyss(version 1.3.5 or later versions)
- cegma(version 2.4.0 or later versions)
- RepeatModeler(version 1.0.5 or later versions)
- RepeatMasker(version 3.3.0 or later versions)
- tophat(version 2.0.6 or later versions)
- Cufflinks(version 2.0.2 or later versions)
- Augustus(version 2.6.1 or later versions)
- SNAP(version 2006-07-28 or later versions)
- GeneMark-es(version 2.3 or later versions)
- Maker(version 2.26 or later versions)
- Interproscan(version 4.8 or later versions)

Third-party software packages may be required to be installed for each soft package above. For example, geneid, NCBI-BLAST and other software packages are required for CEGMA. Please follow these instructions to install the packages.

2.2 Unpack OMIGA.gz

> tar -xzvf OMIGA.gz

In OMIGA path, we can see sub-directories as below:

- 01_genome_assembly:
        Assembly the genome you are interested.

- 02_CEGMA_assess_gene_space:
        Assess the integrity of your genome assembled.

- 03_mask_repeat_element:
        Identification of repeat sequences of genome.

- 04_cufflinks_annotation:
         Collecting evidence of transcription by cufflinks.

- 05_train_augustus_snap_genemark:
         This is used for training gene prediction softwares such as augustus,snap and genemark. The training          gene set are from 04_cufflinks_annotation.

- 06_Augustus_workspace:
         Using Augustus for ab initio gene prediction. Interproscan is used for identifying conserved protein          domains of genes predicted by Augustus.

- 07_snap_workspace:
        Same as 06_Augustus_workspace. Using snap instead of Augustus.

- 08_genemarker_workspace:
        Same as 06_Augustus_workspace. Using genemarker instead of Augustus.

- 09_maker_workspace:
         Maker is used to integrate the evidence of homology, prediction and expression.

- 10_pick_up_ab-inito_genes_with_iprdomain:
         Keep the genes which contain protein conserved domains but they are discarded originally by maker .

- 11_generate_final_OGS1_geneSet:
        Gather the genes from 09 and 10 to be a final gene set.

- database:
         This is used for annotation of nucleic acid sequence. In OMIGA pipeline, nr database is used for          annotation.

- homology-protein:
         This is a protein sequence file named the file homology.fa. The protein sequences best be from closely          related species and refSeq protein sequence.

- OMIGA_PACKAGE:
        Few programs required for OMIGA.

- raw-data:
         There are two sub-directories that are DNA-raw-data and rna-raw-data. DNA-raw-data is for raw data          of genome sequencing. rna-raw-data is for raw data of transcriptome sequencing.

2.3 set environment variable

  • Mysql service must be running before using OMIGA.
    You must create a user name "OMIGA" for mysql.
  • Set shell environment variable: export OMIGA_USER=OMIGA,export OMIGA_PWD="PASSWORD".
  • Set another shell enviroment
  • variable for OMIGA_PACKAGE: export OMIGA_PACKAGE=/my_path_to_OMIGA/OMIGA_PACKAGE
  • And add $OMIGA_PACKAGE to system variable PATH: export PATH=OMIGA_PACKAGE:$PATH.

3. RUNNING OMIGA


3.1 Before running OMIGA, the data should be put into corresponding directory.

- Put NR database into /my_path_to_OMIGA/database/nrDB,
- Put those protein sequences into /my_path_to_OMIGA/homology-protein, which are used to get protein   homology evidence. Those Protein sequences should be saved in the file named homology.fa
- Put raw DNA sequencing data and RNA-Seq data into /my_path_to_OMIGA/raw-data, which are used to   assemble genome sequence, and obtain gene transcription evidence.

3.2 Some parameters in shell scripts should be modified according to DNA sequencing data and RNA-seq data, such as:

- /my_path_to_OMIGA/01_genome_assembly/01_assembly_genome_by_raw_data/cmd.sh,
- /my_path_to_OMIGA//04_cufflinks_annotation/01_cufflinks_work/cmd.sh,

3.3 Open each directory according to the directory id.

Running:
> ./cmd.sh

When all cmd.sh command finished, five files will be created at directory:11_generate_final_OGS1_geneSet. They are OGS1.gff3, OGS1.cds.fasta, OGS1.pep.fasta, masked_genome.fa, unmasked_genome.fa.

4. DOWNLOAD


5. CITATION


Jinding Liu, Huamei Xiao, Shuiqing Huang, Fei Li. (2014) OMIGA: Optimized Maker-based Insect Genome Annotation. Molecular Genetics and Genomics. DOI 10.1007/s00438-014-0831-7.[PDF]

Please contact me for any question.>