BUSCO Genome Quality Assessment
BUSCO (Benchmarking Universal Single-Copy Orthologs) is a tool for evaluating genome assembly completeness by searching for conserved orthologs across different lineages. This guide outlines the steps for using BUSCO to evaluate insect genome quality.
This is the methodology adopted by the InsectBase database team for assessing insect genome quality.
Workflow Steps
Install BUSCO
Ensure that BUSCO is installed on your system.
Installation with Conda
conda install -c bioconda buscoDownload the Appropriate Insect Lineage Dataset
For insect genomes, you need to download the correct lineage dataset. BUSCO provides several pre-built datasets. For insects, you can use the insecta_odb10 dataset.
Download Insect Lineage Dataset
busco download -l insecta_odb10This will download the ortholog dataset for insect species.
Run BUSCO to Assess Genome Quality
Run BUSCO with the downloaded lineage dataset to evaluate your insect genome. The command requires your genome file in FASTA format.
Run BUSCO Command
busco -i your_insect_genome.fasta -l insecta_odb10 -o busco_results -m genome-ispecifies the input genome file (in FASTA format)-lspecifies the lineage dataset (here,insecta_odb10)-ospecifies the output directory for results-m genomeindicates you're analyzing a genome assembly (as opposed to a transcriptome)
Interpret BUSCO Results
Once BUSCO completes, it will generate a report with the following categories:
- Complete (single-copy): The number of BUSCOs that are present as complete single-copy orthologs
 - Complete (duplicated): The number of BUSCOs that are present as duplicated
 - Fragmented: BUSCOs that are still present but fragmented
 - Missing: BUSCOs that are completely absent from the genome
 
For high-quality genomes, you should expect a high percentage of complete BUSCOs.
Example Output
# BUSCO analysis summary
Total BUSCOs: 1,000
Complete: 850 (85%)
Complete (single-copy): 800 (80%)
Complete (duplicated): 50 (5%)
Fragmented: 100 (10%)
Missing: 50 (5%)- 85% complete indicates the genome has high completeness
 - 10% fragmented means some orthologs are partially missing
 - 5% missing suggests that some BUSCOs are absent, which could indicate gaps in the assembly
 
Alternative with compleasm
compleasm is a faster and more accurate reimplementation of BUSCO that can be used as an alternative.
Install compleasm
pip install compleasmRun compleasm
compleasm.py run -t16 -l insecta -L /data/ -a genome.fa -o buscoNote: compleasm downloads lineage files organized differently than BUSCO.
References
- BUSCO Official Website
 - BUSCO GitHub Repository
 - compleasm GitHub Repository