BLAST Module

Fast protein sequence similarity search using Diamond BLAST

BLAST Module Interface

Module Overview

The BLAST module provides fast and accurate protein sequence similarity search using Diamond BLAST algorithms. This module serves as the central hub for sequence comparison, homology detection, and functional annotation through advanced bioinformatics tools.

Key Features

  • 1Select Diamond BLAST program type and database category for optimal sequence comparison
  • 2Input query sequences through paste or file upload methods with specific sequence requirements
  • 3Configure advanced parameters for more precise and detailed sequence alignment analysis

Program and Database Selection

1

Diamond Program Types

Choose the appropriate Diamond BLAST program based on your query sequence type and analysis goals.

  • Diamond BLASTP: Protein query vs protein database
  • Diamond BLASTX: Nucleotide query translated vs protein database
1

Database Categories

Select the appropriate database type for your sequence comparison analysis.

  • Nucleotide Database: DNA/RNA sequences for nucleotide comparisons
  • Protein Database: Amino acid sequences for protein comparisons

Sequence Input Methods

2

Input Options

Choose between pasting sequences directly or uploading sequence files for analysis.

Paste Sequence

  • • Direct text input in FASTA format
  • • Supports both single and multiple sequences
  • • Real-time sequence validation
  • • Immediate format checking

Upload File

  • • Support for .fasta, .fa, .txt files
  • • Drag and drop interface
  • • Automatic file parsing
  • • Batch sequence processing

Sequence Requirements

Allowed Characters:

  • Amino acids: A, C, D, E, F, G, H, I, K, L, M, N, P, Q, R, S, T, V, W, Y
  • Nucleotides: A, T, C, G, U, N
  • Ambiguity codes: B, Z, X
  • Gaps: Hyphens (-) for sequence gaps

Length Requirements:

  • Minimum: 3 characters
  • Maximum: 100,000 characters
  • Format: FASTA format recommended

Invalid Characters:

  • • Asterisks (*), dots (.), numbers, or special symbols
  • • Spaces within sequence data
  • • Non-standard amino acid codes

Advanced Parameters

3

Parameter Configuration

Fine-tune your BLAST search with advanced parameters for more precise and detailed analysis.

E-value Threshold

The E-value represents the expected number of chance matches. Lower values indicate more stringent searches.

Common Values:
  • 0.001: Very strict (high confidence)
  • 0.01: Strict (default for most searches)
  • 0.1: Moderate
  • 1.0: Relaxed
  • 10.0: Very relaxed
Other Parameters:
  • Matrix: Optimized for Diamond
  • Gap penalties: Default values
  • Max hits: Configurable
  • Word size: Auto-optimized

Search Tips for Optimal Results

Sequence Preparation

  • Clean sequences by removing non-standard characters
  • Use FASTA format for best compatibility
  • Check sequence length before submission
  • Verify sequence type matches program selection

Parameter Optimization

  • Start with default E-value (0.01) for most searches
  • Use stricter E-values for high-confidence matches
  • Adjust parameters based on sequence length
  • Consider database size when setting thresholds

Usage Examples

Example 1: Protein Homology Search

  1. Select “Diamond BLASTP” program for protein-protein comparison
  2. Choose “Protein Database” as the target database
  3. Paste your protein sequence in FASTA format
  4. Set E-value to 0.001 for high-confidence matches
  5. Click “Run Diamond Search” to start analysis

Example 2: Nucleotide Translation Search

  1. Select “Diamond BLASTX” for nucleotide-to-protein search
  2. Choose “Protein Database” as the target database
  3. Upload a FASTA file containing nucleotide sequences
  4. Use default E-value (0.01) for balanced results
  5. Download results in multiple formats for analysis

Available Data Types

BLAST Results

Hit alignments and scores

TSV Format

Raw Diamond output

JSON Format

Structured results

FASTA Summary

Query and hit sequences