bio-mcp-seqkit

bio-mcp/bio-mcp-seqkit

3.2

If you are the rightful owner of bio-mcp-seqkit and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.

MCP (Model Context Protocol) server for SeqKit, a cross-platform and ultrafast toolkit for FASTA/Q file manipulation.

Tools
  1. seqkit_stats

    Get basic statistics of FASTA/FASTQ files.

  2. seqkit_subseq

    Extract subsequences by region or BED file.

  3. seqkit_grep

    Search sequences by pattern or ID.

  4. seqkit_seq

    Transform sequences and filter by length.

  5. seqkit_sort

    Sort sequences by different criteria.

  6. seqkit_rmdup

    Remove duplicate sequences.

  7. seqkit_sample

    Sample sequences randomly by number or proportion.

  8. seqkit_convert

    Convert between FASTA and FASTQ formats.

bio-mcp-seqkit

MCP (Model Context Protocol) server for SeqKit, a cross-platform and ultrafast toolkit for FASTA/Q file manipulation.

Overview

This MCP server provides access to various SeqKit functionalities, enabling AI assistants to perform common tasks like getting statistics, extracting subsequences, searching, transforming, sorting, removing duplicates, sampling, and converting sequence files.

Features

  • seqkit_stats: Get basic statistics of FASTA/FASTQ files.
  • seqkit_subseq: Extract subsequences by region or BED file.
  • seqkit_grep: Search sequences by pattern or ID.
  • seqkit_seq: Transform sequences (reverse, complement, translate, etc.) and filter by length.
  • seqkit_sort: Sort sequences by different criteria (ID, name, sequence, length).
  • seqkit_rmdup: Remove duplicate sequences.
  • seqkit_sample: Sample sequences randomly by number or proportion.
  • seqkit_convert: Convert between FASTA and FASTQ formats.

Installation

Prerequisites

  • Python 3.9+
  • SeqKit installed (seqkit)

Install SeqKit

# Download from GitHub releases (example for Linux AMD64)
wget https://github.com/shenwei356/seqkit/releases/latest/download/seqkit_linux_amd64.tar.gz
tar -xzf seqkit_linux_amd64.tar.gz
sudo mv seqkit /usr/local/bin/

# From conda
conda install -c bioconda seqkit

Install the MCP server

git clone https://github.com/bio-mcp/bio-mcp-seqkit
cd bio-mcp-seqkit
pip install -e .

Configuration

Add to your MCP client configuration (e.g., Claude Desktop ~/Library/Application Support/Claude/claude_desktop_config.json):

{
  "mcpServers": {
    "bio-seqkit": {
      "command": "python",
      "args": ["-m", "src.server"],
      "cwd": "/path/to/bio-mcp-seqkit"
    }
  }
}

Environment Variables

  • BIO_MCP_MAX_FILE_SIZE: Maximum input file size in bytes (default: 10GB)
  • BIO_MCP_TIMEOUT: Command timeout in seconds (default: 600)
  • BIO_MCP_SEQKIT_PATH: Path to SeqKit executable (default: finds in PATH)
  • BIO_MCP_TEMP_DIR: Temporary directory for processing

Usage

Once configured, the AI assistant can use the following tools:

seqkit_stats - Get Sequence Statistics

Get basic statistics of FASTA/FASTQ files.

Parameters:

  • input_file (required): Path to FASTA/FASTQ file.
  • all_stats: Show all statistics including N50 (boolean).

seqkit_subseq - Extract Subsequences

Extract subsequences by region or from a BED file.

Parameters:

  • input_file (required): Path to FASTA/FASTQ file.
  • region: Region to extract (e.g., 1:100-200 or chr1:1000-2000).
  • bed_file: BED file with regions to extract.

seqkit_grep - Search Sequences

Search sequences by pattern or ID.

Parameters:

  • input_file (required): Path to FASTA/FASTQ file.
  • pattern: Search pattern (regex supported).
  • pattern_file: File with list of patterns/IDs.
  • search_sequence: Search in sequence instead of header (boolean).
  • invert_match: Invert match (exclude matching sequences) (boolean).
  • ignore_case: Ignore case (boolean).

seqkit_seq - Transform Sequences

Transform sequences (reverse, complement, translate, etc.) and filter by length.

Parameters:

  • input_file (required): Path to FASTA/FASTQ file.
  • reverse: Reverse sequence (boolean).
  • complement: Complement sequence (boolean).
  • reverse_complement: Reverse complement sequence (boolean).
  • rna2dna: Convert RNA to DNA (boolean).
  • dna2rna: Convert DNA to RNA (boolean).
  • translate: Translate to protein (boolean).
  • min_length: Minimum sequence length filter (integer).
  • max_length: Maximum sequence length filter (integer).

seqkit_sort - Sort Sequences

Sort sequences by different criteria.

Parameters:

  • input_file (required): Path to FASTA/FASTQ file.
  • sort_by: Sort criterion (id, name, seq, or length). Default: id.
  • reverse: Reverse sort order (boolean).
  • by_length: Sort by sequence length (boolean).

seqkit_rmdup - Remove Duplicate Sequences

Remove duplicate sequences.

Parameters:

  • input_file (required): Path to FASTA/FASTQ file.
  • by_name: Remove duplicates by sequence name (boolean).
  • by_seq: Remove duplicates by sequence (boolean). Default: True.
  • ignore_case: Ignore case when comparing (boolean).

seqkit_sample - Sample Sequences

Sample sequences randomly by number or proportion.

Parameters:

  • input_file (required): Path to FASTA/FASTQ file.
  • number: Number of sequences to sample (integer).
  • proportion: Proportion of sequences to sample (0-1) (float).
  • seed: Random seed for reproducible sampling (integer).

seqkit_convert - Convert Formats

Convert between FASTA and FASTQ formats.

Parameters:

  • input_file (required): Path to input file.
  • output_format (required): Output format (fasta or fastq).
  • line_width: Line width for FASTA output (0 for no wrapping) (integer). Default: 0.

Examples

Get statistics for a FASTQ file

Get detailed statistics for my_reads.fastq, including N50.

Extract a subsequence

Extract the subsequence from chr1:100-200 in reference.fasta.

Translate DNA to protein

Translate the DNA sequences in coding_sequences.fasta to protein.

Development

Running tests

pytest tests/

Building Docker image

docker build -t bio-mcp-seqkit .

License

MIT License