cellxgene-mcp
If you are the rightful owner of cellxgene-mcp and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.
The CELLxGENE Census MCP Server provides access to a comprehensive collection of single-cell RNA sequencing data, enabling researchers to query and analyze cell and gene metadata efficiently.
CELLxGENE Census MCP Server
A Model Context Protocol (MCP) server that provides access to the CZ CELLxGENE Discover Census - a comprehensive collection of single-cell RNA sequencing data.
Features
- 𧬠Query cell metadata: Explore cell types, tissues, diseases, and other cell annotations
- š§« Query gene metadata: Search genes and their annotations across the Census
- š Data slice summaries: Get overview statistics of data slices without downloading full matrices
- š Flexible filtering: Use pandas-style queries to filter cells and genes
- š Multi-organism support: Access data from human and mouse
- ā” Memory-efficient: Built-in limits to prevent memory overload
Installation
Prerequisites
- Python 3.10+
- uv (recommended) or pip
Install with uv (recommended)
# Clone the repository
git clone <repository-url>
cd cellxgene-mcp
# Install dependencies
uv sync
# Install the package in development mode
uv pip install -e .
Install with pip
pip install -r requirements.txt
pip install -e .
Quick Start
Running the MCP Server
Stdio Transport (for MCP clients)
uv run cellxgene-mcp
HTTP Transport (for web access)
uv run server --host 0.0.0.0 --port 3001
SSE Transport
uv run sse --host 0.0.0.0 --port 3001
MCP Client Configuration
Add this to your MCP client configuration:
For stdio transport:
{
"mcpServers": {
"cellxgene-mcp": {
"command": "uv",
"args": ["run", "cellxgene-mcp"]
}
}
}
For HTTP transport:
{
"mcpServers": {
"cellxgene-mcp": {
"url": "http://localhost:3001/mcp"
}
}
}
Available Tools
cellxgene_get_census_info
Get information about available Census versions and supported organisms.
Returns:
- Available Census versions
- Latest stable version
- Supported organisms
- Data types
cellxgene_get_obs_metadata
Query cell (observation) metadata from the Census.
Parameters:
organism
(str): "Homo sapiens" or "Mus musculus" (default: "Homo sapiens")value_filter
(str, optional): Pandas-style filter expressioncolumn_names
(str, optional): Comma-separated list of columns to returnlimit
(int): Maximum number of rows to return (default: 1000)
Example filters:
cell_type == 'T cell'
tissue == 'lung' and disease == 'COVID-19'
sex == 'female' and cell_type in ['T cell', 'B cell']
cellxgene_get_var_metadata
Query gene (variable) metadata from the Census.
Parameters:
organism
(str): "Homo sapiens" or "Mus musculus" (default: "Homo sapiens")value_filter
(str, optional): Pandas-style filter expressioncolumn_names
(str, optional): Comma-separated list of columns to returnlimit
(int): Maximum number of rows to return (default: 1000)
Example filters:
feature_name in ['CD4', 'CD8A', 'CD3E']
feature_id == 'ENSG00000010610'
cellxgene_get_data_slice
Get a summary of a data slice from the Census.
Parameters:
organism
(str): "Homo sapiens" or "Mus musculus"obs_value_filter
(str, optional): Filter for cellsvar_value_filter
(str, optional): Filter for genesobs_column_names
(str, optional): Cell metadata columns to includevar_column_names
(str, optional): Gene metadata columns to includemax_cells
(int): Maximum number of cells (default: 10000)max_genes
(int): Maximum number of genes (default: 2000)
Returns:
- Number of cells and genes
- Sample metadata
- Column information
- Query information
Available Resources
resource://cellxgene_census-info
Comprehensive information about the CELLxGENE Census database, including:
- Available organisms
- Key metadata fields
- Common query patterns
- Usage guidelines
Data Schema
Cell (Observation) Metadata Fields
cell_type
: Cell type annotationtissue
: Tissue of origindisease
: Disease statesex
: Biological sexorganism
: Speciesassay
: Sequencing assay usedsuspension_type
: Cell or nucleusethnicity
: Self-reported ethnicity (human only)development_stage
: Developmental stage
Gene (Variable) Metadata Fields
feature_id
: Ensembl gene IDfeature_name
: Gene symbolfeature_length
: Gene length
Example Queries
Explore T cells in lung tissue
# Get T cells from lung tissue
obs_data = await get_obs_metadata(
organism="Homo sapiens",
value_filter="cell_type == 'T cell' and tissue == 'lung'",
column_names="cell_type,tissue,disease,sex,assay"
)
Find COVID-19 related data
# Get cells from COVID-19 studies
obs_data = await get_obs_metadata(
organism="Homo sapiens",
value_filter="disease == 'COVID-19'",
column_names="cell_type,tissue,disease,assay"
)
Search for specific genes
# Get information about immune genes
var_data = await get_var_metadata(
organism="Homo sapiens",
value_filter="feature_name in ['CD4', 'CD8A', 'CD3E', 'IL2']",
column_names="feature_id,feature_name,feature_length"
)
Get data slice summary
# Get summary of T cell data slice
data_summary = await get_data_slice(
organism="Homo sapiens",
obs_value_filter="cell_type == 'T cell'",
var_value_filter="feature_name in ['CD4', 'CD8A', 'CD3E']",
max_cells=5000,
max_genes=100
)
Development
Project Structure
cellxgene-mcp/
āāā src/cellxgene_mcp/
ā āāā __init__.py
ā āāā server.py
āāā test/
āāā mcp-config*.json
āāā pyproject.toml
āāā README.md
Running Tests
uv run pytest test/
Code Formatting
uv run ruff format .
uv run ruff check .
Limitations
- Memory limits: Built-in limits prevent downloading extremely large datasets
- Query complexity: Very complex queries may be slow
- Rate limiting: Respect Census service rate limits
- Network dependency: Requires internet connection to access Census data
Related Projects
License
MIT License - see LICENSE file for details.
Support
- File issues on the GitHub repository
- Check the CELLxGENE Census documentation for data questions
- Review the MCP specification for protocol questions