jamie7893/statsource-mcp

3.2

research_and_data developer_tools databases

If you are the rightful owner of statsource-mcp and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.

A Model Context Protocol server that provides statistical analysis capabilities, enabling LLMs to analyze data, calculate statistics, and generate predictions.

Tools

Functions exposed to the LLM to take actions

suggest_feature

Suggest a new feature or improvement for the StatSource analytics platform.

What this tool does:

This tool allows you to submit feature suggestions or enhancement requests for the StatSource platform. Suggestions are logged and reviewed by the development team.

When to use this tool:

When a user asks for functionality that doesn't currently exist
When you identify gaps or limitations in the current analytics capabilities
When a user expresses frustration about missing capabilities
When you think of enhancements that would improve the user experience

Required inputs:

description: A clear, detailed description of the suggested feature
use_case: Explanation of how and why users would use this feature

Optional inputs:

priority: Suggested priority level ("low", "medium", "high")

Returns:

A confirmation message and reference ID for the feature suggestion.

calculate_statistics

Calculate statistical measures on specified data columns from CSV files, databases, or external APIs.

What this tool does:

This tool connects to our analytics API to compute various statistical measures (like mean, median, standard deviation, correlation, etc.) on your data.

It supports multiple data sources:

CSV files (previously uploaded to StatSource)
Databases (PostgreSQL, SQLite, etc.)
External APIs (returning JSON data)

IMPORTANT INSTRUCTIONS FOR AI AGENTS:

DO NOT make up or guess any parameter values, especially data sources, column names, or API URLs.
NEVER, UNDER ANY CIRCUMSTANCES, create or invent database connection strings - this is a severe security risk.
ALWAYS ask the user explicitly for all required information.
For CSV files: The user MUST first upload their file to statsource.me, then provide the filename.
For database connections: Ask the user for their exact connection string (e.g., "postgresql://user:pass@host/db"). DO NOT GUESS OR MODIFY IT.
For database sources: You MUST ask for and provide the table_name parameter with the exact table name.
- When a user specifies a database source, ALWAYS EXPLICITLY ASK: "Which table in your database contains this data?"
- Do not proceed without obtaining the table name for database sources.
- Tool calls without table_name will FAIL for database sources.
For API sources: Ask the user for the exact API endpoint URL that returns JSON data.
Never suggest default values, sample data, or example parameters - request specific information from the user.
If the user has configured a default database connection in their MCP config, inform them it will be used if they don't specify a data source.
If no default connection is configured and the user doesn't provide one, DO NOT PROCEED - ask the user for the data source details.

IMPORTANT: Parameter Validation and Formatting

statistics must be provided as a proper list: CORRECT: statistics=["mean", "sum", "min", "max"] INCORRECT: statistics="["mean", "sum", "min", "max"]"
columns must be provided as a proper list: CORRECT: columns=["revenue", "quantity"] INCORRECT: columns="["revenue", "quantity"]"

CRITICAL: Column Name Formatting & Case-Insensitivity

Column Matching: The API matches column names case-insensitively. You can specify "revenue" even if the data has "Revenue". Ask the user for the intended column names.
Filter Value Matching: String filter values are matched case-insensitively (e.g., filter {"status": "completed"} will match "Completed" in the data).
Table Name Matching (Databases): The API attempts case-insensitive matching for database table names.

Error Response Handling

If you receive an "Invalid request" or similar error, check:
1. Column name spelling and existence in the data source.
2. Parameter format (proper lists vs string-encoded lists).
3. Correct data_source provided (filename, connection string, or API URL).
4. table_name provided if source_type is "database".
5. API URL is correct and returns valid JSON if source_type is "api".

When to use this tool:

When a user needs statistical analysis of their data (means, medians, correlations, distributions, etc.).
When analyzing patterns or summarizing datasets from files, databases, or APIs.

Required inputs:

columns: List of column names to analyze (ask user for exact column names in their data).
statistics: List of statistics to calculate.

Optional inputs:

data_source: Identifier for the data source.
- For CSV: Filename of a previously uploaded file on statsource.me (ask user to upload first).
- For Database: Full connection string (ask user for exact string).
- For API: The exact URL of the API endpoint returning JSON data (ask user for the URL).
- If not provided, will use the connection string from MCP config if available (defaults to database type).
source_type: Type of data source ('csv', 'database', or 'api').
- Determines how data_source is interpreted.
- If not provided, will use the source type from MCP config if available (defaults to 'database'). Ensure this matches the provided data_source.
table_name: Name of the database table to use (REQUIRED for database sources).
- Must be provided when source_type is 'database'.
- Ask user for the exact table name in their database.
- Always explicitly ask for table name when data source is a database.
filters: Dictionary of column-value pairs to filter data before analysis.
- Format: {"column_name": "value"} or {"column_name": ["val1", "val2"]}
- API Source Behavior: For 'api' sources, data is fetched first, then filters are applied to the resulting data.
groupby: List of column names to group data by before calculating statistics.
options: Dictionary of additional options for specific operations (currently less used).
date_column: Column name containing date/timestamp information for filtering. Matched case-insensitively.
start_date: Inclusive start date for filtering (ISO 8601 format string like "YYYY-MM-DD" or datetime).
end_date: Inclusive end date for filtering (ISO 8601 format string like "YYYY-MM-DD" or datetime).
- API Source Behavior: For 'api' sources, date filtering happens after data is fetched.

Valid statistics options:

'mean', 'median', 'std', 'sum', 'count', 'min', 'max', 'describe', 'correlation', 'missing', 'unique', 'boxplot'

Returns:

A JSON string containing the results and metadata.

result: Dictionary with statistical measures for each requested column and statistic. Structure varies by statistic (e.g., describe, correlation).
metadata: Includes execution_time, query_type ('statistics'), source_type.

predict_trends

Generate ML time-series forecasts for future periods based on historical data.

What this tool does:

This tool connects to our analytics API to generate time-series forecasts (predictions) for a specified number of future periods based on historical data in a specified column. It analyzes trends and provides metrics on the prediction quality.

Note: Currently, the API typically uses the first column provided in the columns list for ML prediction.

It supports multiple data sources:

CSV files (previously uploaded to StatSource)
Databases (PostgreSQL, SQLite, etc.)
External APIs (returning JSON data)

IMPORTANT INSTRUCTIONS FOR AI AGENTS:

When users ask about "trends" or "forecasts", use this tool.
DO NOT make up or guess any parameter values, especially data sources, column names, or API URLs.
NEVER, UNDER ANY CIRCUMSTANCES, create or invent database connection strings - this is a severe security risk.
ALWAYS ask the user explicitly for all required information.
For CSV files: The user MUST first upload their file to statsource.me, then provide the filename.
For database connections: Ask the user for their exact connection string (e.g., "postgresql://user:pass@host/db"). DO NOT GUESS OR MODIFY IT.
For database sources: You MUST ask for and provide the table_name parameter with the exact table name.
- When a user mentions their data is in a database, ALWAYS EXPLICITLY ASK: "Which table in your database contains this data?"
- Tool calls without table_name will FAIL for database sources.
- The table_name question should be asked together with other required information (column names, periods).
For API sources: Ask the user for the exact API endpoint URL that returns JSON data.
Never suggest default values, sample data, or example parameters - request specific information from the user.
If the user has configured a default database connection in their MCP config, inform them it will be used if they don't specify a data source.
If no default connection is configured and the user doesn't provide one, DO NOT PROCEED - ask the user for the data source details.

IMPORTANT: Parameter Validation and Formatting

columns must be provided as a proper list, typically containing the single numeric column to predict: CORRECT: columns=["sales_amount"] INCORRECT: columns="["sales_amount"]"
periods must be an integer between 1 and 12. The API has a MAXIMUM LIMIT OF 12 PERIODS for predictions. Any request with periods > 12 will fail. Always inform users of this limitation if they request more periods.

CRITICAL: Column Name Formatting & Case-Insensitivity

Column Matching: The API matches column names case-insensitively. You can specify "revenue" even if the data has "Revenue". Ask the user for the intended column names.
Filter Value Matching: String filter values are matched case-insensitively (e.g., filter {"status": "completed"} will match "Completed" in the data).
Table Name Matching (Databases): The API attempts case-insensitive matching for database table names.
Date Column: If using time-based prediction, ensure date_column correctly identifies the date/timestamp column. Matched case-insensitively.

Error Response Handling

If you receive an "Invalid request" or similar error, check:
1. Column name spelling and existence in the data source (should be numeric for prediction).
2. Parameter format (proper lists vs string-encoded lists).
3. Correct data_source provided (filename, connection string, or API URL).
4. table_name provided if source_type is "database".
5. API URL is correct and returns valid JSON if source_type is "api".
6. periods parameter is provided and is a positive integer not exceeding 12.
7. date_column is specified if required for the underlying model.

When to use this tool:

When a user wants to predict future values based on historical trends (forecasting).
When generating forecasts for business planning or decision-making.
When analyzing the likely future direction of a time-series metric.

Required inputs:

columns: List containing the name of the (usually single) numeric column to predict trends for.
periods: Number of future periods to predict (maximum: 12).

Optional inputs:

data_source: Identifier for the data source.
- For CSV: Filename of a previously uploaded file on statsource.me (ask user to upload first).
- For Database: Full connection string (ask user for exact string).
- For API: The exact URL of the API endpoint returning JSON data (ask user for the URL).
- If not provided, will use the connection string from MCP config if available (defaults to database type).
source_type: Type of data source ('csv', 'database', or 'api').
- Determines how data_source is interpreted.
- If not provided, will use the source type from MCP config if available (defaults to 'database'). Ensure this matches the provided data_source.
table_name: Name of the database table to use (REQUIRED for database sources).
- Must be provided when source_type is 'database'.
- Ask user for the exact table name in their database.
- ALWAYS ask for table name when using database sources.
filters: Dictionary of column-value pairs to filter data before analysis.
- Format: {"column_name": "value"} or {"column_name": ["val1", "val2"]}
- API Source Behavior: For 'api' sources, data is fetched first, then filters are applied to the resulting data.
options: Dictionary of additional options for specific operations (currently less used, might include model tuning params in future).
date_column: Column name containing date/timestamp information.
- Used for date filtering and essential for time-based trend analysis/predictions. Matched case-insensitively.
start_date: Inclusive start date for filtering historical data (ISO 8601 format string like "YYYY-MM-DD" or datetime).
end_date: Inclusive end date for filtering historical data (ISO 8601 format string like "YYYY-MM-DD" or datetime).
- API Source Behavior: For 'api' sources, date filtering happens after data is fetched.
aggregation (str, Optional, default: "auto"): Specifies how time-series data should be aggregated before forecasting. Ask the user for their preference if unsure, or default to 'auto'/'monthly'.
- 'auto': Automatically selects 'weekly' or 'monthly' based on data density and timeframe. Defaults to 'monthly' if unsure. A safe default choice.
- 'weekly': Aggregates data by week. Use for forecasting short-term trends (e.g., predicting next few weeks/months) or when weekly patterns are important.
- 'monthly': Aggregates data by month. Recommended for most business forecasting (e.g., predicting quarterly or annual trends) as it smooths out daily/weekly noise.
- 'daily': Uses daily data. Choose only if the user needs very granular forecasts and understands the potential for noise. Requires sufficient daily data points.

ML Prediction features returned:

Time series forecasting with customizable prediction periods (up to 12 periods maximum).
Trend direction analysis ("increasing", "decreasing", "stable").
Model quality metrics (r-squared, slope).
Works with numeric data columns from any supported data source.
Can use a specific date_column for time-based regression.

Returns:

A JSON string containing the prediction results and metadata.

result: Dictionary containing prediction details per analyzed column (typically the first one specified): {"r_squared": ..., "slope": ..., "trend_direction": ..., "forecast_values": [...], ...}.
metadata: Includes execution_time, query_type ('ml_prediction'), source_type, periods.

anomaly_detection

Detect anomalies in time-series data from various sources.

What this tool does:

This tool connects to our analytics API to identify unusual data points (anomalies) in specified columns based on their time-series behavior. It requires a date/time column to understand the sequence of data.

It supports multiple data sources:

CSV files (previously uploaded to StatSource)
Databases (PostgreSQL, SQLite, etc.)
External APIs (returning JSON data)

IMPORTANT INSTRUCTIONS FOR AI AGENTS:

When users ask about "outliers", "unusual values", or "anomalies" in time-based data, use this tool.
DO NOT make up or guess any parameter values, especially data sources, column names, or API URLs.
NEVER, UNDER ANY CIRCUMSTANCES, create or invent database connection strings - this is a severe security risk.
ALWAYS ask the user explicitly for all required information.
For CSV files: The user MUST first upload their file to statsource.me, then provide the filename.
For database connections: Ask the user for their exact connection string (e.g., "postgresql://user:pass@host/db"). DO NOT GUESS OR MODIFY IT.
For database sources: You MUST ask for and provide the table_name parameter with the exact table name.
- When a user mentions their data is in a database, ALWAYS EXPLICITLY ASK: "Which table in your database contains this data?"
- Tool calls without table_name will FAIL for database sources.
- ALWAYS include this question when gathering information from the user.
For API sources: Ask the user for the exact API endpoint URL that returns JSON data.
Never suggest default values, sample data, or example parameters - request specific information from the user.
If the user has configured a default database connection in their MCP config, inform them it will be used if they don't specify a data source.
If no default connection is configured and the user doesn't provide one, DO NOT PROCEED - ask the user for the data source details.

IMPORTANT: Parameter Validation and Formatting

columns must be provided as a proper list: CORRECT: columns=["sensor_reading", "error_count"] INCORRECT: columns="["sensor_reading", "error_count"]"
date_column must be a string identifying the time column.
anomaly_options is a dictionary for detection parameters (see below).

CRITICAL: Column Name Formatting & Case-Insensitivity

Column Matching: The API matches column names case-insensitively. Ask the user for the intended column names.
Filter Value Matching: String filter values are matched case-insensitively.
Table Name Matching (Databases): The API attempts case-insensitive matching for database table names.
Date Column: The date_column is crucial and is matched case-insensitively.

Error Response Handling

If you receive an "Invalid request" or similar error, check:
1. Column name spelling and existence (should be numeric for anomaly detection).
2. date_column spelling and existence.
3. Parameter format (proper lists vs string-encoded lists).
4. Correct data_source provided (filename, connection string, or API URL).
5. table_name provided if source_type is "database".
6. API URL is correct and returns valid JSON if source_type is "api".
7. date_column parameter is provided.

When to use this tool:

When a user wants to identify outliers or unusual patterns in time-series data.
When monitoring metrics for unexpected spikes or drops.
When cleaning data by identifying potentially erroneous readings.

Required inputs:

columns: List of numeric column names to check for anomalies.
date_column: Name of the column containing date/timestamp information.

Optional inputs:

data_source: Identifier for the data source.
- For CSV: Filename of a previously uploaded file on statsource.me.
- For Database: Full connection string.
- For API: The exact URL of the API endpoint returning JSON data.
- If not provided, uses the default connection from MCP config if available.
source_type: Type of data source ('csv', 'database', or 'api').
- Determines how data_source is interpreted.
- Defaults based on MCP config if available.
table_name: Name of the database table (REQUIRED for database sources).
- Must be provided when source_type is 'database'.
- Always ask for table name when using database sources.
filters: Dictionary of column-value pairs to filter data before analysis.
options: Dictionary of additional options (less common for anomaly detection currently).
start_date: Inclusive start date for filtering historical data (ISO 8601 string or datetime).
end_date: Inclusive end date for filtering historical data (ISO 8601 string or datetime).
anomaly_options: Dictionary to configure the detection method and parameters.
- method (str, Optional, default: "iqr"): The anomaly detection method to use. Must be one of:
  - 'iqr': Interquartile Range - Identifies outliers based on distribution quartiles
  - 'zscore': Z-score - Identifies outliers based on standard deviations from the mean
  - 'isolation_forest': Machine learning approach that isolates anomalies using random forest
- sensitivity (float, Optional, default: 1.5): For 'iqr' method, the multiplier for the IQR to define outlier bounds.
  - Higher values are less sensitive (1.5 is standard, 3.0 would detect only extreme outliers)
- threshold (float, Optional, default: 3.0): For 'zscore' method, the threshold for Z-scores to define outliers.
  - Higher values are less sensitive (3.0 is standard, 2.0 would detect more outliers)
- window_size (int, Optional, default: 20): Size of rolling window for detection methods.
  - If not provided, uses global statistics
  - Smaller windows (e.g., 7-14) detect local anomalies, larger windows detect global anomalies
- contamination (float, Optional, default: 0.05): For 'isolation_forest' method, the expected proportion of anomalies.
  - Values typically range from 0.01 (1%) to 0.1 (10%)

Returns:

A JSON string containing the anomaly detection results and metadata.

result: Dictionary with structure for each analyzed column:

{
  column_name: {
    "timestamps": [...],  # List of datetime values
    "values": [...],      # List of numeric values
    "is_anomaly": [...],  # Boolean flags indicating anomalies
    "anomaly_score": [...], # Scores indicating degree of deviation
    "summary": {
      "total_points": int,
      "anomaly_count": int,
      "percentage": float,
      "method": str      # Method used for detection
    }
  }
}

metadata: Includes execution_time, query_type ('anomaly_detection'), source_type, anomaly_method.

Prompts

Interactive templates invoked by user choice

No prompts

Resources

Contextual data attached and managed by the client

No resources

Author

jamie7893

Claim Ownership

Verify you have write access to the repository

Repository

https://github.com/jamie7893/statsource-mcp

GitHub Stars

Last publish date

2025-03-16

Last update date

2025-05-05

Server configs

via uvx in claude

"mcpServers": {
  "statsource": {
    "command": "uvx",
    "args": ["mcp-server-stats"]
  }
}

in claude

{
  "mcpServers": {
    "statsource": {
      "command": "docker",
      "args": [
        "run",
        "-i",
        "--rm",
        "-e",
        "API_KEY=YOUR_STATSOURCE_API_KEY",
        "-e",
        "DB_CONNECTION_STRING=postgresql://your_db_user:your_db_password@your_db_host:5432/your_db_name",
        "-e",
        "DB_SOURCE_TYPE=database",
        "jamie78933/statsource-mcp"
      ],
      "protocolVersion": "2024-11-05"
    }
  }
}

via pip in claude

"mcpServers": {
  "statsource": {
    "command": "python",
    "args": ["-m", "mcp_server_stats"]
  }
}

Top Comments

Related MCP Servers

View all research_and_data servers →

biomcp

4.6

by genomoncology

BioMCP is an open-source toolkit designed to enhance AI assistants with specialized biomedical knowledge by connecting them to authoritative biomedical data sources.

jamie7893/statsource-mcp

Tools

suggest_feature

What this tool does:

When to use this tool:

Required inputs:

Optional inputs:

Returns:

calculate_statistics

What this tool does:

IMPORTANT INSTRUCTIONS FOR AI AGENTS:

IMPORTANT: Parameter Validation and Formatting

CRITICAL: Column Name Formatting & Case-Insensitivity

Error Response Handling

When to use this tool:

Required inputs:

Optional inputs:

Valid statistics options:

Returns:

predict_trends

What this tool does:

IMPORTANT INSTRUCTIONS FOR AI AGENTS:

IMPORTANT: Parameter Validation and Formatting

CRITICAL: Column Name Formatting & Case-Insensitivity

Error Response Handling

When to use this tool:

Required inputs:

Optional inputs:

ML Prediction features returned:

Returns:

anomaly_detection

What this tool does:

IMPORTANT INSTRUCTIONS FOR AI AGENTS:

IMPORTANT: Parameter Validation and Formatting

CRITICAL: Column Name Formatting & Case-Insensitivity

Error Response Handling

When to use this tool:

Required inputs:

Optional inputs:

Returns:

Prompts

No prompts

Resources

No resources

Related MCP Servers

biomcp

sitemcp

n8n-mcp-server

exa-mcp-server

mcp-trends-hub

whois-mcp

mcp-searxng

mcp-local-rag

openapi-mcp-server

firecrawl-mcp-server

pubchem_mcp_server

oci-documentation-mcp-server

mcp-server-data-exploration

datagov-mcp

vertex-ai-mcp-server

Sequential Thinking MCP Server

perplexity-mcp

web3-research-mcp

search-server

pubmearch

zotero-mcp

Baidu AI Search

mcp-omnisearch

mcp-ragdocs

rmcp

Google-Search-MCP-Server

imagesorcery-mcp