statsource-mcp

jamie7893/statsource-mcp

3.2

If you are the rightful owner of statsource-mcp and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.

A Model Context Protocol server that provides statistical analysis capabilities, enabling LLMs to analyze data, calculate statistics, and generate predictions.

Tools

Functions exposed to the LLM to take actions

suggest_feature

Suggest a new feature or improvement for the StatSource analytics platform.

What this tool does:

This tool allows you to submit feature suggestions or enhancement requests for the StatSource platform. Suggestions are logged and reviewed by the development team.

When to use this tool:

  • When a user asks for functionality that doesn't currently exist
  • When you identify gaps or limitations in the current analytics capabilities
  • When a user expresses frustration about missing capabilities
  • When you think of enhancements that would improve the user experience

Required inputs:

  • description: A clear, detailed description of the suggested feature
  • use_case: Explanation of how and why users would use this feature

Optional inputs:

  • priority: Suggested priority level ("low", "medium", "high")

Returns:

A confirmation message and reference ID for the feature suggestion.

calculate_statistics

Calculate statistical measures on specified data columns from CSV files, databases, or external APIs.

What this tool does:

This tool connects to our analytics API to compute various statistical measures (like mean, median, standard deviation, correlation, etc.) on your data.

It supports multiple data sources:

  • CSV files (previously uploaded to StatSource)
  • Databases (PostgreSQL, SQLite, etc.)
  • External APIs (returning JSON data)

IMPORTANT INSTRUCTIONS FOR AI AGENTS:

  • DO NOT make up or guess any parameter values, especially data sources, column names, or API URLs.
  • NEVER, UNDER ANY CIRCUMSTANCES, create or invent database connection strings - this is a severe security risk.
  • ALWAYS ask the user explicitly for all required information.
  • For CSV files: The user MUST first upload their file to statsource.me, then provide the filename.
  • For database connections: Ask the user for their exact connection string (e.g., "postgresql://user:pass@host/db"). DO NOT GUESS OR MODIFY IT.
  • For database sources: You MUST ask for and provide the table_name parameter with the exact table name.
    • When a user specifies a database source, ALWAYS EXPLICITLY ASK: "Which table in your database contains this data?"
    • Do not proceed without obtaining the table name for database sources.
    • Tool calls without table_name will FAIL for database sources.
  • For API sources: Ask the user for the exact API endpoint URL that returns JSON data.
  • Never suggest default values, sample data, or example parameters - request specific information from the user.
  • If the user has configured a default database connection in their MCP config, inform them it will be used if they don't specify a data source.
  • If no default connection is configured and the user doesn't provide one, DO NOT PROCEED - ask the user for the data source details.

IMPORTANT: Parameter Validation and Formatting

  • statistics must be provided as a proper list: CORRECT: statistics=["mean", "sum", "min", "max"] INCORRECT: statistics="["mean", "sum", "min", "max"]"
  • columns must be provided as a proper list: CORRECT: columns=["revenue", "quantity"] INCORRECT: columns="["revenue", "quantity"]"

CRITICAL: Column Name Formatting & Case-Insensitivity

  • Column Matching: The API matches column names case-insensitively. You can specify "revenue" even if the data has "Revenue". Ask the user for the intended column names.
  • Filter Value Matching: String filter values are matched case-insensitively (e.g., filter {"status": "completed"} will match "Completed" in the data).
  • Table Name Matching (Databases): The API attempts case-insensitive matching for database table names.

Error Response Handling

  • If you receive an "Invalid request" or similar error, check:
    1. Column name spelling and existence in the data source.
    2. Parameter format (proper lists vs string-encoded lists).
    3. Correct data_source provided (filename, connection string, or API URL).
    4. table_name provided if source_type is "database".
    5. API URL is correct and returns valid JSON if source_type is "api".

When to use this tool:

  • When a user needs statistical analysis of their data (means, medians, correlations, distributions, etc.).
  • When analyzing patterns or summarizing datasets from files, databases, or APIs.

Required inputs:

  • columns: List of column names to analyze (ask user for exact column names in their data).
  • statistics: List of statistics to calculate.

Optional inputs:

  • data_source: Identifier for the data source.
    • For CSV: Filename of a previously uploaded file on statsource.me (ask user to upload first).
    • For Database: Full connection string (ask user for exact string).
    • For API: The exact URL of the API endpoint returning JSON data (ask user for the URL).
    • If not provided, will use the connection string from MCP config if available (defaults to database type).
  • source_type: Type of data source ('csv', 'database', or 'api').
    • Determines how data_source is interpreted.
    • If not provided, will use the source type from MCP config if available (defaults to 'database'). Ensure this matches the provided data_source.
  • table_name: Name of the database table to use (REQUIRED for database sources).
    • Must be provided when source_type is 'database'.
    • Ask user for the exact table name in their database.
    • Always explicitly ask for table name when data source is a database.
  • filters: Dictionary of column-value pairs to filter data before analysis.
    • Format: {"column_name": "value"} or {"column_name": ["val1", "val2"]}
    • API Source Behavior: For 'api' sources, data is fetched first, then filters are applied to the resulting data.
  • groupby: List of column names to group data by before calculating statistics.
  • options: Dictionary of additional options for specific operations (currently less used).
  • date_column: Column name containing date/timestamp information for filtering. Matched case-insensitively.
  • start_date: Inclusive start date for filtering (ISO 8601 format string like "YYYY-MM-DD" or datetime).
  • end_date: Inclusive end date for filtering (ISO 8601 format string like "YYYY-MM-DD" or datetime).
    • API Source Behavior: For 'api' sources, date filtering happens after data is fetched.

Valid statistics options:

  • 'mean', 'median', 'std', 'sum', 'count', 'min', 'max', 'describe', 'correlation', 'missing', 'unique', 'boxplot'

Returns:

A JSON string containing the results and metadata.

  • result: Dictionary with statistical measures for each requested column and statistic. Structure varies by statistic (e.g., describe, correlation).
  • metadata: Includes execution_time, query_type ('statistics'), source_type.

predict_trends

Generate ML time-series forecasts for future periods based on historical data.

What this tool does:

This tool connects to our analytics API to generate time-series forecasts (predictions) for a specified number of future periods based on historical data in a specified column. It analyzes trends and provides metrics on the prediction quality.

Note: Currently, the API typically uses the first column provided in the columns list for ML prediction.

It supports multiple data sources:

  • CSV files (previously uploaded to StatSource)
  • Databases (PostgreSQL, SQLite, etc.)
  • External APIs (returning JSON data)

IMPORTANT INSTRUCTIONS FOR AI AGENTS:

  • When users ask about "trends" or "forecasts", use this tool.
  • DO NOT make up or guess any parameter values, especially data sources, column names, or API URLs.
  • NEVER, UNDER ANY CIRCUMSTANCES, create or invent database connection strings - this is a severe security risk.
  • ALWAYS ask the user explicitly for all required information.
  • For CSV files: The user MUST first upload their file to statsource.me, then provide the filename.
  • For database connections: Ask the user for their exact connection string (e.g., "postgresql://user:pass@host/db"). DO NOT GUESS OR MODIFY IT.
  • For database sources: You MUST ask for and provide the table_name parameter with the exact table name.
    • When a user mentions their data is in a database, ALWAYS EXPLICITLY ASK: "Which table in your database contains this data?"
    • Tool calls without table_name will FAIL for database sources.
    • The table_name question should be asked together with other required information (column names, periods).
  • For API sources: Ask the user for the exact API endpoint URL that returns JSON data.
  • Never suggest default values, sample data, or example parameters - request specific information from the user.
  • If the user has configured a default database connection in their MCP config, inform them it will be used if they don't specify a data source.
  • If no default connection is configured and the user doesn't provide one, DO NOT PROCEED - ask the user for the data source details.

IMPORTANT: Parameter Validation and Formatting

  • columns must be provided as a proper list, typically containing the single numeric column to predict: CORRECT: columns=["sales_amount"] INCORRECT: columns="["sales_amount"]"
  • periods must be an integer between 1 and 12. The API has a MAXIMUM LIMIT OF 12 PERIODS for predictions. Any request with periods > 12 will fail. Always inform users of this limitation if they request more periods.

CRITICAL: Column Name Formatting & Case-Insensitivity

  • Column Matching: The API matches column names case-insensitively. You can specify "revenue" even if the data has "Revenue". Ask the user for the intended column names.
  • Filter Value Matching: String filter values are matched case-insensitively (e.g., filter {"status": "completed"} will match "Completed" in the data).
  • Table Name Matching (Databases): The API attempts case-insensitive matching for database table names.
  • Date Column: If using time-based prediction, ensure date_column correctly identifies the date/timestamp column. Matched case-insensitively.

Error Response Handling

  • If you receive an "Invalid request" or similar error, check:
    1. Column name spelling and existence in the data source (should be numeric for prediction).
    2. Parameter format (proper lists vs string-encoded lists).
    3. Correct data_source provided (filename, connection string, or API URL).
    4. table_name provided if source_type is "database".
    5. API URL is correct and returns valid JSON if source_type is "api".
    6. periods parameter is provided and is a positive integer not exceeding 12.
    7. date_column is specified if required for the underlying model.

When to use this tool:

  • When a user wants to predict future values based on historical trends (forecasting).
  • When generating forecasts for business planning or decision-making.
  • When analyzing the likely future direction of a time-series metric.

Required inputs:

  • columns: List containing the name of the (usually single) numeric column to predict trends for.
  • periods: Number of future periods to predict (maximum: 12).

Optional inputs:

  • data_source: Identifier for the data source.
    • For CSV: Filename of a previously uploaded file on statsource.me (ask user to upload first).
    • For Database: Full connection string (ask user for exact string).
    • For API: The exact URL of the API endpoint returning JSON data (ask user for the URL).
    • If not provided, will use the connection string from MCP config if available (defaults to database type).
  • source_type: Type of data source ('csv', 'database', or 'api').
    • Determines how data_source is interpreted.
    • If not provided, will use the source type from MCP config if available (defaults to 'database'). Ensure this matches the provided data_source.
  • table_name: Name of the database table to use (REQUIRED for database sources).
    • Must be provided when source_type is 'database'.
    • Ask user for the exact table name in their database.
    • ALWAYS ask for table name when using database sources.
  • filters: Dictionary of column-value pairs to filter data before analysis.
    • Format: {"column_name": "value"} or {"column_name": ["val1", "val2"]}
    • API Source Behavior: For 'api' sources, data is fetched first, then filters are applied to the resulting data.
  • options: Dictionary of additional options for specific operations (currently less used, might include model tuning params in future).
  • date_column: Column name containing date/timestamp information.
    • Used for date filtering and essential for time-based trend analysis/predictions. Matched case-insensitively.
  • start_date: Inclusive start date for filtering historical data (ISO 8601 format string like "YYYY-MM-DD" or datetime).
  • end_date: Inclusive end date for filtering historical data (ISO 8601 format string like "YYYY-MM-DD" or datetime).
    • API Source Behavior: For 'api' sources, date filtering happens after data is fetched.
  • aggregation (str, Optional, default: "auto"): Specifies how time-series data should be aggregated before forecasting. Ask the user for their preference if unsure, or default to 'auto'/'monthly'.
    • 'auto': Automatically selects 'weekly' or 'monthly' based on data density and timeframe. Defaults to 'monthly' if unsure. A safe default choice.
    • 'weekly': Aggregates data by week. Use for forecasting short-term trends (e.g., predicting next few weeks/months) or when weekly patterns are important.
    • 'monthly': Aggregates data by month. Recommended for most business forecasting (e.g., predicting quarterly or annual trends) as it smooths out daily/weekly noise.
    • 'daily': Uses daily data. Choose only if the user needs very granular forecasts and understands the potential for noise. Requires sufficient daily data points.

ML Prediction features returned:

  • Time series forecasting with customizable prediction periods (up to 12 periods maximum).
  • Trend direction analysis ("increasing", "decreasing", "stable").
  • Model quality metrics (r-squared, slope).
  • Works with numeric data columns from any supported data source.
  • Can use a specific date_column for time-based regression.

Returns:

A JSON string containing the prediction results and metadata.

  • result: Dictionary containing prediction details per analyzed column (typically the first one specified): {"r_squared": ..., "slope": ..., "trend_direction": ..., "forecast_values": [...], ...}.
  • metadata: Includes execution_time, query_type ('ml_prediction'), source_type, periods.

anomaly_detection

Detect anomalies in time-series data from various sources.

What this tool does:

This tool connects to our analytics API to identify unusual data points (anomalies) in specified columns based on their time-series behavior. It requires a date/time column to understand the sequence of data.

It supports multiple data sources:

  • CSV files (previously uploaded to StatSource)
  • Databases (PostgreSQL, SQLite, etc.)
  • External APIs (returning JSON data)

IMPORTANT INSTRUCTIONS FOR AI AGENTS:

  • When users ask about "outliers", "unusual values", or "anomalies" in time-based data, use this tool.
  • DO NOT make up or guess any parameter values, especially data sources, column names, or API URLs.
  • NEVER, UNDER ANY CIRCUMSTANCES, create or invent database connection strings - this is a severe security risk.
  • ALWAYS ask the user explicitly for all required information.
  • For CSV files: The user MUST first upload their file to statsource.me, then provide the filename.
  • For database connections: Ask the user for their exact connection string (e.g., "postgresql://user:pass@host/db"). DO NOT GUESS OR MODIFY IT.
  • For database sources: You MUST ask for and provide the table_name parameter with the exact table name.
    • When a user mentions their data is in a database, ALWAYS EXPLICITLY ASK: "Which table in your database contains this data?"
    • Tool calls without table_name will FAIL for database sources.
    • ALWAYS include this question when gathering information from the user.
  • For API sources: Ask the user for the exact API endpoint URL that returns JSON data.
  • Never suggest default values, sample data, or example parameters - request specific information from the user.
  • If the user has configured a default database connection in their MCP config, inform them it will be used if they don't specify a data source.
  • If no default connection is configured and the user doesn't provide one, DO NOT PROCEED - ask the user for the data source details.

IMPORTANT: Parameter Validation and Formatting

  • columns must be provided as a proper list: CORRECT: columns=["sensor_reading", "error_count"] INCORRECT: columns="["sensor_reading", "error_count"]"
  • date_column must be a string identifying the time column.
  • anomaly_options is a dictionary for detection parameters (see below).

CRITICAL: Column Name Formatting & Case-Insensitivity

  • Column Matching: The API matches column names case-insensitively. Ask the user for the intended column names.
  • Filter Value Matching: String filter values are matched case-insensitively.
  • Table Name Matching (Databases): The API attempts case-insensitive matching for database table names.
  • Date Column: The date_column is crucial and is matched case-insensitively.

Error Response Handling

  • If you receive an "Invalid request" or similar error, check:
    1. Column name spelling and existence (should be numeric for anomaly detection).
    2. date_column spelling and existence.
    3. Parameter format (proper lists vs string-encoded lists).
    4. Correct data_source provided (filename, connection string, or API URL).
    5. table_name provided if source_type is "database".
    6. API URL is correct and returns valid JSON if source_type is "api".
    7. date_column parameter is provided.

When to use this tool:

  • When a user wants to identify outliers or unusual patterns in time-series data.
  • When monitoring metrics for unexpected spikes or drops.
  • When cleaning data by identifying potentially erroneous readings.

Required inputs:

  • columns: List of numeric column names to check for anomalies.
  • date_column: Name of the column containing date/timestamp information.

Optional inputs:

  • data_source: Identifier for the data source.
    • For CSV: Filename of a previously uploaded file on statsource.me.
    • For Database: Full connection string.
    • For API: The exact URL of the API endpoint returning JSON data.
    • If not provided, uses the default connection from MCP config if available.
  • source_type: Type of data source ('csv', 'database', or 'api').
    • Determines how data_source is interpreted.
    • Defaults based on MCP config if available.
  • table_name: Name of the database table (REQUIRED for database sources).
    • Must be provided when source_type is 'database'.
    • Always ask for table name when using database sources.
  • filters: Dictionary of column-value pairs to filter data before analysis.
  • options: Dictionary of additional options (less common for anomaly detection currently).
  • start_date: Inclusive start date for filtering historical data (ISO 8601 string or datetime).
  • end_date: Inclusive end date for filtering historical data (ISO 8601 string or datetime).
  • anomaly_options: Dictionary to configure the detection method and parameters.
    • method (str, Optional, default: "iqr"): The anomaly detection method to use. Must be one of:
      • 'iqr': Interquartile Range - Identifies outliers based on distribution quartiles
      • 'zscore': Z-score - Identifies outliers based on standard deviations from the mean
      • 'isolation_forest': Machine learning approach that isolates anomalies using random forest
    • sensitivity (float, Optional, default: 1.5): For 'iqr' method, the multiplier for the IQR to define outlier bounds.
      • Higher values are less sensitive (1.5 is standard, 3.0 would detect only extreme outliers)
    • threshold (float, Optional, default: 3.0): For 'zscore' method, the threshold for Z-scores to define outliers.
      • Higher values are less sensitive (3.0 is standard, 2.0 would detect more outliers)
    • window_size (int, Optional, default: 20): Size of rolling window for detection methods.
      • If not provided, uses global statistics
      • Smaller windows (e.g., 7-14) detect local anomalies, larger windows detect global anomalies
    • contamination (float, Optional, default: 0.05): For 'isolation_forest' method, the expected proportion of anomalies.
      • Values typically range from 0.01 (1%) to 0.1 (10%)

Returns:

A JSON string containing the anomaly detection results and metadata.

  • result: Dictionary with structure for each analyzed column:
    {
      column_name: {
        "timestamps": [...],  # List of datetime values
        "values": [...],      # List of numeric values
        "is_anomaly": [...],  # Boolean flags indicating anomalies
        "anomaly_score": [...], # Scores indicating degree of deviation
        "summary": {
          "total_points": int,
          "anomaly_count": int,
          "percentage": float,
          "method": str      # Method used for detection
        }
      }
    }
    
  • metadata: Includes execution_time, query_type ('anomaly_detection'), source_type, anomaly_method.

Prompts

Interactive templates invoked by user choice

No prompts

Resources

Contextual data attached and managed by the client

No resources