jamie7893/statsource-mcp
If you are the rightful owner of statsource-mcp and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.
A Model Context Protocol server that provides statistical analysis capabilities, enabling LLMs to analyze data, calculate statistics, and generate predictions.
Tools
Functions exposed to the LLM to take actions
suggest_feature
Suggest a new feature or improvement for the StatSource analytics platform.
What this tool does:
This tool allows you to submit feature suggestions or enhancement requests for the StatSource platform. Suggestions are logged and reviewed by the development team.
When to use this tool:
- When a user asks for functionality that doesn't currently exist
- When you identify gaps or limitations in the current analytics capabilities
- When a user expresses frustration about missing capabilities
- When you think of enhancements that would improve the user experience
Required inputs:
- description: A clear, detailed description of the suggested feature
- use_case: Explanation of how and why users would use this feature
Optional inputs:
- priority: Suggested priority level ("low", "medium", "high")
Returns:
A confirmation message and reference ID for the feature suggestion.
calculate_statistics
Calculate statistical measures on specified data columns from CSV files, databases, or external APIs.
What this tool does:
This tool connects to our analytics API to compute various statistical measures (like mean, median, standard deviation, correlation, etc.) on your data.
It supports multiple data sources:
- CSV files (previously uploaded to StatSource)
- Databases (PostgreSQL, SQLite, etc.)
- External APIs (returning JSON data)
IMPORTANT INSTRUCTIONS FOR AI AGENTS:
- DO NOT make up or guess any parameter values, especially data sources, column names, or API URLs.
- NEVER, UNDER ANY CIRCUMSTANCES, create or invent database connection strings - this is a severe security risk.
- ALWAYS ask the user explicitly for all required information.
- For CSV files: The user MUST first upload their file to statsource.me, then provide the filename.
- For database connections: Ask the user for their exact connection string (e.g., "postgresql://user:pass@host/db"). DO NOT GUESS OR MODIFY IT.
- For database sources: You MUST ask for and provide the table_name parameter with the exact table name.
- When a user specifies a database source, ALWAYS EXPLICITLY ASK: "Which table in your database contains this data?"
- Do not proceed without obtaining the table name for database sources.
- Tool calls without table_name will FAIL for database sources.
- For API sources: Ask the user for the exact API endpoint URL that returns JSON data.
- Never suggest default values, sample data, or example parameters - request specific information from the user.
- If the user has configured a default database connection in their MCP config, inform them it will be used if they don't specify a data source.
- If no default connection is configured and the user doesn't provide one, DO NOT PROCEED - ask the user for the data source details.
IMPORTANT: Parameter Validation and Formatting
- statistics must be provided as a proper list: CORRECT: statistics=["mean", "sum", "min", "max"] INCORRECT: statistics="["mean", "sum", "min", "max"]"
- columns must be provided as a proper list: CORRECT: columns=["revenue", "quantity"] INCORRECT: columns="["revenue", "quantity"]"
CRITICAL: Column Name Formatting & Case-Insensitivity
- Column Matching: The API matches column names case-insensitively. You can specify "revenue" even if the data has "Revenue". Ask the user for the intended column names.
- Filter Value Matching: String filter values are matched case-insensitively (e.g., filter
{"status": "completed"}
will match "Completed" in the data). - Table Name Matching (Databases): The API attempts case-insensitive matching for database table names.
Error Response Handling
- If you receive an "Invalid request" or similar error, check:
- Column name spelling and existence in the data source.
- Parameter format (proper lists vs string-encoded lists).
- Correct data_source provided (filename, connection string, or API URL).
- table_name provided if source_type is "database".
- API URL is correct and returns valid JSON if source_type is "api".
When to use this tool:
- When a user needs statistical analysis of their data (means, medians, correlations, distributions, etc.).
- When analyzing patterns or summarizing datasets from files, databases, or APIs.
Required inputs:
- columns: List of column names to analyze (ask user for exact column names in their data).
- statistics: List of statistics to calculate.
Optional inputs:
- data_source: Identifier for the data source.
- For CSV: Filename of a previously uploaded file on statsource.me (ask user to upload first).
- For Database: Full connection string (ask user for exact string).
- For API: The exact URL of the API endpoint returning JSON data (ask user for the URL).
- If not provided, will use the connection string from MCP config if available (defaults to database type).
- source_type: Type of data source ('csv', 'database', or 'api').
- Determines how
data_source
is interpreted. - If not provided, will use the source type from MCP config if available (defaults to 'database'). Ensure this matches the provided
data_source
.
- Determines how
- table_name: Name of the database table to use (REQUIRED for database sources).
- Must be provided when source_type is 'database'.
- Ask user for the exact table name in their database.
- Always explicitly ask for table name when data source is a database.
- filters: Dictionary of column-value pairs to filter data before analysis.
- Format: {"column_name": "value"} or {"column_name": ["val1", "val2"]}
- API Source Behavior: For 'api' sources, data is fetched first, then filters are applied to the resulting data.
- groupby: List of column names to group data by before calculating statistics.
- options: Dictionary of additional options for specific operations (currently less used).
- date_column: Column name containing date/timestamp information for filtering. Matched case-insensitively.
- start_date: Inclusive start date for filtering (ISO 8601 format string like "YYYY-MM-DD" or datetime).
- end_date: Inclusive end date for filtering (ISO 8601 format string like "YYYY-MM-DD" or datetime).
- API Source Behavior: For 'api' sources, date filtering happens after data is fetched.
Valid statistics options:
- 'mean', 'median', 'std', 'sum', 'count', 'min', 'max', 'describe', 'correlation', 'missing', 'unique', 'boxplot'
Returns:
A JSON string containing the results and metadata.
result
: Dictionary with statistical measures for each requested column and statistic. Structure varies by statistic (e.g.,describe
,correlation
).metadata
: Includesexecution_time
,query_type
('statistics'),source_type
.
predict_trends
Generate ML time-series forecasts for future periods based on historical data.
What this tool does:
This tool connects to our analytics API to generate time-series forecasts (predictions) for a specified number of future periods based on historical data in a specified column. It analyzes trends and provides metrics on the prediction quality.
Note: Currently, the API typically uses the first column provided in the columns
list for ML prediction.
It supports multiple data sources:
- CSV files (previously uploaded to StatSource)
- Databases (PostgreSQL, SQLite, etc.)
- External APIs (returning JSON data)
IMPORTANT INSTRUCTIONS FOR AI AGENTS:
- When users ask about "trends" or "forecasts", use this tool.
- DO NOT make up or guess any parameter values, especially data sources, column names, or API URLs.
- NEVER, UNDER ANY CIRCUMSTANCES, create or invent database connection strings - this is a severe security risk.
- ALWAYS ask the user explicitly for all required information.
- For CSV files: The user MUST first upload their file to statsource.me, then provide the filename.
- For database connections: Ask the user for their exact connection string (e.g., "postgresql://user:pass@host/db"). DO NOT GUESS OR MODIFY IT.
- For database sources: You MUST ask for and provide the table_name parameter with the exact table name.
- When a user mentions their data is in a database, ALWAYS EXPLICITLY ASK: "Which table in your database contains this data?"
- Tool calls without table_name will FAIL for database sources.
- The table_name question should be asked together with other required information (column names, periods).
- For API sources: Ask the user for the exact API endpoint URL that returns JSON data.
- Never suggest default values, sample data, or example parameters - request specific information from the user.
- If the user has configured a default database connection in their MCP config, inform them it will be used if they don't specify a data source.
- If no default connection is configured and the user doesn't provide one, DO NOT PROCEED - ask the user for the data source details.
IMPORTANT: Parameter Validation and Formatting
- columns must be provided as a proper list, typically containing the single numeric column to predict: CORRECT: columns=["sales_amount"] INCORRECT: columns="["sales_amount"]"
- periods must be an integer between 1 and 12. The API has a MAXIMUM LIMIT OF 12 PERIODS for predictions. Any request with periods > 12 will fail. Always inform users of this limitation if they request more periods.
CRITICAL: Column Name Formatting & Case-Insensitivity
- Column Matching: The API matches column names case-insensitively. You can specify "revenue" even if the data has "Revenue". Ask the user for the intended column names.
- Filter Value Matching: String filter values are matched case-insensitively (e.g., filter
{"status": "completed"}
will match "Completed" in the data). - Table Name Matching (Databases): The API attempts case-insensitive matching for database table names.
- Date Column: If using time-based prediction, ensure
date_column
correctly identifies the date/timestamp column. Matched case-insensitively.
Error Response Handling
- If you receive an "Invalid request" or similar error, check:
- Column name spelling and existence in the data source (should be numeric for prediction).
- Parameter format (proper lists vs string-encoded lists).
- Correct data_source provided (filename, connection string, or API URL).
- table_name provided if source_type is "database".
- API URL is correct and returns valid JSON if source_type is "api".
periods
parameter is provided and is a positive integer not exceeding 12.date_column
is specified if required for the underlying model.
When to use this tool:
- When a user wants to predict future values based on historical trends (forecasting).
- When generating forecasts for business planning or decision-making.
- When analyzing the likely future direction of a time-series metric.
Required inputs:
- columns: List containing the name of the (usually single) numeric column to predict trends for.
- periods: Number of future periods to predict (maximum: 12).
Optional inputs:
- data_source: Identifier for the data source.
- For CSV: Filename of a previously uploaded file on statsource.me (ask user to upload first).
- For Database: Full connection string (ask user for exact string).
- For API: The exact URL of the API endpoint returning JSON data (ask user for the URL).
- If not provided, will use the connection string from MCP config if available (defaults to database type).
- source_type: Type of data source ('csv', 'database', or 'api').
- Determines how
data_source
is interpreted. - If not provided, will use the source type from MCP config if available (defaults to 'database'). Ensure this matches the provided
data_source
.
- Determines how
- table_name: Name of the database table to use (REQUIRED for database sources).
- Must be provided when source_type is 'database'.
- Ask user for the exact table name in their database.
- ALWAYS ask for table name when using database sources.
- filters: Dictionary of column-value pairs to filter data before analysis.
- Format: {"column_name": "value"} or {"column_name": ["val1", "val2"]}
- API Source Behavior: For 'api' sources, data is fetched first, then filters are applied to the resulting data.
- options: Dictionary of additional options for specific operations (currently less used, might include model tuning params in future).
- date_column: Column name containing date/timestamp information.
- Used for date filtering and essential for time-based trend analysis/predictions. Matched case-insensitively.
- start_date: Inclusive start date for filtering historical data (ISO 8601 format string like "YYYY-MM-DD" or datetime).
- end_date: Inclusive end date for filtering historical data (ISO 8601 format string like "YYYY-MM-DD" or datetime).
- API Source Behavior: For 'api' sources, date filtering happens after data is fetched.
- aggregation (str, Optional, default: "auto"): Specifies how time-series data should be aggregated before forecasting. Ask the user for their preference if unsure, or default to 'auto'/'monthly'.
- 'auto': Automatically selects 'weekly' or 'monthly' based on data density and timeframe. Defaults to 'monthly' if unsure. A safe default choice.
- 'weekly': Aggregates data by week. Use for forecasting short-term trends (e.g., predicting next few weeks/months) or when weekly patterns are important.
- 'monthly': Aggregates data by month. Recommended for most business forecasting (e.g., predicting quarterly or annual trends) as it smooths out daily/weekly noise.
- 'daily': Uses daily data. Choose only if the user needs very granular forecasts and understands the potential for noise. Requires sufficient daily data points.
ML Prediction features returned:
- Time series forecasting with customizable prediction periods (up to 12 periods maximum).
- Trend direction analysis ("increasing", "decreasing", "stable").
- Model quality metrics (r-squared, slope).
- Works with numeric data columns from any supported data source.
- Can use a specific
date_column
for time-based regression.
Returns:
A JSON string containing the prediction results and metadata.
result
: Dictionary containing prediction details per analyzed column (typically the first one specified):{"r_squared": ..., "slope": ..., "trend_direction": ..., "forecast_values": [...], ...}
.metadata
: Includesexecution_time
,query_type
('ml_prediction'),source_type
,periods
.
anomaly_detection
Detect anomalies in time-series data from various sources.
What this tool does:
This tool connects to our analytics API to identify unusual data points (anomalies) in specified columns based on their time-series behavior. It requires a date/time column to understand the sequence of data.
It supports multiple data sources:
- CSV files (previously uploaded to StatSource)
- Databases (PostgreSQL, SQLite, etc.)
- External APIs (returning JSON data)
IMPORTANT INSTRUCTIONS FOR AI AGENTS:
- When users ask about "outliers", "unusual values", or "anomalies" in time-based data, use this tool.
- DO NOT make up or guess any parameter values, especially data sources, column names, or API URLs.
- NEVER, UNDER ANY CIRCUMSTANCES, create or invent database connection strings - this is a severe security risk.
- ALWAYS ask the user explicitly for all required information.
- For CSV files: The user MUST first upload their file to statsource.me, then provide the filename.
- For database connections: Ask the user for their exact connection string (e.g., "postgresql://user:pass@host/db"). DO NOT GUESS OR MODIFY IT.
- For database sources: You MUST ask for and provide the table_name parameter with the exact table name.
- When a user mentions their data is in a database, ALWAYS EXPLICITLY ASK: "Which table in your database contains this data?"
- Tool calls without table_name will FAIL for database sources.
- ALWAYS include this question when gathering information from the user.
- For API sources: Ask the user for the exact API endpoint URL that returns JSON data.
- Never suggest default values, sample data, or example parameters - request specific information from the user.
- If the user has configured a default database connection in their MCP config, inform them it will be used if they don't specify a data source.
- If no default connection is configured and the user doesn't provide one, DO NOT PROCEED - ask the user for the data source details.
IMPORTANT: Parameter Validation and Formatting
- columns must be provided as a proper list: CORRECT: columns=["sensor_reading", "error_count"] INCORRECT: columns="["sensor_reading", "error_count"]"
- date_column must be a string identifying the time column.
- anomaly_options is a dictionary for detection parameters (see below).
CRITICAL: Column Name Formatting & Case-Insensitivity
- Column Matching: The API matches column names case-insensitively. Ask the user for the intended column names.
- Filter Value Matching: String filter values are matched case-insensitively.
- Table Name Matching (Databases): The API attempts case-insensitive matching for database table names.
- Date Column: The
date_column
is crucial and is matched case-insensitively.
Error Response Handling
- If you receive an "Invalid request" or similar error, check:
- Column name spelling and existence (should be numeric for anomaly detection).
date_column
spelling and existence.- Parameter format (proper lists vs string-encoded lists).
- Correct data_source provided (filename, connection string, or API URL).
table_name
provided if source_type is "database".- API URL is correct and returns valid JSON if source_type is "api".
date_column
parameter is provided.
When to use this tool:
- When a user wants to identify outliers or unusual patterns in time-series data.
- When monitoring metrics for unexpected spikes or drops.
- When cleaning data by identifying potentially erroneous readings.
Required inputs:
- columns: List of numeric column names to check for anomalies.
- date_column: Name of the column containing date/timestamp information.
Optional inputs:
- data_source: Identifier for the data source.
- For CSV: Filename of a previously uploaded file on statsource.me.
- For Database: Full connection string.
- For API: The exact URL of the API endpoint returning JSON data.
- If not provided, uses the default connection from MCP config if available.
- source_type: Type of data source ('csv', 'database', or 'api').
- Determines how
data_source
is interpreted. - Defaults based on MCP config if available.
- Determines how
- table_name: Name of the database table (REQUIRED for database sources).
- Must be provided when source_type is 'database'.
- Always ask for table name when using database sources.
- filters: Dictionary of column-value pairs to filter data before analysis.
- options: Dictionary of additional options (less common for anomaly detection currently).
- start_date: Inclusive start date for filtering historical data (ISO 8601 string or datetime).
- end_date: Inclusive end date for filtering historical data (ISO 8601 string or datetime).
- anomaly_options: Dictionary to configure the detection method and parameters.
method
(str, Optional, default: "iqr"): The anomaly detection method to use. Must be one of:- 'iqr': Interquartile Range - Identifies outliers based on distribution quartiles
- 'zscore': Z-score - Identifies outliers based on standard deviations from the mean
- 'isolation_forest': Machine learning approach that isolates anomalies using random forest
sensitivity
(float, Optional, default: 1.5): For 'iqr' method, the multiplier for the IQR to define outlier bounds.- Higher values are less sensitive (1.5 is standard, 3.0 would detect only extreme outliers)
threshold
(float, Optional, default: 3.0): For 'zscore' method, the threshold for Z-scores to define outliers.- Higher values are less sensitive (3.0 is standard, 2.0 would detect more outliers)
window_size
(int, Optional, default: 20): Size of rolling window for detection methods.- If not provided, uses global statistics
- Smaller windows (e.g., 7-14) detect local anomalies, larger windows detect global anomalies
contamination
(float, Optional, default: 0.05): For 'isolation_forest' method, the expected proportion of anomalies.- Values typically range from 0.01 (1%) to 0.1 (10%)
Returns:
A JSON string containing the anomaly detection results and metadata.
result
: Dictionary with structure for each analyzed column:{ column_name: { "timestamps": [...], # List of datetime values "values": [...], # List of numeric values "is_anomaly": [...], # Boolean flags indicating anomalies "anomaly_score": [...], # Scores indicating degree of deviation "summary": { "total_points": int, "anomaly_count": int, "percentage": float, "method": str # Method used for detection } } }
metadata
: Includesexecution_time
,query_type
('anomaly_detection'),source_type
,anomaly_method
.
Prompts
Interactive templates invoked by user choice
No prompts
Resources
Contextual data attached and managed by the client