kevinbtalbert/senzing-mcp-server
If you are the rightful owner of senzing-mcp-server and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.
The Senzing Model Context Protocol (MCP) Server is designed to facilitate the integration and management of entity resolution processes using the Senzing engine.
Senzing MCP Server for Cloudera Agent Studio
A Model Context Protocol (MCP) server that provides Cloudera Agent Studio with access to Senzing Entity Resolution capabilities using the Senzing v4 SDK.
Overview
This MCP server enables Cloudera Agent Studio to perform entity resolution operations:
- Entity Resolution: Add records and automatically resolve them against existing entities
- Entity Search: Find entities by attributes (name, address, phone, etc.)
- Relationship Discovery: Find paths and networks between entities
- Explainability: Understand why entities are resolved together
- Statistics & Monitoring: View repository statistics and performance metrics
Designed for use with CAI-Senzing-Custom-Runtime in Cloudera ML workspaces.
Architecture
The MCP server uses an embedded architecture - the Senzing SDK runs directly inside the MCP server's Python process as a native library (no separate server required).
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā Cloudera Agent Studio Workflow ā
ā (Running in Cloudera ML) ā
āāāāāāāāāāāāāāāāāāāāāāāāāāāāā¬āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā
ā MCP Protocol (stdio/JSON-RPC)
ā via mcpadapt
ā
āāāāāāāāāāāāāāāāāāāāāāāāāāāāā¼āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā senzing-mcp-server (Python Process) ā
ā ā
ā āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā ā
ā ā MCP Server Layer ā ā
ā ā - Tool handlers (add_record, search_entities, etc)ā ā
ā ā - Request/response marshaling ā ā
ā āāāāāāāāāāāāāāāāāāāāāāā¬āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā ā
ā ā ā
ā āāāāāāāāāāāāāāāāāāāāāāā¼āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā ā
ā ā Senzing SDK v4 (Embedded Library) ā ā
ā ā - SzAbstractFactoryCore (Python bindings) ā ā
ā ā - SzEngine (entity resolution) ā ā
ā ā - Native C++ libraries (.so files) ā ā
ā āāāāāāāāāāāāāāāāāāāāāāā¬āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā ā
ā ā Direct file I/O ā
āāāāāāāāāāāāāāāāāāāāāāāāāā¼āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā
ā¼
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā SQLite Database ā
ā ~/senzing/var/sqlite/G2C.db ā
ā (Persistent Storage) ā
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
Key Points:
- š§ Embedded SDK: No separate Senzing server process
- ā” In-Process: All operations are direct function calls
- š Direct Access: Database accessed via file I/O
- š Fast: No network overhead, minimal latency
Prerequisites
Before using this MCP server in Agent Studio, you must:
- Have CAI-Senzing-Custom-Runtime deployed in your Cloudera ML workspace
- Create a persistent Senzing project at
~/senzing
- Load data into your Senzing database
See the CAI-Senzing-Custom-Runtime setup guide for complete instructions.
Quick Setup
1. Verify Senzing is Ready
In your Cloudera ML terminal:
# Check that project exists
ls -la ~/senzing/var/sqlite/G2C.db
# Should show your database file (e.g., 1.1M)
# Test that Senzing SDK is accessible
cd ~/senzing && source setupEnv
python3 -c "from senzing_core import SzAbstractFactoryCore; print('ā Senzing SDK available')"
2. Configure Agent Studio
Add this MCP server configuration to your Agent Studio workflow:
{
"mcpServers": {
"senzing": {
"command": "uvx",
"args": [
"--from",
"git+https://github.com/kevinbtalbert/senzing-mcp-server@main",
"senzing-mcp"
],
"env": {
"SENZING_PROJECT_DIR": "/home/cdsw/senzing",
"PYTHONPATH": "/opt/senzing/er/sdk/python",
"LD_LIBRARY_PATH": "/home/cdsw/senzing/lib:/opt/senzing/er/lib"
}
}
}
}
3. Test the Configuration
Before running in Agent Studio, test manually:
cd ~/senzing && source setupEnv
export SENZING_PROJECT_DIR=~/senzing
uvx --from git+https://github.com/kevinbtalbert/senzing-mcp-server@main senzing-mcp
The server should start and wait for input. Press Ctrl+C
to stop.
Agent Configuration Best Practices
š Complete Configurations Available: For production-ready agent configurations with full safety controls, approval workflows, and detailed examples, see .
Recommended Agent Setup
For optimal entity resolution workflows in Agent Studio, configure two specialized agents:
1. Senzing Query Coordinator (Manager Agent)
This agent analyzes user questions and delegates to the specialist.
Name: Senzing Query Coordinator
Role: Senzing Query Coordinator & Intent Analyzer
Backstory:
You are a senior entity resolution coordinator with 15+ years of experience across
financial services, law enforcement, and compliance operations. You've successfully
managed thousands of entity resolution investigations and have deep expertise in
translating user questions into precise Senzing operations.
Your specialty is understanding the EXACT technical meaning behind user questions:
- "customer 1070" means DATA_SOURCE="CUSTOMERS", RECORD_ID="1070" (NOT a person's name!)
- "entity 55" means a resolved entity with numeric ID 55
- "watchlist" refers to the WATCHLIST data source
- Users want to RETRIEVE existing data, not create new records
You understand the complete entity resolution workflow: retrieve records ā find
entities ā explore relationships ā explain matches. You know which Senzing tools
handle each task and you delegate with crystal-clear, step-by-step instructions
that leave no room for misinterpretation.
CRITICAL RULES YOU NEVER BREAK:
1. Record IDs are strings like "1070", "2013", "5001" - use get_record()
2. Entity IDs are numbers like 55, 91, 400001 - use get_entity()
3. NEVER tell the specialist to use add_record unless user explicitly says "create" or "add new"
4. For add_record or delete_record, ALWAYS instruct specialist to get user approval BEFORE executing
5. Always specify EXACT tool names and parameters in your delegations
6. Break complex queries into explicit numbered steps
Goal:
Parse user questions to identify what entity resolution operations are needed, then
delegate to the Entity Resolution Specialist with explicit, step-by-step instructions
that specify: 1) EXACT tools to use (by name: get_record, get_entity, search_entities,
find_network, find_path, why_entities, get_stats, add_record, delete_record),
2) EXACT parameters (data_source, record_id, entity_id, search_attributes),
3) EXPECTED results from each step, 4) VALIDATION checkpoints to catch errors,
5) BUSINESS context explaining why this matters.
Your delegations must be so clear that the specialist can execute them mechanically
without guessing. For add_record or delete_record operations, your delegation MUST
include an explicit instruction to get user approval before executing.
Tools: None (coordinator only)
MCP Servers: None (coordinator only)
2. Entity Resolution Specialist (Worker Agent)
This agent executes the actual Senzing operations.
Name: Entity Resolution Specialist
Role: Senior Entity Resolution Analyst
Backstory:
You are a world-class entity resolution expert with 20+ years of experience using the
Senzing platform across domains including fraud detection, customer master data
management (MDM), sanctions screening, anti-money laundering (AML), and law enforcement
intelligence.
You have executed millions of entity resolution queries and deeply understand the
Senzing architecture:
- Records are stored by data source (CUSTOMERS, REFERENCE, WATCHLIST) with string IDs ("1070", "2013")
- Records are resolved into entities with numeric IDs (55, 91, 400001)
- A single entity can contain multiple records from different sources
- Relationships exist between entities (disclosed, possible, ambiguous)
You are proficient with all Senzing tools and know EXACTLY when to use each:
get_record (for "customer 1070"), get_entity (for entity IDs), search_entities
(for finding by attributes), find_network (for relationships), find_path (for
connections), why_entities (for explanations), get_stats (for metrics), add_record
(ONLY when user explicitly creates new data, REQUIRES APPROVAL), delete_record
(ONLY when user explicitly deletes, REQUIRES APPROVAL).
CRITICAL RULES:
1. Record IDs are STRINGS ("1070") - Entity IDs are INTEGERS (55)
2. ALWAYS start with get_record for "customer X" queries
3. NEVER use add_record unless explicitly creating NEW data
4. Extract REAL attributes from records for searches
5. Validate each step - if results look wrong, STOP
6. Provide business interpretation, not just raw JSON dumps
Goal:
Execute entity resolution operations with precision. FOR EVERY TASK: 1) Parse
instructions carefully and identify exact tools/parameters, 2) Execute step-by-step,
extracting data from each result for next steps, 3) Validate results look real
(not placeholder data), 4) Provide business context explaining what the data means,
5) Format final answer with direct response, supporting details, and significance.
ā ļø MANDATORY APPROVAL FOR DESTRUCTIVE OPERATIONS:
For add_record: Check if record exists first. Present for approval: "I need to add:
Data Source: [X], Record ID: [Y], Data: [show all fields]. ā ļø This will create a
new record and may trigger entity resolution. Do you want to proceed? (yes/no/modify)".
Only execute after explicit "yes". NEVER execute add_record without approval!
For delete_record: Get the record first to show what will be deleted. Present for
approval: "I need to delete: Data Source: [X], Record ID: [Y], Current Data: [show
record]. ā ļø This permanently removes the record. Do you want to proceed? (yes/no)".
Only execute after explicit "yes". NEVER execute delete_record without approval!
ERROR RECOVERY: Explain what failed and why. Provide information you DO have.
Suggest how to get missing information. DON'T make up data. DON'T use add_record
as a workaround. DON'T execute destructive operations without approval.
Tools: None (uses MCP only)
MCP Servers: senzing
(configured as shown above)
Query Patterns
The coordinator should recognize these patterns and delegate appropriately:
User Query Pattern | Senzing Tool to Use | Purpose |
---|---|---|
"Find", "search", "lookup" | search_entities | Discover entities by attributes |
"Show me", "get record" | get_record | Retrieve specific record |
"Get entity", "show entity" | get_entity | Get complete resolved entity |
"Related to", "connected" | find_network | Map relationships |
"Path between", "connection" | find_path | Find relationship paths |
"Why", "explain match" | why_entities | Explain resolution decisions |
"Statistics", "metrics" | get_stats | Get repository statistics |
"Add record", "create" | add_record | Insert new records |
Example Workflow
User Question: "Who is customer 1070 and are they connected to anyone on the watchlist?"
Manager Agent Delegates:
Task: Investigate customer 1070 for potential watchlist connections
Context: User needs identity verification and risk assessment
Steps needed:
1. Retrieve customer 1070 record details
2. Get the resolved entity (may include multiple records)
3. Search for networks/relationships
4. Check for any watchlist connections
5. Assess risk level based on findings
Specialist Agent Executes:
1. get_record("CUSTOMERS", "1070")
ā Found: Jie Wang, Hong Kong, DOB 9/14/93
2. search_entities({"NAME_FULL": "Jie Wang", "DATE_OF_BIRTH": "9/14/93"})
ā Found entity 55
3. get_entity(55)
ā Entity includes: CUSTOMERS 1069, 1070, REFERENCE 2013
4. find_network([55], max_degrees=2)
ā Connected to entity 91 (business relationship)
5. search_entities in WATCHLIST
ā No matches found
Final Answer: Customer 1070 is Jie Wang from Hong Kong. This record is part of
entity 55, which consolidates 3 records. Entity 55 has a business relationship
(60% ownership) with entity 91. No watchlist matches found. Risk: LOW.
Tips for Better Results
DO:
- ā Provide business context in responses (what does the data mean?)
- ā Explain match confidence and data quality
- ā Suggest relevant follow-up queries
- ā Interpret findings for non-technical users
- ā Highlight ambiguous matches or conflicts
- ā Get user approval before add_record or delete_record operations
DON'T:
- ā Return raw JSON without interpretation
- ā Make assumptions without verifying data
- ā Ignore relationship implications
- ā Skip data quality observations
- ā Execute destructive operations (add_record, delete_record) without explicit user approval
ā ļø Safety Controls for Destructive Operations
Critical: The add_record
and delete_record
tools can modify or corrupt data. Your specialist agent must:
-
For add_record:
- Check if record exists first (use
get_record
) - Show user ALL data to be added
- Explain entity resolution impact
- Wait for explicit "yes" approval
- Only then execute
- Check if record exists first (use
-
For delete_record:
- Retrieve and show the record being deleted
- Show entity impact (other records in that entity)
- Explain consequences
- Wait for explicit "yes" approval
- Only then execute
Never use add_record
to "fix" a failed get_record
- that means the record doesn't exist!
š Complete Agent Configurations: See for detailed, production-ready agent configurations with safety controls, validation checklists, and example workflows.
Available Tools
Once configured, your Agent Studio workflows can use these Senzing tools:
add_record
Add a record for entity resolution.
ā ļø Requires user approval - Agent must present data and get explicit "yes" before executing.
Example: "Add a customer record for Jane Smith, DOB 1985-03-15, living at 123 Main St"
get_entity
Get complete details about a resolved entity by entity ID.
Example: "Get entity 55"
search_entities
Search for entities by attributes.
Example: "Search for entities named Robert Smith born in 1978"
find_path
Find relationship paths between two entities.
Example: "Find the path between entity 55 and entity 91"
find_network
Discover networks of related entities.
Example: "Find the network around entity 55 within 2 degrees"
get_record
Retrieve a specific record by data source and record ID.
Example: "Get record 1070 from CUSTOMERS"
delete_record
Delete a record from Senzing.
ā ļø Requires user approval - Agent must show what will be deleted and get explicit "yes" before executing.
Example: "Delete record NEW_001 from CUSTOMERS"
why_entities
Explain why two entities are resolved together (or not).
Example: "Why are entity 55 and 91 related?"
get_stats
Get repository statistics.
Example: "Show me Senzing statistics"
Entity Attributes
When adding records, use Senzing's standardized attribute names:
Category | Attributes |
---|---|
Names | NAME_FULL , NAME_FIRST , NAME_LAST , NAME_MIDDLE |
Dates | DATE_OF_BIRTH |
Addresses | ADDR_FULL , ADDR_LINE1 , ADDR_CITY , ADDR_STATE , ADDR_POSTAL_CODE |
Contact | PHONE_NUMBER , EMAIL_ADDRESS |
IDs | SSN_NUMBER , NATIONAL_ID , PASSPORT_NUMBER , DRIVERS_LICENSE_NUMBER |
See the Senzing Entity Specification for complete details.
Troubleshooting
Timeout Error: "Couldn't connect to the MCP server after 60 seconds"
This usually means the Senzing SDK can't be imported. Follow these steps:
1. Verify Environment
cd ~/senzing && source setupEnv
echo $PYTHONPATH
# Should include: /opt/senzing/er/sdk/python
python3 -c "from senzing_core import SzAbstractFactoryCore; from senzing import SzError; print('ā OK')"
# Should print: ā OK
If the import fails, check:
- Senzing is installed:
ls /opt/senzing/er/sdk/python/senzing_core/
- setupEnv exists:
ls ~/senzing/setupEnv
- Database exists:
ls ~/senzing/var/sqlite/G2C.db
2. Test Manual Server Start
cd ~/senzing && source setupEnv
export SENZING_PROJECT_DIR=~/senzing
export PYTHONPATH=/opt/senzing/er/sdk/python
export LD_LIBRARY_PATH=/home/cdsw/senzing/lib:/opt/senzing/er/lib
python3 -m senzing_mcp_server.server
If this fails, fix the error before using with Agent Studio.
3. Check Database Permissions
ls -la ~/senzing/var/sqlite/G2C.db
# Should be readable: -rw-r--r--
# Fix if needed:
chmod 644 ~/senzing/var/sqlite/G2C.db
4. Verify Agent Studio Configuration
Double-check your MCP configuration includes all required environment variables:
SENZING_PROJECT_DIR
:/home/cdsw/senzing
PYTHONPATH
:/opt/senzing/er/sdk/python
LD_LIBRARY_PATH
:/home/cdsw/senzing/lib:/opt/senzing/er/lib
Common Errors
"Data source code [CUSTOMERS] does not exist"
- You need to configure data sources before loading data
- See: CAI-Senzing-Custom-Runtime setup guide
"Unknown resolved entity value 'X'"
- Entity ID doesn't exist in your database
- Use
sz_explorer
to find valid entity IDs - Load more data if database is empty
"engine object has been destroyed"
- The SDK factory went out of scope (shouldn't happen with current code)
- Report as a bug if you see this
Database Locked
If you see "database is locked":
# Stop any running Senzing processes
pkill -f senzing
# Remove lock files if present
rm -f ~/senzing/var/sqlite/*.lock
# Restart Agent Studio workflow
Testing Your Setup
Diagnostic Script
Download and run the diagnostic script:
cd ~/senzing && source setupEnv
wget https://raw.githubusercontent.com/kevinbtalbert/senzing-mcp-server/main/debug_mcp_startup.sh
chmod +x debug_mcp_startup.sh
./debug_mcp_startup.sh
This will check:
- ā Senzing project exists
- ā Database is accessible
- ā PYTHONPATH is configured
- ā Python can import Senzing modules
- ā MCP server can initialize
Test in Agent Studio
Create a simple workflow with a task that uses Senzing:
Task: "Search for entities named Robert Smith in Senzing"
If configured correctly, the agent will:
- Connect to the MCP server
- Call
search_entities
with appropriate parameters - Return matching entities
Performance Notes
SQLite Limitations
The setup uses SQLite for evaluation/development:
- ā Simple, no setup required
- ā File-based, easy persistence
- ā ļø Single-writer (concurrency limited)
- ā ļø Not suitable for production
For production workloads, migrate to PostgreSQL or MySQL for:
- Better concurrency
- Improved performance (sub-millisecond per record)
- Scalability to millions of records
Expected Performance
With SQLite:
- ~50ms per record insert
- Search queries: <100ms for small datasets
- Path finding: <500ms for most queries
With PostgreSQL:
- <1ms per record insert
- Search queries: <50ms
- Path finding: <200ms
Resources
- CAI-Senzing-Custom-Runtime - Docker runtime setup guide
- Senzing Documentation
- Senzing Python SDK v4 Reference
- Senzing Entity Specification
- Model Context Protocol
Support
If you encounter issues:
- Run the diagnostic script (see Testing Your Setup)
- Check Troubleshooting section
- Review CAI-Senzing-Custom-Runtime setup
- Open an issue on GitHub with diagnostic output
Version
Current version: 0.2.0 (Senzing SDK v4)
See for version history and migration guide.
License
Apache License 2.0
Author
Kevin Talbert (ktalbert@cloudera.com)