mudakara/excel-tags-parser-mongodb
If you are the rightful owner of excel-tags-parser-mongodb and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.
The Model Context Protocol (MCP) server facilitates the integration of AI/LLMs with databases, enabling natural language queries and data visualization.
Excel Tags Parser & Analytics Platform
A comprehensive solution for parsing Excel files with tagged resources, extracting all fields dynamically, storing data in MongoDB, and performing advanced analytics with powerful visualization capabilities.
🎯 Overview
This application provides a complete end-to-end workflow for:
- 📤 Uploading Excel files with tagged data
- 🔍 Parsing ALL fields from tags dynamically (no predefined schema)
- 💾 Storing data in MongoDB with optimized structure
- 📊 Analyzing data with custom queries and visualizations
- 🤖 Querying via MCP tools for AI-powered insights
✨ Key Features
Dynamic Tag Parsing
- Automatically extracts ALL fields from tags
- No need to predefine field names
- Supports unlimited custom fields
- Works with multiple tag formats (JSON, key-value, pipe-separated)
Multi-Page Web Interface
- 🤖 AI Assistant: Ask questions about Azure cost and Infrastructure related data
- 📤 Excel Upload: Upload and process large Excel files (100K+ rows)
- 🔎 Query Builder: Build custom queries with dynamic filters and field selection
- 💰 Cost Analysis: Analyze costs by Application, Environment, Owner, and Date range
- 📊 Drill Down Analysis: Hierarchical cost analysis (Application → Environment → Owner)
- 📈 Monthly Comparison: Compare monthly costs across applications
- ❓ Help: Interactive documentation viewer with search and navigation
MongoDB Integration
- Optimized document structure for analytics
- All dynamic fields stored at top level for easy querying
- Automatic indexing for performance
- Real-time statistics
MCP Server
- 10+ tools for advanced data analysis
- Query by any field combination
- Cost analysis by any dimension
- Cross-tabulation and aggregations
- AI-friendly API for Claude and other LLMs
🚀 Quick Start
Prerequisites
- Python 3.8+
- MongoDB 4.0+
- Excel files with tagged data
Installation
-
Clone the repository
git clone https://github.com/mudakara/excel-tags-parser-mongodb.git cd excel-tags-parser-mongodb -
Install dependencies
pip3 install -r requirements.txt -
Start MongoDB
# macOS (Homebrew) brew services start mongodb-community # Or run directly mongod -
Run the application
streamlit run src/ui/streamlit_app.py -
Open in browser
- The app will automatically open at
http://localhost:8501
- The app will automatically open at
📁 Project Structure
excel-tags-parser-mongodb/
├── README.md # This file - project overview
├── Documents/ # 📚 All documentation (26 files)
│ ├── INDEX.md # Documentation index with links
│ ├── 🚀 Getting Started/
│ ├── 🏗️ Core Features/
│ ├── 🔌 MCP Integration/
│ ├── 🎨 UI Components/
│ ├── ⚡ Performance Optimizations/
│ └── 🔧 Troubleshooting/
├── src/
│ ├── database/
│ │ ├── mongodb_client.py # MongoDB connection
│ │ └── mongodb_operations.py # CRUD operations with dynamic fields
│ ├── parser/
│ │ ├── excel_reader.py # Excel reading with chunking
│ │ ├── excel_writer.py # Excel writing with progress
│ │ └── tag_parser.py # Dynamic tag parsing engine
│ ├── ui/
│ │ ├── streamlit_app.py # Main app entry point
│ │ └── pages/
│ │ ├── 0_🏠_Home.py # AI Assistant page
│ │ ├── 1_📤_Excel_Upload.py # Upload page
│ │ ├── 2_🔎_Query_Builder.py # Query Builder page
│ │ ├── 3_💰_Cost_Analysis.py # Cost Analysis page
│ │ ├── 4_📊_Drill_Down_Analysis.py # Drill-down page
│ │ ├── 5_📈_Monthly_Comparison.py # Comparison page
│ │ └── 6_❓_Help.py # Help & Documentation page
│ └── utils/
│ └── validators.py # File and data validation
├── mcp_server/
│ ├── server.py # MCP server with 10+ tools
│ └── test_mcp.py # MCP server tests
├── config.py # Configuration settings
└── requirements.txt # Python dependencies
📖 Documentation
📚 - All documentation organized and searchable
Quick Links
Getting Started (Start Here!)
- - MongoDB installation and setup
- - Complete project overview
- - Implementation summary
User Guides
- - 🆕 AI assistant setup (5 min quick start)
- - 🆕 AI query guide with examples
- - Multi-page app guide
- - How dynamic tag parsing works
- - Quick start for MCP tools
Technical Documentation
- - MongoDB schema and dynamic fields
- - Complete MCP tools reference
- - Repository setup guide
Performance Optimizations ⚡
- - 10-100x faster (Nov 17, 2025)
- - 20-30x faster (Nov 18, 2025)
- - Query optimization
Troubleshooting
- - General troubleshooting guide
- - Find specific fix documents
Testing
- test_dynamic_parsing.py - Tag parsing validation
- test_dynamic_mongodb.py - MongoDB field insertion tests
- test_mcp.py - MCP server tests
💡 Tip: See for the complete organized documentation with 26 files categorized by topic.
🎨 Application Pages
🤖 AI Assistant
- Natural Language Queries: Ask questions about Azure cost and Infrastructure related data
- Multiple LLM Support: OpenRouter (20+ models), Claude, or custom LLMs
- Automatic Tool Use: AI intelligently uses MongoDB MCP tools
- Interactive Chat: ChatGPT-style interface with message history
- Transparent Operations: See which tools the AI uses
- Real-time Analysis: Get insights, aggregations, and cost breakdowns
- Persistent Settings: LLM configuration saved to MongoDB, survives page refreshes
Example Questions:
- "What's the total cost by department?"
- "Show me all IT resources in production"
- "Which cost center has the highest spend?"
- "Find resources without proper tags"
See and for details.
📤 Excel Upload Page
- File upload with validation
- Progress tracking during processing
- Extract ALL tag fields dynamically
- Download processed Excel file
- Push data to MongoDB with progress bar
🔎 Query Builder Page
- Build custom queries with any field combination
- Add multiple filters dynamically
- Dynamic field explorer in sidebar
- Performance optimization tools (indexing)
- Database statistics on demand
- Export results to CSV
- Cache management
💰 Cost Analysis Page
- Analyze costs by Application, Environment, Owner
- Single Month or Month Range selection
- Multi-select filters with $in operator support
- Total, average, min, max cost breakdown
- Monthly cost trend visualization
- Bar and pie chart visualizations
- Execution time tracking
- MongoDB query details display
📊 Drill Down Analysis Page
- Hierarchical Navigation: Application → Environment → Owner
- Interactive Charts: Click-based drill-down with Plotly
- Time Period Filter: Last 3/6/9/12 months
- Top N Filter: View All or Top 5/10 applications
- Lazy Loading: On-demand data loading (20-30x faster)
- Caching: 5-minute intelligent caching
- Download: Export owner cost data to CSV
📈 Monthly Comparison Page
- Multi-Application Analysis: Compare 1-5 applications
- Custom Date Range: Select any month range
- Form-Based Input: Optimized for no-lag configuration
- Line Chart: Monthly cost trends visualization
- Pivot Table: Monthly breakdown by application
- Summary Metrics: Total, average, and month count
- Download: Export comparison data to CSV
- Ultra-Fast: 10-100x faster with distinct() query optimization
❓ Help & Documentation Page
- Interactive Viewer: Read all 28 documentation files in-app
- Categorized Sidebar: Quick access to docs by category
- Search Functionality: Find specific topics and keywords
- Navigation: Back/Home buttons with history tracking
- Download: Export any document as .md file
- No External Links: Everything accessible within the app
See for details.
🤖 MCP Server Tools
The MCP server provides 10+ tools for advanced data analysis:
| Tool | Description |
|---|---|
get_available_fields | List all queryable fields |
advanced_query | Query by any field combination |
aggregate_by_any_field | Group and aggregate by any field |
cost_analysis_by_field | Cost breakdown by dimension |
multi_dimensional_analysis | Cross-tabulate two fields |
query_resources | Basic resource queries |
get_statistics | Database overview stats |
get_total_cost | Total cost with filters |
create_bar_chart | Generate bar charts |
create_pie_chart | Generate pie charts |
Start MCP Server:
cd mcp_server
python3 mongodb_mcp_server.py
See for usage examples.
💡 Usage Examples
Upload and Process Excel File
- Navigate to 📤 Excel Upload page
- Upload your Excel file
- The parser extracts ALL fields from tags automatically
- Download the processed file or push to MongoDB
Query Data
- Navigate to 🔎 Query Builder page
- Add filters (e.g.,
department = "IT",environment = "production") - Run query and export results
Cost Analysis
- Navigate to 💰 Cost Analysis page
- Select Application, Environment, Owner filters (multi-select supported)
- Choose Single Month or Month Range
- Click "Calculate Total Cost"
- View detailed breakdown with charts and metrics
Drill Down Analysis
- Navigate to 📊 Drill Down Analysis page
- Select time period (Last 3/6/9/12 months or All)
- Choose Top N applications or view all
- Click on application to drill into environments
- Click on environment to see owner breakdown
🔧 Configuration
Edit config.py to customize:
# MongoDB Settings
MONGODB_URI = "mongodb://localhost:27017/"
MONGODB_DATABASE = "azure"
MONGODB_COLLECTION = "resources"
# File Processing
CHUNK_SIZE = 10000
MAX_FILE_SIZE_MB = 100
ALLOWED_EXTENSIONS = ['.xlsx', '.xls']
# Tag Column
TAG_COLUMN = "Tags"
📊 Supported Tag Formats
1. Escaped JSON (Recommended)
"primarycontact":"john doe","usage":"databricks prod","department":"IT"
2. Key-Value Pairs
applicationname:myapp,environment:prod,owner:john,usage:databricks
3. JSON Format
{"owner": "john", "environment": "production", "department": "IT"}
4. Pipe-Separated (Limited)
myapp|production|john|1234.56
🗃️ MongoDB Document Structure
{
// Standard fields
"applicationName": "myapp",
"environment": "production",
"owner": "john",
"cost": 1234.56,
"date": "2025-11",
// ALL dynamic fields extracted from tags
"primaryContact": "jane doe",
"usage": "databricks prod",
"department": "IT",
"costCenter": "CC123",
"team": "analytics",
// ... unlimited custom fields
// Tags metadata
"tags": {
"raw": "original tag string",
"parsed": { /* all extracted fields */ }
},
// Original Excel data
"originalData": { /* complete row data */ },
// Import metadata
"metadata": {
"importDate": "2025-11-15T...",
"sourceFile": "filename.xlsx",
"dataDate": "2025-11"
}
}
🎯 Use Cases
IT Asset Management
- Track all infrastructure resources
- Analyze costs by department, team, or owner
- Identify unused resources
Cloud Cost Optimization
- Analyze cloud spending by dimension
- Identify cost drivers
- Track usage patterns
Resource Governance
- Ensure proper tagging compliance
- Identify untagged or mis-tagged resources
- Generate compliance reports
Data Analysis
- Slice and dice by any dimension
- Create custom reports
- Export data for further analysis
🐛 Troubleshooting
MongoDB Connection Error
# Make sure MongoDB is running
brew services start mongodb-community
# Or
mongod
Import Errors
# Reinstall dependencies
pip3 install -r requirements.txt
Large File Processing
- Increase
CHUNK_SIZEin config.py - Ensure sufficient RAM
- Process files in batches
Tag Parsing Issues
- Check tag format matches supported formats
- Enable debug logging in config.py
- Run test_dynamic_parsing.py to validate
🚀 Performance
- Large File Support: Handles 100K+ rows efficiently
- Chunked Processing: Memory-efficient streaming
- MongoDB Indexing: Optimized query performance
- Batch Insertion: Fast data loading
- Progress Tracking: Real-time updates
🤝 Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Submit a pull request
📜 License
This project is open source and available under the MIT License.
🔗 Links
- Repository: https://github.com/mudakara/excel-tags-parser-mongodb
- Issues: https://github.com/mudakara/excel-tags-parser-mongodb/issues
- Streamlit Docs: https://docs.streamlit.io
- MongoDB Docs: https://docs.mongodb.com
🎉 Success Stories
- ✅ Processed 200K+ rows in under 2 minutes
- ✅ Extracted 50+ unique dynamic fields automatically
- ✅ Reduced manual tagging analysis from hours to seconds
- ✅ Enabled AI-powered querying via MCP tools
📞 Support
For help:
- Check documentation in this README
- Review troubleshooting section
- Open an issue on GitHub
Built with ❤️ using Streamlit, MongoDB, and Python