DeepKariaX/Analysis-Alpaca-Researcher
If you are the rightful owner of Analysis-Alpaca-Researcher and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.
AnalysisAlpaca is a production-ready Model Context Protocol (MCP) server designed to facilitate comprehensive research and analysis for AI assistants like Claude. It integrates web and academic search functionalities with an optional web interface for interactive research and AI-powered report generation.
AnalysisAlpaca ๐ฆ
A production-ready MCP (Model Context Protocol) server that enables comprehensive research and analysis capabilities for Claude and other MCP-compatible AI assistants. This server integrates web and academic search functionality with an optional web interface for interactive research and AI-powered report generation.
๐ Quick Start
# 1. Clone and navigate to the project
git clone https://github.com/DeepKariaX/Analysis-Alpaca-Researcher.git
cd Analysis-Alpaca-Researcher
# 2. Install dependencies (use virtual environment recommended)
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -e .
# 3. Start the MCP server
python http_server.py
# Server runs on http://localhost:8001
# API documentation: http://localhost:8001/docs
๐ Table of Contents
- Features
- Architecture
- Installation
- Configuration
- Usage
- Web Interface
- API Reference
- Development
- Testing
- Deployment
- Troubleshooting
- Contributing
โจ Features
Core Research Capabilities
- Multi-Source Search: Combines DuckDuckGo web search and Semantic Scholar academic research
- Content Extraction: Intelligent extraction of relevant information from web pages
- Academic Integration: Direct access to scholarly articles and research papers
- Smart Formatting: Properly formatted research with citations and structured output
- Rate Limiting: Built-in retry logic and graceful handling of API limits
Web Interface Features
- Interactive Research: User-friendly web interface for conducting research
- Job Management: Track multiple research jobs with progress monitoring
- AI-Powered Reports: Generate comprehensive PDF reports using OpenAI, Anthropic, or Groq
- PDF Export: Download research results as properly named PDF files
- Real-time Updates: Live progress tracking with WebSocket-like polling
Production Features
- Comprehensive Error Handling: Graceful degradation when services are unavailable
- Extensive Logging: Detailed logging for debugging and monitoring
- Configurable Settings: Environment-based configuration management
- Auto-Dependency Installation: Automatic installation of missing dependencies
- Modular Architecture: Easy to extend and customize
๐ Architecture
Components Overview
analysis_alpaca/
โโโ src/analysis_alpaca/ # Core MCP server implementation
โ โโโ core/ # Server and research orchestration
โ โโโ search/ # Search engine implementations
โ โโโ models/ # Data models and schemas
โ โโโ utils/ # Utility functions and helpers
โ โโโ exceptions/ # Custom exception handling
โโโ web_ui/ # Optional web interface
โ โโโ frontend/ # React.js frontend application
โ โโโ backend/ # FastAPI backend for web UI
โโโ tests/ # Test suite
โโโ http_server.py # HTTP API wrapper for MCP server
โโโ requirements.txt # Unified dependencies
Core Components
-
MCP Server (
src/analysis_alpaca/core/server.py
)- FastMCP-based server exposing research tools to Claude
- Main tool:
deep_research()
for comprehensive research - Built-in prompt templates for structured research methodology
-
Research Service (
src/analysis_alpaca/core/research_service.py
)- Orchestrates the entire research workflow
- Coordinates web and academic searches
- Manages content extraction and result formatting
- Handles parallel execution and error recovery
-
Search Implementations
- WebSearcher: DuckDuckGo web search with result parsing
- AcademicSearcher: Semantic Scholar API integration with retry logic
- ContentExtractor: Web page content extraction and processing
-
HTTP Server (
http_server.py
)- REST API wrapper for MCP functionality
- Enables direct HTTP access to research capabilities
- CORS-enabled for web interface integration
-
Web Interface
- Frontend: React.js application with PDF generation
- Backend: FastAPI server for job management and AI report generation
๐ง Installation
Prerequisites
- Python 3.8+ (recommended: Python 3.11+)
- Node.js 16+ (only if using web interface)
- npm or yarn (only if using web interface)
Basic Installation
# Clone the repository
git clone https://github.com/DeepKariaX/Analysis-Alpaca-Researcher.git
cd Analysis-Alpaca-Researcher
# Create virtual environment (recommended)
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install the package
pip install -e .
# Or install with all optional dependencies
pip install -e ".[dev,ai]"
Web Interface Setup
# Install frontend dependencies
cd web_ui/frontend
npm install
# Return to project root
cd ../..
Dependencies Overview
Core Dependencies:
httpx>=0.25.0
- HTTP client for API requestsbeautifulsoup4>=4.12.0
- HTML parsing for content extractionmcp>=0.1.0
- Model Context Protocol server frameworkfastapi>=0.104.0
- Web framework for HTTP APIuvicorn>=0.24.0
- ASGI server for FastAPI
Optional AI Dependencies:
pip install -e ".[ai]" # Installs OpenAI, Anthropic, and Groq clients
Development Dependencies:
pip install -e ".[dev]" # Installs testing and linting tools
โ๏ธ Configuration
Environment Variables
Create a .env
file in the project root:
# Search Configuration
AA_MAX_RESULTS=5 # Maximum results per search
AA_DEFAULT_NUM_RESULTS=3 # Default number of results
AA_WEB_TIMEOUT=15.0 # Web search timeout (seconds)
AA_USER_AGENT="AnalysisAlpaca 1.0"
# Content Configuration
AA_MAX_CONTENT_SIZE=10000 # Maximum response size
AA_MAX_EXTRACTION_SIZE=150000 # Maximum content to extract
# Server Configuration
AA_LOG_LEVEL=INFO # Logging level (DEBUG, INFO, WARNING, ERROR)
AA_LOG_FILE="logs/research.log" # Optional log file path
AA_AUTO_INSTALL_DEPS=true # Auto-install missing dependencies
# AI Provider API Keys (Optional - for web interface)
OPENAI_API_KEY=your_openai_key_here
ANTHROPIC_API_KEY=your_anthropic_key_here
GROQ_API_KEY=your_groq_key_here
# Web UI Configuration
MCP_SERVER_URL=http://localhost:8001 # URL of the MCP HTTP server
Configuration Files
The system uses a hierarchical configuration approach:
- Default values in
config.py
- Environment variables (override defaults)
- Optional
.env
file (override environment)
๐ Usage
MCP Server (for Claude Desktop)
Add to your Claude Desktop configuration:
macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Windows: %APPDATA%\Claude\claude_desktop_config.json
Linux: ~/.config/Claude/claude_desktop_config.json
{
"mcpServers": {
"analysis-alpaca": {
"command": "/path/to/python",
"args": ["/path/to/analysis_alpaca/http_server.py"],
"env": {
"AA_MAX_RESULTS": "5",
"AA_LOG_LEVEL": "INFO"
}
}
}
}
Standalone HTTP Server
# Start the HTTP API server
python http_server.py
# Server runs on http://localhost:8001
# API documentation available at http://localhost:8001/docs
Web Interface (Optional)
The web interface provides a user-friendly way to interact with AnalysisAlpaca through a browser.
Requirements:
- Node.js 16+ and npm for the frontend
- The MCP HTTP server must be running (see above)
Setup:
# Install frontend dependencies
cd web_ui/frontend
npm install
cd ../..
Manual Startup (2 terminals required):
Terminal 1 - Backend API Server:
cd web_ui/backend
python main.py
# Backend runs on http://localhost:8000
# API documentation: http://localhost:8000/docs
Terminal 2 - Frontend Development Server:
cd web_ui/frontend
npm start
# Frontend runs on http://localhost:3000
# Access the web interface at http://localhost:3000
Complete Setup (3 servers total):
- MCP Server (Terminal 1):
python http_server.py
โ http://localhost:8001 - Backend API (Terminal 2):
cd web_ui/backend && python main.py
โ http://localhost:8000 - Frontend UI (Terminal 3):
cd web_ui/frontend && npm start
โ http://localhost:3000
Research Tool Usage
The main deep_research
tool accepts these parameters:
- query (required): The research question or topic
- sources (optional): "web", "academic", or "both" (default: "both")
- num_results (optional): Number of sources to examine (default: 2)
Example Prompts for Claude
Research the latest developments in quantum computing using both web and academic sources.
Can you do comprehensive research on climate change mitigation strategies? Focus on academic sources and examine 3 results.
I need detailed information about the impact of artificial intelligence on healthcare. Use the deep_research tool with web sources only.
Direct API Usage
# Research via HTTP API
curl -X POST "http://localhost:8001/deep_research" \
-H "Content-Type: application/json" \
-d '{
"query": "artificial intelligence in healthcare",
"sources": "both",
"num_results": 3
}'
๐ Web Interface
Features
- Research Form: Interactive form to submit research queries
- Progress Tracking: Real-time progress updates with detailed logs
- Job Management: View and manage multiple research jobs
- AI Report Generation: Generate comprehensive reports using various LLM providers
- PDF Export: Download reports as properly named PDF files
- History: Browse previous research jobs and results
Supported LLM Providers
- OpenAI: GPT-4, GPT-3.5-turbo, and other models
- Anthropic: Claude 3 (Sonnet, Opus, Haiku)
- Groq: Fast inference with various open-source models
File Naming Convention
Downloaded reports use the format: {sanitized_title}_{source_type}.pdf
Example: artificial_intelligence_healthcare_web_academic.pdf
๐ API Reference
MCP Tools
deep_research
Perform comprehensive research on a topic.
Parameters:
query
(string, required): Research question or topicsources
(string, optional): Source type ("web", "academic", "both")num_results
(integer, optional): Number of sources to examine
Returns: Formatted research results with sources and content
research_prompt
Generate a structured research prompt for multi-stage research.
Parameters:
topic
(string, required): Topic to research
Returns: Comprehensive research prompt with methodology
HTTP API Endpoints
POST /deep_research
Execute research query via HTTP.
{
"query": "string",
"sources": "both",
"num_results": 2
}
GET /health
Health check endpoint.
GET /docs
Interactive API documentation (Swagger UI).
Web UI API Endpoints
POST /research
Start a new research job.
GET /research/{job_id}
Get research job status and results.
GET /research/{job_id}/progress
Get detailed progress for a research job.
๐ Development
Project Structure
analysis_alpaca/
โโโ src/analysis_alpaca/
โ โโโ __init__.py
โ โโโ config.py # Configuration management
โ โโโ core/
โ โ โโโ __init__.py
โ โ โโโ server.py # MCP server implementation
โ โ โโโ research_service.py # Research orchestration
โ โโโ search/
โ โ โโโ __init__.py
โ โ โโโ base.py # Base searcher class
โ โ โโโ web_search.py # DuckDuckGo implementation
โ โ โโโ academic_search.py # Semantic Scholar implementation
โ โ โโโ content_extractor.py # Content extraction
โ โโโ models/
โ โ โโโ __init__.py
โ โ โโโ research.py # Data models
โ โโโ utils/
โ โ โโโ __init__.py
โ โ โโโ logging.py # Logging utilities
โ โ โโโ text.py # Text processing
โ โโโ exceptions/
โ โโโ __init__.py
โ โโโ base.py # Custom exceptions
โโโ web_ui/
โ โโโ frontend/ # React.js application
โ โโโ backend/ # FastAPI backend
โโโ tests/ # Test suite
โโโ http_server.py # HTTP wrapper
โโโ requirements.txt # Dependencies
โโโ pyproject.toml # Package configuration
โโโ Makefile # Development commands
Development Setup
# Install with development dependencies
pip install -e ".[dev,ai]"
# Set up pre-commit hooks (optional)
pre-commit install
# Run tests
make test
# Code formatting
make format
# Linting
make lint
# Type checking
make type-check
Adding New Search Providers
- Create a new searcher class inheriting from
BaseSearcher
- Implement the
search()
method - Add the searcher to
ResearchService
- Update configuration and documentation
Example:
from .base import BaseSearcher
class NewSearcher(BaseSearcher):
async def search(self, query: str, num_results: int) -> List[SearchResult]:
# Implement search logic
pass
๐งช Testing
Running Tests
# Run all tests
make test
# Run with coverage
make test-cov
# Run specific test file
pytest tests/test_models.py
# Run with verbose output
pytest -v
Test Structure
tests/test_models.py
- Data model teststests/test_utils.py
- Utility function teststests/conftest.py
- Test configuration and fixtures
Writing Tests
Tests use pytest and pytest-asyncio for async testing:
import pytest
from analysis_alpaca.models.research import ResearchQuery
@pytest.mark.asyncio
async def test_research_query():
query = ResearchQuery(query="test", sources="web", num_results=2)
assert query.query == "test"
๐ Deployment
Production Deployment
Docker Deployment
FROM python:3.11-slim
WORKDIR /app
COPY . .
RUN pip install -e .
EXPOSE 8001
CMD ["python", "http_server.py"]
Environment Configuration
For production, set these environment variables:
AA_LOG_LEVEL=WARNING
AA_LOG_FILE=/var/log/analysis-alpaca.log
AA_AUTO_INSTALL_DEPS=false
AA_MAX_RESULTS=3
AA_WEB_TIMEOUT=20.0
Reverse Proxy Setup (Nginx)
server {
listen 80;
server_name your-domain.com;
location / {
proxy_pass http://127.0.0.1:8001;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
}
}
Monitoring
The application provides comprehensive logging. Monitor these key metrics:
- Research request rates
- Search success/failure rates
- Content extraction success rates
- Response times
- Error patterns
Scaling Considerations
- The application is stateless and can be horizontally scaled
- Consider implementing Redis for caching search results
- Use a proper message queue for background processing in high-traffic scenarios
๐ Troubleshooting
Common Issues
Import Errors
# Ensure proper installation
pip install -e .
# Check Python path
python -c "import analysis_alpaca; print('OK')"
Search Timeouts
# Increase timeout values
export AA_WEB_TIMEOUT=30.0
export AA_ACADEMIC_TIMEOUT=30.0
Academic Search Rate Limiting
The system automatically handles Semantic Scholar rate limits with:
- Exponential backoff retry logic
- Graceful degradation (returns web results only)
- Request spacing
Content Extraction Failures
- Check network connectivity
- Verify target site availability
- Some sites may block automated requests
Large Response Truncation
# Increase content size limits
export AA_MAX_CONTENT_SIZE=15000
export AA_MAX_EXTRACTION_SIZE=200000
Debug Mode
Enable detailed logging:
export AA_LOG_LEVEL=DEBUG
export AA_LOG_FILE="debug.log"
python http_server.py
View logs:
tail -f debug.log
Getting Help
- Check the logs for detailed error messages
- Verify your configuration against the examples
- Test with simple queries first
- Ensure all dependencies are properly installed
๐ค Contributing
Development Workflow
- Fork the repository
- Create a feature branch:
git checkout -b feature-name
- Make your changes with tests
- Run quality checks:
make check-all
- Submit a pull request
Code Style
The project uses:
- Black for code formatting
- isort for import sorting
- flake8 for linting
- mypy for type checking
Run all checks:
make check-all
Commit Guidelines
Use conventional commits:
feat:
for new featuresfix:
for bug fixesdocs:
for documentationtest:
for testsrefactor:
for refactoring
๐ License
MIT License - see LICENSE file for details.
๐ Acknowledgments
- Semantic Scholar for academic search API
- DuckDuckGo for web search functionality
- Model Context Protocol for the integration framework
- FastMCP for the server implementation
- React.js and FastAPI for the web interface
๐ Roadmap
Planned Features
-
Additional Search Providers
- Google Scholar integration
- Bing Academic search
- ArXiv direct integration
-
Enhanced Content Processing
- PDF content extraction
- Image and chart analysis
- Table data extraction
-
Performance Improvements
- Redis caching layer
- Async processing optimization
- Response streaming
-
Advanced Features
- Citation graph analysis
- Research trend detection
- Multi-language support
-
Enterprise Features
- User authentication
- Usage analytics
- API rate limiting
- Custom search domains
Version History
- v1.0.0 - Initial release with core research functionality
- v1.1.0 - Added web interface and PDF export
- v1.2.0 - Enhanced error handling and rate limiting
- Current - Comprehensive cleanup and documentation
For the latest updates and detailed changelog, visit the GitHub repository.