Analysis-Alpaca-Researcher

DeepKariaX/Analysis-Alpaca-Researcher

3.2

If you are the rightful owner of Analysis-Alpaca-Researcher and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.

AnalysisAlpaca is a production-ready Model Context Protocol (MCP) server designed to facilitate comprehensive research and analysis for AI assistants like Claude. It integrates web and academic search functionalities with an optional web interface for interactive research and AI-powered report generation.

Tools
2
Resources
0
Prompts
0

AnalysisAlpaca ๐Ÿฆ™

License: MIT MCP Compatible PRs Welcome GitHub Issues GitHub Stars GitHub Forks

A production-ready MCP (Model Context Protocol) server that enables comprehensive research and analysis capabilities for Claude and other MCP-compatible AI assistants. This server integrates web and academic search functionality with an optional web interface for interactive research and AI-powered report generation.

๐Ÿš€ Quick Start

# 1. Clone and navigate to the project
git clone https://github.com/DeepKariaX/Analysis-Alpaca-Researcher.git
cd Analysis-Alpaca-Researcher

# 2. Install dependencies (use virtual environment recommended)
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -e .

# 3. Start the MCP server
python http_server.py

# Server runs on http://localhost:8001
# API documentation: http://localhost:8001/docs

๐Ÿ“‹ Table of Contents

โœจ Features

Core Research Capabilities

  • Multi-Source Search: Combines DuckDuckGo web search and Semantic Scholar academic research
  • Content Extraction: Intelligent extraction of relevant information from web pages
  • Academic Integration: Direct access to scholarly articles and research papers
  • Smart Formatting: Properly formatted research with citations and structured output
  • Rate Limiting: Built-in retry logic and graceful handling of API limits

Web Interface Features

  • Interactive Research: User-friendly web interface for conducting research
  • Job Management: Track multiple research jobs with progress monitoring
  • AI-Powered Reports: Generate comprehensive PDF reports using OpenAI, Anthropic, or Groq
  • PDF Export: Download research results as properly named PDF files
  • Real-time Updates: Live progress tracking with WebSocket-like polling

Production Features

  • Comprehensive Error Handling: Graceful degradation when services are unavailable
  • Extensive Logging: Detailed logging for debugging and monitoring
  • Configurable Settings: Environment-based configuration management
  • Auto-Dependency Installation: Automatic installation of missing dependencies
  • Modular Architecture: Easy to extend and customize

๐Ÿ— Architecture

Components Overview

analysis_alpaca/
โ”œโ”€โ”€ src/analysis_alpaca/          # Core MCP server implementation
โ”‚   โ”œโ”€โ”€ core/                     # Server and research orchestration
โ”‚   โ”œโ”€โ”€ search/                   # Search engine implementations
โ”‚   โ”œโ”€โ”€ models/                   # Data models and schemas
โ”‚   โ”œโ”€โ”€ utils/                    # Utility functions and helpers
โ”‚   โ””โ”€โ”€ exceptions/               # Custom exception handling
โ”œโ”€โ”€ web_ui/                       # Optional web interface
โ”‚   โ”œโ”€โ”€ frontend/                 # React.js frontend application
โ”‚   โ””โ”€โ”€ backend/                  # FastAPI backend for web UI
โ”œโ”€โ”€ tests/                        # Test suite
โ”œโ”€โ”€ http_server.py               # HTTP API wrapper for MCP server
โ””โ”€โ”€ requirements.txt             # Unified dependencies

Core Components

  1. MCP Server (src/analysis_alpaca/core/server.py)

    • FastMCP-based server exposing research tools to Claude
    • Main tool: deep_research() for comprehensive research
    • Built-in prompt templates for structured research methodology
  2. Research Service (src/analysis_alpaca/core/research_service.py)

    • Orchestrates the entire research workflow
    • Coordinates web and academic searches
    • Manages content extraction and result formatting
    • Handles parallel execution and error recovery
  3. Search Implementations

    • WebSearcher: DuckDuckGo web search with result parsing
    • AcademicSearcher: Semantic Scholar API integration with retry logic
    • ContentExtractor: Web page content extraction and processing
  4. HTTP Server (http_server.py)

    • REST API wrapper for MCP functionality
    • Enables direct HTTP access to research capabilities
    • CORS-enabled for web interface integration
  5. Web Interface

    • Frontend: React.js application with PDF generation
    • Backend: FastAPI server for job management and AI report generation

๐Ÿ”ง Installation

Prerequisites

  • Python 3.8+ (recommended: Python 3.11+)
  • Node.js 16+ (only if using web interface)
  • npm or yarn (only if using web interface)

Basic Installation

# Clone the repository
git clone https://github.com/DeepKariaX/Analysis-Alpaca-Researcher.git
cd Analysis-Alpaca-Researcher

# Create virtual environment (recommended)
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install the package
pip install -e .

# Or install with all optional dependencies
pip install -e ".[dev,ai]"

Web Interface Setup

# Install frontend dependencies
cd web_ui/frontend
npm install

# Return to project root
cd ../..

Dependencies Overview

Core Dependencies:

  • httpx>=0.25.0 - HTTP client for API requests
  • beautifulsoup4>=4.12.0 - HTML parsing for content extraction
  • mcp>=0.1.0 - Model Context Protocol server framework
  • fastapi>=0.104.0 - Web framework for HTTP API
  • uvicorn>=0.24.0 - ASGI server for FastAPI

Optional AI Dependencies:

pip install -e ".[ai]"  # Installs OpenAI, Anthropic, and Groq clients

Development Dependencies:

pip install -e ".[dev]"  # Installs testing and linting tools

โš™๏ธ Configuration

Environment Variables

Create a .env file in the project root:

# Search Configuration
AA_MAX_RESULTS=5              # Maximum results per search
AA_DEFAULT_NUM_RESULTS=3      # Default number of results
AA_WEB_TIMEOUT=15.0          # Web search timeout (seconds)
AA_USER_AGENT="AnalysisAlpaca 1.0"

# Content Configuration
AA_MAX_CONTENT_SIZE=10000    # Maximum response size
AA_MAX_EXTRACTION_SIZE=150000 # Maximum content to extract

# Server Configuration
AA_LOG_LEVEL=INFO            # Logging level (DEBUG, INFO, WARNING, ERROR)
AA_LOG_FILE="logs/research.log"  # Optional log file path
AA_AUTO_INSTALL_DEPS=true    # Auto-install missing dependencies

# AI Provider API Keys (Optional - for web interface)
OPENAI_API_KEY=your_openai_key_here
ANTHROPIC_API_KEY=your_anthropic_key_here
GROQ_API_KEY=your_groq_key_here

# Web UI Configuration
MCP_SERVER_URL=http://localhost:8001  # URL of the MCP HTTP server

Configuration Files

The system uses a hierarchical configuration approach:

  1. Default values in config.py
  2. Environment variables (override defaults)
  3. Optional .env file (override environment)

๐Ÿš€ Usage

MCP Server (for Claude Desktop)

Add to your Claude Desktop configuration:

macOS: ~/Library/Application Support/Claude/claude_desktop_config.json Windows: %APPDATA%\Claude\claude_desktop_config.json Linux: ~/.config/Claude/claude_desktop_config.json

{
  "mcpServers": {
    "analysis-alpaca": {
      "command": "/path/to/python",
      "args": ["/path/to/analysis_alpaca/http_server.py"],
      "env": {
        "AA_MAX_RESULTS": "5",
        "AA_LOG_LEVEL": "INFO"
      }
    }
  }
}

Standalone HTTP Server

# Start the HTTP API server
python http_server.py

# Server runs on http://localhost:8001
# API documentation available at http://localhost:8001/docs

Web Interface (Optional)

The web interface provides a user-friendly way to interact with AnalysisAlpaca through a browser.

Requirements:

  • Node.js 16+ and npm for the frontend
  • The MCP HTTP server must be running (see above)

Setup:

# Install frontend dependencies
cd web_ui/frontend
npm install
cd ../..

Manual Startup (2 terminals required):

Terminal 1 - Backend API Server:

cd web_ui/backend
python main.py

# Backend runs on http://localhost:8000
# API documentation: http://localhost:8000/docs

Terminal 2 - Frontend Development Server:

cd web_ui/frontend
npm start

# Frontend runs on http://localhost:3000
# Access the web interface at http://localhost:3000

Complete Setup (3 servers total):

  1. MCP Server (Terminal 1): python http_server.py โ†’ http://localhost:8001
  2. Backend API (Terminal 2): cd web_ui/backend && python main.py โ†’ http://localhost:8000
  3. Frontend UI (Terminal 3): cd web_ui/frontend && npm start โ†’ http://localhost:3000

Research Tool Usage

The main deep_research tool accepts these parameters:

  • query (required): The research question or topic
  • sources (optional): "web", "academic", or "both" (default: "both")
  • num_results (optional): Number of sources to examine (default: 2)
Example Prompts for Claude
Research the latest developments in quantum computing using both web and academic sources.

Can you do comprehensive research on climate change mitigation strategies? Focus on academic sources and examine 3 results.

I need detailed information about the impact of artificial intelligence on healthcare. Use the deep_research tool with web sources only.
Direct API Usage
# Research via HTTP API
curl -X POST "http://localhost:8001/deep_research" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "artificial intelligence in healthcare",
    "sources": "both",
    "num_results": 3
  }'

๐ŸŒ Web Interface

Features

  • Research Form: Interactive form to submit research queries
  • Progress Tracking: Real-time progress updates with detailed logs
  • Job Management: View and manage multiple research jobs
  • AI Report Generation: Generate comprehensive reports using various LLM providers
  • PDF Export: Download reports as properly named PDF files
  • History: Browse previous research jobs and results

Supported LLM Providers

  • OpenAI: GPT-4, GPT-3.5-turbo, and other models
  • Anthropic: Claude 3 (Sonnet, Opus, Haiku)
  • Groq: Fast inference with various open-source models

File Naming Convention

Downloaded reports use the format: {sanitized_title}_{source_type}.pdf

Example: artificial_intelligence_healthcare_web_academic.pdf

๐Ÿ“š API Reference

MCP Tools

deep_research

Perform comprehensive research on a topic.

Parameters:

  • query (string, required): Research question or topic
  • sources (string, optional): Source type ("web", "academic", "both")
  • num_results (integer, optional): Number of sources to examine

Returns: Formatted research results with sources and content

research_prompt

Generate a structured research prompt for multi-stage research.

Parameters:

  • topic (string, required): Topic to research

Returns: Comprehensive research prompt with methodology

HTTP API Endpoints

POST /deep_research

Execute research query via HTTP.

{
  "query": "string",
  "sources": "both",
  "num_results": 2
}
GET /health

Health check endpoint.

GET /docs

Interactive API documentation (Swagger UI).

Web UI API Endpoints

POST /research

Start a new research job.

GET /research/{job_id}

Get research job status and results.

GET /research/{job_id}/progress

Get detailed progress for a research job.

๐Ÿ›  Development

Project Structure

analysis_alpaca/
โ”œโ”€โ”€ src/analysis_alpaca/
โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”œโ”€โ”€ config.py              # Configuration management
โ”‚   โ”œโ”€โ”€ core/
โ”‚   โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”‚   โ”œโ”€โ”€ server.py          # MCP server implementation
โ”‚   โ”‚   โ””โ”€โ”€ research_service.py # Research orchestration
โ”‚   โ”œโ”€โ”€ search/
โ”‚   โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”‚   โ”œโ”€โ”€ base.py           # Base searcher class
โ”‚   โ”‚   โ”œโ”€โ”€ web_search.py     # DuckDuckGo implementation
โ”‚   โ”‚   โ”œโ”€โ”€ academic_search.py # Semantic Scholar implementation
โ”‚   โ”‚   โ””โ”€โ”€ content_extractor.py # Content extraction
โ”‚   โ”œโ”€โ”€ models/
โ”‚   โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”‚   โ””โ”€โ”€ research.py       # Data models
โ”‚   โ”œโ”€โ”€ utils/
โ”‚   โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”‚   โ”œโ”€โ”€ logging.py        # Logging utilities
โ”‚   โ”‚   โ””โ”€โ”€ text.py          # Text processing
โ”‚   โ””โ”€โ”€ exceptions/
โ”‚       โ”œโ”€โ”€ __init__.py
โ”‚       โ””โ”€โ”€ base.py          # Custom exceptions
โ”œโ”€โ”€ web_ui/
โ”‚   โ”œโ”€โ”€ frontend/            # React.js application
โ”‚   โ””โ”€โ”€ backend/             # FastAPI backend
โ”œโ”€โ”€ tests/                   # Test suite
โ”œโ”€โ”€ http_server.py          # HTTP wrapper
โ”œโ”€โ”€ requirements.txt        # Dependencies
โ”œโ”€โ”€ pyproject.toml         # Package configuration
โ””โ”€โ”€ Makefile              # Development commands

Development Setup

# Install with development dependencies
pip install -e ".[dev,ai]"

# Set up pre-commit hooks (optional)
pre-commit install

# Run tests
make test

# Code formatting
make format

# Linting
make lint

# Type checking
make type-check

Adding New Search Providers

  1. Create a new searcher class inheriting from BaseSearcher
  2. Implement the search() method
  3. Add the searcher to ResearchService
  4. Update configuration and documentation

Example:

from .base import BaseSearcher

class NewSearcher(BaseSearcher):
    async def search(self, query: str, num_results: int) -> List[SearchResult]:
        # Implement search logic
        pass

๐Ÿงช Testing

Running Tests

# Run all tests
make test

# Run with coverage
make test-cov

# Run specific test file
pytest tests/test_models.py

# Run with verbose output
pytest -v

Test Structure

  • tests/test_models.py - Data model tests
  • tests/test_utils.py - Utility function tests
  • tests/conftest.py - Test configuration and fixtures

Writing Tests

Tests use pytest and pytest-asyncio for async testing:

import pytest
from analysis_alpaca.models.research import ResearchQuery

@pytest.mark.asyncio
async def test_research_query():
    query = ResearchQuery(query="test", sources="web", num_results=2)
    assert query.query == "test"

๐Ÿš€ Deployment

Production Deployment

Docker Deployment
FROM python:3.11-slim

WORKDIR /app
COPY . .

RUN pip install -e .

EXPOSE 8001
CMD ["python", "http_server.py"]
Environment Configuration

For production, set these environment variables:

AA_LOG_LEVEL=WARNING
AA_LOG_FILE=/var/log/analysis-alpaca.log
AA_AUTO_INSTALL_DEPS=false
AA_MAX_RESULTS=3
AA_WEB_TIMEOUT=20.0
Reverse Proxy Setup (Nginx)
server {
    listen 80;
    server_name your-domain.com;

    location / {
        proxy_pass http://127.0.0.1:8001;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
    }
}

Monitoring

The application provides comprehensive logging. Monitor these key metrics:

  • Research request rates
  • Search success/failure rates
  • Content extraction success rates
  • Response times
  • Error patterns

Scaling Considerations

  • The application is stateless and can be horizontally scaled
  • Consider implementing Redis for caching search results
  • Use a proper message queue for background processing in high-traffic scenarios

๐Ÿ” Troubleshooting

Common Issues

Import Errors
# Ensure proper installation
pip install -e .

# Check Python path
python -c "import analysis_alpaca; print('OK')"
Search Timeouts
# Increase timeout values
export AA_WEB_TIMEOUT=30.0
export AA_ACADEMIC_TIMEOUT=30.0
Academic Search Rate Limiting

The system automatically handles Semantic Scholar rate limits with:

  • Exponential backoff retry logic
  • Graceful degradation (returns web results only)
  • Request spacing
Content Extraction Failures
  • Check network connectivity
  • Verify target site availability
  • Some sites may block automated requests
Large Response Truncation
# Increase content size limits
export AA_MAX_CONTENT_SIZE=15000
export AA_MAX_EXTRACTION_SIZE=200000

Debug Mode

Enable detailed logging:

export AA_LOG_LEVEL=DEBUG
export AA_LOG_FILE="debug.log"
python http_server.py

View logs:

tail -f debug.log

Getting Help

  1. Check the logs for detailed error messages
  2. Verify your configuration against the examples
  3. Test with simple queries first
  4. Ensure all dependencies are properly installed

๐Ÿค Contributing

Development Workflow

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature-name
  3. Make your changes with tests
  4. Run quality checks: make check-all
  5. Submit a pull request

Code Style

The project uses:

  • Black for code formatting
  • isort for import sorting
  • flake8 for linting
  • mypy for type checking

Run all checks:

make check-all

Commit Guidelines

Use conventional commits:

  • feat: for new features
  • fix: for bug fixes
  • docs: for documentation
  • test: for tests
  • refactor: for refactoring

๐Ÿ“„ License

MIT License - see LICENSE file for details.

๐Ÿ™ Acknowledgments

  • Semantic Scholar for academic search API
  • DuckDuckGo for web search functionality
  • Model Context Protocol for the integration framework
  • FastMCP for the server implementation
  • React.js and FastAPI for the web interface

๐Ÿ“Š Roadmap

Planned Features

  • Additional Search Providers

    • Google Scholar integration
    • Bing Academic search
    • ArXiv direct integration
  • Enhanced Content Processing

    • PDF content extraction
    • Image and chart analysis
    • Table data extraction
  • Performance Improvements

    • Redis caching layer
    • Async processing optimization
    • Response streaming
  • Advanced Features

    • Citation graph analysis
    • Research trend detection
    • Multi-language support
  • Enterprise Features

    • User authentication
    • Usage analytics
    • API rate limiting
    • Custom search domains

Version History

  • v1.0.0 - Initial release with core research functionality
  • v1.1.0 - Added web interface and PDF export
  • v1.2.0 - Enhanced error handling and rate limiting
  • Current - Comprehensive cleanup and documentation

For the latest updates and detailed changelog, visit the GitHub repository.