PeMCP

JameZUK/PeMCP

3.3

If you are the rightful owner of PeMCP and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.

The PeMCP Toolkit is a Python-based script designed for in-depth analysis of Portable Executable (PE) files, offering both command-line and Model-Context-Protocol (MCP) server modes for comprehensive analysis.

Tools
5
Resources
0
Prompts
0

PeMCP Toolkit - Advanced PE Analysis & Decompilation Suite

The PeMCP Toolkit is a professional-grade Python suite designed for the in-depth static and dynamic analysis of Portable Executable (PE) files and raw shellcode. While it serves as a powerful CLI tool for generating comprehensive reports, its primary strength lies in its Model-Context-Protocol (MCP) Server mode.
In MCP mode, PeMCP acts as an intelligent backend for LLMs (like Claude or other AI agents), providing them with a suite of 40+ specialized tools to interactively explore, decompile, and analyze binaries. It bridges the gap between high-level AI reasoning and low-level binary instrumentation.

Key Features

1. Advanced Binary Analysis (Powered by Angr)

Beyond standard static analysis, PeMCP now integrates the Angr binary analysis framework to provide capabilities typically reserved for dedicated reverse engineering platforms:

  • Decompilation: Convert assembly into human-readable C-like pseudocode on the fly.
  • Control Flow Graph (CFG): Generate and traverse function blocks and edges.
  • Symbolic Execution: Automatically find inputs to reach specific code paths (e.g., "Find an input that reaches the 'Access Granted' block").
  • Emulation: Execute functions with concrete arguments using the Unicorn engine to observe behavior safely.
  • Slicing & Dominators: Perform forward/backward slicing to track data flow and identify critical code dependencies.

2. Comprehensive Static Analysis

  • PE Structure: Full parsing of DOS/NT Headers, Imports/Exports, Resources, TLS, Debug, and Load Config.
  • Signatures: Authenticode validation (Signify), certificate parsing (Cryptography), and Packer detection (PEiD).
  • Capabilities: Integrated Capa analysis to map binary behaviors to the MITRE ATT&CK framework.
  • Strings: FLOSS integration for extracting static, stack, tight, and decoded strings, ranked by relevance using StringSifter.

3. Robust Architecture

  • Docker-First Design: No interactive prompts. Dependencies are managed via environment or Docker, making it CI/CD and container-ready.
  • State Encapsulation: Uses a centralized AnalyzerState class to manage analysis context, ensuring thread safety and stability.
  • Background Task Management: Long-running operations (like symbolic execution) run asynchronously with a heartbeat monitor, preventing timeouts.

Prerequisites and Installation

Option A: Docker (Recommended)

The easiest way to run PeMCP is via Docker. This handles all complex dependencies (Angr, Unicorn, Vivisect) automatically.

  1. Build the Image:
    docker build -t pemcp-toolkit .

  2. Run as MCP Server:
    # Create a directory for your malware samples
    mkdir -p ./samples

    # Run the container
    docker run --rm -it \
    -p 8082:8082 \
    -v "$(pwd)/samples:/app/samples" \
    -e VT_API_KEY="your_virustotal_key" \
    pemcp-toolkit \
    --mcp-server \
    --input-file /app/samples/suspicious.exe \
    --mcp-host 0.0.0.0

Option B: Local Installation

If you prefer running locally, you must have Python 3.10+ and cmake installed (for building Unicorn/Angr bindings).

  1. Install System Dependencies (Ubuntu/Debian):
    sudo apt-get install build-essential libssl-dev cmake

  2. Install Python Packages:
    pip install -r requirements.txt

    Note: Ensure pefile, angr[unicorn], flare-floss, flare-capa, rapidfuzz, and mcp[cli] are installed.

Modes of Operation

1. CLI Mode (One-Shot Report)

Best for generating a massive, human-readable dump of all static data found in a file.
python PeMCP.py --input-file malware.exe --verbose > analysis_report.txt

Capabilities in CLI Mode:

  • Full PE header dump.
  • Hashes (MD5, SHA256, SSDeep).
  • YARA & PEiD scans.
  • Capa capability report.
  • FLOSS string extraction.

2. MCP Server Mode (Interactive Agent)

Best for use with AI coding assistants or MCP clients. The server pre-loads the binary and exposes tools to query it dynamically.
python PeMCP.py --mcp-server --input-file malware.exe

Available Tools (Highlights)

🔍 Deep Binary Analysis (Angr)

  • decompile_function_with_angr: Returns C-like pseudocode for a specific address.
  • find_path_to_address: Uses symbolic execution to solve for inputs that reach a target instruction.
  • emulate_function_execution: Runs a function with specific arguments in a sandboxed emulator.
  • get_function_cfg: Returns the nodes and edges of a function's control flow graph.
  • get_backward_slice / get_forward_slice: Traces code reachability.
  • analyze_binary_loops: Detects and characterizes loops in the binary.

🧪 Triage & Forensics

  • get_triage_report: Auto-generates a summary of high-value indicators (suspicious imports, high-score strings, severe capabilities).
  • get_virustotal_report_for_loaded_file: Queries VirusTotal for the file hash (requires VT_API_KEY).
  • reanalyze_loaded_pe_file: Triggers a re-scan (e.g., to enable Angr features if skipped initially).

📝 String & Data Analysis

  • get_top_sifted_strings: Returns strings ranked by "interestingness" (using Machine Learning).
  • fuzzy_search_strings: Finds strings similar to a query (great for finding obfuscated keys).
  • find_and_decode_encoded_strings: Detects Base64/Hex/XOR patterns and attempts heuristic decoding.
  • search_floss_strings: Regex search over FLOSS-extracted strings (stack, tight, decoded).

🧬 Context & Linking

  • get_string_usage_context: Shows the assembly instructions around where a string is used.
  • get_strings_for_function: Lists all strings referenced by a specific function.

Configuration

Environment Variables

  • VT_API_KEY: (Optional) Your VirusTotal API key. Required for the get_virustotal_report_for_loaded_file tool.

Shellcode Analysis

PeMCP supports raw shellcode analysis. When using raw binaries:

  1. Use --mode shellcode.
  2. Ideally provide an architecture hint to FLOSS/Angr using --floss-format sc64 (or sc32).

python PeMCP.py --mcp-server --input-file shellcode.bin --mode shellcode --floss-format sc64

Architecture & Design

  • Single-File Analysis Context: The server holds one file in memory (AnalyzerState). All tools operate on this shared context, ensuring consistency.
  • Lazy Loading: Heavy analysis (like Angr CFG generation) can be triggered in the background or on-demand to allow for instant server startup.
  • Smart Truncation: MCP responses are automatically protected against token-limit overflows. If a tool returns 1MB of JSON, the server intelligently truncates lists or strings to fit within 64KB limits while preserving structural integrity.

Contributing

Contributions are welcome!

  1. Fork the repository.
  2. Create a feature branch (git checkout -b feature/AngrEnhancement).
  3. Commit your changes.
  4. Push to the branch.
  5. Open a Pull Request.

License

Distributed under the MIT License. See LICENSE for more information.

Disclaimer

This toolkit is provided "as-is" for educational and research purposes only. It is capable of executing parts