Sourcebot

sourcebot-dev/Sourcebot

3.7

If you are the rightful owner of Sourcebot and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.

The Sourcebot MCP server allows LLM agents to fetch code context from various repositories hosted on platforms like GitHub, GitLab, and Bitbucket, enhancing the capabilities of LLMs in code-related tasks.

Sourcebot MCP - Fetch code context from GitHub, GitLab, Bitbucket, and more

Sourcebot GitHub Docs npm

The Sourcebot MCP server gives your LLM agents the ability to fetch code context across thousands of repos hosted on GitHub, GitLab, BitBucket and more. Ask your LLM a question, and the Sourcebot MCP server will fetch relevant context from its index and inject it into your chat session. Some use cases this unlocks include:

  • Enriching responses to user requests:

    • "What repositories are using internal library X?"
    • "Provide usage examples of the CodeMirror component"
    • "Where is the useCodeMirrorTheme hook defined?"
    • "Find all usages of deprecatedApi across all repos"
  • Improving reasoning ability for existing horizontal agents like AI code review, docs generation, etc.

    • "Find the definitions for all functions in this diff"
    • "Document what systems depend on this class"
  • Building custom LLM horizontal agents like like compliance auditing agents, migration agents, etc.

    • "Find all instances of hardcoded credentials"
    • "Identify repositories that depend on this deprecated api"

Getting Started

  1. Install Node.JS >= v18.0.0.

  2. (optional) Spin up a Sourcebot instance by following this guide. The host url of your instance (e.g., http://localhost:3000) is passed to the MCP server via the SOURCEBOT_HOST url. This allows you to control which repos Sourcebot MCP fetches context from (including private repos).

    If a host is not provided, then the server will fallback to using the demo instance hosted at https://demo.sourcebot.dev. You can see the list of repositories indexed here. Add additional repositories by opening a PR.

  3. Install @sourcebot/mcp into your MCP client:

    Cursor

    Cursor MCP docs

    Go to: Settings -> Cursor Settings -> MCP -> Add new global MCP server

    Paste the following into your ~/.cursor/mcp.json file. This will install Sourcebot globally within Cursor:

    {
        "mcpServers": {
            "sourcebot": {
                "command": "npx",
                "args": ["-y", "@sourcebot/mcp@latest" ],
                // Optional - if not specified, https://demo.sourcebot.dev is used
                "env": {
                    "SOURCEBOT_HOST": "http://localhost:3000"
                }
            }
        }
    }
    
    Windsurf

    Windsurf MCP docs

    Go to: Windsurf Settings -> Cascade -> Add Server -> Add Custom Server

    Paste the following into your mcp_config.json file:

    {
        "mcpServers": {
            "sourcebot": {
                "command": "npx",
                "args": ["-y", "@sourcebot/mcp@latest" ],
                // Optional - if not specified, https://demo.sourcebot.dev is used
                "env": {
                    "SOURCEBOT_HOST": "http://localhost:3000"
                }
            }
        }
    }
    
    VS Code

    VS Code MCP docs

    Add the following to your .vscode/mcp.json file:

    {
        "servers": {
            "sourcebot": {
                "type": "stdio",
                "command": "npx",
                "args": ["-y", "@sourcebot/mcp@latest"],
                // Optional - if not specified, https://demo.sourcebot.dev is used
                "env": {
                    "SOURCEBOT_HOST": "http://localhost:3000"
                }
            }
        }
    }
    
    Claude Code

    Claude Code MCP docs

    Run the following command:

    # SOURCEBOT_HOST env var is optional - if not specified,
    # https://demo.sourcebot.dev is used.
    claude mcp add sourcebot -e SOURCEBOT_HOST=http://localhost:3000 -- npx -y @sourcebot/mcp@latest
    
    Claude Desktop

    Claude Desktop MCP docs

    Add the following to your claude_desktop_config.json:

    {
        "mcpServers": {
            "sourcebot": {
                "command": "npx",
                "args": ["-y", "@sourcebot/mcp@latest"],
                // Optional - if not specified, https://demo.sourcebot.dev is used
                "env": {
                    "SOURCEBOT_HOST": "http://localhost:3000"
                }
            }
        }
    }
    

    Alternatively, you can install using via Smithery. For example:

    npx -y @smithery/cli install @sourcebot-dev/sourcebot --client claude
    

  1. Tell your LLM to use sourcebot when prompting.

For a more detailed guide, checkout the docs.

Available Tools

search_code

Searches for code that matches the provided search query as a substring by default, or as a regular expression if useRegex is true.

Parameters
NameRequiredDescription
queryyesThe search pattern to match against code contents. Do not escape quotes in your query.
useRegexnoWhether to use regular expression matching. When false, substring matching is used (default: false).
filterByReposnoScope the search to specific repositories.
filterByLanguagesnoScope the search to specific languages.
filterByFilepathsnoScope the search to specific filepaths.
caseSensitivenoWhether the search should be case sensitive (default: false).
includeCodeSnippetsnoWhether to include code snippets in the response (default: false).
refnoCommit SHA, branch or tag name to search on. If not provided, defaults to the default branch.
maxTokensnoThe maximum number of tokens to return (default: 10000). Higher values provide more context but consume more tokens.

list_repos

Lists repositories indexed by Sourcebot with optional filtering and pagination.

Parameters
NameRequiredDescription
querynoFilter repositories by name (case-insensitive).
pagenoPage number for pagination (min 1, default: 1).
perPagenoResults per page for pagination (min 1, max 100, default: 30).
sortnoSort repositories by 'name' or 'pushed' (most recent commit). Default: 'name'.
directionnoSort direction: 'asc' or 'desc' (default: 'asc').

read_file

Reads the source code for a given file.

Parameters
NameRequiredDescription
repoyesThe repository name.
pathyesThe path to the file.
refnoCommit SHA, branch or tag name to fetch the source code for. If not provided, uses the default branch.

list_tree

Lists files and directories from a repository path. Can be used as a directory listing tool (depth: 1) or a repo-tree tool (depth > 1).

Parameters
NameRequiredDescription
repoyesThe name of the repository to list files from.
pathnoDirectory path (relative to repo root). If omitted, the repo root is used.
refnoCommit SHA, branch or tag name to list files from. If not provided, uses the default branch.
depthnoNumber of directory levels to traverse below path (min 1, max 10, default: 1).
includeFilesnoWhether to include file entries in the output (default: true).
includeDirectoriesnoWhether to include directory entries in the output (default: true).
maxEntriesnoMaximum number of entries to return before truncating (min 1, max 10000, default: 1000).

list_commits

Get a list of commits for a given repository.

Parameters
NameRequiredDescription
repoyesThe name of the repository to list commits for.
querynoSearch query to filter commits by message content (case-insensitive).
sincenoShow commits more recent than this date. Supports ISO 8601 (e.g., '2024-01-01') or relative formats (e.g., '30 days ago').
untilnoShow commits older than this date. Supports ISO 8601 (e.g., '2024-12-31') or relative formats (e.g., 'yesterday').
authornoFilter commits by author name or email (case-insensitive).
refnoCommit SHA, branch or tag name to list commits of. If not provided, uses the default branch.
pagenoPage number for pagination (min 1, default: 1).
perPagenoResults per page for pagination (min 1, max 100, default: 50).

list_language_models

Lists the available language models configured on the Sourcebot instance. Use this to discover which models can be specified when calling ask_codebase.

Parameters

This tool takes no parameters.

ask_codebase

Ask a natural language question about the codebase. This tool uses an AI agent to autonomously search code, read files, and find symbol references/definitions to answer your question. Returns a detailed answer in markdown format with code references, plus a link to view the full research session in the Sourcebot web UI.

Parameters
NameRequiredDescription
queryyesThe query to ask about the codebase.
reposnoThe repositories that are accessible to the agent during the chat. If not provided, all repositories are accessible.
languageModelnoThe language model to use for answering the question. Object with provider and model. If not provided, defaults to the first model in the config. Use list_language_models to see available options.
visibilitynoThe visibility of the chat session ('PRIVATE' or 'PUBLIC'). Defaults to PRIVATE for authenticated users and PUBLIC for anonymous users. Set to PUBLIC to make the chat viewable by anyone with the link (useful in shared environments like Slack).

Supported Code Hosts

Sourcebot supports the following code hosts:

| Don't see your code host? Open a feature request.

Future Work

Semantic Search

Currently, Sourcebot only supports regex-based code search (powered by zoekt under the hood). It is great for scenarios when the agent is searching for is something that is super precise and well-represented in the source code (e.g., a specific function name, a error string, etc.). It is not-so-great for fuzzy searches where the objective is to find some loosely defined category or concept in the code (e.g., find code that verifies JWT tokens). The LLM can approximate this by crafting regex searches that attempt to capture a concept (e.g., it might try a query like "jwt|token|(verify|validate).*(jwt|token)"), but often yields sub-optimal search results that aren't related. Tools like Cursor solve this with embedding models to capture the semantic meaning of code, allowing for LLMs to search using natural language. We would like to extend Sourcebot to support semantic search and expose this capability over MCP as a tool (e.g., semantic_search_code tool). GitHub Discussion