mcp-huggingface by NimbleBrainInc - MCP Server

Hugging Face MCP Server

MCP server for accessing the Hugging Face Inference API. Run 200,000+ machine learning models including LLMs, image generation, text classification, embeddings, and more.

Features

Text Generation: LLMs like Llama-3, Mistral, Gemma
Image Generation: FLUX, Stable Diffusion XL, SD 2.1
Text Classification: Sentiment analysis, topic classification
Token Classification: Named entity recognition, POS tagging
Question Answering: Extract answers from context
Summarization: Condense long text
Translation: 200+ language pairs
Image-to-Text: Image captioning
Image Classification: Classify images into categories
Object Detection: Detect objects with bounding boxes
Text-to-Speech: Convert text to audio
Speech Recognition: Transcribe audio (Whisper)
Embeddings: Get text/sentence embeddings
And more: Fill-mask, sentence similarity

Setup

Prerequisites

Hugging Face account
API token (free or Pro)

Environment Variables

HUGGINGFACE_API_TOKEN (required): Your Hugging Face API token

How to get an API token:

Go to huggingface.co/settings/tokens
Click "New token"
Give it a name and select permissions (read is sufficient for inference)
Copy the token (starts with hf_)
Store as HUGGINGFACE_API_TOKEN

Available Tools

Text Generation Tools

`text_generation`

Generate text using large language models.

Parameters:

prompt (string, required): Input text prompt
model_id (string, optional): Model ID (default: 'mistralai/Mistral-7B-Instruct-v0.3')
max_new_tokens (int, optional): Maximum tokens to generate
temperature (float, optional): Sampling temperature 0-2 (higher = more random)
top_p (float, optional): Nucleus sampling 0-1
top_k (int, optional): Top-k sampling
repetition_penalty (float, optional): Penalty for repetition
return_full_text (bool, optional): Return prompt + generation (default: False)

Popular models:

meta-llama/Llama-3.2-3B-Instruct - Meta's Llama 3.2
mistralai/Mistral-7B-Instruct-v0.3 - Mistral 7B
google/gemma-2-2b-it - Google Gemma 2
HuggingFaceH4/zephyr-7b-beta - Zephyr 7B
tiiuae/falcon-7b-instruct - Falcon 7B

Example:

result = await text_generation(
    prompt="Write a Python function to calculate fibonacci numbers:",
    model_id="mistralai/Mistral-7B-Instruct-v0.3",
    max_new_tokens=200,
    temperature=0.7,
    top_p=0.9
)

Classification Tools

`text_classification`

Classify text into categories (sentiment, topics, etc.).

Parameters:

text (string, required): Text to classify
model_id (string, optional): Model ID (default: 'distilbert-base-uncased-finetuned-sst-2-english')

Popular models:

distilbert-base-uncased-finetuned-sst-2-english - Sentiment (positive/negative)
facebook/bart-large-mnli - Zero-shot classification
cardiffnlp/twitter-roberta-base-sentiment-latest - Twitter sentiment
finiteautomata/bertweet-base-sentiment-analysis - Tweet sentiment

Example:

result = await text_classification(
    text="I love this product! It exceeded my expectations.",
    model_id="distilbert-base-uncased-finetuned-sst-2-english"
)
# Returns: [{'label': 'POSITIVE', 'score': 0.9998}]

`token_classification`

Token-level classification for NER, POS tagging, etc.

Parameters:

text (string, required): Input text
model_id (string, optional): Model ID (default: 'dslim/bert-base-NER')

Popular models:

dslim/bert-base-NER - Named Entity Recognition
Jean-Baptiste/roberta-large-ner-english - Large NER model
dbmdz/bert-large-cased-finetuned-conll03-english - CoNLL-2003 NER

Example:

result = await token_classification(
    text="Apple Inc. is located in Cupertino, California.",
    model_id="dslim/bert-base-NER"
)
# Returns entities: ORG (Apple Inc.), LOC (Cupertino), LOC (California)

Question Answering & Text Processing

`question_answering`

Answer questions based on provided context.

Parameters:

question (string, required): Question to answer
context (string, required): Context containing the answer
model_id (string, optional): Model ID (default: 'deepset/roberta-base-squad2')

Popular models:

deepset/roberta-base-squad2 - RoBERTa on SQuAD 2.0
distilbert-base-cased-distilled-squad - DistilBERT on SQuAD

Example:

result = await question_answering(
    question="Where is the Eiffel Tower located?",
    context="The Eiffel Tower is a landmark in Paris, France. It was built in 1889.",
    model_id="deepset/roberta-base-squad2"
)
# Returns: {'answer': 'Paris, France', 'score': 0.98, 'start': 35, 'end': 48}

`summarization`

Summarize long text into shorter version.

Parameters:

text (string, required): Text to summarize
model_id (string, optional): Model ID (default: 'facebook/bart-large-cnn')
max_length (int, optional): Maximum summary length
min_length (int, optional): Minimum summary length

Popular models:

facebook/bart-large-cnn - BART CNN summarization
google/pegasus-xsum - PEGASUS XSum
sshleifer/distilbart-cnn-12-6 - Distilled BART

Example:

result = await summarization(
    text="Long article text here...",
    model_id="facebook/bart-large-cnn",
    max_length=130,
    min_length=30
)

`translation`

Translate text between languages.

Parameters:

text (string, required): Text to translate
model_id (string, required): Model ID for language pair

Popular models:

Helsinki-NLP/opus-mt-en-es - English to Spanish
Helsinki-NLP/opus-mt-es-en - Spanish to English
Helsinki-NLP/opus-mt-en-fr - English to French
Helsinki-NLP/opus-mt-en-de - English to German
facebook/mbart-large-50-many-to-many-mmt - Multilingual (50 languages)

Example:

result = await translation(
    text="Hello, how are you?",
    model_id="Helsinki-NLP/opus-mt-en-es"
)
# Returns: "Hola, ¿cómo estás?"

Image Generation Tools

`text_to_image`

Generate images from text prompts.

Parameters:

prompt (string, required): Text description of desired image
model_id (string, optional): Model ID (default: 'black-forest-labs/FLUX.1-dev')
negative_prompt (string, optional): What to avoid in image
num_inference_steps (int, optional): Number of denoising steps
guidance_scale (float, optional): How closely to follow prompt

Popular models:

black-forest-labs/FLUX.1-dev - FLUX.1 (high quality)
stabilityai/stable-diffusion-xl-base-1.0 - SDXL
stabilityai/stable-diffusion-2-1 - SD 2.1
runwayml/stable-diffusion-v1-5 - SD 1.5

Example:

result = await text_to_image(
    prompt="A serene mountain landscape at sunset, photorealistic, 8k",
    model_id="black-forest-labs/FLUX.1-dev",
    negative_prompt="blurry, low quality, distorted",
    guidance_scale=7.5
)
# Returns: {'image': 'base64_encoded_image', 'format': 'base64'}

Computer Vision Tools

`image_to_text`

Generate text descriptions from images (captioning).

Parameters:

image_base64 (string, required): Base64 encoded image
model_id (string, optional): Model ID (default: 'Salesforce/blip-image-captioning-large')

Popular models:

Salesforce/blip-image-captioning-large - BLIP large
nlpconnect/vit-gpt2-image-captioning - ViT-GPT2

Example:

result = await image_to_text(
    image_base64="base64_encoded_image_data",
    model_id="Salesforce/blip-image-captioning-large"
)
# Returns: [{'generated_text': 'a dog playing in the park'}]

`image_classification`

Classify images into categories.

Parameters:

image_base64 (string, required): Base64 encoded image
model_id (string, optional): Model ID (default: 'google/vit-base-patch16-224')

Popular models:

google/vit-base-patch16-224 - Vision Transformer
microsoft/resnet-50 - ResNet-50

Example:

result = await image_classification(
    image_base64="base64_encoded_image_data",
    model_id="google/vit-base-patch16-224"
)
# Returns: [{'label': 'golden retriever', 'score': 0.95}, ...]

`object_detection`

Detect objects in images with bounding boxes.

Parameters:

image_base64 (string, required): Base64 encoded image
model_id (string, optional): Model ID (default: 'facebook/detr-resnet-50')

Popular models:

facebook/detr-resnet-50 - DETR with ResNet-50
hustvl/yolos-tiny - YOLOS tiny

Example:

result = await object_detection(
    image_base64="base64_encoded_image_data",
    model_id="facebook/detr-resnet-50"
)
# Returns: [{'label': 'dog', 'score': 0.98, 'box': {...}}, ...]

Audio Tools

`text_to_speech`

Convert text to speech audio.

Parameters:

text (string, required): Text to synthesize
model_id (string, optional): Model ID (default: 'facebook/mms-tts-eng')

Popular models:

facebook/mms-tts-eng - MMS TTS English
espnet/kan-bayashi_ljspeech_vits - VITS LJSpeech

Example:

result = await text_to_speech(
    text="Hello, this is a test of text to speech.",
    model_id="facebook/mms-tts-eng"
)
# Returns: {'audio': 'base64_encoded_audio', 'format': 'base64'}

`automatic_speech_recognition`

Transcribe audio to text (speech recognition).

Parameters:

audio_base64 (string, required): Base64 encoded audio
model_id (string, optional): Model ID (default: 'openai/whisper-large-v3')

Popular models:

openai/whisper-large-v3 - Whisper large v3 (best quality)
openai/whisper-medium - Whisper medium (faster)
facebook/wav2vec2-base-960h - Wav2Vec 2.0

Example:

result = await automatic_speech_recognition(
    audio_base64="base64_encoded_audio_data",
    model_id="openai/whisper-large-v3"
)
# Returns: {'text': 'transcribed audio text here'}

Embedding & Similarity Tools

`sentence_similarity`

Compute similarity between sentences.

Parameters:

source_sentence (string, required): Reference sentence
sentences (list, required): List of sentences to compare
model_id (string, optional): Model ID (default: 'sentence-transformers/all-MiniLM-L6-v2')

Popular models:

sentence-transformers/all-MiniLM-L6-v2 - Fast, good quality
sentence-transformers/all-mpnet-base-v2 - Best quality
BAAI/bge-small-en-v1.5 - BGE small

Example:

result = await sentence_similarity(
    source_sentence="The cat sits on the mat",
    sentences=[
        "A cat is sitting on a mat",
        "The dog runs in the park",
        "Cats are great pets"
    ],
    model_id="sentence-transformers/all-MiniLM-L6-v2"
)
# Returns: [0.95, 0.23, 0.65]

`feature_extraction`

Get embeddings (feature vectors) for text.

Parameters:

text (string, required): Input text
model_id (string, optional): Model ID (default: 'sentence-transformers/all-MiniLM-L6-v2')

Popular models:

sentence-transformers/all-MiniLM-L6-v2 - 384 dimensions
sentence-transformers/all-mpnet-base-v2 - 768 dimensions
BAAI/bge-small-en-v1.5 - 384 dimensions

Example:

result = await feature_extraction(
    text="This is a sample sentence.",
    model_id="sentence-transformers/all-MiniLM-L6-v2"
)
# Returns: [[0.012, -0.034, 0.056, ...]] (384-dimensional vector)

`fill_mask`

Fill in masked words in text.

Parameters:

text (string, required): Text with [MASK] token
model_id (string, optional): Model ID (default: 'bert-base-uncased')

Popular models:

bert-base-uncased - BERT base
roberta-base - RoBERTa base
distilbert-base-uncased - DistilBERT

Example:

result = await fill_mask(
    text="Paris is the [MASK] of France.",
    model_id="bert-base-uncased"
)
# Returns: [{'token_str': 'capital', 'score': 0.95}, ...]

Model Loading & Cold Starts

Important: Models may take 20-60 seconds to load on first request (cold start). Subsequent requests are faster.

Tips:

Use popular models for faster loading
Implement retry logic for timeouts
Consider caching model responses
Use smaller models for faster inference

Rate Limits

Free Tier

Rate limited to prevent abuse
Suitable for testing and small projects
May experience queuing during high load

Pro Subscription ($9/month)

No rate limits
Priority access to models
Faster inference
No queuing

Visit huggingface.co/pricing for details.

Base64 Encoding

For images and audio, you need to provide base64 encoded data:

Python example:

import base64

# Encode image
with open("image.jpg", "rb") as f:
    image_base64 = base64.b64encode(f.read()).decode('utf-8')

# Encode audio
with open("audio.wav", "rb") as f:
    audio_base64 = base64.b64encode(f.read()).decode('utf-8')

# Decode image response
image_bytes = base64.b64decode(response['image'])
with open("generated.jpg", "wb") as f:
    f.write(image_bytes)

Parameter Tuning

Text Generation

temperature (0-2): Higher = more creative/random, Lower = more focused/deterministic
top_p (0-1): Nucleus sampling, typically 0.9-0.95
top_k: Number of highest probability tokens to keep
repetition_penalty: Penalize repeated tokens (>1.0 reduces repetition)

Image Generation

guidance_scale (1-20): Higher = follows prompt more strictly (typical: 7-7.5)
num_inference_steps: More steps = higher quality but slower (typical: 20-50)
negative_prompt: Describe what you don't want in the image

Error Handling

Common errors:

503 Service Unavailable: Model is loading (cold start), retry after 20-60 seconds
401 Unauthorized: Invalid or missing API token
429 Too Many Requests: Rate limit exceeded (upgrade to Pro)
400 Bad Request: Invalid parameters or model ID
504 Gateway Timeout: Model took too long to respond

Retry logic example:

import time

max_retries = 3
for attempt in range(max_retries):
    try:
        result = await text_generation(prompt="Hello")
        break
    except httpx.HTTPStatusError as e:
        if e.response.status_code == 503 and attempt < max_retries - 1:
            time.sleep(20)  # Wait for model to load
            continue
        raise

Finding Models

Browse models:

Visit huggingface.co/models
Filter by task (Text Generation, Image Generation, etc.)
Sort by downloads, likes, or trending
Check model card for usage examples

Popular categories:

Text Generation: 50,000+ models
Text Classification: 30,000+ models
Image Generation: 10,000+ models
Translation: 5,000+ models
Embeddings: 3,000+ models

Best Practices

Use popular models: Faster loading and better maintained
Implement timeouts: Set appropriate timeouts (60-120 seconds)
Cache responses: Store results to reduce API calls
Handle cold starts: Implement retry logic for 503 errors
Monitor usage: Track API calls and costs
Test locally: Use Hugging Face Transformers library for testing
Read model cards: Understand model capabilities and limitations
Optimize parameters: Tune settings for your use case

Use Cases

Chatbots: LLM-powered conversational AI
Content Generation: Blog posts, articles, creative writing
Image Creation: Art, illustrations, product images
Sentiment Analysis: Customer feedback analysis
Translation: Multi-language support
Transcription: Meeting notes, podcast transcripts
Semantic Search: Embedding-based search
Data Extraction: NER for document processing
Content Moderation: Text and image classification

NimbleBrainInc/mcp-huggingface

Hugging Face MCP Server

Features

Setup

Prerequisites

Environment Variables

Available Tools

Text Generation Tools

text_generation

Classification Tools

text_classification

token_classification

Question Answering & Text Processing

question_answering

summarization

translation

Image Generation Tools

text_to_image

Computer Vision Tools

image_to_text

image_classification

object_detection

Audio Tools

text_to_speech

automatic_speech_recognition

Embedding & Similarity Tools

sentence_similarity

feature_extraction

fill_mask

Model Loading & Cold Starts

Rate Limits

Free Tier

Pro Subscription ($9/month)

Base64 Encoding

Parameter Tuning

Text Generation

Image Generation

Error Handling

Finding Models

Best Practices

Use Cases

API Documentation

Support

`text_generation`

`text_classification`

`token_classification`

`question_answering`

`summarization`

`translation`

`text_to_image`

`image_to_text`

`image_classification`

`object_detection`

`text_to_speech`

`automatic_speech_recognition`

`sentence_similarity`

`feature_extraction`

`fill_mask`