mcp-huggingface

NimbleBrainInc/mcp-huggingface

3.2

If you are the rightful owner of mcp-huggingface and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.

Hugging Face MCP Server provides access to a wide range of machine learning models via the Hugging Face Inference API, supporting tasks like text generation, image generation, and more.

Tools
5
Resources
0
Prompts
0

Hugging Face MCP Server

MCP server for accessing the Hugging Face Inference API. Run 200,000+ machine learning models including LLMs, image generation, text classification, embeddings, and more.

Features

  • Text Generation: LLMs like Llama-3, Mistral, Gemma
  • Image Generation: FLUX, Stable Diffusion XL, SD 2.1
  • Text Classification: Sentiment analysis, topic classification
  • Token Classification: Named entity recognition, POS tagging
  • Question Answering: Extract answers from context
  • Summarization: Condense long text
  • Translation: 200+ language pairs
  • Image-to-Text: Image captioning
  • Image Classification: Classify images into categories
  • Object Detection: Detect objects with bounding boxes
  • Text-to-Speech: Convert text to audio
  • Speech Recognition: Transcribe audio (Whisper)
  • Embeddings: Get text/sentence embeddings
  • And more: Fill-mask, sentence similarity

Setup

Prerequisites

  • Hugging Face account
  • API token (free or Pro)

Environment Variables

  • HUGGINGFACE_API_TOKEN (required): Your Hugging Face API token

How to get an API token:

  1. Go to huggingface.co/settings/tokens
  2. Click "New token"
  3. Give it a name and select permissions (read is sufficient for inference)
  4. Copy the token (starts with hf_)
  5. Store as HUGGINGFACE_API_TOKEN

Available Tools

Text Generation Tools

text_generation

Generate text using large language models.

Parameters:

  • prompt (string, required): Input text prompt
  • model_id (string, optional): Model ID (default: 'mistralai/Mistral-7B-Instruct-v0.3')
  • max_new_tokens (int, optional): Maximum tokens to generate
  • temperature (float, optional): Sampling temperature 0-2 (higher = more random)
  • top_p (float, optional): Nucleus sampling 0-1
  • top_k (int, optional): Top-k sampling
  • repetition_penalty (float, optional): Penalty for repetition
  • return_full_text (bool, optional): Return prompt + generation (default: False)

Popular models:

  • meta-llama/Llama-3.2-3B-Instruct - Meta's Llama 3.2
  • mistralai/Mistral-7B-Instruct-v0.3 - Mistral 7B
  • google/gemma-2-2b-it - Google Gemma 2
  • HuggingFaceH4/zephyr-7b-beta - Zephyr 7B
  • tiiuae/falcon-7b-instruct - Falcon 7B

Example:

result = await text_generation(
    prompt="Write a Python function to calculate fibonacci numbers:",
    model_id="mistralai/Mistral-7B-Instruct-v0.3",
    max_new_tokens=200,
    temperature=0.7,
    top_p=0.9
)

Classification Tools

text_classification

Classify text into categories (sentiment, topics, etc.).

Parameters:

  • text (string, required): Text to classify
  • model_id (string, optional): Model ID (default: 'distilbert-base-uncased-finetuned-sst-2-english')

Popular models:

  • distilbert-base-uncased-finetuned-sst-2-english - Sentiment (positive/negative)
  • facebook/bart-large-mnli - Zero-shot classification
  • cardiffnlp/twitter-roberta-base-sentiment-latest - Twitter sentiment
  • finiteautomata/bertweet-base-sentiment-analysis - Tweet sentiment

Example:

result = await text_classification(
    text="I love this product! It exceeded my expectations.",
    model_id="distilbert-base-uncased-finetuned-sst-2-english"
)
# Returns: [{'label': 'POSITIVE', 'score': 0.9998}]
token_classification

Token-level classification for NER, POS tagging, etc.

Parameters:

  • text (string, required): Input text
  • model_id (string, optional): Model ID (default: 'dslim/bert-base-NER')

Popular models:

  • dslim/bert-base-NER - Named Entity Recognition
  • Jean-Baptiste/roberta-large-ner-english - Large NER model
  • dbmdz/bert-large-cased-finetuned-conll03-english - CoNLL-2003 NER

Example:

result = await token_classification(
    text="Apple Inc. is located in Cupertino, California.",
    model_id="dslim/bert-base-NER"
)
# Returns entities: ORG (Apple Inc.), LOC (Cupertino), LOC (California)

Question Answering & Text Processing

question_answering

Answer questions based on provided context.

Parameters:

  • question (string, required): Question to answer
  • context (string, required): Context containing the answer
  • model_id (string, optional): Model ID (default: 'deepset/roberta-base-squad2')

Popular models:

  • deepset/roberta-base-squad2 - RoBERTa on SQuAD 2.0
  • distilbert-base-cased-distilled-squad - DistilBERT on SQuAD

Example:

result = await question_answering(
    question="Where is the Eiffel Tower located?",
    context="The Eiffel Tower is a landmark in Paris, France. It was built in 1889.",
    model_id="deepset/roberta-base-squad2"
)
# Returns: {'answer': 'Paris, France', 'score': 0.98, 'start': 35, 'end': 48}
summarization

Summarize long text into shorter version.

Parameters:

  • text (string, required): Text to summarize
  • model_id (string, optional): Model ID (default: 'facebook/bart-large-cnn')
  • max_length (int, optional): Maximum summary length
  • min_length (int, optional): Minimum summary length

Popular models:

  • facebook/bart-large-cnn - BART CNN summarization
  • google/pegasus-xsum - PEGASUS XSum
  • sshleifer/distilbart-cnn-12-6 - Distilled BART

Example:

result = await summarization(
    text="Long article text here...",
    model_id="facebook/bart-large-cnn",
    max_length=130,
    min_length=30
)
translation

Translate text between languages.

Parameters:

  • text (string, required): Text to translate
  • model_id (string, required): Model ID for language pair

Popular models:

  • Helsinki-NLP/opus-mt-en-es - English to Spanish
  • Helsinki-NLP/opus-mt-es-en - Spanish to English
  • Helsinki-NLP/opus-mt-en-fr - English to French
  • Helsinki-NLP/opus-mt-en-de - English to German
  • facebook/mbart-large-50-many-to-many-mmt - Multilingual (50 languages)

Example:

result = await translation(
    text="Hello, how are you?",
    model_id="Helsinki-NLP/opus-mt-en-es"
)
# Returns: "Hola, ¿cómo estás?"

Image Generation Tools

text_to_image

Generate images from text prompts.

Parameters:

  • prompt (string, required): Text description of desired image
  • model_id (string, optional): Model ID (default: 'black-forest-labs/FLUX.1-dev')
  • negative_prompt (string, optional): What to avoid in image
  • num_inference_steps (int, optional): Number of denoising steps
  • guidance_scale (float, optional): How closely to follow prompt

Popular models:

  • black-forest-labs/FLUX.1-dev - FLUX.1 (high quality)
  • stabilityai/stable-diffusion-xl-base-1.0 - SDXL
  • stabilityai/stable-diffusion-2-1 - SD 2.1
  • runwayml/stable-diffusion-v1-5 - SD 1.5

Example:

result = await text_to_image(
    prompt="A serene mountain landscape at sunset, photorealistic, 8k",
    model_id="black-forest-labs/FLUX.1-dev",
    negative_prompt="blurry, low quality, distorted",
    guidance_scale=7.5
)
# Returns: {'image': 'base64_encoded_image', 'format': 'base64'}

Computer Vision Tools

image_to_text

Generate text descriptions from images (captioning).

Parameters:

  • image_base64 (string, required): Base64 encoded image
  • model_id (string, optional): Model ID (default: 'Salesforce/blip-image-captioning-large')

Popular models:

  • Salesforce/blip-image-captioning-large - BLIP large
  • nlpconnect/vit-gpt2-image-captioning - ViT-GPT2

Example:

result = await image_to_text(
    image_base64="base64_encoded_image_data",
    model_id="Salesforce/blip-image-captioning-large"
)
# Returns: [{'generated_text': 'a dog playing in the park'}]
image_classification

Classify images into categories.

Parameters:

  • image_base64 (string, required): Base64 encoded image
  • model_id (string, optional): Model ID (default: 'google/vit-base-patch16-224')

Popular models:

  • google/vit-base-patch16-224 - Vision Transformer
  • microsoft/resnet-50 - ResNet-50

Example:

result = await image_classification(
    image_base64="base64_encoded_image_data",
    model_id="google/vit-base-patch16-224"
)
# Returns: [{'label': 'golden retriever', 'score': 0.95}, ...]
object_detection

Detect objects in images with bounding boxes.

Parameters:

  • image_base64 (string, required): Base64 encoded image
  • model_id (string, optional): Model ID (default: 'facebook/detr-resnet-50')

Popular models:

  • facebook/detr-resnet-50 - DETR with ResNet-50
  • hustvl/yolos-tiny - YOLOS tiny

Example:

result = await object_detection(
    image_base64="base64_encoded_image_data",
    model_id="facebook/detr-resnet-50"
)
# Returns: [{'label': 'dog', 'score': 0.98, 'box': {...}}, ...]

Audio Tools

text_to_speech

Convert text to speech audio.

Parameters:

  • text (string, required): Text to synthesize
  • model_id (string, optional): Model ID (default: 'facebook/mms-tts-eng')

Popular models:

  • facebook/mms-tts-eng - MMS TTS English
  • espnet/kan-bayashi_ljspeech_vits - VITS LJSpeech

Example:

result = await text_to_speech(
    text="Hello, this is a test of text to speech.",
    model_id="facebook/mms-tts-eng"
)
# Returns: {'audio': 'base64_encoded_audio', 'format': 'base64'}
automatic_speech_recognition

Transcribe audio to text (speech recognition).

Parameters:

  • audio_base64 (string, required): Base64 encoded audio
  • model_id (string, optional): Model ID (default: 'openai/whisper-large-v3')

Popular models:

  • openai/whisper-large-v3 - Whisper large v3 (best quality)
  • openai/whisper-medium - Whisper medium (faster)
  • facebook/wav2vec2-base-960h - Wav2Vec 2.0

Example:

result = await automatic_speech_recognition(
    audio_base64="base64_encoded_audio_data",
    model_id="openai/whisper-large-v3"
)
# Returns: {'text': 'transcribed audio text here'}

Embedding & Similarity Tools

sentence_similarity

Compute similarity between sentences.

Parameters:

  • source_sentence (string, required): Reference sentence
  • sentences (list, required): List of sentences to compare
  • model_id (string, optional): Model ID (default: 'sentence-transformers/all-MiniLM-L6-v2')

Popular models:

  • sentence-transformers/all-MiniLM-L6-v2 - Fast, good quality
  • sentence-transformers/all-mpnet-base-v2 - Best quality
  • BAAI/bge-small-en-v1.5 - BGE small

Example:

result = await sentence_similarity(
    source_sentence="The cat sits on the mat",
    sentences=[
        "A cat is sitting on a mat",
        "The dog runs in the park",
        "Cats are great pets"
    ],
    model_id="sentence-transformers/all-MiniLM-L6-v2"
)
# Returns: [0.95, 0.23, 0.65]
feature_extraction

Get embeddings (feature vectors) for text.

Parameters:

  • text (string, required): Input text
  • model_id (string, optional): Model ID (default: 'sentence-transformers/all-MiniLM-L6-v2')

Popular models:

  • sentence-transformers/all-MiniLM-L6-v2 - 384 dimensions
  • sentence-transformers/all-mpnet-base-v2 - 768 dimensions
  • BAAI/bge-small-en-v1.5 - 384 dimensions

Example:

result = await feature_extraction(
    text="This is a sample sentence.",
    model_id="sentence-transformers/all-MiniLM-L6-v2"
)
# Returns: [[0.012, -0.034, 0.056, ...]] (384-dimensional vector)
fill_mask

Fill in masked words in text.

Parameters:

  • text (string, required): Text with [MASK] token
  • model_id (string, optional): Model ID (default: 'bert-base-uncased')

Popular models:

  • bert-base-uncased - BERT base
  • roberta-base - RoBERTa base
  • distilbert-base-uncased - DistilBERT

Example:

result = await fill_mask(
    text="Paris is the [MASK] of France.",
    model_id="bert-base-uncased"
)
# Returns: [{'token_str': 'capital', 'score': 0.95}, ...]

Model Loading & Cold Starts

Important: Models may take 20-60 seconds to load on first request (cold start). Subsequent requests are faster.

Tips:

  • Use popular models for faster loading
  • Implement retry logic for timeouts
  • Consider caching model responses
  • Use smaller models for faster inference

Rate Limits

Free Tier

  • Rate limited to prevent abuse
  • Suitable for testing and small projects
  • May experience queuing during high load

Pro Subscription ($9/month)

  • No rate limits
  • Priority access to models
  • Faster inference
  • No queuing

Visit huggingface.co/pricing for details.

Base64 Encoding

For images and audio, you need to provide base64 encoded data:

Python example:

import base64

# Encode image
with open("image.jpg", "rb") as f:
    image_base64 = base64.b64encode(f.read()).decode('utf-8')

# Encode audio
with open("audio.wav", "rb") as f:
    audio_base64 = base64.b64encode(f.read()).decode('utf-8')

# Decode image response
image_bytes = base64.b64decode(response['image'])
with open("generated.jpg", "wb") as f:
    f.write(image_bytes)

Parameter Tuning

Text Generation

  • temperature (0-2): Higher = more creative/random, Lower = more focused/deterministic
  • top_p (0-1): Nucleus sampling, typically 0.9-0.95
  • top_k: Number of highest probability tokens to keep
  • repetition_penalty: Penalize repeated tokens (>1.0 reduces repetition)

Image Generation

  • guidance_scale (1-20): Higher = follows prompt more strictly (typical: 7-7.5)
  • num_inference_steps: More steps = higher quality but slower (typical: 20-50)
  • negative_prompt: Describe what you don't want in the image

Error Handling

Common errors:

  • 503 Service Unavailable: Model is loading (cold start), retry after 20-60 seconds
  • 401 Unauthorized: Invalid or missing API token
  • 429 Too Many Requests: Rate limit exceeded (upgrade to Pro)
  • 400 Bad Request: Invalid parameters or model ID
  • 504 Gateway Timeout: Model took too long to respond

Retry logic example:

import time

max_retries = 3
for attempt in range(max_retries):
    try:
        result = await text_generation(prompt="Hello")
        break
    except httpx.HTTPStatusError as e:
        if e.response.status_code == 503 and attempt < max_retries - 1:
            time.sleep(20)  # Wait for model to load
            continue
        raise

Finding Models

Browse models:

  • Visit huggingface.co/models
  • Filter by task (Text Generation, Image Generation, etc.)
  • Sort by downloads, likes, or trending
  • Check model card for usage examples

Popular categories:

  • Text Generation: 50,000+ models
  • Text Classification: 30,000+ models
  • Image Generation: 10,000+ models
  • Translation: 5,000+ models
  • Embeddings: 3,000+ models

Best Practices

  1. Use popular models: Faster loading and better maintained
  2. Implement timeouts: Set appropriate timeouts (60-120 seconds)
  3. Cache responses: Store results to reduce API calls
  4. Handle cold starts: Implement retry logic for 503 errors
  5. Monitor usage: Track API calls and costs
  6. Test locally: Use Hugging Face Transformers library for testing
  7. Read model cards: Understand model capabilities and limitations
  8. Optimize parameters: Tune settings for your use case

Use Cases

  • Chatbots: LLM-powered conversational AI
  • Content Generation: Blog posts, articles, creative writing
  • Image Creation: Art, illustrations, product images
  • Sentiment Analysis: Customer feedback analysis
  • Translation: Multi-language support
  • Transcription: Meeting notes, podcast transcripts
  • Semantic Search: Embedding-based search
  • Data Extraction: NER for document processing
  • Content Moderation: Text and image classification

API Documentation

Support