LinkedIn-Profile-Scraper---MCP-Server

9Mercury/LinkedIn-Profile-Scraper---MCP-Server

3.2

If you are the rightful owner of LinkedIn-Profile-Scraper---MCP-Server and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.

The LinkedIn Profile Scraper is an advanced Model Context Protocol (MCP) server designed for automated LinkedIn profile discovery and data extraction, allowing users to search for professionals by occupation, extract detailed profile information, and export the data to Excel through a unified API.

Tools
5
Resources
0
Prompts
0

🔍 LinkedIn Profile Scraper - MCP Server

An advanced Model Context Protocol (MCP) server for automated LinkedIn profile discovery and data extraction. Search for professionals by occupation, extract detailed profile information, and export to Excel - all through a unified API.

Python MCP License

✨ Features

🎯 Intelligent Profile Discovery

  • Google Search Integration: Uses Serper API to find LinkedIn profiles
  • Multi-Query Strategy: Employs multiple search patterns for comprehensive results
  • Smart URL Validation: Filters and validates LinkedIn profile URLs
  • Profession-Based Search: Target specific roles (AI Engineer, HR Manager, etc.)
  • Batch Processing: Search multiple professions in one operation

📊 Comprehensive Data Extraction

  • Profile Information: Name, headline, location, industry
  • Current Position: Job title, company, location, dates
  • Contact Details: Email, phone (when available), website
  • Professional Stats: Connections count, followers count
  • Skills: Top skills and expertise areas
  • Education: Schools, degrees, fields of study
  • Experience History: Complete work history

🔧 MCP Server Tools

  • search_profession_profiles: Find profiles for a single profession
  • search_multiple_professions: Batch search across multiple roles
  • export_to_excel: Save data to formatted Excel spreadsheet
  • get_current_results: View summary of collected profiles
  • clear_results: Reset collected data

💾 Data Export

  • Excel Format: Professional formatted spreadsheets
  • Timestamp: Automatic file naming with timestamps
  • Custom Naming: Optional custom filenames
  • Rich Data: 15+ fields per profile
  • Pandas Integration: Easy data manipulation

📋 Requirements

System Requirements

  • Python 3.8 or higher
  • Internet connection
  • API keys (RapidAPI, Serper)

Python Dependencies

httpx>=0.24.0
pandas>=2.0.0
openpyxl>=3.1.0
python-dotenv>=1.0.0
mcp>=0.1.0
fastmcp>=0.1.0

API Requirements

  1. RapidAPI Account - For LinkedIn Profile Data API
  2. Serper API Key - For Google search functionality

🔧 Installation

1. Clone or Download

git clone https://github.com/yourusername/linkedin-scraper-mcp.git
cd linkedin-scraper-mcp

2. Install Dependencies

pip install httpx pandas openpyxl python-dotenv mcp fastmcp

Or use requirements.txt:

pip install -r requirements.txt

3. Set Up API Keys

Get RapidAPI Key
  1. Visit RapidAPI.com
  2. Sign up for a free account
  3. Subscribe to Fresh LinkedIn Profile Data API
  4. Copy your API key from the dashboard
Get Serper API Key
  1. Visit Serper.dev
  2. Sign up for an account (free tier available)
  3. Get your API key from the dashboard

4. Configure Environment Variables

Create a .env file in the project directory:

RAPIDAPI_KEY=your_rapidapi_key_here
SERPER_API_KEY=your_serper_api_key_here

Security Note: Never commit your .env file to version control!

Add to .gitignore:

.env
*.xlsx
__pycache__/
*.pyc

🚀 Usage

Running as MCP Server

python linkedin_scraper.py

The server will start in stdio mode, ready to receive MCP tool calls.

Running as Standalone Script

Uncomment the test function at the bottom:

if __name__ == "__main__":
    # Uncomment to test
    asyncio.run(test_enhanced_scraper())
    
    # Comment out for testing
    # mcp.run(transport="stdio")

Then run:

python linkedin_scraper.py

Using MCP Tools

1. Search Single Profession
# Find AI Engineers
result = await search_profession_profiles(
    profession="AI Engineer",
    max_profiles=10
)
2. Search Multiple Professions
# Find multiple roles at once
result = await search_multiple_professions(
    professions="AI Engineer,HR Manager,Startup Founder",
    max_profiles_per_profession=5
)
3. Export to Excel
# Export with auto-generated filename
result = await export_to_excel()

# Export with custom filename
result = await export_to_excel(filename="my_contacts.xlsx")
4. Get Current Results
# View summary of collected data
summary = await get_current_results()
print(summary)
5. Clear Data
# Clear all collected profiles
result = await clear_results()

📊 Output Format

Excel Spreadsheet Columns

ColumnDescriptionExample
NameFull nameJohn Doe
LinkedIn ProfileProfile URLlinkedin.com/in/johndoe
EmailEmail addressjohn@example.com
PhonePhone number+1-555-0123
HeadlineProfessional headlineSenior AI Engineer at Tech Corp
Current CompanyCurrent employerTech Corp
Current PositionCurrent job titleSenior AI Engineer
LocationGeographic locationSan Francisco, CA
IndustryIndustry sectorComputer Software
WebsitePersonal/company websitejohndoe.com
ConnectionsConnection count500+
FollowersFollower count1,234
SkillsTop skills (comma-separated)Python, ML, TensorFlow
Profession SearchedSearch query usedAI Engineer
Search RankResult position1
Scraped AtTimestamp2025-01-15 10:30:00

JSON Response Format

{
  "full_name": "John Doe",
  "first_name": "John",
  "last_name": "Doe",
  "headline": "Senior AI Engineer | Machine Learning Expert",
  "location": "San Francisco Bay Area",
  "country": "United States",
  "industry": "Computer Software",
  "linkedin_url": "https://linkedin.com/in/johndoe",
  "profile_id": "johndoe",
  "website": "https://johndoe.com",
  "email": "john@example.com",
  "phone": "+1-555-0123",
  "current_position": {
    "title": "Senior AI Engineer",
    "company": "Tech Corp",
    "location": "San Francisco, CA",
    "start_date": "2023-01",
    "end_date": ""
  },
  "company": "Tech Corp",
  "skills": ["Python", "Machine Learning", "TensorFlow", "PyTorch"],
  "connections_count": 500,
  "followers_count": 1234,
  "education": [
    {
      "school": "Stanford University",
      "degree": "Master of Science",
      "field_of_study": "Computer Science"
    }
  ],
  "profession_searched": "AI Engineer",
  "search_rank": 1,
  "scrape_timestamp": "2025-01-15T10:30:00"
}

⚙️ Configuration

Adjust Search Parameters

Number of Search Queries

Modify the search_queries list in search_google_for_linkedin_profiles:

search_queries = [
    f'site:linkedin.com/in "{profession}" -dir',
    f'"{profession}" LinkedIn profile site:linkedin.com/in',
    f'linkedin.com/in {profession} professional',
    f'{profession} site:linkedin.com/in USA',  # Add more
    f'intitle:"{profession}" site:linkedin.com/in'  # Add more
]
Rate Limiting

Adjust delays between requests:

# In process_profession
await asyncio.sleep(2)  # Delay between profiles (default: 2 seconds)

# In search_multiple_professions
await asyncio.sleep(3)  # Delay between professions (default: 3 seconds)

Customize Data Fields

Include More LinkedIn Data

Modify params in get_linkedin_data:

params = {
    "linkedin_url": linkedin_url,
    "include_skills": "true",
    "include_certifications": "true",  # Changed to true
    "include_publications": "true",     # Changed to true
    "include_honors": "true",           # Changed to true
    "include_volunteers": "true",       # Changed to true
    "include_projects": "true",         # Changed to true
}
Customize Excel Columns

Edit the save_to_excel method:

row = {
    'Name': result.get('full_name', ''),
    'LinkedIn Profile': result.get('linkedin_url', ''),
    # Add custom columns
    'Years of Experience': calculate_experience(result),
    'Education Level': get_highest_degree(result),
    'Custom Field': result.get('custom_data', '')
}

Change Search Location

Modify the Serper API payload:

payload = {
    'q': query,
    'num': num_results,
    'hl': 'en',
    'gl': 'us',  # Change country code (us, uk, ca, au, etc.)
    'location': 'San Francisco, CA'  # Add specific location
}

🏗️ Architecture

Component Diagram

┌─────────────────────────────────────────────┐
│           MCP Server Interface               │
│  • search_profession_profiles                │
│  • search_multiple_professions               │
│  • export_to_excel                           │
│  • get_current_results                       │
│  • clear_results                             │
└──────────────────┬──────────────────────────┘
                   │
                   ▼
┌─────────────────────────────────────────────┐
│         LinkedInScraper Class                │
│  • Profile discovery                         │
│  • Data extraction                           │
│  • Result management                         │
└──────────────────┬──────────────────────────┘
                   │
         ┌─────────┴─────────┐
         │                   │
         ▼                   ▼
┌─────────────────┐  ┌──────────────────┐
│  Serper API     │  │  RapidAPI        │
│  (Google Search)│  │  (LinkedIn Data) │
└─────────────────┘  └──────────────────┘
         │                   │
         └─────────┬─────────┘
                   │
                   ▼
         ┌─────────────────┐
         │  Data Processing │
         │  • Extraction    │
         │  • Validation    │
         │  • Formatting    │
         └────────┬─────────┘
                  │
                  ▼
         ┌─────────────────┐
         │  Excel Export    │
         │  (Pandas/OpenPyXL)│
         └──────────────────┘

Data Flow

1. User Request
   ↓
2. Search Google (Serper API)
   • Multiple search queries
   • Extract LinkedIn URLs
   ↓
3. Validate URLs
   • Check format
   • Remove duplicates
   ↓
4. Fetch Profile Data (RapidAPI)
   • Get detailed profile info
   • Rate limiting (2s delay)
   ↓
5. Extract Contact Info
   • Parse JSON response
   • Extract email patterns
   • Structure data
   ↓
6. Store Results
   • Add to results list
   • Track metadata
   ↓
7. Export to Excel
   • Format as DataFrame
   • Save to .xlsx file

🎯 Use Cases

1. Recruitment & Talent Acquisition

# Find potential candidates
await search_multiple_professions(
    professions="Senior Python Developer,Machine Learning Engineer,Data Scientist",
    max_profiles_per_profession=20
)
await export_to_excel("tech_candidates.xlsx")

2. Sales & Lead Generation

# Find decision makers
await search_multiple_professions(
    professions="CTO,VP Engineering,Head of AI",
    max_profiles_per_profession=15
)
await export_to_excel("tech_leads.xlsx")

3. Market Research

# Research industry professionals
await search_profession_profiles(
    profession="Blockchain Developer",
    max_profiles=50
)
await export_to_excel("blockchain_market_research.xlsx")

4. Networking & Partnership

# Find potential collaborators
await search_multiple_professions(
    professions="Startup Founder,Angel Investor,Venture Capitalist",
    max_profiles_per_profession=10
)
await export_to_excel("potential_partners.xlsx")

5. Competitive Intelligence

# Research competitors' teams
await search_profession_profiles(
    profession="AI Engineer at OpenAI",
    max_profiles=30
)
await export_to_excel("competitor_analysis.xlsx")

🛠️ Troubleshooting

API Key Issues

Error: "RAPIDAPI_KEY is not set"

# Check your .env file exists
ls -la .env

# Verify contents
cat .env

# Ensure no extra spaces
RAPIDAPI_KEY=your_key_here  # ✓ Correct
RAPIDAPI_KEY = your_key_here  # ✗ Wrong (spaces)

Error: "Invalid API key"

  • Verify key is correct (copy-paste from dashboard)
  • Check API subscription is active
  • Ensure free tier limits not exceeded

Rate Limiting

Error: "429 Too Many Requests"

# Increase delays
await asyncio.sleep(5)  # Instead of 2

# Reduce batch size
max_profiles=5  # Instead of 10

No Results Found

Issue: Search returns empty list

# Try different search terms
"AI Engineer""Artificial Intelligence Engineer"
"HR Manager""Human Resources Manager"

# Be more specific
"Developer""Senior Python Developer"

# Add location
"Data Scientist in San Francisco"

Email Not Found

Issue: Most profiles show "Not Available"

This is expected! LinkedIn doesn't publicly display emails for most users. Emails are only extracted when:

  • User includes email in their "About" section
  • Email is in headline or summary
  • Profile has contact information visible

Workaround:

# Use other contact methods
- LinkedIn messaging
- Company website contact forms
- Twitter/social media handles
- GitHub profile (for developers)

Excel Export Issues

Error: "No module named 'openpyxl'"

pip install openpyxl

Error: "Permission denied"

# File is open in Excel - close it first
# Or use different filename
await export_to_excel("contacts_v2.xlsx")

Memory Issues

Error: "MemoryError" with large batches

# Process in smaller batches
for i in range(0, 100, 10):
    await search_profession_profiles(
        profession="Engineer",
        max_profiles=10
    )
    await export_to_excel(f"batch_{i}.xlsx")
    await clear_results()  # Free memory

🔒 Privacy & Legal Considerations

Important Notes

⚠️ Use Responsibly: This tool is for legitimate business purposes only

Legal Compliance

  • Respect LinkedIn ToS: Review LinkedIn's terms of service
  • GDPR Compliance: If targeting EU profiles, ensure GDPR compliance
  • CAN-SPAM: Follow email marketing laws if using for outreach
  • Data Protection: Store collected data securely
  • Rate Limiting: Respect API rate limits

Best Practices

  1. Obtain Consent: Get consent before sending marketing emails
  2. Data Security: Encrypt stored data
  3. Limited Retention: Delete data when no longer needed
  4. Transparency: Be clear about data collection
  5. Opt-Out: Provide easy opt-out mechanisms

Prohibited Uses

  • ❌ Spam or unsolicited marketing
  • ❌ Data selling without consent
  • ❌ Harassment or stalking
  • ❌ Violating LinkedIn ToS
  • ❌ Impersonation or fraud

📊 Performance Metrics

Speed Benchmarks

  • Search Time: ~5-10 seconds per search query
  • Profile Fetch: ~2-3 seconds per profile
  • Total Time: ~30-40 seconds for 10 profiles
  • Batch Processing: ~5-7 minutes for 100 profiles

API Limits

Serper API (Free Tier)

  • 2,500 searches per month
  • ~25-30 searches per profession
  • Can find ~80-100 professions per month

RapidAPI (Basic Plan)

  • 500 requests per month
  • 1 request per profile
  • Can scrape ~500 profiles per month

Optimization Tips

  1. Batch Processing: Group similar professions
  2. Cache Results: Avoid re-scraping same profiles
  3. Parallel Processing: Use asyncio effectively
  4. Smart Queries: Use targeted search terms

🤝 Contributing

Contributions welcome! Areas for improvement:

Feature Ideas

  • Add support for company page scraping
  • Implement profile scoring/ranking
  • Add email verification service
  • Create web dashboard interface
  • Add CSV export option
  • Implement database storage (SQLite/PostgreSQL)
  • Add profile deduplication
  • Create Docker container
  • Add webhook notifications
  • Implement retry logic with exponential backoff

Development Setup

# Clone repository
git clone https://github.com/yourusername/linkedin-scraper-mcp.git

# Create virtual environment
python -m venv venv
source venv/bin/activate  # Linux/Mac
# or
venv\Scripts\activate  # Windows

# Install dependencies
pip install -r requirements-dev.txt

# Run tests
pytest tests/

# Submit PR

📝 License

This project is licensed under the MIT License - see LICENSE file for details.

🙏 Acknowledgments

  • FastMCP: Model Context Protocol server framework
  • RapidAPI: API marketplace and Fresh LinkedIn Profile Data API
  • Serper: Google search API service
  • Pandas: Data manipulation and analysis
  • httpx: Modern HTTP client

📞 Support

Getting Help

  • Issues: Report bugs via GitHub issues
  • Discussions: Ask questions in GitHub discussions
  • Documentation: Check MCP documentation at modelcontextprotocol.io

Contact

🗺️ Roadmap

Q1 2025

  • Basic profile scraping
  • Excel export
  • Multi-profession search
  • Email verification integration
  • Profile scoring system

Q2 2025

  • Web dashboard
  • PostgreSQL integration
  • Advanced filtering
  • API authentication
  • Webhook support

Q3 2025

  • Company page scraping
  • Job posting extraction
  • Skills gap analysis
  • Chrome extension
  • Mobile app

Q4 2025

  • AI-powered matching
  • CRM integrations
  • Analytics dashboard
  • Team collaboration features
  • Enterprise features

💡 Tips & Best Practices

Search Optimization

  1. Be Specific: "Senior Backend Engineer Python" vs "Engineer"
  2. Use Titles: Actual job titles work best
  3. Location: Add location for targeted results
  4. Industry: Include industry keywords
  5. Company: Search by company name for specificity

Data Quality

  1. Verify Emails: Use email verification service
  2. Cross-Reference: Check multiple sources
  3. Update Regularly: LinkedIn profiles change
  4. Manual Review: Verify important contacts manually

API Management

  1. Monitor Usage: Track API call counts
  2. Upgrade Plans: Upgrade when hitting limits
  3. Cache Results: Store results to avoid re-fetching
  4. Error Handling: Implement robust error handling

Export Best Practices

  1. Regular Exports: Export data frequently
  2. Descriptive Names: Use clear filenames
  3. Backup Data: Keep multiple copies
  4. Version Control: Track data versions

Made with ❤️ for professional networking and recruitment

Star ⭐ this repository if you find it useful!


📸 Screenshots

Sample Excel Output

| Name          | LinkedIn Profile          | Email              | Current Position        |
|---------------|---------------------------|--------------------|-------------------------|
| John Doe      | linkedin.com/in/johndoe   | john@example.com   | Senior AI Engineer      |
| Jane Smith    | linkedin.com/in/janesmith | Not Available      | Machine Learning Lead   |
| Bob Johnson   | linkedin.com/in/bobjohnson| bob@company.com    | Data Scientist          |

Summary Output

{
  "total_profiles": 25,
  "professions": {
    "AI Engineer": 10,
    "Data Scientist": 8,
    "ML Engineer": 7
  },
  "profiles_with_email": 6,
  "profiles_with_phone": 2
}

🔐 Security Best Practices

  1. Never Commit API Keys: Use .env and .gitignore
  2. Rotate Keys: Change API keys periodically
  3. Limit Access: Restrict who has API credentials
  4. Monitor Usage: Watch for unusual activity
  5. Secure Storage: Encrypt exported data
  6. Access Control: Implement user authentication
  7. Audit Logs: Track all API usage
  8. Data Retention: Delete old data regularly

Disclaimer: This tool is for educational and legitimate business purposes only. Always comply with LinkedIn's Terms of Service, applicable laws, and regulations. The authors are not responsible for misuse of this tool.