9Mercury/LinkedIn-Profile-Scraper---MCP-Server
If you are the rightful owner of LinkedIn-Profile-Scraper---MCP-Server and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.
The LinkedIn Profile Scraper is an advanced Model Context Protocol (MCP) server designed for automated LinkedIn profile discovery and data extraction, allowing users to search for professionals by occupation, extract detailed profile information, and export the data to Excel through a unified API.
🔍 LinkedIn Profile Scraper - MCP Server
An advanced Model Context Protocol (MCP) server for automated LinkedIn profile discovery and data extraction. Search for professionals by occupation, extract detailed profile information, and export to Excel - all through a unified API.
✨ Features
🎯 Intelligent Profile Discovery
- Google Search Integration: Uses Serper API to find LinkedIn profiles
- Multi-Query Strategy: Employs multiple search patterns for comprehensive results
- Smart URL Validation: Filters and validates LinkedIn profile URLs
- Profession-Based Search: Target specific roles (AI Engineer, HR Manager, etc.)
- Batch Processing: Search multiple professions in one operation
📊 Comprehensive Data Extraction
- Profile Information: Name, headline, location, industry
- Current Position: Job title, company, location, dates
- Contact Details: Email, phone (when available), website
- Professional Stats: Connections count, followers count
- Skills: Top skills and expertise areas
- Education: Schools, degrees, fields of study
- Experience History: Complete work history
🔧 MCP Server Tools
- search_profession_profiles: Find profiles for a single profession
- search_multiple_professions: Batch search across multiple roles
- export_to_excel: Save data to formatted Excel spreadsheet
- get_current_results: View summary of collected profiles
- clear_results: Reset collected data
💾 Data Export
- Excel Format: Professional formatted spreadsheets
- Timestamp: Automatic file naming with timestamps
- Custom Naming: Optional custom filenames
- Rich Data: 15+ fields per profile
- Pandas Integration: Easy data manipulation
📋 Requirements
System Requirements
- Python 3.8 or higher
- Internet connection
- API keys (RapidAPI, Serper)
Python Dependencies
httpx>=0.24.0
pandas>=2.0.0
openpyxl>=3.1.0
python-dotenv>=1.0.0
mcp>=0.1.0
fastmcp>=0.1.0
API Requirements
- RapidAPI Account - For LinkedIn Profile Data API
- Serper API Key - For Google search functionality
🔧 Installation
1. Clone or Download
git clone https://github.com/yourusername/linkedin-scraper-mcp.git
cd linkedin-scraper-mcp
2. Install Dependencies
pip install httpx pandas openpyxl python-dotenv mcp fastmcp
Or use requirements.txt:
pip install -r requirements.txt
3. Set Up API Keys
Get RapidAPI Key
- Visit RapidAPI.com
- Sign up for a free account
- Subscribe to Fresh LinkedIn Profile Data API
- Copy your API key from the dashboard
Get Serper API Key
- Visit Serper.dev
- Sign up for an account (free tier available)
- Get your API key from the dashboard
4. Configure Environment Variables
Create a .env file in the project directory:
RAPIDAPI_KEY=your_rapidapi_key_here
SERPER_API_KEY=your_serper_api_key_here
Security Note: Never commit your .env file to version control!
Add to .gitignore:
.env
*.xlsx
__pycache__/
*.pyc
🚀 Usage
Running as MCP Server
python linkedin_scraper.py
The server will start in stdio mode, ready to receive MCP tool calls.
Running as Standalone Script
Uncomment the test function at the bottom:
if __name__ == "__main__":
# Uncomment to test
asyncio.run(test_enhanced_scraper())
# Comment out for testing
# mcp.run(transport="stdio")
Then run:
python linkedin_scraper.py
Using MCP Tools
1. Search Single Profession
# Find AI Engineers
result = await search_profession_profiles(
profession="AI Engineer",
max_profiles=10
)
2. Search Multiple Professions
# Find multiple roles at once
result = await search_multiple_professions(
professions="AI Engineer,HR Manager,Startup Founder",
max_profiles_per_profession=5
)
3. Export to Excel
# Export with auto-generated filename
result = await export_to_excel()
# Export with custom filename
result = await export_to_excel(filename="my_contacts.xlsx")
4. Get Current Results
# View summary of collected data
summary = await get_current_results()
print(summary)
5. Clear Data
# Clear all collected profiles
result = await clear_results()
📊 Output Format
Excel Spreadsheet Columns
| Column | Description | Example |
|---|---|---|
| Name | Full name | John Doe |
| LinkedIn Profile | Profile URL | linkedin.com/in/johndoe |
| Email address | john@example.com | |
| Phone | Phone number | +1-555-0123 |
| Headline | Professional headline | Senior AI Engineer at Tech Corp |
| Current Company | Current employer | Tech Corp |
| Current Position | Current job title | Senior AI Engineer |
| Location | Geographic location | San Francisco, CA |
| Industry | Industry sector | Computer Software |
| Website | Personal/company website | johndoe.com |
| Connections | Connection count | 500+ |
| Followers | Follower count | 1,234 |
| Skills | Top skills (comma-separated) | Python, ML, TensorFlow |
| Profession Searched | Search query used | AI Engineer |
| Search Rank | Result position | 1 |
| Scraped At | Timestamp | 2025-01-15 10:30:00 |
JSON Response Format
{
"full_name": "John Doe",
"first_name": "John",
"last_name": "Doe",
"headline": "Senior AI Engineer | Machine Learning Expert",
"location": "San Francisco Bay Area",
"country": "United States",
"industry": "Computer Software",
"linkedin_url": "https://linkedin.com/in/johndoe",
"profile_id": "johndoe",
"website": "https://johndoe.com",
"email": "john@example.com",
"phone": "+1-555-0123",
"current_position": {
"title": "Senior AI Engineer",
"company": "Tech Corp",
"location": "San Francisco, CA",
"start_date": "2023-01",
"end_date": ""
},
"company": "Tech Corp",
"skills": ["Python", "Machine Learning", "TensorFlow", "PyTorch"],
"connections_count": 500,
"followers_count": 1234,
"education": [
{
"school": "Stanford University",
"degree": "Master of Science",
"field_of_study": "Computer Science"
}
],
"profession_searched": "AI Engineer",
"search_rank": 1,
"scrape_timestamp": "2025-01-15T10:30:00"
}
⚙️ Configuration
Adjust Search Parameters
Number of Search Queries
Modify the search_queries list in search_google_for_linkedin_profiles:
search_queries = [
f'site:linkedin.com/in "{profession}" -dir',
f'"{profession}" LinkedIn profile site:linkedin.com/in',
f'linkedin.com/in {profession} professional',
f'{profession} site:linkedin.com/in USA', # Add more
f'intitle:"{profession}" site:linkedin.com/in' # Add more
]
Rate Limiting
Adjust delays between requests:
# In process_profession
await asyncio.sleep(2) # Delay between profiles (default: 2 seconds)
# In search_multiple_professions
await asyncio.sleep(3) # Delay between professions (default: 3 seconds)
Customize Data Fields
Include More LinkedIn Data
Modify params in get_linkedin_data:
params = {
"linkedin_url": linkedin_url,
"include_skills": "true",
"include_certifications": "true", # Changed to true
"include_publications": "true", # Changed to true
"include_honors": "true", # Changed to true
"include_volunteers": "true", # Changed to true
"include_projects": "true", # Changed to true
}
Customize Excel Columns
Edit the save_to_excel method:
row = {
'Name': result.get('full_name', ''),
'LinkedIn Profile': result.get('linkedin_url', ''),
# Add custom columns
'Years of Experience': calculate_experience(result),
'Education Level': get_highest_degree(result),
'Custom Field': result.get('custom_data', '')
}
Change Search Location
Modify the Serper API payload:
payload = {
'q': query,
'num': num_results,
'hl': 'en',
'gl': 'us', # Change country code (us, uk, ca, au, etc.)
'location': 'San Francisco, CA' # Add specific location
}
🏗️ Architecture
Component Diagram
┌─────────────────────────────────────────────┐
│ MCP Server Interface │
│ • search_profession_profiles │
│ • search_multiple_professions │
│ • export_to_excel │
│ • get_current_results │
│ • clear_results │
└──────────────────┬──────────────────────────┘
│
▼
┌─────────────────────────────────────────────┐
│ LinkedInScraper Class │
│ • Profile discovery │
│ • Data extraction │
│ • Result management │
└──────────────────┬──────────────────────────┘
│
┌─────────┴─────────┐
│ │
▼ ▼
┌─────────────────┐ ┌──────────────────┐
│ Serper API │ │ RapidAPI │
│ (Google Search)│ │ (LinkedIn Data) │
└─────────────────┘ └──────────────────┘
│ │
└─────────┬─────────┘
│
▼
┌─────────────────┐
│ Data Processing │
│ • Extraction │
│ • Validation │
│ • Formatting │
└────────┬─────────┘
│
▼
┌─────────────────┐
│ Excel Export │
│ (Pandas/OpenPyXL)│
└──────────────────┘
Data Flow
1. User Request
↓
2. Search Google (Serper API)
• Multiple search queries
• Extract LinkedIn URLs
↓
3. Validate URLs
• Check format
• Remove duplicates
↓
4. Fetch Profile Data (RapidAPI)
• Get detailed profile info
• Rate limiting (2s delay)
↓
5. Extract Contact Info
• Parse JSON response
• Extract email patterns
• Structure data
↓
6. Store Results
• Add to results list
• Track metadata
↓
7. Export to Excel
• Format as DataFrame
• Save to .xlsx file
🎯 Use Cases
1. Recruitment & Talent Acquisition
# Find potential candidates
await search_multiple_professions(
professions="Senior Python Developer,Machine Learning Engineer,Data Scientist",
max_profiles_per_profession=20
)
await export_to_excel("tech_candidates.xlsx")
2. Sales & Lead Generation
# Find decision makers
await search_multiple_professions(
professions="CTO,VP Engineering,Head of AI",
max_profiles_per_profession=15
)
await export_to_excel("tech_leads.xlsx")
3. Market Research
# Research industry professionals
await search_profession_profiles(
profession="Blockchain Developer",
max_profiles=50
)
await export_to_excel("blockchain_market_research.xlsx")
4. Networking & Partnership
# Find potential collaborators
await search_multiple_professions(
professions="Startup Founder,Angel Investor,Venture Capitalist",
max_profiles_per_profession=10
)
await export_to_excel("potential_partners.xlsx")
5. Competitive Intelligence
# Research competitors' teams
await search_profession_profiles(
profession="AI Engineer at OpenAI",
max_profiles=30
)
await export_to_excel("competitor_analysis.xlsx")
🛠️ Troubleshooting
API Key Issues
Error: "RAPIDAPI_KEY is not set"
# Check your .env file exists
ls -la .env
# Verify contents
cat .env
# Ensure no extra spaces
RAPIDAPI_KEY=your_key_here # ✓ Correct
RAPIDAPI_KEY = your_key_here # ✗ Wrong (spaces)
Error: "Invalid API key"
- Verify key is correct (copy-paste from dashboard)
- Check API subscription is active
- Ensure free tier limits not exceeded
Rate Limiting
Error: "429 Too Many Requests"
# Increase delays
await asyncio.sleep(5) # Instead of 2
# Reduce batch size
max_profiles=5 # Instead of 10
No Results Found
Issue: Search returns empty list
# Try different search terms
"AI Engineer" → "Artificial Intelligence Engineer"
"HR Manager" → "Human Resources Manager"
# Be more specific
"Developer" → "Senior Python Developer"
# Add location
"Data Scientist in San Francisco"
Email Not Found
Issue: Most profiles show "Not Available"
This is expected! LinkedIn doesn't publicly display emails for most users. Emails are only extracted when:
- User includes email in their "About" section
- Email is in headline or summary
- Profile has contact information visible
Workaround:
# Use other contact methods
- LinkedIn messaging
- Company website contact forms
- Twitter/social media handles
- GitHub profile (for developers)
Excel Export Issues
Error: "No module named 'openpyxl'"
pip install openpyxl
Error: "Permission denied"
# File is open in Excel - close it first
# Or use different filename
await export_to_excel("contacts_v2.xlsx")
Memory Issues
Error: "MemoryError" with large batches
# Process in smaller batches
for i in range(0, 100, 10):
await search_profession_profiles(
profession="Engineer",
max_profiles=10
)
await export_to_excel(f"batch_{i}.xlsx")
await clear_results() # Free memory
🔒 Privacy & Legal Considerations
Important Notes
⚠️ Use Responsibly: This tool is for legitimate business purposes only
Legal Compliance
- ✅ Respect LinkedIn ToS: Review LinkedIn's terms of service
- ✅ GDPR Compliance: If targeting EU profiles, ensure GDPR compliance
- ✅ CAN-SPAM: Follow email marketing laws if using for outreach
- ✅ Data Protection: Store collected data securely
- ✅ Rate Limiting: Respect API rate limits
Best Practices
- Obtain Consent: Get consent before sending marketing emails
- Data Security: Encrypt stored data
- Limited Retention: Delete data when no longer needed
- Transparency: Be clear about data collection
- Opt-Out: Provide easy opt-out mechanisms
Prohibited Uses
- ❌ Spam or unsolicited marketing
- ❌ Data selling without consent
- ❌ Harassment or stalking
- ❌ Violating LinkedIn ToS
- ❌ Impersonation or fraud
📊 Performance Metrics
Speed Benchmarks
- Search Time: ~5-10 seconds per search query
- Profile Fetch: ~2-3 seconds per profile
- Total Time: ~30-40 seconds for 10 profiles
- Batch Processing: ~5-7 minutes for 100 profiles
API Limits
Serper API (Free Tier)
- 2,500 searches per month
- ~25-30 searches per profession
- Can find ~80-100 professions per month
RapidAPI (Basic Plan)
- 500 requests per month
- 1 request per profile
- Can scrape ~500 profiles per month
Optimization Tips
- Batch Processing: Group similar professions
- Cache Results: Avoid re-scraping same profiles
- Parallel Processing: Use asyncio effectively
- Smart Queries: Use targeted search terms
🤝 Contributing
Contributions welcome! Areas for improvement:
Feature Ideas
- Add support for company page scraping
- Implement profile scoring/ranking
- Add email verification service
- Create web dashboard interface
- Add CSV export option
- Implement database storage (SQLite/PostgreSQL)
- Add profile deduplication
- Create Docker container
- Add webhook notifications
- Implement retry logic with exponential backoff
Development Setup
# Clone repository
git clone https://github.com/yourusername/linkedin-scraper-mcp.git
# Create virtual environment
python -m venv venv
source venv/bin/activate # Linux/Mac
# or
venv\Scripts\activate # Windows
# Install dependencies
pip install -r requirements-dev.txt
# Run tests
pytest tests/
# Submit PR
📝 License
This project is licensed under the MIT License - see LICENSE file for details.
🙏 Acknowledgments
- FastMCP: Model Context Protocol server framework
- RapidAPI: API marketplace and Fresh LinkedIn Profile Data API
- Serper: Google search API service
- Pandas: Data manipulation and analysis
- httpx: Modern HTTP client
📞 Support
Getting Help
- Issues: Report bugs via GitHub issues
- Discussions: Ask questions in GitHub discussions
- Documentation: Check MCP documentation at modelcontextprotocol.io
Contact
- Email: your.email@example.com
- Twitter: @yourusername
- LinkedIn: linkedin.com/in/yourprofile
🗺️ Roadmap
Q1 2025
- Basic profile scraping
- Excel export
- Multi-profession search
- Email verification integration
- Profile scoring system
Q2 2025
- Web dashboard
- PostgreSQL integration
- Advanced filtering
- API authentication
- Webhook support
Q3 2025
- Company page scraping
- Job posting extraction
- Skills gap analysis
- Chrome extension
- Mobile app
Q4 2025
- AI-powered matching
- CRM integrations
- Analytics dashboard
- Team collaboration features
- Enterprise features
💡 Tips & Best Practices
Search Optimization
- Be Specific: "Senior Backend Engineer Python" vs "Engineer"
- Use Titles: Actual job titles work best
- Location: Add location for targeted results
- Industry: Include industry keywords
- Company: Search by company name for specificity
Data Quality
- Verify Emails: Use email verification service
- Cross-Reference: Check multiple sources
- Update Regularly: LinkedIn profiles change
- Manual Review: Verify important contacts manually
API Management
- Monitor Usage: Track API call counts
- Upgrade Plans: Upgrade when hitting limits
- Cache Results: Store results to avoid re-fetching
- Error Handling: Implement robust error handling
Export Best Practices
- Regular Exports: Export data frequently
- Descriptive Names: Use clear filenames
- Backup Data: Keep multiple copies
- Version Control: Track data versions
Made with ❤️ for professional networking and recruitment
Star ⭐ this repository if you find it useful!
📸 Screenshots
Sample Excel Output
| Name | LinkedIn Profile | Email | Current Position |
|---------------|---------------------------|--------------------|-------------------------|
| John Doe | linkedin.com/in/johndoe | john@example.com | Senior AI Engineer |
| Jane Smith | linkedin.com/in/janesmith | Not Available | Machine Learning Lead |
| Bob Johnson | linkedin.com/in/bobjohnson| bob@company.com | Data Scientist |
Summary Output
{
"total_profiles": 25,
"professions": {
"AI Engineer": 10,
"Data Scientist": 8,
"ML Engineer": 7
},
"profiles_with_email": 6,
"profiles_with_phone": 2
}
🔐 Security Best Practices
- Never Commit API Keys: Use .env and .gitignore
- Rotate Keys: Change API keys periodically
- Limit Access: Restrict who has API credentials
- Monitor Usage: Watch for unusual activity
- Secure Storage: Encrypt exported data
- Access Control: Implement user authentication
- Audit Logs: Track all API usage
- Data Retention: Delete old data regularly
Disclaimer: This tool is for educational and legitimate business purposes only. Always comply with LinkedIn's Terms of Service, applicable laws, and regulations. The authors are not responsible for misuse of this tool.