langextract-mcp-server

jsagir/langextract-mcp-server

3.2

If you are the rightful owner of langextract-mcp-server and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.

LangExtract MCP Server is designed for structured data extraction using LangExtract and Gemini AI, offering a suite of tools for efficient data processing.

Tools
8
Resources
0
Prompts
0

🧠 Mindrian LangExtract MCP

Transform Research into Living Knowledge Graphs

Comprehensive research context extraction powered by LangExtract and Gemini AI

MIT License Python 3.8+ FastMCP Mindrian

🚀 Quick Start📖 Documentation🎓 Examples🛠️ Tools💬 Community


🌟 What is This?

Mindrian LangExtract MCP is not just another extraction tool—it's a cognitive system for transforming unstructured research into structured, queryable knowledge graphs.

While typical extraction tools give you entities, Mindrian LangExtract gives you:

  • Complete Context - Every extraction preserves its source text and surrounding context
  • 🔗 Living Relationships - Automatic linking of related concepts, methods, and constraints
  • 🎯 Implicit Intelligence - Surfaces unstated assumptions and hidden requirements
  • 📊 Graph-Ready Output - 30-column CSV schema designed for knowledge graphs
  • 🧬 Research DNA - 10 comprehensive categories capturing research essence

🎯 Why Mindrian?

The Mindrian Methodology Connection

This tool is specifically engineered for the Mindrian research framework—a systematic approach to innovation validation and opportunity discovery. Here's why this integration is revolutionary:

┌─────────────────────────────────────────────────────────────────┐
│                    THE MINDRIAN ADVANTAGE                       │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  Traditional Research Tools        →    Mindrian LangExtract   │
│  ═══════════════════════             ══════════════════════     │
│                                                                 │
│  📝 Extract entities               →    🧠 Capture reasoning    │
│  📚 Save citations                 →    🔗 Build relationship   │
│  📊 Count frequencies              →    💡 Surface patterns     │
│  🗂️  Categorize topics             →    🎯 Discover gaps        │
│  📄 Generate summaries             →    🚀 Enable innovation    │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

🧬 Mindrian's Four Integrated Capabilities

Mindrian transforms organizations into self-evolving innovation systems through four capabilities:

  1. 📋 PWS Methodology - Systematic validation (Is it Real? Can We Win? Is it Worth It?)
  2. 🧠 Mindrian Infrastructure - Agentic cognitive system capturing validation reasoning
  3. 🏦 Bank of Opportunities - Living marketplace of validated problems
  4. 🤝 Alumni Network - Perpetual ecosystem of entrepreneurs and mentors

This LangExtract MCP server is the extraction engine for Capability #2 - the cognitive infrastructure that captures, structures, and activates research intelligence across cohorts.

🎓 Why Research Extraction Matters for Innovation

Without Structured Extraction:

  • ❌ Knowledge locked in papers
  • ❌ Assumptions remain implicit
  • ❌ Constraints undocumented
  • ❌ Relationships hidden
  • ❌ Each cohort starts from zero

With Mindrian LangExtract:

  • ✅ Knowledge becomes queryable
  • ✅ Implicit info surfaced
  • ✅ All constraints captured
  • ✅ Relationships mapped
  • ✅ Intelligence compounds forever

🚀 The Compounding Intelligence Effect

Year 1: 30 validations × 50 extractions = 1,500 insights captured
Year 2: +45 validations × 60 extractions = 4,200 insights (↑ 15% efficiency)
Year 3: +70 validations × 75 extractions = 9,450 insights (↑ 30% efficiency)
Year 5: Unbridgeable intelligence moat = Permanent competitive advantage

Every extraction feeds the Mindrian cognitive system, making future validations faster, smarter, and more accurate. The system gets smarter forever.


✨ Key Features

FeatureDescriptionImpact
🎯 10 Research CategoriesDomains, Methods, Constraints, Citations, Resources, Problems, Requirements, Trade-offs, Relationships, SolutionsComprehensive context capture
📊 30-Column CSV SchemaComplete metadata with relationships, citations, constraintsKnowledge graph ready
🔗 Automatic Relationship LinkingConnects related extractions by IDBuild cognitive networks
💡 Implicit Information ExtractionSurfaces unstated assumptionsDiscover hidden requirements
📚 Full Bibliographic DataDOIs, authors, years, typesCitation network analysis
🎨 Context PreservationOriginal text spans maintainedValidate and audit extractions
🔄 Multi-Pass Extraction1-5 passes for thoroughnessCatch everything
Parallel ProcessingUp to 50 workersFast at scale
📈 Multiple Export FormatsCSV, JSONL, HTMLFlexible integration
🤖 Two AI ModelsGemini 2.5 Flash & ProBalance speed/accuracy

🏗️ Architecture

┌────────────────────────────────────────────────────────────────┐
│                      MINDRIAN LANGEXTRACT                      │
│                    Cognitive Extraction Engine                 │
└────────────────────┬───────────────────────────────────────────┘
                     │
    ┌────────────────┼────────────────┐
    │                │                │
    ▼                ▼                ▼
┌─────────┐    ┌──────────┐    ┌──────────┐
│ Claude  │    │  FastMCP │    │  Python  │
│ Desktop │◄──►│  Server  │◄──►│   APIs   │
└─────────┘    └──────────┘    └──────────┘
                     │
         ┌───────────┼───────────┐
         │           │           │
         ▼           ▼           ▼
    ┌────────┐  ┌────────┐  ┌────────┐
    │Research│  │ Gemini │  │ Output │
    │Examples│  │  API   │  │ Files  │
    └────────┘  └────────┘  └────────┘
         │           │           │
         └───────────┼───────────┘
                     │
         ┌───────────▼───────────┐
         │                       │
    ┌────────┐              ┌────────┐
    │  CSV   │              │ Neo4j  │
    │ 30-col │              │ Graph  │
    └────────┘              └────────┘

🚀 Quick Start

Prerequisites

✓ Python 3.8+
✓ Gemini API Key (free tier: 15 RPM, 1M tokens/day)
✓ Claude Desktop (optional)
✓ FastMCP Cloud account

1️⃣ Get Gemini API Key

  1. Visit Google AI Studio
  2. Click "Create API key"
  3. Copy and save securely

2️⃣ Deploy on FastMCP Cloud

# Option A: Deploy from GitHub (Recommended)
1. Go to https://fastmcp.com/dashboard
2. Sign in with GitHub
3. Click "Create New Project"
4. Select: jsagir/langextract-mcp-server
5. Configure:
   - Server File: server.py
   - Environment Variable:
     Name: LANGEXTRACT_API_KEY
     Value: [your Gemini API key]
6. Click "Deploy"
7. Copy URL: https://your-project.fastmcp.cloud

3️⃣ Configure Claude Desktop

Location:

  • Windows: %APPDATA%\Claude\claude_desktop_config.json
  • Mac: ~/Library/Application Support/Claude/claude_desktop_config.json
  • Linux: ~/.config/Claude/claude_desktop_config.json

Add:

{
  "mcpServers": {
    "mindrian-langextract": {
      "url": "https://your-project.fastmcp.cloud"
    }
  }
}

Restart Claude Desktop

4️⃣ Test Installation

In Claude Desktop:

Test Mindrian LangExtract:

Extract research context from:
"Jensen & Sigmund (2011) introduced topology optimization for photonics 
using density-based methods, requiring minimum linewidth ≥100nm for TSMC."

Use extract_research_context

Expected: 8-12 extractions across 4 categories with relationships! 🎉


🛠️ 11 Powerful Tools

ToolPurposeUse When
extract_research_contextResearch extraction (built-in examples)Primary tool for papers
📊 export_to_research_csvExport to 30-column schemaAfter extraction
📚 get_research_examplesView training examplesLearning the format
🔧 extract_structured_dataCustom extractionDomain-specific needs
🌐 extract_from_urlExtract from URLsOnline papers/docs
💾 save_results_to_jsonlJSONL exportLangExtract format
🎨 generate_visualizationInteractive HTMLVisual inspection
📋 list_stored_resultsList all resultsSession management
🔍 get_extraction_detailsFull result detailsDeep inspection
📝 create_example_templateGenerate templatesCustom examples
ℹ️ get_supported_modelsModel infoConfiguration help

📊 The 30-Column CSV Schema

Click to expand full schema

Core Identity

ColumnDescriptionExample
idUnique row ID1, 2, 3...
categoryMain categoryCONSTRAINTS
subcategorySpecific typephysical_constraints
element_nameUnique identifierminimum_linewidth

Relationships

ColumnDescriptionExample
relationship_typeConnection typerequires, enables, causes
relationship_targetLinked IDs2,5,8
related_toLinked namestopology_opt,tsmc

Attributes

ColumnDescriptionExample
attribute_keyAttribute namevalue, type, spec
attribute_valueThe value≥100nm

Evidence & Confidence

ColumnDescriptionExample
evidence_typeEvidence categoryempirical, theoretical
confidence_levelCertaintycertain, high, medium
temporal_markerTime referencecurrent, 2024
impact_scoreImportance (1-10)10

Citations

ColumnDescriptionExample
citation_keyCitation IDJensen2011
citation_urlDOI/URLhttps://doi.org/...
citation_authorsAuthor listJensen,Sigmund
citation_yearYear2011
citation_typeTypejournal_article

Resources

ColumnDescriptionExample
resource_nameResource IDLumerical FDTD
resource_urlURLhttps://lumerical.com
resource_typeTypecommercial_software

Domain Hierarchy

ColumnDescriptionExample
domain_hierarchyFull pathOptics→Photonics→Integrated
domain_levelLevel (1-5)2
parent_domainParentPhotonics
child_domainsChildrenSilicon,III-V
cross_domain_refsOther domainsMath→Optimization

Constraints

ColumnDescriptionExample
constraint_typeCategorygeometric, regulatory
constraint_sourceOriginTSMC foundry
constraint_enforcementHow enforcedautomatic_DRC
constraint_dependenciesRelatedmin_gap,corners

Context Preservation

ColumnDescriptionExample
source_contextOriginal text"requires min linewidth..."
notesQualifiers"Hard constraint. Non-negotiable."

🎓 10 Research Categories Explained

┌───────────────────────────────────────────────────────────┐
│                  RESEARCH KNOWLEDGE GRAPH                 │
├───────────────────────────────────────────────────────────┤
│                                                           │
│     1. DOMAIN_CONTEXT ──────────┐                        │
│        • Field hierarchies       │                        │
│        • Terminology             │                        │
│        • Cross-domain bridges    │                        │
│                                  │                        │
│     2. CURRENT_APPROACHES ───────┤                        │
│        • Methods                 │                        │
│        • Techniques              ├──► 5. RESOURCES        │
│        • Performance             │    • Software          │
│                                  │    • Hardware          │
│     3. CONSTRAINTS ──────────────┤    • Facilities        │
│        • Physical limits         │                        │
│        • Regulatory              │                        │
│        • Economic                │                        │
│                                  │                        │
│     4. CITATIONS ────────────────┘                        │
│        • Papers                  ┌──────────────┐        │
│        • Standards               │ 6. PROBLEMS  │        │
│        • Code                    │ 7. REQUIREMENTS│      │
│        • Datasets                │ 8. TRADE-OFFS│        │
│                                  │ 9. RELATIONSHIPS│     │
│                                  │ 10. SOLUTIONS│        │
│                                  └──────────────┘        │
└───────────────────────────────────────────────────────────┘

Category Deep Dive

1. 🌐 DOMAIN_CONTEXT - Field structure and terminology

Purpose: Capture the intellectual landscape

Extracts:

  • Domain hierarchies (parent/child)
  • Key terminology and concepts
  • Interdisciplinary connections
  • Stakeholder ecosystem
  • Historical evolution

Example:

"Integrated photonics, a subfield of optical engineering..."
→ domain_hierarchy: Optics→Photonics→Integrated
→ parent_domain: Photonics
→ cross_domain_refs: Electronics, Materials Science
2. 🔬 CURRENT_APPROACHES - Methods and techniques

Purpose: Document existing solutions

Extracts:

  • Method classifications
  • Technical implementations
  • Performance profiles (strengths/weaknesses)
  • Use cases and applications

Example:

"Density-based topology optimization converges well..."
→ method: density_based_topology
→ performance: "accurate gradients"
→ citation: Jensen2011
3. ⚠️ CONSTRAINTS - All types of limitations

Purpose: Capture EVERY constraint (most critical category)

8 Subcategories:

  • Physical (manufacturing, materials)
  • Technical (computational, precision)
  • Economic (cost, resources)
  • Regulatory (standards, compliance)
  • Environmental (temperature, humidity)
  • Human (usability, skills)
  • System (interfaces, compatibility)
  • Temporal (deadlines, time windows)

Example:

"Requires minimum linewidth ≥100nm for TSMC fabrication"
→ constraint_type: geometric
→ constraint_source: TSMC foundry
→ constraint_enforcement: automatic_DRC_check
→ impact_score: 10 (critical)
4. 📚 CITATIONS_AND_REFERENCES - Complete bibliography

Purpose: Build citation networks

Extracts:

  • Journal papers (DOI, authors, year)
  • Conference proceedings
  • Standards documents
  • Patents
  • Code repositories
  • Datasets

Example:

"Jensen & Sigmund (2011) introduced..."
→ citation_key: Jensen2011
→ citation_authors: Jensen,Sigmund
→ citation_year: 2011
→ citation_url: https://doi.org/10.1364/OE.19.008451
5. 🛠️ RESOURCES - Required infrastructure

Purpose: Identify what's needed

Extracts:

  • Software tools
  • Hardware equipment
  • Computational resources
  • Facilities
  • Funding sources
  • Materials

Example:

"FDTD simulations using Lumerical require HPC (128+ cores)"
→ software: Lumerical FDTD
→ hardware: HPC_cluster
→ specifications: "128+ cores minimum"
6. ❌ PROBLEM_DEFINITION - Challenges and gaps

Purpose: Articulate what's broken

Extracts:

  • Problem statements
  • Failure modes
  • Gap analyses
  • Impact assessments

Example:

"Current methods fail to guarantee manufacturability"
→ failure_mode: manufacturability_failure
→ impact_score: 10
→ affected_domain: inverse_design
7. ✅ REQUIREMENTS - Solution criteria

Purpose: Define success

Extracts:

  • Functional requirements
  • Performance targets
  • Compatibility needs
  • Success criteria

Example:

"Must enforce physical limits during optimization"
→ requirement_type: functional
→ specification: "real-time constraint enforcement"
→ priority: critical
8. ⚖️ TRADE_OFFS - Competing objectives

Purpose: Document tensions

Extracts:

  • Competing objectives
  • Technical tensions
  • Resource allocation trade-offs

Example:

"Trading computational cost for accuracy"
→ trade_off_type: technical_tension
→ objectives: computational_cost vs simulation_accuracy
9. 🔗 RELATIONSHIPS - Dependencies and connections

Purpose: Map the graph

Extracts:

  • Causal relationships
  • Dependencies
  • Hierarchies
  • Domain bridges
  • Constraint interactions

Example:

minimum_linewidth REQUIRES tsmc_process
method CITES Jensen2011
optimization ENABLES_FROM mathematics
10. 💡 SOLUTION_SPACE - Opportunities

Purpose: Identify potential approaches

Extracts:

  • Proposed solutions
  • Research directions
  • Innovation opportunities

Example:

"Hybrid topology optimization with constraint projection"
→ approach_type: hybrid_method
→ novelty: high
→ feasibility: medium

💡 Usage Examples

Example 1: Basic Research Paper Extraction

Scenario: Extract context from a research paper abstract

Extract research context from this abstract:

"Jensen & Sigmund (2011) introduced topology optimization for integrated 
photonics using density-based methods. The approach converges well due to 
accurate gradients but requires minimum linewidth ≥100nm for TSMC 
fabrication. Current inverse design methods fail to guarantee 
manufacturability constraints, creating a gap between optimized designs 
and fabricable devices."

Use extract_research_context with model gemini-2.5-pro

Output:

✅ 12 extractions found
📊 Categories: CITATIONS (2), CURRENT_APPROACHES (3), CONSTRAINTS (4), 
              PROBLEM_DEFINITION (2), DOMAIN_CONTEXT (1)
🔗 8 relationships linked
⏱️ Processing time: ~45 seconds

Then export:

Export to CSV:
Use export_to_research_csv with result_id: [from above]

Result: research_context.csv with full context, relationships, and citations!


Example 2: Multi-Paper Literature Review

Workflow:

# Step 1: Extract from each paper via Claude
papers = ["paper1_abstract.txt", "paper2_abstract.txt", "paper3_abstract.txt"]

for i, paper in enumerate(papers):
    print(f"Extracting from paper {i+1}...")
    # Use Claude to extract
    # result_id = extract_research_context(paper)
    # export_to_research_csv(result_id, f"paper_{i+1}.csv")

# Step 2: Combine all CSVs
import pandas as pd
import glob

csvs = glob.glob("paper_*.csv")
combined = pd.concat([pd.read_csv(f) for f in csvs], ignore_index=True)

# Step 3: Deduplicate by element_name
combined = combined.drop_duplicates(subset=['element_name', 'category'])

# Step 4: Rebuild relationship IDs
# (relationships now point to combined dataframe)

combined.to_csv("literature_review_complete.csv", index=False)

print(f"✅ Combined {len(csvs)} papers into {len(combined)} unique extractions")

Example 3: Citation Network Analysis

After extraction, analyze:

import pandas as pd
import networkx as nx

# Load CSV
df = pd.read_csv('output/research_context.csv')

# Get citations
citations = df[df['category'] == 'CITATIONS_AND_REFERENCES']

# Count mentions
citation_counts = {}
for _, row in df.iterrows():
    if pd.notna(row['citation_key']):
        key = row['citation_key']
        citation_counts[key] = citation_counts.get(key, 0) + 1

# Build citation network
G = nx.DiGraph()
for _, cite in citations.iterrows():
    G.add_node(cite['citation_key'], 
               authors=cite['citation_authors'],
               year=cite['citation_year'])
    
    # Add edges based on 'related_to'
    if pd.notna(cite['related_to']):
        related = cite['related_to'].split(',')
        for r in related:
            if r in citation_counts:
                G.add_edge(cite['citation_key'], r)

# Analyze
print(f"📊 Citation Network:")
print(f"   Nodes: {G.number_of_nodes()}")
print(f"   Edges: {G.number_of_edges()}")
print(f"   Most cited: {max(citation_counts, key=citation_counts.get)}")

Example 4: Constraint Dependency Analysis

Find coupled constraints:

# Load data
df = pd.read_csv('output/research_context.csv')
constraints = df[df['category'] == 'CONSTRAINTS']

# Build dependency graph
import networkx as nx
G = nx.DiGraph()

for _, row in constraints.iterrows():
    G.add_node(row['element_name'],
               type=row['constraint_type'],
               source=row['constraint_source'],
               impact=row['impact_score'])
    
    if pd.notna(row['constraint_dependencies']):
        deps = [d.strip() for d in row['constraint_dependencies'].split(',')]
        for dep in deps:
            G.add_edge(dep, row['element_name'])

# Find critical paths
critical = [n for n in G.nodes() 
            if G.nodes[n].get('impact', 0) == '10']

print(f"⚠️ Critical Constraints: {critical}")
print(f"🔗 Dependency Network: {G.number_of_edges()} connections")

Example 5: Gap Analysis Report

Generate insights:

def generate_mindrian_gap_report(df):
    """Create Mindrian-style gap analysis"""
    
    report = ["# Mindrian Innovation Gap Analysis\n"]
    
    # 1. Problems Identified
    problems = df[df['subcategory'] == 'failure_modes']
    report.append("## 🔴 Critical Problems\n")
    for _, p in problems.iterrows():
        report.append(f"### {p['element_name']}")
        report.append(f"**Impact Score:** {p['impact_score']}/10")
        report.append(f"**Context:** {p['source_context']}\n")
    
    # 2. Current Gaps
    gaps = df[df['subcategory'] == 'gap_analysis']
    report.append("## 📊 Identified Gaps\n")
    for _, g in gaps.iterrows():
        report.append(f"- **{g['element_name']}**: {g['attribute_value']}")
    
    # 3. Requirements for Solution
    requirements = df[df['category'] == 'REQUIREMENTS']
    report.append("\n## ✅ Solution Requirements\n")
    for _, r in requirements.iterrows():
        report.append(f"- {r['element_name']}: {r['attribute_value']}")
    
    # 4. Critical Constraints
    constraints = df[
        (df['category'] == 'CONSTRAINTS') &
        (df['impact_score'].astype(str) == '10')
    ]
    report.append("\n## ⚠️ Hard Constraints\n")
    for _, c in constraints.iterrows():
        report.append(f"- **{c['element_name']}** ({c['constraint_type']})")
        report.append(f"  - Value: {c['attribute_value']}")
        report.append(f"  - Source: {c['constraint_source']}")
    
    # 5. Innovation Opportunities
    solutions = df[df['category'] == 'SOLUTION_SPACE']
    if not solutions.empty:
        report.append("\n## 💡 Opportunity Spaces\n")
        for _, s in solutions.iterrows():
            report.append(f"- {s['element_name']}: {s['source_context']}")
    
    return '\n'.join(report)

# Generate report
report = generate_mindrian_gap_report(df)
with open('mindrian_gap_analysis.md', 'w') as f:
    f.write(report)

print("✅ Mindrian gap analysis saved!")

⚙️ Configuration & Best Practices

Model Selection

Use CaseModelPassesWorkersBufferWhy
📄 Research Papersgemini-2.5-pro53010000Best context understanding
📚 Technical Docsgemini-2.5-pro3-4208000Balance accuracy/speed
⚡ Quick Extractgemini-2.5-flash2158000Fast iteration
🏭 High Volumegemini-2.5-flash1105000Production scale

Best Practices

1️⃣ Text Preparation
  • ✅ Keep complete sentences together
  • ✅ Preserve citation formats exactly
  • ✅ Include section headers
  • ❌ Don't pre-clean aggressively
2️⃣ Example Selection
  • ✅ Use 5-10 examples minimum
  • ✅ Cover all categories you want
  • ✅ Show interconnected extractions
  • ✅ Include implicit information examples
3️⃣ Extraction Strategy
  • ✅ Start with extract_research_context (built-in examples)
  • ✅ Use 5 passes for research papers
  • ✅ Use gemini-2.5-pro for accuracy
  • ✅ Always export to CSV for analysis
4️⃣ Post-Processing
  • ✅ Validate relationship links
  • ✅ Check for duplicate element_names
  • ✅ Verify citation completeness
  • ✅ Review high-impact items manually
5️⃣ Iterative Refinement
Extract → Review → Refine Examples → Re-Extract → Validate

🔧 Troubleshooting

Server not responding?
  • Check FastMCP Cloud logs
  • Verify LANGEXTRACT_API_KEY is set
  • Ensure server file is server.py
  • Try redeploying the project
Claude can't connect?
  • Verify URL in config starts with https://
  • Check JSON syntax is valid (use JSONLint)
  • Restart Claude Desktop completely
  • Wait 10-20 seconds after restart
Low extraction count?

Solutions:

  1. Increase extraction_passes to 5
  2. Switch to gemini-2.5-pro
  3. Verify text has clear sentence boundaries
  4. Check examples are relevant to domain
  5. Increase max_char_buffer to 10000
Missing relationships?

Fix:

  1. Ensure examples show related_to attribute
  2. Always use export_to_research_csv (performs ID resolution)
  3. Check relationship_target field in CSV
Source context truncated?

Fix:

  1. Set max_char_buffer to 10000
  2. Keep paragraphs together in input
  3. Don't fragment sentences
CSV export fails?

Solutions:

  1. Verify pandas installed: pip install pandas>=2.0.0
  2. Check output/ directory exists
  3. Verify disk space available
  4. Use list_stored_results to check result_id

📚 Advanced Topics

Integration with Neo4j

from neo4j import GraphDatabase
import pandas as pd

# Load CSV
df = pd.read_csv('output/research_context.csv')

# Connect to Neo4j
driver = GraphDatabase.driver("neo4j://localhost:7687", 
                              auth=("neo4j", "password"))

def create_knowledge_graph(tx, df):
    # Create nodes
    for _, row in df.iterrows():
        tx.run("""
            CREATE (n:Entity {
                id: $id,
                name: $name,
                category: $category,
                context: $context
            })
        """, id=row['id'], 
             name=row['element_name'],
             category=row['category'],
             context=row['source_context'])
    
    # Create relationships
    for _, row in df.iterrows():
        if pd.notna(row['relationship_target']):
            targets = row['relationship_target'].split(',')
            for target in targets:
                tx.run("""
                    MATCH (a {id: $source_id})
                    MATCH (b {id: $target_id})
                    CREATE (a)-[:RELATES_TO {
                        type: $rel_type
                    }]->(b)
                """, source_id=row['id'],
                     target_id=int(target),
                     rel_type=row['relationship_type'] or 'RELATES_TO')

with driver.session() as session:
    session.write_transaction(create_knowledge_graph, df)

print("✅ Knowledge graph created in Neo4j!")

Export to BibTeX

def export_bibtex(df, output_file='references.bib'):
    """Export citations to BibTeX format"""
    
    citations = df[df['category'] == 'CITATIONS_AND_REFERENCES']
    entries = []
    
    for _, cite in citations.iterrows():
        if cite['citation_type'] == 'journal_article':
            entry = f"""@article{{{cite['citation_key']},
    author = {{{cite['citation_authors'].replace(',', ' and ')}}},
    year = {{{cite['citation_year']}}},
    title = {{{cite['element_name']}}},
    doi = {{{cite['citation_url'].replace('https://doi.org/', '')}}}
}}"""
            entries.append(entry)
    
    with open(output_file, 'w') as f:
        f.write('\n\n'.join(entries))
    
    print(f"✅ Exported {len(entries)} citations to {output_file}")

export_bibtex(df)

🎯 Mindrian Success Metrics

System Performance

📊 Extraction Quality
├─ Context Preservation: >95% of extractions with source
├─ Relationship Coverage: >50% of extractions linked
├─ Category Coverage: 8-10 categories per research text
└─ Implicit Information: ~25% unstated assumptions surfaced

⚡ Processing Speed
├─ Short text (1 para): ~30 seconds
├─ Medium text (2-3 para): ~60 seconds
├─ Long text (5+ para): ~120 seconds
└─ Throughput: ~500 extractions/hour

🎯 Accuracy (with gemini-2.5-pro, 5 passes)
├─ Citation extraction: >98%
├─ Constraint capture: >90%
├─ Relationship accuracy: >85%
└─ Context preservation: >95%

Mindrian Intelligence Compounding

Year 1: 30 validations × 50 extractions = 1,500 insights
Year 2: +45 validations × 60 extractions = 4,200 insights (+15% efficiency)
Year 3: +70 validations × 75 extractions = 9,450 insights (+30% efficiency)
Year 5: Intelligence gap becomes unbridgeable = Permanent moat

📖 API Reference

Tool Signatures

# Primary Research Extraction
extract_research_context(
    text: str,
    model_id: str = "gemini-2.5-pro",
    extraction_passes: int = 5,
    max_workers: int = 30,
    api_key: Optional[str] = None
) -> Dict[str, Any]

# CSV Export
export_to_research_csv(
    result_id: str,
    output_name: str = "research_context.csv"
) -> Dict[str, Any]

# Get Examples
get_research_examples() -> Dict[str, Any]

# General Extraction
extract_structured_data(
    text: str,
    prompt_description: str,
    examples: List[Dict[str, Any]],
    model_id: str = "gemini-2.5-flash",
    extraction_passes: int = 1,
    max_workers: int = 10,
    max_char_buffer: int = 8000,
    api_key: Optional[str] = None
) -> Dict[str, Any]

# URL Extraction
extract_from_url(
    url: str,
    prompt_description: str,
    examples: List[Dict[str, Any]],
    model_id: str = "gemini-2.5-flash",
    extraction_passes: int = 2,
    max_workers: int = 20
) -> Dict[str, Any]

# Result Management
list_stored_results() -> Dict[str, Any]
get_extraction_details(result_id: str) -> Dict[str, Any]

# Utilities
save_results_to_jsonl(
    result_id: str,
    output_name: str = "extraction_results.jsonl"
) -> Dict[str, Any]

generate_visualization(
    result_id: str,
    output_name: str = "visualization.html"
) -> Dict[str, Any]

create_example_template(
    extraction_classes: List[str]
) -> Dict[str, Any]

get_supported_models() -> Dict[str, Any]

🤝 Community & Support

Resources

Contributing

We welcome contributions! Please:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing)
  3. Commit changes (git commit -m 'Add amazing feature')
  4. Push to branch (git push origin feature/amazing)
  5. Open a Pull Request

Citation

If you use Mindrian LangExtract in research:

@software{mindrian_langextract_2025,
  title={Mindrian LangExtract MCP: Research Context Extraction},
  author={Mindrian Team},
  year={2025},
  url={https://github.com/jsagir/langextract-mcp-server}
}

📄 License

MIT License - see file


🙏 Acknowledgments

  • LangExtract by Google Research
  • FastMCP by the MCP community
  • Gemini AI by Google AI
  • Mindrian methodology by the Mindrian team
  • All contributors and early adopters

🚀 Ready to Transform Research into Intelligence?

Get Started NowView ExamplesRead Docs


Built with ❤️ for the Mindrian ecosystem

Transforming research into living knowledge graphs, one extraction at a time.