mcp-data-science

chokukil/mcp-data-science

3.2

If you are the rightful owner of mcp-data-science and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.

The Advanced Data Science MCP Server offers comprehensive data science capabilities and intelligent analytics, supporting enterprise-level data analysis through state-of-the-art machine learning and deep learning algorithms.

Tools
4
Resources
0
Prompts
0

Advanced Data Science MCP Server

ํฌ๊ด„์ ์ธ ๋ฐ์ดํ„ฐ ์‚ฌ์ด์–ธ์Šค ๊ธฐ๋Šฅ๊ณผ ์ง€๋Šฅํ˜• ๋ถ„์„์„ ์ œ๊ณตํ•˜๋Š” ๊ณ ๋„ํ™”๋œ MCP (Model Context Protocol) ์„œ๋ฒ„์ž…๋‹ˆ๋‹ค. ์ตœ์‹  ๋จธ์‹ ๋Ÿฌ๋‹/๋”ฅ๋Ÿฌ๋‹ ์•Œ๊ณ ๋ฆฌ์ฆ˜, AutoML, ์ง€๋Šฅํ˜• ๋ฌธ์ œ ์œ ํ˜• ๊ฐ์ง€, ์ ์‘์  ์ƒ˜ํ”Œ๋ง, ์ธํ„ฐ๋ž™ํ‹ฐ๋ธŒ ์‹œ๊ฐํ™”๋ฅผ ํ†ตํ•ด ์—”ํ„ฐํ”„๋ผ์ด์ฆˆ๊ธ‰ ๋ฐ์ดํ„ฐ ๋ถ„์„์„ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค.

๐Ÿš€ ์ฃผ์š” ๊ธฐ๋Šฅ

๐Ÿง  ์ง€๋Šฅํ˜• ๋ฌธ์ œ ์œ ํ˜• ์ž๋™ ๊ฐ์ง€

  • ๋ฐ์ดํ„ฐ ํŠน์„ฑ ๊ธฐ๋ฐ˜ ์ž๋™ ๋ฌธ์ œ ์œ ํ˜• ํŒ๋ณ„ (๋ถ„๋ฅ˜/ํšŒ๊ท€/ํด๋Ÿฌ์Šคํ„ฐ๋ง/์‹œ๊ณ„์—ด/์ด๋ฏธ์ง€/ํ…์ŠคํŠธ)
  • ์ง€๋„/๋น„์ง€๋„ ํ•™์Šต ์ž๋™ ์ถ”์ฒœ
  • ํƒ€๊ฒŸ ์ปฌ๋Ÿผ ์ž๋™ ์ถ”์ฒœ ๋ฐ ์ ํ•ฉ์„ฑ ํ‰๊ฐ€
  • ๋ฐ์ดํ„ฐ ํฌ๊ธฐ๋ณ„ ์ตœ์  ๋ถ„์„ ์ „๋žต ์ž๋™ ์„ ํƒ

๐Ÿค– ๊ณ ๊ธ‰ AutoML & ๋จธ์‹ ๋Ÿฌ๋‹

  • ์ตœ์‹  ๋ถ€์ŠคํŒ… ์•Œ๊ณ ๋ฆฌ์ฆ˜: XGBoost, LightGBM, CatBoost
  • AutoKeras ํ†ตํ•ฉ: ์ž๋™ ์‹ ๊ฒฝ๋ง ์•„ํ‚คํ…์ฒ˜ ํƒ์ƒ‰ (NAS)
  • Gaussian Process: ๋ถˆํ™•์‹ค์„ฑ ์ •๋Ÿ‰ํ™” ๊ธฐ๋Šฅ
  • ๊ณ ๊ธ‰ ํด๋Ÿฌ์Šคํ„ฐ๋ง: HDBSCAN, Spectral Clustering, OPTICS
  • ์ด์ƒ์น˜ ํƒ์ง€: Isolation Forest, One-Class SVM, Elliptic Envelope
  • ๋ถˆ๊ท ํ˜• ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ: SMOTE, ADASYN, ์–ธ๋”์ƒ˜ํ”Œ๋ง

๐Ÿ“Š ์ ์‘์  ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ ๋ฐ ์ƒ˜ํ”Œ๋ง

  • ์ง€๋Šฅํ˜• ์ƒ˜ํ”Œ๋ง: 100MB ์ด์ƒ ํŒŒ์ผ์— ๋Œ€ํ•œ ์ž๋™ ์ƒ˜ํ”Œ๋ง
  • ์ƒ˜ํ”Œ๋ง ๋ฐฉ๋ฒ•: ์ธตํ™”ํ‘œ์ง‘, ์ฒด๊ณ„ํ‘œ์ง‘, ๋ฌด์ž‘์œ„ํ‘œ์ง‘
  • ํ’ˆ์งˆ ๊ฒ€์ฆ: ์ƒ˜ํ”Œ๊ณผ ์›๋ณธ ๋ฐ์ดํ„ฐ์˜ ํ†ต๊ณ„์  ์œ ์‚ฌ์„ฑ ์ž๋™ ๊ฒ€์ฆ
  • ๋ฉ”๋ชจ๋ฆฌ ์ตœ์ ํ™”: ๋Œ€์šฉ๋Ÿ‰ ๋ฐ์ดํ„ฐ์˜ ํšจ์œจ์  ์ฒ˜๋ฆฌ

๐ŸŽจ ๊ณ ๊ธ‰ ์‹œ๊ฐํ™” & ์ฐจ์› ์ถ•์†Œ

  • ์ธํ„ฐ๋ž™ํ‹ฐ๋ธŒ ์‹œ๊ฐํ™”: Plotly ๊ธฐ๋ฐ˜ ๋™์  ์ฐจํŠธ
  • ์ฐจ์› ์ถ•์†Œ: t-SNE, UMAP, Isomap, PCA
  • ์ข…ํ•ฉ ๋Œ€์‹œ๋ณด๋“œ: ๋ฐ์ดํ„ฐ ํ’ˆ์งˆ, ์ƒ๊ด€๊ด€๊ณ„, ๋ถ„ํฌ ํ•œ๋ˆˆ์— ๋ณด๊ธฐ
  • ๋‹ค์–‘ํ•œ ํ”Œ๋กฏ: histogram, scatter, boxplot, heatmap, pairplot, violin, distribution

๐Ÿ”ฌ ๋ชจ๋ธ ํ•ด์„ & ์ตœ์ ํ™”

  • SHAP: ๋ธ”๋ž™๋ฐ•์Šค ๋ชจ๋ธ ํ•ด์„ ๋ฐ ํŠน์„ฑ ์ค‘์š”๋„
  • ์‹ค๋ฃจ์—ฃ ๋ถ„์„: ์ตœ์  ํด๋Ÿฌ์Šคํ„ฐ ์ˆ˜ ์ž๋™ ๊ฒฐ์ •
  • ๊ต์ฐจ ๊ฒ€์ฆ: ๋‹ค์–‘ํ•œ ํ‰๊ฐ€ ์ง€ํ‘œ๋กœ ๋ชจ๋ธ ์„ฑ๋Šฅ ๊ฒ€์ฆ
  • ํŠน์„ฑ ์ค‘์š”๋„: ์—ฌ๋Ÿฌ ๋ฐฉ๋ฒ•๋ก ์„ ํ†ตํ•œ ํŠน์„ฑ ๋ถ„์„

๐Ÿ’ป ์™„์ „ํ•œ ์ฝ”๋“œ ์ž๋™ ์ƒ์„ฑ

  • ๋ชจ๋“  ๋ถ„์„ ๊ณผ์ •์˜ ์žฌํ˜„ ๊ฐ€๋Šฅํ•œ Python ์ฝ”๋“œ ์ž๋™ ์ƒ์„ฑ
  • ์ƒ˜ํ”Œ๋ง ๋กœ์ง, ์ „์ฒ˜๋ฆฌ, ๋ชจ๋ธ๋ง ์ „์ฒด ํŒŒ์ดํ”„๋ผ์ธ ์ฝ”๋“œํ™”
  • ํ”„๋กœ๋•์…˜ ๋ฐฐํฌ๋ฅผ ์œ„ํ•œ ๋…๋ฆฝ ์‹คํ–‰ ๊ฐ€๋Šฅ ์Šคํฌ๋ฆฝํŠธ
  • API ์„œ๋น™, ๋ชจ๋ธ ๋ชจ๋‹ˆํ„ฐ๋ง ์ฝ”๋“œ ํ…œํ”Œ๋ฆฟ ์ œ๊ณต

๐Ÿ“ ์ข…ํ•ฉ ๋งˆํฌ๋‹ค์šด ๋ณด๊ณ ์„œ

  • ๋ถ„์„ ๊ณผ์ •๊ณผ ๊ฒฐ๊ณผ์˜ ์ „๋ฌธ์ ์ธ ๋งˆํฌ๋‹ค์šด ๋ณด๊ณ ์„œ ์ž๋™ ์ƒ์„ฑ
  • ์‹œ๊ฐํ™” ์ž„๋ฒ ๋”ฉ, ์ฝ”๋“œ ์„ค๋ช…, ์‹ค์šฉ์  ํ™œ์šฉ ๊ฐ€์ด๋“œ ํฌํ•จ
  • ์ƒ˜ํ”Œ๋ง ์ •๋ณด, ๋ชจ๋ธ ์„ฑ๋Šฅ, ๊ถŒ์žฅ์‚ฌํ•ญ ์ƒ์„ธ ๊ธฐ์ˆ 

๐Ÿ› ๏ธ ๋„๊ตฌ ๋ชฉ๋ก

๐Ÿ“Š ํ•ต์‹ฌ ๋ถ„์„ ๋„๊ตฌ

  • load_dataset: ์ง€๋Šฅํ˜• ๋ฐ์ดํ„ฐ ๋กœ๋”ฉ (์ž๋™ ์ƒ˜ํ”Œ๋ง, ๋ฌธ์ œ ์œ ํ˜• ๊ฐ์ง€, ํƒ€๊ฒŸ ์ถ”์ฒœ)
  • perform_eda: ๊ณ ๊ธ‰ ํƒ์ƒ‰์  ๋ฐ์ดํ„ฐ ๋ถ„์„ (๋Œ€์‹œ๋ณด๋“œ, ์ฐจ์›์ถ•์†Œ, ์ด์ƒ์น˜ ๋ถ„์„)
  • auto_ml_pipeline: ์ง€๋Šฅํ˜• AutoML ํŒŒ์ดํ”„๋ผ์ธ (๊ณ ๊ธ‰ ์•Œ๊ณ ๋ฆฌ์ฆ˜, ์ž๋™ ํ‰๊ฐ€)
  • create_visualization: ๊ณ ๊ธ‰ ์‹œ๊ฐํ™” ์ƒ์„ฑ (๋‹ค์–‘ํ•œ ํ”Œ๋กฏ ํƒ€์ž…, ์ธํ„ฐ๋ž™ํ‹ฐ๋ธŒ)

๐Ÿ“ ํŒŒ์ผ ๊ด€๋ฆฌ ๋„๊ตฌ

  • upload_local_file: ๋กœ์ปฌ ํŒŒ์ผ ์—…๋กœ๋“œ (์ž๋™ ํ˜•์‹ ๊ฒ€์ฆ, ์•ˆ์ „ํ•œ ์ด๋ฆ„ ์ƒ์„ฑ)
  • copy_file_to_sandbox: ํŒŒ์ผ ๋ณต์‚ฌ (์ปค์Šคํ…€ ์ด๋ฆ„ ์ง€์ • ๊ฐ€๋Šฅ)
  • list_uploaded_files: ์—…๋กœ๋“œ๋œ ํŒŒ์ผ ๋ชฉ๋ก (๋ฐ์ดํ„ฐ ๋ฏธ๋ฆฌ๋ณด๊ธฐ ํฌํ•จ)
  • list_available_datasets: ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ ๋ฐ์ดํ„ฐ์…‹ ๋ชฉ๋ก (๋ฉ”ํƒ€๋ฐ์ดํ„ฐ ํฌํ•จ)

๐Ÿ“‹ ์ถ”์  ๋ฐ ๋ณด๊ณ ์„œ ๋„๊ตฌ

  • generate_comprehensive_report: ์ข…ํ•ฉ ๋ถ„์„ ๋ณด๊ณ ์„œ ์ƒ์„ฑ (๋งˆํฌ๋‹ค์šด, ์‹œ๊ฐํ™” ์ž„๋ฒ ๋”ฉ)
  • get_operation_details: ์ž‘์—… ์ƒ์„ธ ์ •๋ณด ์กฐํšŒ (์ž…๋ ฅ/์ถœ๋ ฅ, ์ƒ์„ฑ ํŒŒ์ผ)
  • list_generated_code: ์ƒ์„ฑ๋œ ์ฝ”๋“œ ๋ชฉ๋ก (ํŒŒ์ผ ์ •๋ณด, ์ฝ”๋“œ ํƒ€์ž…)
  • get_upload_instructions: ํŒŒ์ผ ์—…๋กœ๋“œ ๊ฐ€์ด๋“œ (์‚ฌ์šฉ๋ฒ•, ์˜ˆ์‹œ)

โš™๏ธ ์‹œ์Šคํ…œ ๋„๊ตฌ

  • get_environment_info: ํ™˜๊ฒฝ ์ •๋ณด ์กฐํšŒ (ํŒจํ‚ค์ง€ ๊ฐ€์šฉ์„ฑ, ์ง€์› ๊ธฐ๋Šฅ)
  • health_check: ์„œ๋ฒ„ ์ƒํƒœ ํ™•์ธ (์‹œ์Šคํ…œ ์ •๋ณด, ๊ธฐ๋Šฅ ์ƒํƒœ)

๐Ÿ” ์ง€์› ์•Œ๊ณ ๋ฆฌ์ฆ˜ ์ƒ์„ธ

๋ถ„๋ฅ˜ (Classification)

  • ๊ธฐ๋ณธ: Random Forest, Gradient Boosting, Extra Trees, Logistic Regression, SVM, k-NN, Naive Bayes, Decision Tree, MLP
  • ๊ณ ๊ธ‰: XGBoost, LightGBM, CatBoost, Gaussian Process, QDA
  • ํ‰๊ฐ€: Accuracy, Precision, Recall, F1-Score, ROC-AUC

ํšŒ๊ท€ (Regression)

  • ๊ธฐ๋ณธ: Random Forest, Gradient Boosting, Extra Trees, Linear/Ridge/Lasso/ElasticNet, SVR, k-NN, Decision Tree, MLP
  • ๊ณ ๊ธ‰: XGBoost, LightGBM, CatBoost, Gaussian Process
  • ํ‰๊ฐ€: Rยฒ, RMSE, MAE, Cross-validation Score

ํด๋Ÿฌ์Šคํ„ฐ๋ง (Clustering)

  • ๊ธฐ๋ณธ: K-Means, DBSCAN, Agglomerative, Gaussian Mixture
  • ๊ณ ๊ธ‰: HDBSCAN, Spectral, OPTICS, Mean Shift, Affinity Propagation
  • ํ‰๊ฐ€: Silhouette Score, ์ž๋™ ์ตœ์  ํด๋Ÿฌ์Šคํ„ฐ ์ˆ˜ ๊ฒฐ์ •

์ด์ƒ์น˜ ํƒ์ง€ (Anomaly Detection)

  • ์•Œ๊ณ ๋ฆฌ์ฆ˜: Isolation Forest, One-Class SVM, Elliptic Envelope, Local Outlier Factor
  • ํ‰๊ฐ€: ์ด์ƒ์น˜ ๋น„์œจ, ์ด์ƒ์น˜ ์Šค์ฝ”์–ด

๐Ÿ“ˆ ์‹œ๊ฐํ™” ํƒ€์ž…

ํƒ€์ž…์„ค๋ช…ํ•„์ˆ˜ ํŒŒ๋ผ๋ฏธํ„ฐ์„ ํƒ ํŒŒ๋ผ๋ฏธํ„ฐ
histogramํžˆ์Šคํ† ๊ทธ๋žจx_columntitle
scatter์‚ฐ์ ๋„x_column, y_columnhue_column, title
boxplot๋ฐ•์Šคํ”Œ๋กฏx_columny_column, title
heatmap์ƒ๊ด€๊ด€๊ณ„ ํžˆํŠธ๋งต-title
bar๋ฐ”์ฐจํŠธx_columny_column, title
pairplot๋‹ค๋ณ€๋Ÿ‰ ๊ด€๊ณ„ ํ”Œ๋กฏ-hue_column, title
violin๋ฐ”์ด์˜ฌ๋ฆฐ ํ”Œ๋กฏx_column, y_columntitle
distribution๋ถ„ํฌ ๋ถ„์„ (ํžˆ์Šคํ† ๊ทธ๋žจ+KDE+Q-Q)x_columntitle

๐Ÿ“‚ ์ง€์› ํŒŒ์ผ ํ˜•์‹

์ž…๋ ฅ ํ˜•์‹

  • CSV: .csv (๊ฐ€์žฅ ๊ถŒ์žฅ)
  • Excel: .xlsx, .xls
  • JSON: .json
  • Parquet: .parquet
  • ํ…์ŠคํŠธ: .txt, .tsv

์ถœ๋ ฅ ํ˜•์‹

  • ๋ฐ์ดํ„ฐ: CSV
  • ์‹œ๊ฐํ™”: PNG (์ •์ ), HTML (์ธํ„ฐ๋ž™ํ‹ฐ๋ธŒ)
  • ๋ชจ๋ธ: PKL (joblib)
  • ๋ณด๊ณ ์„œ: JSON, Markdown
  • ์ฝ”๋“œ: Python (.py)

๐Ÿš€ ๋น ๋ฅธ ์‹œ์ž‘

Linux/Mac ์‚ฌ์šฉ์ž

# 1. uv ์„ค์น˜ (์—†๋Š” ๊ฒฝ์šฐ)
curl -LsSf https://astral.sh/uv/install.sh | sh
source ~/.bashrc  # ๋˜๋Š” ~/.zshrc

# 2. ํ”„๋กœ์ ํŠธ ์„ค์ •
git clone https://github.com/chokukil/mcp-data-science.git
cd mcp-data-science

# 3. ๊ธฐ๋ณธ ์˜์กด์„ฑ ์„ค์น˜
uv venv --python 3.10
source .venv/bin/activate
uv pip install -e .

# 4. ๊ณ ๊ธ‰ ๊ธฐ๋Šฅ ์„ค์น˜ (์„ ํƒ์ )
uv pip install -e ".[all]"  # ๋ชจ๋“  ๊ณ ๊ธ‰ ๊ธฐ๋Šฅ ํ•œ๋ฒˆ์— ์„ค์น˜

# 5. ์„œ๋ฒ„ ์‹คํ–‰
python mcp_data_science.py --sandbox-dir ./sandbox --port 8007

Windows ์‚ฌ์šฉ์ž

# 1. uv ์„ค์น˜ (์—†๋Š” ๊ฒฝ์šฐ) - PowerShell์„ ๊ด€๋ฆฌ์ž ๊ถŒํ•œ์œผ๋กœ ์‹คํ–‰
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
# PowerShell ์žฌ์‹œ์ž‘

# 2. ํ”„๋กœ์ ํŠธ ์„ค์ •
git clone https://github.com/chokukil/mcp-data-science.git
cd mcp-data-science

# 3. ๊ธฐ๋ณธ ์˜์กด์„ฑ ์„ค์น˜
uv venv --python 3.10
.\.venv\Scripts\activate
uv pip install -e .

# 4. ๊ณ ๊ธ‰ ๊ธฐ๋Šฅ ์„ค์น˜ (์„ ํƒ์ )
uv pip install -e ".[all]"  # ๋ชจ๋“  ๊ณ ๊ธ‰ ๊ธฐ๋Šฅ ํ•œ๋ฒˆ์— ์„ค์น˜

3. ์„œ๋ฒ„ ์‹คํ–‰

# ๊ธฐ๋ณธ ์‹คํ–‰
python mcp_data_science.py

# uv๋ฅผ ์‚ฌ์šฉํ•œ ๊ฒฉ๋ฆฌ ์‹คํ–‰
uv run --isolated mcp_data_science.py

# ์ปค์Šคํ…€ ์„ค์ •
uv run --isolated --sandbox-dir ./my_sandbox --port 8008

# ๋ชจ๋“  ๊ณ ๊ธ‰ ๊ธฐ๋Šฅ๊ณผ ํ•จ๊ป˜ ์‹คํ–‰ (์‹œ๊ฐ„์ด ์ˆ˜ ๋ถ„ ๊ฑธ๋ฆด ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.)
uv run --isolated --extra all mcp_data_science.py

4. ๊ธฐ๋ณธ ์‚ฌ์šฉ๋ฒ•

1. ํŒŒ์ผ ์—…๋กœ๋“œ

upload_result = await upload_local_file('/path/to/your/data.csv')

2. ๋ฐ์ดํ„ฐ ๋กœ๋“œ (์ž๋™ ์ƒ˜ํ”Œ๋ง, ๋ฌธ์ œ ์œ ํ˜• ๊ฐ์ง€)

load_result = await load_dataset(upload_result['file_info']['destination_path']) dataset_id = load_result['dataset_id']

3. ํƒ์ƒ‰์  ๋ฐ์ดํ„ฐ ๋ถ„์„

eda_result = await perform_eda(dataset_id)

4. AutoML ์‹คํ–‰

automl_result = await auto_ml_pipeline( dataset_id=dataset_id, target_column='target', # ๋˜๋Š” None์œผ๋กœ ์ž๋™ ๊ฐ์ง€ include_advanced=True )

5. ์‹œ๊ฐํ™” ์ƒ์„ฑ

viz_result = await create_visualization( dataset_id=dataset_id, plot_type='pairplot', hue_column='category' )

6. ์ข…ํ•ฉ ๋ณด๊ณ ์„œ ์ƒ์„ฑ

report_result = await generate_comprehensive_report()


## ๐Ÿ’ก ๊ณ ๊ธ‰ ์‚ฌ์šฉ ์‹œ๋‚˜๋ฆฌ์˜ค

### ๐ŸŽฏ ๋Œ€์šฉ๋Ÿ‰ ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ

```python
# 500MB CSV ํŒŒ์ผ ์ฒ˜๋ฆฌ
await upload_local_file('/data/large_dataset.csv')
# โ†’ ์ž๋™์œผ๋กœ ์ง€๋Šฅํ˜• ์ƒ˜ํ”Œ๋ง ์ ์šฉ (์ธตํ™”ํ‘œ์ง‘/์ฒด๊ณ„ํ‘œ์ง‘/๋ฌด์ž‘์œ„ํ‘œ์ง‘)

load_result = await load_dataset('large_dataset_20241207_143052.csv')
# ์ถœ๋ ฅ ์˜ˆ์‹œ:
# - ์›๋ณธ: 1,000,000 ํ–‰
# - ์ƒ˜ํ”Œ: 30,000 ํ–‰ (์ธตํ™”ํ‘œ์ง‘)
# - ํ’ˆ์งˆ: Excellent (๋ถ„ํฌ ์ฐจ์ด < 2%)

๐Ÿ”ฌ ๋ฌธ์ œ ์œ ํ˜• ์ž๋™ ๊ฐ์ง€

# ํƒ€๊ฒŸ ์ปฌ๋Ÿผ ์—†์ด ๋กœ๋“œ โ†’ ์ž๋™ ์ถ”์ฒœ
load_result = await load_dataset('mystery_data.csv')
# ์ถœ๋ ฅ:
# - ๊ฐ์ง€๋œ ๋ฌธ์ œ ์œ ํ˜•: classification
# - ์ถ”์ฒœ ํƒ€๊ฒŸ: 'survived' (๋ถ„๋ฅ˜, ์ ํ•ฉ๋„ 85์ )
# - ๋Œ€์•ˆ ํƒ€๊ฒŸ: 'age' (ํšŒ๊ท€, ์ ํ•ฉ๋„ 70์ )

# ์ถ”์ฒœ ํƒ€๊ฒŸ์œผ๋กœ AutoML ์‹คํ–‰
await auto_ml_pipeline('dataset_123', target_column='survived')

๐Ÿงช ๊ณ ๊ธ‰ ์•Œ๊ณ ๋ฆฌ์ฆ˜ ํ™œ์šฉ

# ๊ณ ๊ธ‰ ์•Œ๊ณ ๋ฆฌ์ฆ˜ ํฌํ•จ AutoML
automl_result = await auto_ml_pipeline(
    dataset_id='dataset_123',
    target_column='target',
    include_advanced=True  # XGBoost, LightGBM, CatBoost, Gaussian Process
)

# ๊ฒฐ๊ณผ ์˜ˆ์‹œ:
# 1์œ„: CatBoost (์ •ํ™•๋„: 94.2%)
# 2์œ„: XGBoost (์ •ํ™•๋„: 93.8%)
# 3์œ„: LightGBM (์ •ํ™•๋„: 93.5%)

๐Ÿ“Š ํด๋Ÿฌ์Šคํ„ฐ๋ง ์ž๋™ ์ตœ์ ํ™”

# ๋น„์ง€๋„ ํ•™์Šต - ์ž๋™ ํด๋Ÿฌ์Šคํ„ฐ ์ˆ˜ ๊ฒฐ์ •
await auto_ml_pipeline(
    dataset_id='dataset_no_target',
    target_column=None
)
# โ†’ ์‹ค๋ฃจ์—ฃ ๋ถ„์„์œผ๋กœ ์ตœ์  k=5 ์ž๋™ ๊ฒฐ์ •
# โ†’ HDBSCAN, Spectral Clustering ๋“ฑ ๊ณ ๊ธ‰ ์•Œ๊ณ ๋ฆฌ์ฆ˜ ๋น„๊ต

๐Ÿ“ˆ ์ ์‘์  ์ƒ˜ํ”Œ๋ง ์ „๋žต

ํŒŒ์ผ ํฌ๊ธฐ์ƒ˜ํ”Œ๋ง ๋ฐฉ๋ฒ•๋ชฉํ‘œ ์ƒ˜ํ”Œ ํฌ๊ธฐ๋ฉ”๋ชจ๋ฆฌ ์ ˆ์•ฝ
< 100MB์ƒ˜ํ”Œ๋ง ์•ˆํ•จ์ „์ฒด ๋ฐ์ดํ„ฐ0%
100-500MB์ธตํ™”/๋ฌด์ž‘์œ„50,000 ํ–‰~70%
500MB-1GB์ธตํ™”/์ฒด๊ณ„30,000 ํ–‰~80%
1-2GB์ฒด๊ณ„ ํ‘œ์ง‘20,000 ํ–‰~85%
> 2GB์ฒด๊ณ„ ํ‘œ์ง‘15,000 ํ–‰~90%

์ƒ˜ํ”Œ๋ง ํ’ˆ์งˆ ๋ณด์žฅ

  • ํ†ต๊ณ„์  ์œ ์‚ฌ์„ฑ: ํ‰๊ท ๊ฐ’ ์ฐจ์ด < 5%
  • ๋ถ„ํฌ ๋ณด์กด: ํƒ€๊ฒŸ ๋ณ€์ˆ˜ ๋ถ„ํฌ ์ฐจ์ด < 2%
  • ํ’ˆ์งˆ ๋“ฑ๊ธ‰: Excellent / Good / Fair ์ž๋™ ํ‰๊ฐ€
  • ์ „์ฒด ๊ฒ€์ฆ: ์ƒ์‚ฐ ๋ฐฐํฌ ์ „ ์ „์ฒด ๋ฐ์ดํ„ฐ ๊ฒ€์ฆ ๊ถŒ์žฅ

๐Ÿ—๏ธ ํ”„๋กœ์ ํŠธ ๊ตฌ์กฐ

mcp-data-science/
โ”œโ”€โ”€ mcp_data_science.py          # ๋ฉ”์ธ ์„œ๋ฒ„ ์ฝ”๋“œ (๋‹จ์ผ ํŒŒ์ผ)
โ”œโ”€โ”€ pyproject.toml               # ํ”„๋กœ์ ํŠธ ์„ค์ • ๋ฐ ์˜์กด์„ฑ
โ”œโ”€โ”€ README.md                    # ์ด ๋ฌธ์„œ
โ”œโ”€โ”€ .env                         # ํ™˜๊ฒฝ ๋ณ€์ˆ˜ (์„ ํƒ์ )
โ”‚   โ””โ”€โ”€ OPENAI_API_KEY=...      # LLM ํ–ฅ์ƒ ๊ธฐ๋Šฅ์šฉ
โ””โ”€โ”€ sandbox/                     # ์ž‘์—… ๋””๋ ‰ํ† ๋ฆฌ (์ž๋™ ์ƒ์„ฑ)
    โ”œโ”€โ”€ datasets/                # ์—…๋กœ๋“œ/๋กœ๋“œ๋œ ๋ฐ์ดํ„ฐ์…‹
    โ”œโ”€โ”€ plots/                   # ์ƒ์„ฑ๋œ ์‹œ๊ฐํ™” (PNG, HTML)
    โ”œโ”€โ”€ models/                  # ํ•™์Šต๋œ ๋ชจ๋ธ (PKL)
    โ”œโ”€โ”€ reports/                 # ๋ถ„์„ ๋ณด๊ณ ์„œ (JSON, MD)
    โ”œโ”€โ”€ logs/                    # ์ž‘์—… ๋กœ๊ทธ (JSON)
    โ””โ”€โ”€ generated_code/          # ์ƒ์„ฑ๋œ Python ์ฝ”๋“œ

๐Ÿ”ง ํ™˜๊ฒฝ ์„ค์ •

ํ™˜๊ฒฝ ๋ณ€์ˆ˜ (.env ํŒŒ์ผ)

# OpenAI API ํ‚ค (์„ ํƒ์  - ๋ณด๊ณ ์„œ ํ–ฅ์ƒ์šฉ)
OPENAI_API_KEY=sk-...

# ์„œ๋ฒ„ ์„ค์ • (๊ธฐ๋ณธ๊ฐ’ ์‚ฌ์šฉ ๊ฐ€๋Šฅ)
# MCP_SERVER_PORT=8007
# MCP_SANDBOX_DIR=./sandbox

ํŒจํ‚ค์ง€ ๊ทธ๋ฃน๋ณ„ ์„ค์น˜

# ๊ธฐ๋ณธ ์„ค์น˜ (ํ•„์ˆ˜ ํŒจํ‚ค์ง€๋งŒ)
uv pip install -e .

# ๊ธฐ๋Šฅ๋ณ„ ์„ค์น˜
uv pip install -e ".[ml]"        # ๋จธ์‹ ๋Ÿฌ๋‹ ๋ถ€์ŠคํŒ… (XGBoost, LightGBM, CatBoost)
uv pip install -e ".[dl]"        # ๋”ฅ๋Ÿฌ๋‹ (TensorFlow, AutoKeras, Keras-Tuner)
uv pip install -e ".[viz]"       # ๊ณ ๊ธ‰ ์‹œ๊ฐํ™” (Plotly)
uv pip install -e ".[interpret]" # ๋ชจ๋ธ ํ•ด์„ (SHAP)
uv pip install -e ".[advanced]"  # ๊ณ ๊ธ‰ ๋ถ„์„ (UMAP, HDBSCAN, Optuna)
uv pip install -e ".[stats]"     # ํ†ต๊ณ„ ๋ถ„์„ (statsmodels)
uv pip install -e ".[image]"     # ์ด๋ฏธ์ง€ ์ฒ˜๋ฆฌ (Pillow)
uv pip install -e ".[llm]"       # LLM ํ†ตํ•ฉ (OpenAI)

# ์กฐํ•ฉ ์„ค์น˜
uv pip install -e ".[ml,viz,interpret]"  # ์ž์ฃผ ์‚ฌ์šฉํ•˜๋Š” ์กฐํ•ฉ
uv pip install -e ".[all]"               # ๋ชจ๋“  ๊ธฐ๋Šฅ

๐Ÿ›ก๏ธ ๋ณด์•ˆ ๋ฐ ์„ฑ๋Šฅ

๐Ÿ”’ ๋ฐ์ดํ„ฐ ๋ณด์•ˆ

  • ๋กœ์ปฌ ์ฒ˜๋ฆฌ: ๋ชจ๋“  ๋ฐ์ดํ„ฐ๋Š” ๋กœ์ปฌ์—์„œ๋งŒ ์ฒ˜๋ฆฌ
  • ์ƒ˜ํ”Œ๋ง ์ •๋ณด: LLM์—๋Š” ํ†ต๊ณ„ ์š”์•ฝ๋งŒ ์ „์†ก, ์›๋ณธ ๋ฐ์ดํ„ฐ ๋น„์ „์†ก
  • ํŒŒ์ผ ๊ฒฉ๋ฆฌ: ์ƒŒ๋“œ๋ฐ•์Šค ๋””๋ ‰ํ† ๋ฆฌ๋กœ ์•ˆ์ „ํ•œ ํŒŒ์ผ ๊ด€๋ฆฌ
  • ๋กœ๊ทธ ์ถ”์ : ๋ชจ๋“  ์ž‘์—…์˜ ์ƒ์„ธํ•œ ๋กœ๊ทธ ๊ธฐ๋ก

โšก ์„ฑ๋Šฅ ์ตœ์ ํ™”

  • ์ ์‘์  ์ƒ˜ํ”Œ๋ง: ๋Œ€์šฉ๋Ÿ‰ ๋ฐ์ดํ„ฐ์˜ ์ง€๋Šฅ์  ํฌ๊ธฐ ์กฐ์ •
  • ๋ฉ”๋ชจ๋ฆฌ ํšจ์œจ: ์ฒญํฌ ๋‹จ์œ„ ์ฒ˜๋ฆฌ๋กœ ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ๋Ÿ‰ ์ตœ์†Œํ™”
  • ์กฐ๊ธฐ ์ข…๋ฃŒ: ์„ฑ๋Šฅ์ด ์ˆ˜๋ ดํ•˜๋ฉด ์ž๋™์œผ๋กœ ํ•™์Šต ์ค‘๋‹จ
  • ์บ์‹ฑ: ์ค‘๊ฐ„ ๊ฒฐ๊ณผ ์บ์‹ฑ์œผ๋กœ ๋ฐ˜๋ณต ์ž‘์—… ์ตœ์ ํ™”

๐Ÿ“Š ์„ฑ๋Šฅ ๋ฒค์น˜๋งˆํฌ

๋ฐ์ดํ„ฐ ํฌ๊ธฐํ–‰ ์ˆ˜์ฒ˜๋ฆฌ ์‹œ๊ฐ„๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ์ƒ˜ํ”Œ๋ง ์ ์šฉ
์†Œํ˜•< 10K< 30์ดˆ< 500MBโŒ
์ค‘ํ˜•10K-100K1-5๋ถ„500MB-2GBโŒ
๋Œ€ํ˜•100K-1M2-15๋ถ„1-4GBโœ…
์ดˆ๋Œ€ํ˜•> 1M5-30๋ถ„2-8GBโœ…

๐Ÿšจ ํŠธ๋Ÿฌ๋ธ”์ŠˆํŒ…

์ผ๋ฐ˜์ ์ธ ๋ฌธ์ œ๋“ค

1. ๋ฉ”๋ชจ๋ฆฌ ๋ถ€์กฑ ์˜ค๋ฅ˜
# ํ•ด๊ฒฐ: ๋” ์ž‘์€ ์ƒ˜ํ”Œ ํฌ๊ธฐ ์‚ฌ์šฉ
# 100MB ์ž„๊ณ„๊ฐ’์„ 50MB๋กœ ๋‚ฎ์ถค
import os
os.environ['DATA_SIZE_THRESHOLD_MB'] = '50'
2. TensorFlow/AutoKeras ์„ค์น˜ ๋ฌธ์ œ
# Apple Silicon Mac
pip install tensorflow-macos tensorflow-metal

# CUDA GPU ์ง€์›
pip install tensorflow[and-cuda]

# ๋˜๋Š” CPU ๋ฒ„์ „๋งŒ
pip install tensorflow-cpu
3. ํŒจํ‚ค์ง€ ๋ฒ„์ „ ์ถฉ๋Œ
# ๊ฐ€์ƒํ™˜๊ฒฝ ์žฌ์ƒ์„ฑ
python -m venv venv --clear
source venv/bin/activate  # ๋˜๋Š” Windows: venv\Scripts\activate
pip install --upgrade pip
# ํŒจํ‚ค์ง€ ์žฌ์„ค์น˜...
4. ํŒŒ์ผ ์—…๋กœ๋“œ ์‹คํŒจ
# ํ•ด๊ฒฐ: ํŒŒ์ผ ๊ฒฝ๋กœ ํ™•์ธ ๋ฐ ๊ถŒํ•œ ์ฒดํฌ
import os
print(f"ํŒŒ์ผ ์กด์žฌ: {os.path.exists('/path/to/file.csv')}")
print(f"์ฝ๊ธฐ ๊ถŒํ•œ: {os.access('/path/to/file.csv', os.R_OK)}")
5. ์„œ๋ฒ„ ์‹œ์ž‘ ์‹คํŒจ
# ํฌํŠธ ์ถฉ๋Œ ํ™•์ธ
netstat -tlnp | grep 8007

# ๋‹ค๋ฅธ ํฌํŠธ ์‚ฌ์šฉ
python mcp_data_science.py --port 8008

๋กœ๊ทธ ํ™•์ธ

# ์ž‘์—… ๋กœ๊ทธ ํ™•์ธ
tail -f sandbox/logs/operation_*.json

# ์„œ๋ฒ„ ๋กœ๊ทธ ํ™•์ธ (์ฝ˜์†” ์ถœ๋ ฅ)

โš™๏ธ MCP ์„œ๋ฒ„ ์„ค์ •

MCP ์„œ๋ฒ„๋ฅผ ์‚ฌ์šฉํ•˜๋ ค๋ฉด ์„ค์ • ํŒŒ์ผ์„ ๊ตฌ์„ฑํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

์‚ฌ์šฉํ•˜๊ณ ์ž ํ•˜๋Š” ํ™˜๊ฒฝ์˜ MCP ์„ค์ • ํŒŒ์ผ์„ ์—ด์–ด ์•„๋ž˜ ๋‚ด์šฉ์„ ์ถ”๊ฐ€ํ•˜์—ฌ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์„ค์ • ํŒŒ์ผ ๋‚ด์šฉ
{
  "mcpServers": {
    "data-science": {
      "transport": "sse",
      "url": "http://localhost:8007/sse"
    }
  }
}

ํฌํŠธ ์„ค์ •

๊ธฐ๋ณธ ํฌํŠธ 8007์ด ์‚ฌ์šฉ ์ค‘์ธ ๊ฒฝ์šฐ ๋‹ค๋ฅธ ํฌํŠธ๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:

# ํฌํŠธ ์‚ฌ์šฉ ํ™•์ธ (Linux/Mac)
lsof -i :8007

# ํฌํŠธ ์‚ฌ์šฉ ํ™•์ธ (Windows)
netstat -an | findstr :8007

# ๋‹ค๋ฅธ ํฌํŠธ๋กœ ์„œ๋ฒ„ ์‹คํ–‰
python mcp_data_science.py --port 8008

์„ค์ • ํŒŒ์ผ์—์„œ๋„ ํ•ด๋‹น ํฌํŠธ๋กœ ๋ณ€๊ฒฝ:

{
  "mcpServers": {
    "data-science": {
      "transport": "sse",
      "url": "http://localhost:8008/sse"
    }
  }
}

๐ŸŽฏ MCP ํ˜ธ์ถœ ์˜ˆ์ œ

๋‹ค์Œ์€ Cursor์—์„œ ๋ฐ์ดํ„ฐ ์‚ฌ์ด์–ธ์Šค MCP ์„œ๋ฒ„๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ์‹ค์ œ ์˜ˆ์ œ์ž…๋‹ˆ๋‹ค.

MCP ์‚ฌ์šฉ ์˜ˆ์ œ

๐Ÿ”ฅ ๐Ÿ”ฅ

๐Ÿ“Š ๊ธฐ๋ณธ ๋ฐ์ดํ„ฐ ๋ถ„์„ ์›Œํฌํ”Œ๋กœ์šฐ

์‚ฌ์šฉ์ž: "CSV ํŒŒ์ผ์„ ์—…๋กœ๋“œํ•˜๊ณ  ๋ฐ์ดํ„ฐ ๋ถ„์„์„ ํ•ด์ฃผ์„ธ์š”."

Cursor: upload_local_file() โ†’ load_dataset() โ†’ perform_eda() โ†’ auto_ml_pipeline()

๐Ÿ” ๋‹จ๊ณ„๋ณ„ MCP ๋„๊ตฌ ํ˜ธ์ถœ

1. ํ™˜๊ฒฝ ํ™•์ธ
์‚ฌ์šฉ์ž: "๋ฐ์ดํ„ฐ ์‚ฌ์ด์–ธ์Šค ์„œ๋ฒ„๊ฐ€ ์ •์ƒ ์ž‘๋™ํ•˜๋Š”์ง€ ํ™•์ธํ•ด์ฃผ์„ธ์š”."

Cursor๊ฐ€ ํ˜ธ์ถœ: health_check()
2. ํŒŒ์ผ ์—…๋กœ๋“œ
์‚ฌ์šฉ์ž: "๋กœ์ปฌ์— ์žˆ๋Š” sales_data.csv ํŒŒ์ผ์„ ๋ถ„์„ํ•ด์ฃผ์„ธ์š”."

Cursor๊ฐ€ ํ˜ธ์ถœ:
1. upload_local_file('/path/to/sales_data.csv')
2. load_dataset('sales_data_20241207_143052.csv')
3. ํƒ์ƒ‰์  ๋ฐ์ดํ„ฐ ๋ถ„์„
์‚ฌ์šฉ์ž: "๋ฐ์ดํ„ฐ์˜ ๊ธฐ๋ณธ ํŠน์„ฑ๊ณผ ๋ถ„ํฌ๋ฅผ ํ™•์ธํ•ด์ฃผ์„ธ์š”."

Cursor๊ฐ€ ํ˜ธ์ถœ:
1. perform_eda(dataset_id='dataset_123')
2. create_visualization(dataset_id='dataset_123', plot_type='pairplot')
4. ๋จธ์‹ ๋Ÿฌ๋‹ ๋ชจ๋ธ๋ง
์‚ฌ์šฉ์ž: "๋งค์ถœ ์˜ˆ์ธก ๋ชจ๋ธ์„ ๋งŒ๋“ค์–ด์ฃผ์„ธ์š”."

Cursor๊ฐ€ ํ˜ธ์ถœ:
1. auto_ml_pipeline(
     dataset_id='dataset_123',
     target_column='sales',
     include_advanced=True
   )
5. ๊ฒฐ๊ณผ ๋ณด๊ณ ์„œ ์ƒ์„ฑ
์‚ฌ์šฉ์ž: "์ „์ฒด ๋ถ„์„ ๊ฒฐ๊ณผ๋ฅผ ์ •๋ฆฌํ•œ ๋ณด๊ณ ์„œ๋ฅผ ๋งŒ๋“ค์–ด์ฃผ์„ธ์š”."

Cursor๊ฐ€ ํ˜ธ์ถœ:
1. generate_comprehensive_report()
2. list_generated_code()

๐Ÿ’ก ๊ณ ๊ธ‰ ์‚ฌ์šฉ ์‹œ๋‚˜๋ฆฌ์˜ค

๋Œ€์šฉ๋Ÿ‰ ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ
์‚ฌ์šฉ์ž: "500MB ๊ณ ๊ฐ ๋ฐ์ดํ„ฐ๋ฅผ ๋ถ„์„ํ•ด์ฃผ์„ธ์š”. ๋ฉ”๋ชจ๋ฆฌ ํšจ์œจ์ ์œผ๋กœ ์ฒ˜๋ฆฌํ•ด์ฃผ์„ธ์š”."

Cursor์˜ ์ž๋™ ์ฒ˜๋ฆฌ:
1. upload_local_file() โ†’ ์ž๋™ ์ƒ˜ํ”Œ๋ง ๊ฐ์ง€
2. load_dataset() โ†’ ์ธตํ™”ํ‘œ์ง‘์œผ๋กœ 30,000ํ–‰ ์ƒ˜ํ”Œ ์ƒ์„ฑ
3. perform_eda() โ†’ ์ƒ˜ํ”Œ ๋ฐ์ดํ„ฐ๋กœ EDA ์ˆ˜ํ–‰
4. auto_ml_pipeline() โ†’ ๊ณ ๊ธ‰ ์•Œ๊ณ ๋ฆฌ์ฆ˜์œผ๋กœ ๋ชจ๋ธ๋ง
๋ฌธ์ œ ์œ ํ˜• ์ž๋™ ๊ฐ์ง€
์‚ฌ์šฉ์ž: "์ด ๋ฐ์ดํ„ฐ์…‹์œผ๋กœ ๋ฌด์—‡์„ ๋ถ„์„ํ•  ์ˆ˜ ์žˆ๋Š”์ง€ ์•Œ๋ ค์ฃผ์„ธ์š”."

Cursor๊ฐ€ ํ˜ธ์ถœ:
1. load_dataset() โ†’ ์ž๋™ ๋ฌธ์ œ ์œ ํ˜• ๊ฐ์ง€
2. ๊ฒฐ๊ณผ: "๋ถ„๋ฅ˜ ๋ฌธ์ œ๋กœ ๊ฐ์ง€, ์ถ”์ฒœ ํƒ€๊ฒŸ: 'customer_churn'"
์ปค์Šคํ…€ ์‹œ๊ฐํ™”
์‚ฌ์šฉ์ž: "์—ฐ๋ น๋Œ€๋ณ„ ๊ตฌ๋งค ํŒจํ„ด์„ ์‹œ๊ฐํ™”ํ•ด์ฃผ์„ธ์š”."

Cursor๊ฐ€ ํ˜ธ์ถœ:
create_visualization(
  dataset_id='dataset_123',
  plot_type='boxplot',
  x_column='age_group',
  y_column='purchase_amount',
  title='์—ฐ๋ น๋Œ€๋ณ„ ๊ตฌ๋งค ํŒจํ„ด'
)

๐Ÿš€ ์‹ค์ œ ์‚ฌ์šฉ ํŒ

  1. ํšจ์œจ์ ์ธ ์›Œํฌํ”Œ๋กœ์šฐ

    • ๋จผ์ € health_check๋กœ ์„œ๋ฒ„ ์ƒํƒœ ํ™•์ธ
    • upload_local_file โ†’ load_dataset โ†’ perform_eda ์ˆœ์„œ๋กœ ์ง„ํ–‰
    • ํฐ ๋ฐ์ดํ„ฐ๋Š” ์ž๋™ ์ƒ˜ํ”Œ๋ง ๊ฒฐ๊ณผ๋ฅผ ๋จผ์ € ํ™•์ธ
  2. ์˜ค๋ฅ˜ ์ฒ˜๋ฆฌ

    • ํŒŒ์ผ ์—…๋กœ๋“œ ์‹คํŒจ ์‹œ get_upload_instructions ํ™•์ธ
    • ๋ฉ”๋ชจ๋ฆฌ ๋ถ€์กฑ ์‹œ ๋” ์ž‘์€ ์ƒ˜ํ”Œ ํฌ๊ธฐ ์š”์ฒญ
    • ๋ชจ๋ธ๋ง ์‹คํŒจ ์‹œ get_operation_details๋กœ ์ƒ์„ธ ์ •๋ณด ํ™•์ธ
  3. ๊ฒฐ๊ณผ ํ™œ์šฉ

    • list_generated_code๋กœ ์žฌํ˜„ ๊ฐ€๋Šฅํ•œ Python ์ฝ”๋“œ ํ™•์ธ
    • generate_comprehensive_report๋กœ ์ „๋ฌธ์ ์ธ ๋ถ„์„ ๋ณด๊ณ ์„œ ์ƒ์„ฑ
    • ์ƒ์„ฑ๋œ ๋ชจ๋ธ์€ sandbox/models/์—์„œ ํ™•์ธ ๊ฐ€๋Šฅ

๐Ÿ“ ๋ผ์ด์„ผ์Šค

MIT License - ์ž์œ ๋กญ๊ฒŒ ์‚ฌ์šฉ, ์ˆ˜์ •, ๋ฐฐํฌ ๊ฐ€๋Šฅ


๐Ÿ”„ ๋ฒ„์ „ ํžˆ์Šคํ† ๋ฆฌ

v0.3.0 (ํ˜„์žฌ)

  • ๐ŸŽฏ ์ง€๋Šฅํ˜• ๋ฌธ์ œ ์œ ํ˜• ์ž๋™ ๊ฐ์ง€
  • ๐Ÿง  ๊ณ ๊ธ‰ ML ์•Œ๊ณ ๋ฆฌ์ฆ˜ (XGBoost, LightGBM, CatBoost, Gaussian Process)
  • ๐Ÿ“Š ์ ์‘์  ์ƒ˜ํ”Œ๋ง (์ธตํ™”/์ฒด๊ณ„/๋ฌด์ž‘์œ„)
  • ๐ŸŽจ ๊ณ ๊ธ‰ ์‹œ๊ฐํ™” (Plotly, ์ฐจ์›์ถ•์†Œ, ๋Œ€์‹œ๋ณด๋“œ)
  • ๐Ÿ”ฌ SHAP ๊ธฐ๋ฐ˜ ๋ชจ๋ธ ํ•ด์„
  • ๐Ÿ’ป ์™„์ „ํ•œ ์ฝ”๋“œ ์ž๋™ ์ƒ์„ฑ
  • ๐Ÿ“ ์ข…ํ•ฉ ๋งˆํฌ๋‹ค์šด ๋ณด๊ณ ์„œ
  • ๐Ÿ“ ํŒŒ์ผ ์—…๋กœ๋“œ ๋ฐ ๊ด€๋ฆฌ
  • โšก ์„ฑ๋Šฅ ์ตœ์ ํ™” ๋ฐ ์•ˆ์ •์„ฑ

์ฃผ์š” ๊ฐœ์„ ์‚ฌํ•ญ

  • ์ง€๋Šฅํ˜• ๋ถ„์„: ๋ฐ์ดํ„ฐ ํŠน์„ฑ ๊ธฐ๋ฐ˜ ์ž๋™ ๋ฌธ์ œ ์œ ํ˜• ๊ฐ์ง€
  • ํ’ˆ์งˆ ๋ณด์žฅ: ์ƒ˜ํ”Œ๋ง ํ’ˆ์งˆ ์ž๋™ ๊ฒ€์ฆ ์‹œ์Šคํ…œ
  • ์‚ฌ์šฉ์ž ์นœํ™”: ์ง๊ด€์ ์ธ API์™€ ์ƒ์„ธํ•œ ๊ฐ€์ด๋“œ
  • ์—”ํ„ฐํ”„๋ผ์ด์ฆˆ: ๋Œ€์šฉ๋Ÿ‰ ๋ฐ์ดํ„ฐ ์•ˆ์ •์  ์ฒ˜๋ฆฌ
  • ์žฌํ˜„์„ฑ: ๋ชจ๋“  ๋ถ„์„์˜ ์™„์ „ํ•œ ์ฝ”๋“œ ์ƒ์„ฑ

๐Ÿ’ก ํŒ: ์ตœ์ƒ์˜ ๊ฒฐ๊ณผ๋ฅผ ์œ„ํ•ด ํƒ€๊ฒŸ ์ปฌ๋Ÿผ์ด ์žˆ๋Š” ์ง€๋„ํ•™์Šต ๋ฌธ์ œ๋ฅผ ๊ถŒ์žฅํ•˜๋ฉฐ, ๋Œ€์šฉ๋Ÿ‰ ๋ฐ์ดํ„ฐ์˜ ๊ฒฝ์šฐ ์ž๋™ ์ƒ˜ํ”Œ๋ง ๊ฒฐ๊ณผ๋ฅผ ๊ฒ€ํ† ํ•œ ํ›„ ํ•„์š”์‹œ ์ „์ฒด ๋ฐ์ดํ„ฐ๋กœ ์žฌ๊ฒ€์ฆํ•˜์„ธ์š”.