안녕하세요!

Federico Calò

Sviluppatore Software | Divulgatore Tecnico

Creo applicazioni web moderne e strumenti digitali personalizzati per aiutare le attività a crescere attraverso l'innovazione tecnologica. La mia passione è unire informatica ed economia per generare valore reale.

연락하기

소개

La mia passione per l'informatica è nata tra i banchi dell'Istituto Tecnico Commerciale di Maglie, dove ho scoperto il potere della programmazione e il fascino di creare soluzioni digitali. Fin da subito, ho capito che l'informatica non era solo codice, ma uno strumento straordinario per trasformare idee in realtà.

Durante gli studi superiori in Sistemi Informativi Aziendali, ho iniziato a intrecciare informatica ed economia, comprendendo come la tecnologia possa essere il motore della crescita per qualsiasi attività. Questa visione mi ha accompagnato all'Università degli Studi di Bari, dove ho conseguito la Laurea in Informatica, approfondendo le mie competenze tecniche e la mia passione per lo sviluppo software.

Oggi metto questa esperienza al servizio di imprese, professionisti e startup, creando soluzioni digitali su misura che automatizzano processi, ottimizzano risorse e aprono nuove opportunità di business. Perché la vera innovazione inizia quando la tecnologia incontra le esigenze reali delle persone.

역량

Analisi Dati & Modelli Previsionali

Trasformo i dati in insights strategici con analisi approfondite e modelli predittivi per decisioni informate

프로세스 자동화

Creo strumenti personalizzati che automatizzano operazioni ripetitive e liberano tempo per attività a valore aggiunto

맞춤 시스템

Sviluppo sistemi software su misura, dalle integrazioni tra piattaforme alle dashboard personalizzate

const federico = {
  nome: "Federico Calò",
  ruolo: "Sviluppatore Software",
  città: "Bari, Italia",
  missione: "Aiutare attraverso l'informatica",
  passioni: [
    "Codice Pulito",
    "Innovazione",
    "Crescita Continua"
  ]
};

미션

Credo fermamente che l'informatica sia lo strumento più potente per trasformare le idee in realtà e migliorare la vita delle persone.

🚀

기술의 민주화

La mia missione è rendere l'informatica accessibile a tutti: dalle piccole imprese locali alle startup innovative, fino ai professionisti che vogliono digitalizzare la propria attività. Ogni realtà merita di sfruttare le potenzialità del digitale.

💡

IT와 비즈니스 통합

Non è solo questione di scrivere codice: è capire come la tecnologia possa generare valore reale. Intrecciando competenze informatiche e visione economica, aiuto le attività a crescere, ottimizzare processi e raggiungere nuovi traguardi di efficienza e redditività.

🎯

맞춤 솔루션

Ogni attività è unica, e così devono esserlo le soluzioni. Sviluppo strumenti personalizzati che rispondono alle esigenze specifiche di ciascun cliente, automatizzando processi ripetitivi e liberando tempo per ciò che conta davvero: far crescere il business.

기술로 비즈니스를 혁신하세요

Dicembre 2024

Visualizza

Master SQL

RoadMap.sh

Novembre 2024

Visualizza

Oracle Certified Foundations Associate

Oracle

Ottobre 2024

Visualizza

People Leadership Credential

Connect

Settembre 2024

💻 Linguaggi & Tecnologie

☕Java

🐍Python

📜JavaScript

🅰️Angular

⚛️React

🔷TypeScript

🗄️SQL

🐘PHP

🎨CSS/SCSS

🔧Node.js

🐳Docker

🌿Git

💼

12/2024 - Presente

Custom Software Engineering Analyst

Accenture

Bari, Puglia, Italia · Ibrida Analisi e sviluppo di sistemi informatici attraverso l'utilizzo di Java e Quarkus in Health and Public Sector. Formazione continua su tecnologie moderne per la creazione di soluzioni software personalizzate ed efficienti e sugli agenti.

💼

06/2022 - 12/2024

Analista software e Back End Developer Associate Consultant

Links Management and Technology SpA

Esperienza nell'analisi di sistemi software as-is e flussi ETL utilizzando PowerCenter. Formazione completata su Spring Boot per lo sviluppo di applicazioni backend moderne e scalabili. Sviluppatore Backend specializzato in Spring Boot, con esperienza in progettazione di database, analisi, sviluppo e testing dei task assegnati.

💼

02/2021 - 10/2021

Programmatore software

Adesso.it (prima era WebScience srl)

Esperienza nell'analisi AS-IS e TO-BE, evoluzioni SEO ed evoluzioni website per migliorare le performance e l'engagement degli utenti.

🎓

2018 - 2025

Laurea in Informatica

Università degli Studi di Bari Aldo Moro

Bachelor's degree in Computer Science, focusing on software engineering, algorithms, and modern development practices.

📚

2013 - 2018

Diploma - Sistemi Informativi Aziendali

Istituto Tecnico Commerciale di Maglie

Technical diploma specializing in Business Information Systems, combining IT knowledge with business management.

연락하기

프로젝트가 있으신가요? 아래 양식을 작성해 주시면 빠르게 답변드리겠습니다.

* Campi obbligatori. I tuoi dati saranno utilizzati solo per rispondere alla tua richiesta.

의료 분야의 AI: 진단, 약물 발견 및 환자 흐름

하루에 수천 장의 엑스레이를 분석하는 방사선 전문의, 수년을 연구하는 과학자 약물 후보 분자를 찾아 관련 정보를 추출해야 하는 의사 수백 페이지에 달하는 의료 기록에서: 이 시나리오는 일상적인 문제를 설명합니다. 현대 의학의. 인공지능이 이러한 각 영역을 변화시키고 있습니다. 미래의 약속이 아닌 2025년의 운영 현실로서 심오하고 측정 가능한 방식으로 진행됩니다.

FDA는 기준치를 초과했습니다. 승인된 AI 의료기기 1,240개 2025년 말까지, 그 중 방사선과에만 1,039개가 있습니다. 의료영상 촬영이 전체 허가의 77%를 차지 의료 분야의 AI. 신약 발굴 분야를 넘어 AI가 설계한 75개의 분자 최초로 완전히 AI로 설계된 분자가 완성되면서 임상 시험에 들어갔습니다. 2a단계는 2025년에 성공할 것입니다. 이탈리아 디지털 건강 시장은 그만한 가치가 있습니다. 73억 8천만 2025년 달러 2035년에는 265억 명으로 성장할 것입니다(CAGR 13.6%).

이 기사에서는 진단 영상부터 의료 분야 AI의 전체 스펙트럼을 다룹니다. 임상 NLP, 신약 발견부터 개인 정보 보호를 위한 연합 학습, 최대 EU MDR 및 AI법 규정. i용 Python 코드 예제 포함 가장 관련성이 높은 사용 사례.

이 기사에서 배울 내용

진단 영상(방사선과, 병리학, 피부과)에서 AI가 작동하는 방식
ML을 통한 약물 발견: 분자 생성, 가상 스크리닝 및 특성 예측
EHR을 위한 임상 NLP: 명명된 개체 인식 및 자동 ICD 코딩
민감한 데이터를 공유하지 않고 모델을 교육하는 연합 학습
FHIR/HL7 상호 운용성 및 병원 시스템과의 통합
규제: EU MDR, AI Act, AI 의료기기에 대한 CE 마크
의료 AI의 윤리와 편견: 실제 위험과 실질적인 완화
3가지 Python 코드 예: 영상 분류기, 약물 특성 예측기, 임상 NER

데이터 웨어하우스, AI 및 디지털 혁신 시리즈 개요

#	Articolo	집중하다
1	데이터 웨어하우스의 진화	SQL Server에서 데이터 레이크하우스로
2	데이터 메시 및 분산형 아키텍처	회사 데이터를 분산화
3	ETL과 최신 ELT	dbt, 에어바이트, Fivetran
4	파이프라인 오케스트레이션	Airflow, Dagster 및 Prefect
5	제조 분야의 AI	예측 유지 관리 및 디지털 트윈
6	금융 속의 AI	사기 탐지, 신용 점수 및 위험
7	소매업의 AI	수요 예측 및 추천 엔진
8	현재 위치 - 의료 분야의 AI	진단, 약물 발견 및 환자 흐름
9	물류 분야의 AI	경로 최적화 및 창고 자동화
10	비즈니스 LLM	RAG Enterprise, 미세 조정 및 가드레일
11	벡터 데이터베이스 엔터프라이즈	pgVector, Pinecone 및 Weaviate
12	비즈니스용 MLOps	MLflow를 사용하여 프로덕션 중인 AI 모델
13	데이터 거버넌스 및 데이터 품질	신뢰할 수 있는 AI를 위한 기반
14	중소기업을 위한 데이터 기반 로드맵	AI와 DWH의 실용화

맥락: 의료 분야의 AI가 다양한 이유

헬스케어 분야의 AI는 단순히 '의료 데이터에 ML을 적용하는 것'이 아닙니다. 그리고 도메인 모든 기술, 아키텍처 및 거버넌스 선택을 결정하는 고유한 특성 다른 분야보다 더 복잡합니다.

최대 판돈: 진단 오류로 인해 인명 피해가 발생할 수 있음
매우 민감한 데이터: GDPR, HIPAA 및 국가 규정 보호
엄격한 규제: EU MDR, AI Act, FDA 510(k) 및 PMA 승인
비판적 편견: 대표성이 없는 인구를 대상으로 훈련된 모델은 치료의 격차를 만듭니다.
복잡한 통합: 레거시 EHR/HIS 시스템, DICOM, HL7 v2/FHIR R4
임상적 수용: 의사는 AI 권장 사항을 신뢰하고 이해해야 합니다.

이러한 어려움에도 불구하고 잠재력은 엄청납니다. NIH는 AI가 감소시키다 20-30% 의료비 앞으로 10년 동안 조기 진단, 보다 효과적인 치료 및 치료 경로 최적화. 이탈리아에서는 PNRR이 할당했습니다. 16억 7천만 유로 디지털화를 위해 원격 의료, 전자 건강 기록을 위한 특정 자금을 포함한 의료 서비스 AI 도구의 채택.

의료 영상 AI: 방사선학에서 디지털 병리학까지

진단영상은 헬스케어 AI의 가장 성숙한 영역이다. 1,039개 이상의 장치 보유 방사선학 분야에서 FDA 승인을 받은 AI(2025년 데이터 종료), 컴퓨터 지원 탐지 시스템 (CADe) 및 진단(CADx)은 이제 주요 방사선학 작업흐름의 필수적인 부분입니다. 세계병원.

방사선과: 흉부 엑스레이 및 CT 스캔

흉부 엑스레이를 통한 폐병리 감지 모델 그들은 최초로 임상 성과를 달성했습니다. Stanford의 CheXpert 데이터 세트 (224,316 x-ray) 및 NIH ChestX-ray14 (112,120 이미지) 훈련 허용 특정 작업에서 방사선 전문의의 평균 정확도를 초과하는 모델:

기흉 감지: 방사선 전문의의 AUC 0.944 대 0.888
폐CT로 코로나19 진단 : 민감도 96%, 안전성 93%
폐암 검진(NLST 시험): 사망률 20% 감소

디지털 병리학 및 조직학

디지털 병리학은 조직학적 슬라이드(WSI - 전체 슬라이드 이미지)를 데이터로 변환합니다. AI로 분석 가능 CONCH, PLIP 및 UNI와 같은 기초 모델은 사전 학습되었습니다. 수백만 개의 조직학적 이미지를 통해 작업 병리학자에게 탁월한 성능을 제공합니다. 전립선암 등급(Gleason 시스템)과 같은 세부 사항.

피부과: 스마트폰을 통해 AI 접근 가능

피부과는 AI가 가장 큰 민주화 잠재력을 갖고 있는 분야인 스마트폰입니다. 좋은 카메라를 사용하면 진단 도구가 될 수 있습니다. 구글 모델 피부 병변 분류(600,000개의 이미지로 훈련됨) 달성 가장 일반적인 26가지 질환에 대해 위원회 인증을 받은 피부과 전문의의 정확성.

의료 영상을 위한 CNN 아키텍처

건축학	사용 사례	일반적인 데이터세트	성능
ResNet-50/101	방사선 사진 분류	CheXpert, NIH 흉부X-레이	AUC 0.89-0.95
유넷	장기/종양 분할	브라츠, 카오스	0.85~0.94라고 적혀있어요
EfficientNet-B4	피부 병변의 분류	ISIC 2020, HAM10000	AUC 0.93-0.96
ViT / DINO	WSI 디지털 병리학	TCGA, 카멜리온	AUC 0.94-0.98
3D U-넷	체적 CT/MRI 분할	의료 세분화 10종 경기	0.82~0.91이라고 나와있어요

실제 예: PyTorch를 사용한 의료 이미지 분류기

다음 예시에서는 흉부 엑스레이에 폐질환 분류기를 구현합니다. ImageNet에서 사전 훈련된 EfficientNet과 함께 전이 학습을 사용합니다. 그리고 접근 방식 임상 연구 프로젝트와 병원 개념 증명에서 흔히 볼 수 있습니다.

"""
Medical Image Classifier per Chest X-Ray
Classifica: Normal, Pneumonia, COVID-19, Lung Cancer
Richiede: torch, torchvision, timm, Pillow, numpy
"""
import torch
import torch.nn as nn
import torchvision.transforms as transforms
from torch.utils.data import Dataset, DataLoader
from PIL import Image
import numpy as np
import timm
from pathlib import Path
from typing import Dict, List, Tuple, Optional
import json

# ========================
# Configurazione
# ========================

CLASSES = ['Normal', 'Pneumonia', 'COVID-19', 'Lung_Cancer']
IMAGE_SIZE = 224
BATCH_SIZE = 32
NUM_EPOCHS = 30
LEARNING_RATE = 1e-4
DEVICE = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# ========================
# Dataset
# ========================

class ChestXRayDataset(Dataset):
    """
    Dataset per chest X-ray. Struttura attesa:
    data_dir/
      train/
        Normal/
        Pneumonia/
        COVID-19/
        Lung_Cancer/
      val/
        ...
    """

    def __init__(
        self,
        data_dir: str,
        split: str = 'train',
        transform: Optional[transforms.Compose] = None
    ) -> None:
        self.data_dir = Path(data_dir) / split
        self.transform = transform
        self.samples: List[Tuple[Path, int]] = []

        for class_idx, class_name in enumerate(CLASSES):
            class_dir = self.data_dir / class_name
            if class_dir.exists():
                for img_path in class_dir.glob('*.jpg'):
                    self.samples.append((img_path, class_idx))
                for img_path in class_dir.glob('*.png'):
                    self.samples.append((img_path, class_idx))

        # Statistiche dataset
        class_counts = [0] * len(CLASSES)
        for _, label in self.samples:
            class_counts[label] += 1
        print(f"[{split}] Totale: {len(self.samples)} immagini")
        for i, (name, count) in enumerate(zip(CLASSES, class_counts)):
            print(f"  {name}: {count} ({count/len(self.samples)*100:.1f}%)")

    def __len__(self) -> int:
        return len(self.samples)

    def __getitem__(self, idx: int) -> Tuple[torch.Tensor, int]:
        img_path, label = self.samples[idx]
        image = Image.open(img_path).convert('RGB')
        if self.transform:
            image = self.transform(image)
        return image, label


# ========================
# Trasformazioni con data augmentation
# ========================

def get_transforms(split: str) -> transforms.Compose:
    """
    Trasformazioni per training (con augmentation) e validazione.
    CLAHE-like contrast enhancement via RandomAutocontrast.
    """
    if split == 'train':
        return transforms.Compose([
            transforms.Resize((IMAGE_SIZE + 32, IMAGE_SIZE + 32)),
            transforms.RandomCrop(IMAGE_SIZE),
            transforms.RandomHorizontalFlip(p=0.3),
            # Chest X-ray: flip verticale raro ma accettabile
            transforms.RandomRotation(degrees=10),
            transforms.ColorJitter(
                brightness=0.2,
                contrast=0.2,
                saturation=0.1
            ),
            transforms.RandomAutocontrast(p=0.3),
            transforms.ToTensor(),
            # Normalizzazione su statistiche ImageNet (transfer learning)
            transforms.Normalize(
                mean=[0.485, 0.456, 0.406],
                std=[0.229, 0.224, 0.225]
            )
        ])
    else:
        return transforms.Compose([
            transforms.Resize((IMAGE_SIZE, IMAGE_SIZE)),
            transforms.ToTensor(),
            transforms.Normalize(
                mean=[0.485, 0.456, 0.406],
                std=[0.229, 0.224, 0.225]
            )
        ])


# ========================
# Modello con EfficientNet
# ========================

class MedicalImageClassifier(nn.Module):
    """
    Classificatore per immagini mediche basato su EfficientNet-B4.
    Transfer learning da ImageNet con fine-tuning progressivo.
    """

    def __init__(
        self,
        num_classes: int = len(CLASSES),
        backbone: str = 'efficientnet_b4',
        dropout_rate: float = 0.3
    ) -> None:
        super().__init__()

        # Backbone pre-addestrato (timm library)
        self.backbone = timm.create_model(
            backbone,
            pretrained=True,
            num_classes=0,  # Rimuove la testa originale
            global_pool='avg'
        )

        # Dimensione features output del backbone
        feature_dim = self.backbone.num_features

        # Testa di classificazione custom
        self.classifier = nn.Sequential(
            nn.Dropout(p=dropout_rate),
            nn.Linear(feature_dim, 512),
            nn.ReLU(inplace=True),
            nn.BatchNorm1d(512),
            nn.Dropout(p=dropout_rate / 2),
            nn.Linear(512, num_classes)
        )

        # Congela backbone inizialmente
        self._freeze_backbone()

    def _freeze_backbone(self) -> None:
        """Congela il backbone per il warm-up iniziale."""
        for param in self.backbone.parameters():
            param.requires_grad = False

    def unfreeze_backbone(self, unfreeze_last_n_blocks: int = 3) -> None:
        """Scongela gli ultimi N blocchi del backbone per fine-tuning."""
        # Congela tutto prima
        for param in self.backbone.parameters():
            param.requires_grad = False

        # Scongela ultimi N blocchi
        blocks = list(self.backbone.children())
        for block in blocks[-unfreeze_last_n_blocks:]:
            for param in block.parameters():
                param.requires_grad = True

        trainable = sum(p.numel() for p in self.parameters() if p.requires_grad)
        print(f"Parametri trainable: {trainable:,}")

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        features = self.backbone(x)
        return self.classifier(features)


# ========================
# Training con class weighting
# ========================

def compute_class_weights(dataset: ChestXRayDataset) -> torch.Tensor:
    """
    Calcola pesi inversamente proporzionali alla frequenza di classe.
    Fondamentale per dataset sbilanciati (es. Normal >> Patologico).
    """
    labels = [label for _, label in dataset.samples]
    class_counts = np.bincount(labels, minlength=len(CLASSES))
    weights = 1.0 / (class_counts + 1e-8)
    weights = weights / weights.sum() * len(CLASSES)
    return torch.FloatTensor(weights)


def train_epoch(
    model: nn.Module,
    loader: DataLoader,
    optimizer: torch.optim.Optimizer,
    criterion: nn.Module,
    device: torch.device
) -> Dict[str, float]:
    model.train()
    total_loss = 0.0
    correct = 0
    total = 0

    for batch_idx, (images, labels) in enumerate(loader):
        images, labels = images.to(device), labels.to(device)

        optimizer.zero_grad()
        outputs = model(images)
        loss = criterion(outputs, labels)
        loss.backward()

        # Gradient clipping per stabilità
        torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)
        optimizer.step()

        total_loss += loss.item()
        _, predicted = outputs.max(1)
        correct += predicted.eq(labels).sum().item()
        total += labels.size(0)

    return {
        'loss': total_loss / len(loader),
        'accuracy': 100.0 * correct / total
    }


@torch.no_grad()
def evaluate(
    model: nn.Module,
    loader: DataLoader,
    criterion: nn.Module,
    device: torch.device
) -> Dict[str, float]:
    model.eval()
    total_loss = 0.0
    all_preds: List[int] = []
    all_labels: List[int] = []
    all_probs: List[np.ndarray] = []

    for images, labels in loader:
        images, labels = images.to(device), labels.to(device)
        outputs = model(images)
        loss = criterion(outputs, labels)

        total_loss += loss.item()
        probs = torch.softmax(outputs, dim=1).cpu().numpy()
        preds = outputs.argmax(dim=1).cpu().numpy()

        all_preds.extend(preds.tolist())
        all_labels.extend(labels.cpu().numpy().tolist())
        all_probs.extend(probs.tolist())

    # Calcolo metriche per classe
    from sklearn.metrics import (
        accuracy_score, classification_report, roc_auc_score
    )
    accuracy = accuracy_score(all_labels, all_preds)
    report = classification_report(
        all_labels, all_preds,
        target_names=CLASSES,
        output_dict=True
    )

    # AUC-ROC multi-classe (OvR)
    try:
        auc = roc_auc_score(
            all_labels,
            np.array(all_probs),
            multi_class='ovr',
            average='macro'
        )
    except Exception:
        auc = 0.0

    return {
        'loss': total_loss / len(loader),
        'accuracy': accuracy * 100,
        'auc_roc': auc,
        'per_class': {
            cls: {
                'precision': report[cls]['precision'],
                'recall': report[cls]['recall'],
                'f1': report[cls]['f1-score']
            }
            for cls in CLASSES if cls in report
        }
    }


# ========================
# Pipeline principale
# ========================

def train_medical_classifier(data_dir: str) -> None:
    """Pipeline completa di training con curriculum learning."""
    print(f"Device: {DEVICE}")

    # Dataset e DataLoader
    train_dataset = ChestXRayDataset(data_dir, 'train', get_transforms('train'))
    val_dataset = ChestXRayDataset(data_dir, 'val', get_transforms('val'))

    class_weights = compute_class_weights(train_dataset).to(DEVICE)

    train_loader = DataLoader(
        train_dataset, batch_size=BATCH_SIZE,
        shuffle=True, num_workers=4, pin_memory=True
    )
    val_loader = DataLoader(
        val_dataset, batch_size=BATCH_SIZE,
        shuffle=False, num_workers=4, pin_memory=True
    )

    # Modello
    model = MedicalImageClassifier().to(DEVICE)
    criterion = nn.CrossEntropyLoss(weight=class_weights)

    # FASE 1: Warm-up (solo testa, backbone congelato)
    print("\n--- FASE 1: Warm-up (5 epoche) ---")
    optimizer = torch.optim.AdamW(
        filter(lambda p: p.requires_grad, model.parameters()),
        lr=LEARNING_RATE, weight_decay=1e-4
    )
    for epoch in range(5):
        train_metrics = train_epoch(model, train_loader, optimizer, criterion, DEVICE)
        print(f"Epoch {epoch+1}/5 | Loss: {train_metrics['loss']:.4f} | Acc: {train_metrics['accuracy']:.2f}%")

    # FASE 2: Fine-tuning backbone
    print("\n--- FASE 2: Fine-tuning (25 epoche) ---")
    model.unfreeze_backbone(unfreeze_last_n_blocks=3)
    optimizer = torch.optim.AdamW(
        model.parameters(),
        lr=LEARNING_RATE / 10,  # LR più basso per backbone
        weight_decay=1e-4
    )
    scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(
        optimizer, T_max=25, eta_min=1e-7
    )

    best_auc = 0.0
    for epoch in range(25):
        train_m = train_epoch(model, train_loader, optimizer, criterion, DEVICE)
        val_m = evaluate(model, val_loader, criterion, DEVICE)
        scheduler.step()

        print(
            f"Epoch {epoch+1}/25 | "
            f"Train Loss: {train_m['loss']:.4f} Acc: {train_m['accuracy']:.2f}% | "
            f"Val Loss: {val_m['loss']:.4f} Acc: {val_m['accuracy']:.2f}% AUC: {val_m['auc_roc']:.4f}"
        )

        # Salva il miglior modello
        if val_m['auc_roc'] > best_auc:
            best_auc = val_m['auc_roc']
            torch.save({
                'epoch': epoch,
                'model_state_dict': model.state_dict(),
                'optimizer_state_dict': optimizer.state_dict(),
                'val_metrics': val_m,
                'classes': CLASSES
            }, 'best_medical_classifier.pth')
            print(f"  -> Nuovo miglior modello salvato (AUC: {best_auc:.4f})")

    # Report finale
    print("\n--- Metriche finali per classe ---")
    final_metrics = evaluate(model, val_loader, criterion, DEVICE)
    for cls, metrics in final_metrics['per_class'].items():
        print(
            f"{cls:15s} | "
            f"Precision: {metrics['precision']:.3f} | "
            f"Recall: {metrics['recall']:.3f} | "
            f"F1: {metrics['f1']:.3f}"
        )


if __name__ == '__main__':
    train_medical_classifier('./chest_xray_dataset')

주의: AI 모델의 임상적 사용

의료 진단을 위한 기계 학습 모델 그것들은 사용되어서는 안 된다 독립형 진단 도구로 임상 검증, 인증 없이 의료 기기(CE 마크/FDA 허가) 및 자격을 갖춘 의료 감독으로 사용됩니다. 이 문서의 코드는 교육 및 연구 목적으로 사용됩니다.

머신러닝을 통한 약물 발견

신약 발견은 전통적으로 10~15년이 걸리는 과정이다. 평균 비용 분자당 26억 달러 승인. 실패율은 잔인합니다. 1단계에 진입한 후보자 중 10%만이 성공합니다. 승인. AI는 이러한 시나리오를 근본적으로 변화시키고 있습니다.

AI로 가속화되는 신약 발견 단계

약물 발견 파이프라인에는 ML이 뚜렷한 기여를 가져오는 여러 단계가 포함됩니다.

표적 식별: 상호작용 네트워크의 GNN(Graph Neural Networks) 우선순위 치료 표적을 식별하기 위한 단백질-단백질
적중 발견: 수백만 개의 분자 라이브러리에 대한 가상 스크리닝 (Schrodinger Glide, AutoDock Vina, DeepDocking과 같은 ML 기반 모델)
리드 최적화: QSAR(정량적 구조-활동 관계) 모델 생물학적 활성과 독성을 예측하기 위해
분자 생성: VAE(Variational Autoencoder), 흐름 기반 모델 새로운 분자를 생성하기 위한 확산 모델 및 확산 모델
ADMET 예측: 흡수, 분포, 대사 예측, 체외 테스트 없이 배설 및 독성

Insilico 의학 사례: 최초의 완전 AI 분자

2025년, 표적과 분자를 갖춘 최초의 약물이 임상 2a를 성공적으로 완료했습니다. 전적으로 AI에 의해 설계됨: Insilico Medicine의 ISM001-055, 억제제 특발성 폐섬유증(IPF)에 대한 TRAF2- 및 Nck-상호작용 키나제(TNIK). 이 시험에서는 강제 폐활량의 용량 의존적 개선이 입증되었습니다. 이 성과는 업계 전체에 대한 기대치를 재정의했습니다.

AlphaFold 3 및 단백질 구조

DeepMind의 AlphaFold는 단백질 접힘 문제를 해결했습니다. 알파폴드 3(2024-2025) 단백질-DNA, 단백질-RNA 및 복합체 예측으로 기능 확장 전례 없는 정확성을 지닌 단백질 리간드. 공개 데이터베이스에는 다음이 포함됩니다. 그 이상으로 예상되는 구조 2억 개의 단백질, 접근 가능하게 만들기 이전에는 수년간의 결정학이 필요했던 정보를 모든 연구자들에게 제공합니다.

예: RDKit 및 ML을 사용한 분자 특성 예측

다음 코드는 QSAR 파이프라인을 구현하여 경구 생체 이용률을 예측합니다. (Lipinski Rule of Five) 및 Morgan 분자 지문을 사용한 수용해도 그래디언트 부스팅 모델. 이는 많은 히트 최적화 프로젝트의 기초입니다.

"""
Drug Property Prediction Pipeline con RDKit e Scikit-Learn
Predice: Lipinski compliance, solubilita acquosa (LogS), tossicita
Richiede: rdkit, scikit-learn, numpy, pandas, matplotlib
"""
import numpy as np
import pandas as pd
from dataclasses import dataclass, field
from typing import List, Optional, Dict, Any, Tuple
from rdkit import Chem
from rdkit.Chem import Descriptors, AllChem, QED, Crippen
from rdkit.Chem import rdMolDescriptors
from rdkit import RDLogger
from sklearn.ensemble import GradientBoostingClassifier, GradientBoostingRegressor
from sklearn.model_selection import StratifiedKFold, cross_val_score
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.metrics import classification_report, mean_squared_error
import warnings

# Silenzia warning RDKit per demo
RDLogger.DisableLog('rdApp.*')
warnings.filterwarnings('ignore')


# ========================
# Strutture dati
# ========================

@dataclass
class MoleculeFeatures:
    """Features calcolate per una molecola."""
    smiles: str
    mol_weight: float = 0.0
    logp: float = 0.0
    hbd: int = 0      # H-Bond Donors
    hba: int = 0      # H-Bond Acceptors
    tpsa: float = 0.0 # Topological Polar Surface Area
    rotatable_bonds: int = 0
    aromatic_rings: int = 0
    qed_score: float = 0.0   # Quantitative Estimate of Drug-likeness
    morgan_fp: List[int] = field(default_factory=list)
    # Regola di Lipinski: tutti e 4 i criteri
    lipinski_compliant: bool = False
    # Label per training
    solubility_class: Optional[str] = None  # 'low', 'medium', 'high'
    log_solubility: Optional[float] = None  # LogS (mol/L)


# ========================
# Feature Extraction
# ========================

class MolecularFeatureExtractor:
    """
    Estrae features molecolari per modelli QSAR.
    Usa RDKit per calcolare fingerprints e descrittori fisico-chimici.
    """

    MORGAN_RADIUS = 2
    MORGAN_NBITS = 2048

    def extract(self, smiles: str) -> Optional[MoleculeFeatures]:
        """Estrae features da una molecola in formato SMILES."""
        mol = Chem.MolFromSmiles(smiles)
        if mol is None:
            return None

        # Fingerprint di Morgan (equivalente ECFP4)
        morgan_fp = AllChem.GetMorganFingerprintAsBitVect(
            mol, radius=self.MORGAN_RADIUS, nBits=self.MORGAN_NBITS
        )
        fp_array = list(morgan_fp.ToBitString())

        # Descrittori fisico-chimici
        mw = Descriptors.MolWt(mol)
        logp = Crippen.MolLogP(mol)
        hbd = rdMolDescriptors.CalcNumHBD(mol)
        hba = rdMolDescriptors.CalcNumHBA(mol)
        tpsa = Descriptors.TPSA(mol)
        rot_bonds = rdMolDescriptors.CalcNumRotatableBonds(mol)
        arom_rings = rdMolDescriptors.CalcNumAromaticRings(mol)
        qed_score = QED.qed(mol)

        # Verifica regola di Lipinski (Rule of Five)
        lipinski = (
            mw <= 500 and
            logp <= 5 and
            hbd <= 5 and
            hba <= 10
        )

        return MoleculeFeatures(
            smiles=smiles,
            mol_weight=mw,
            logp=logp,
            hbd=hbd,
            hba=hba,
            tpsa=tpsa,
            rotatable_bonds=rot_bonds,
            aromatic_rings=arom_rings,
            qed_score=qed_score,
            morgan_fp=[int(b) for b in fp_array],
            lipinski_compliant=lipinski
        )

    def batch_extract(
        self,
        smiles_list: List[str]
    ) -> pd.DataFrame:
        """Estrae features per un batch di molecole."""
        records = []
        for smiles in smiles_list:
            features = self.extract(smiles)
            if features:
                records.append({
                    'smiles': features.smiles,
                    'mol_weight': features.mol_weight,
                    'logp': features.logp,
                    'hbd': features.hbd,
                    'hba': features.hba,
                    'tpsa': features.tpsa,
                    'rotatable_bonds': features.rotatable_bonds,
                    'aromatic_rings': features.aromatic_rings,
                    'qed_score': features.qed_score,
                    'lipinski_compliant': int(features.lipinski_compliant),
                    # Morgan fingerprint come colonne separate (dimensione ridotta per demo)
                    **{f'fp_{i}': int(features.morgan_fp[i])
                       for i in range(min(256, len(features.morgan_fp)))}
                })
        return pd.DataFrame(records)


# ========================
# Modelli QSAR
# ========================

class SolubilityPredictor:
    """
    Predice la solubilita acquosa (LogS) di molecole farmaceutiche.
    Solubilita alta: fondamentale per biodisponibilita orale.
    """

    def __init__(self) -> None:
        self.extractor = MolecularFeatureExtractor()
        self.classifier: Optional[Pipeline] = None
        self.regressor: Optional[Pipeline] = None

    def _prepare_features(self, df: pd.DataFrame) -> np.ndarray:
        """Prepara feature matrix da DataFrame."""
        feature_cols = [
            'mol_weight', 'logp', 'hbd', 'hba',
            'tpsa', 'rotatable_bonds', 'aromatic_rings', 'qed_score'
        ]
        fp_cols = [c for c in df.columns if c.startswith('fp_')]
        return df[feature_cols + fp_cols].values.astype(np.float32)

    def train(
        self,
        smiles_list: List[str],
        log_solubility: List[float]
    ) -> Dict[str, Any]:
        """
        Addestra due modelli:
        1. Classificatore: low/medium/high solubility
        2. Regressore: valore LogS continuo
        """
        df = self.extractor.batch_extract(smiles_list)
        X = self._prepare_features(df)

        # Target regressione: LogS
        y_reg = np.array(log_solubility[:len(df)])

        # Target classificazione: categorie
        def categorize(logs: float) -> str:
            if logs < -4: return 'low'
            elif logs < -2: return 'medium'
            else: return 'high'

        y_cls = np.array([categorize(v) for v in y_reg])

        # Pipeline con scaling
        self.regressor = Pipeline([
            ('scaler', StandardScaler()),
            ('model', GradientBoostingRegressor(
                n_estimators=200,
                max_depth=4,
                learning_rate=0.05,
                subsample=0.8,
                random_state=42
            ))
        ])

        self.classifier = Pipeline([
            ('scaler', StandardScaler()),
            ('model', GradientBoostingClassifier(
                n_estimators=200,
                max_depth=4,
                learning_rate=0.05,
                random_state=42
            ))
        ])

        # Cross-validation 5-fold
        cv_rmse = cross_val_score(
            self.regressor, X, y_reg,
            cv=5, scoring='neg_root_mean_squared_error'
        )
        cv_acc = cross_val_score(
            self.classifier, X, y_cls,
            cv=StratifiedKFold(5), scoring='accuracy'
        )

        # Fitting finale
        self.regressor.fit(X, y_reg)
        self.classifier.fit(X, y_cls)

        return {
            'regressor_cv_rmse': float(-cv_rmse.mean()),
            'regressor_cv_std': float(cv_rmse.std()),
            'classifier_cv_accuracy': float(cv_acc.mean()),
            'classifier_cv_std': float(cv_acc.std()),
            'n_molecules': len(df)
        }

    def predict(self, smiles: str) -> Dict[str, Any]:
        """Predice proprietà di solubilita per una molecola."""
        if not self.regressor or not self.classifier:
            raise ValueError("Modello non addestrato. Chiama train() prima.")

        features = self.extractor.extract(smiles)
        if not features:
            raise ValueError(f"SMILES non valido: {smiles}")

        df = self.extractor.batch_extract([smiles])
        X = self._prepare_features(df)

        log_s = float(self.regressor.predict(X)[0])
        solubility_class = self.classifier.predict(X)[0]
        class_proba = dict(zip(
            self.classifier.classes_,
            self.classifier.predict_proba(X)[0]
        ))

        return {
            'smiles': smiles,
            'mol_weight': features.mol_weight,
            'logp': features.logp,
            'hbd': features.hbd,
            'hba': features.hba,
            'tpsa': features.tpsa,
            'qed_score': round(features.qed_score, 3),
            'lipinski_compliant': features.lipinski_compliant,
            'predicted_log_solubility': round(log_s, 3),
            'solubility_class': solubility_class,
            'class_probabilities': {k: round(v, 3) for k, v in class_proba.items()},
            'drug_likeness': 'Good' if features.lipinski_compliant and features.qed_score > 0.5 else 'Poor'
        }


# ========================
# Esempio di utilizzo
# ========================

def demo_drug_prediction() -> None:
    """Demo con molecole farmaceutiche note."""

    # Dataset sintetico: SMILES + LogS approssimati da letteratura
    training_data = [
        # Aspirina
        ('CC(=O)Oc1ccccc1C(=O)O', -1.69),
        # Ibuprofene
        ('CC(C)Cc1ccc(cc1)C(C)C(=O)O', -3.97),
        # Paracetamolo
        ('CC(=O)Nc1ccc(O)cc1', -1.29),
        # Atorvastatina
        ('CC(C)c1c(C(=O)Nc2ccccc2F)c(-c2ccccc2)n1CCC(O)CC(O)CC(=O)O', -5.21),
        # Metformina
        ('CN(C)C(=N)NC(=N)N', 0.81),
        # Amoxicillina
        ('CC1(C)SC2C(NC(=O)C(N)c3ccc(O)cc3)C(=O)N2C1C(=O)O', -1.84),
        # Caffeina
        ('Cn1cnc2c1c(=O)n(c(=O)n2C)C', -1.36),
        # Warfarin (bassa solubilita)
        ('CC(=O)CC(c1ccccc1)c1c(O)c2ccccc2oc1=O', -4.66),
        # Sildenafil (bassa solubilita)
        ('CCCC1=NN(C)C(=O)c2[nH]nc(-c3cc(S(=O)(=O)N4CCN(C)CC4)ccc3OCC)c21', -5.01),
        # Carbamazepina (media solubilita)
        ('NC(=O)N1c2ccccc2=Cc2ccccc21', -2.73),
    ]

    smiles_list = [s for s, _ in training_data]
    log_s_list = [v for _, v in training_data]

    # Training
    predictor = SolubilityPredictor()
    metrics = predictor.train(smiles_list, log_s_list)

    print("=== Metriche Training (CV 5-fold) ===")
    print(f"Regressore RMSE:     {metrics['regressor_cv_rmse']:.3f} +/- {metrics['regressor_cv_std']:.3f}")
    print(f"Classificatore Acc:  {metrics['classifier_cv_accuracy']:.3f} +/- {metrics['classifier_cv_std']:.3f}")
    print(f"Molecole training:   {metrics['n_molecules']}")

    # Predizioni su nuove molecole
    test_molecules = [
        ('O=C(O)c1ccccc1O', 'Acido Salicilico'),        # Aspirina senza gruppo acetile
        ('c1ccc(cc1)CCN', 'Feniletilammina'),
        ('CC(=O)c1ccc(O)cc1', 'Acetofenone'),
    ]

    print("\n=== Predizioni su Nuove Molecole ===")
    for smiles, name in test_molecules:
        result = predictor.predict(smiles)
        print(f"\n{name} ({smiles})")
        print(f"  Mol Weight:     {result['mol_weight']:.1f} Da")
        print(f"  LogP:           {result['logp']:.2f}")
        print(f"  QED Score:      {result['qed_score']}")
        print(f"  Lipinski OK:    {result['lipinski_compliant']}")
        print(f"  LogS predetto:  {result['predicted_log_solubility']}")
        print(f"  Classe:         {result['solubility_class']}")
        print(f"  Drug-likeness:  {result['drug_likeness']}")


if __name__ == '__main__':
    demo_drug_prediction()

임상 NLP: 파일에서 인텔리전스까지

전자건강기록(EHR/EMR)에는 엄청난 양의 정보가 담겨 있습니다. 구조화되지 않은 텍스트 형식: 병력, 방사선 보고서, 퇴원 기록, 처방전. 임상 NLP와 하나를 사용하여 이러한 텍스트에서 구조화된 정보를 추출합니다. 의료 AI에서 ROI가 가장 높은 사용 사례입니다.

명명된 개체 인식(NER) 클리닉

임상 NER 모델은 다음과 같은 항목을 식별하고 분류합니다.

의료 문제: 진단, 증상, 만성질환
약: 명칭, 용량, 빈도, 투여경로
진단 테스트: 혈액검사, 영상검사, 생체검사
절차: 수술, 치료법
해부: 관련된 장기, 해부학적 구조
임상적 가치: 혈압, 혈당, 체온, 포화도

자동 ICD-10 코딩

ICD(국제질병분류) 코딩은 비용이 많이 드는 수동 프로세스입니다. 오류 발생 가능성: 미국에서는 수동으로 적용한 코드의 25~40%에 다음이 포함된 것으로 추정됩니다. 오류. BioBERT, ClinicalBERT, RoBERTa 등 모델 기반 AI 시스템 미세 조정 단일 라벨 ICD-10 작업에서 90% 이상의 정확도, 75% 이상의 정확도 달성 다중 라벨 코딩. John Snow Labs Healthcare NLP는 2,500개 이상의 사전 훈련된 파이프라인을 제공합니다. SNOMED CT, RxNorm 및 ICD-10용 리졸버를 포함합니다.

예: spaCy 및 BioBERT를 사용한 임상 NER

"""
Clinical Named Entity Recognition con spaCy e Transformers
Estrae entità mediche da note cliniche in italiano/inglese
Richiede: spacy, transformers, torch, scispacy
"""
from dataclasses import dataclass, field
from typing import List, Dict, Optional, Any
import json
import re

# ========================
# Strutture dati
# ========================

@dataclass
class ClinicalEntity:
    """Entità clinica estratta da testo."""
    text: str
    label: str          # PROBLEM, MEDICATION, TEST, PROCEDURE, ANATOMY, VALUE
    start: int
    end: int
    confidence: float = 0.0
    icd10_code: Optional[str] = None
    rxnorm_code: Optional[str] = None
    normalized_value: Optional[str] = None


@dataclass
class ClinicalDocument:
    """Documento clinico con entità estratte."""
    text: str
    patient_id: str
    document_type: str  # 'discharge_summary', 'radiology_report', 'progress_note'
    entities: List[ClinicalEntity] = field(default_factory=list)
    icd_codes: List[str] = field(default_factory=list)
    medications: List[Dict[str, str]] = field(default_factory=list)


# ========================
# Rule-based NER per Italian Clinical Text
# ========================

class ItalianClinicalNERRules:
    """
    NER rule-based per testo clinico italiano.
    Da usare come baseline o per entity types altamente strutturati
    (valori numerici, farmaci con dosaggio).
    In produzione: integrare con modello ML fine-tuned su corpora italiani.
    """

    # Pattern farmaci comuni con dosaggio
    MEDICATION_PATTERNS = [
        r'\b([A-Z][a-z]+(?:ina|olo|ide|ato|ico)?)\s+(\d+(?:\.\d+)?)\s*(mg|mcg|g|UI|mEq)\b',
        r'\b([A-Z][a-z]+)\s+cp\s+(\d+\s*mg)\b',
        r'\b([A-Z][a-z]+)\s+(\d+)\s*(mg|g)\s+(?:x|per)\s+(\d+)',
    ]

    # Pattern valori clinici
    VALUE_PATTERNS = [
        (r'\bPA\s*[:\s]?\s*(\d+)\s*/\s*(\d+)\b', 'blood_pressure'),
        (r'\bFC\s*[:\s]?\s*(\d+)\s*bpm\b', 'heart_rate'),
        (r'\bGlicemia\s*[:\s]?\s*(\d+(?:\.\d+)?)\s*mg/dL\b', 'glycemia'),
        (r'\bSpO2\s*[:\s]?\s*(\d+(?:\.\d+)?)\s*%\b', 'oxygen_saturation'),
        (r'\bTemperatura\s*[:\s]?\s*(\d+(?:\.\d+)?)\s*[°°C]', 'temperature'),
        (r'\bHbA1c\s*[:\s]?\s*(\d+(?:\.\d+)?)\s*%', 'hba1c'),
        (r'\bCreatinina\s*[:\s]?\s*(\d+(?:\.\d+)?)\s*(?:mg/dL|mmol/L)', 'creatinine'),
    ]

    # Diagnosi comuni per keyword matching (subset)
    PROBLEM_KEYWORDS = [
        'diabete mellito', 'ipertensione arteriosa', 'scompenso cardiaco',
        'fibrillazione atriale', 'infarto miocardico', 'angina pectoris',
        'broncopneumopatia', 'BPCO', 'insufficienza renale',
        'neoplasia', 'carcinoma', 'adenocarcinoma', 'linfoma',
        'polmonite', 'sepsi', 'ictus ischemico', 'emorragia cerebrale',
        'frattura', 'osteoporosi', 'artrite reumatoide', 'lupus eritematoso',
    ]

    # Mappa diagnosi -> ICD-10 (subset illustrativo)
    DIAGNOSIS_TO_ICD10 = {
        'diabete mellito': 'E11',
        'ipertensione arteriosa': 'I10',
        'scompenso cardiaco': 'I50',
        'fibrillazione atriale': 'I48',
        'infarto miocardico': 'I21',
        'broncopneumopatia': 'J44',
        'BPCO': 'J44',
        'insufficienza renale': 'N17',
        'polmonite': 'J18',
        'sepsi': 'A41',
        'ictus ischemico': 'I63',
        'frattura': 'M84',
    }

    def extract_entities(self, text: str) -> List[ClinicalEntity]:
        """Estrae entità da testo clinico italiano."""
        entities: List[ClinicalEntity] = []
        text_lower = text.lower()

        # Estrazione problemi medici
        for keyword in self.PROBLEM_KEYWORDS:
            pattern = re.compile(re.escape(keyword), re.IGNORECASE)
            for match in pattern.finditer(text):
                icd = self.DIAGNOSIS_TO_ICD10.get(keyword.lower())
                entities.append(ClinicalEntity(
                    text=match.group(),
                    label='PROBLEM',
                    start=match.start(),
                    end=match.end(),
                    confidence=0.85,
                    icd10_code=icd
                ))

        # Estrazione farmaci con dosaggio
        for pattern_str in self.MEDICATION_PATTERNS:
            for match in re.finditer(pattern_str, text, re.IGNORECASE):
                entities.append(ClinicalEntity(
                    text=match.group(),
                    label='MEDICATION',
                    start=match.start(),
                    end=match.end(),
                    confidence=0.90
                ))

        # Estrazione valori clinici
        for pattern_str, value_type in self.VALUE_PATTERNS:
            for match in re.finditer(pattern_str, text, re.IGNORECASE):
                entities.append(ClinicalEntity(
                    text=match.group(),
                    label='VALUE',
                    start=match.start(),
                    end=match.end(),
                    confidence=0.95,
                    normalized_value=value_type
                ))

        # Deduplicazione e ordinamento
        entities.sort(key=lambda e: e.start)
        return self._deduplicate(entities)

    def _deduplicate(
        self,
        entities: List[ClinicalEntity]
    ) -> List[ClinicalEntity]:
        """Rimuove entità sovrapposte, preferisce quelle con confidence più alta."""
        if not entities:
            return []
        deduplicated = [entities[0]]
        for current in entities[1:]:
            last = deduplicated[-1]
            if current.start >= last.end:
                deduplicated.append(current)
            elif current.confidence > last.confidence:
                deduplicated[-1] = current
        return deduplicated


# ========================
# Document Processor
# ========================

class ClinicalDocumentProcessor:
    """
    Processa documenti clinici: NER, ICD coding, structured extraction.
    In produzione: usa modelli ML (ClinicalBERT, ItalianMedBERT) per NER.
    Questo esempio usa rule-based come baseline.
    """

    def __init__(self) -> None:
        self.ner = ItalianClinicalNERRules()

    def process(
        self,
        text: str,
        patient_id: str,
        doc_type: str = 'discharge_summary'
    ) -> ClinicalDocument:
        """Processa un documento clinico completo."""
        doc = ClinicalDocument(
            text=text,
            patient_id=patient_id,
            document_type=doc_type
        )

        # NER
        doc.entities = self.ner.extract_entities(text)

        # Estrai codici ICD dalle entità PROBLEM
        doc.icd_codes = list(set(
            e.icd10_code for e in doc.entities
            if e.label == 'PROBLEM' and e.icd10_code
        ))

        # Estrai farmaci
        doc.medications = [
            {'text': e.text, 'confidence': str(e.confidence)}
            for e in doc.entities if e.label == 'MEDICATION'
        ]

        return doc

    def to_fhir_condition(
        self,
        doc: ClinicalDocument
    ) -> List[Dict[str, Any]]:
        """
        Converte diagnosi estratte in risorse FHIR Condition.
        Formato FHIR R4 semplificato.
        """
        conditions = []
        for entity in doc.entities:
            if entity.label == 'PROBLEM':
                condition: Dict[str, Any] = {
                    'resourceType': 'Condition',
                    'subject': {
                        'reference': f'Patient/{doc.patient_id}'
                    },
                    'code': {
                        'text': entity.text
                    }
                }
                if entity.icd10_code:
                    condition['code']['coding'] = [{
                        'system': 'http://hl7.org/fhir/sid/icd-10',
                        'code': entity.icd10_code,
                        'display': entity.text
                    }]
                conditions.append(condition)
        return conditions


# ========================
# Demo
# ========================

def demo_clinical_nlp() -> None:
    """Demo su nota di dimissione ospedaliera italiana."""

    nota_dimissione = """
    NOTA DI DIMISSIONE - Reparto di Medicina Interna

    Paziente: M.R., 72 anni, ricoverato il 15/01/2025
    Diagnosi principale: Scompenso cardiaco in paziente con Fibrillazione atriale cronica
    Diagnosi secondarie: Diabete mellito tipo 2, Ipertensione arteriosa, BPCO moderata

    Anamnesi: Il paziente e noto per scompenso cardiaco con FE ridotta (30%) e fibrillazione
    atriale permanente in terapia anticoagulante. Si presenta per dispnea da sforzo ingravescente
    e ortopnea da 3 giorni.

    Esame obiettivo: PA 150/90, FC 98 bpm, SpO2 94% in aria ambiente, Temperatura 36.8°C.
    Edemi declivi bilaterali. Crepitii bibasali.

    Esami: Glicemia 187 mg/dL, HbA1c 8.2%, Creatinina 1.8 mg/dL, NT-proBNP 3450 pg/mL.

    Terapia impostata:
    - Furosemide 40 mg x 2/die ev per 3 giorni poi 25 mg os
    - Bisoprololo 2.5 mg 1 cp/die
    - Ramipril 5 mg 1 cp/die
    - Apixaban 5 mg x 2/die
    - Metformina 500 mg x 3/die
    """

    processor = ClinicalDocumentProcessor()
    doc = processor.process(nota_dimissione, patient_id='PAZ-2025-001')

    print("=== Entità Estratte ===")
    for entity in doc.entities:
        icd_str = f" [ICD-10: {entity.icd10_code}]" if entity.icd10_code else ""
        val_str = f" [Tipo: {entity.normalized_value}]" if entity.normalized_value else ""
        print(
            f"  [{entity.label:12s}] {entity.text[:50]:50s}"
            f" conf={entity.confidence:.2f}{icd_str}{val_str}"
        )

    print(f"\n=== Codici ICD-10 Estratti ===")
    for code in doc.icd_codes:
        print(f"  {code}")

    print(f"\n=== Farmaci Identificati ===")
    for med in doc.medications:
        print(f"  {med['text']} (conf: {med['confidence']})")

    print(f"\n=== Risorse FHIR Condition ===")
    fhir_conditions = processor.to_fhir_condition(doc)
    print(json.dumps(fhir_conditions[:2], indent=2, ensure_ascii=False))


if __name__ == '__main__':
    demo_clinical_nlp()

의료 데이터 개인정보 보호를 위한 연합 학습

의료 분야에서 AI의 근본적인 문제 중 하나는 다음과 같은 필요성 사이의 긴장입니다. 정확한 모델을 훈련하기 위한 대규모 데이터 세트와 중앙 집중화 불가능 민감한 환자 데이터. 연합 학습은 이 문제를 우아하게 해결합니다. 모델은 개별 병원에서 현지 교육을 받으며, 그라데이션 또는 모델 가중치 (데이터 아님)은 중앙 서버와 공유됩니다.

의료 분야에서 연합 학습이 작동하는 방식

병원 환경에서 연합 학습의 일반적인 프로세스는 다음 단계를 따릅니다.

중앙 서버는 참여하는 모든 노드에 초기 모델(가중치)을 배포합니다.
각 병원은 N 시대에 대한 자체 데이터를 기반으로 로컬에서 모델을 교육합니다.
각 노드는 원본 데이터가 아닌 가중치 델타만 서버에 보냅니다.
서버는 FedAvg 알고리즘(또는 FedProx와 같은 변형)을 사용하여 가중치를 집계합니다.
집계 모델이 재분배되고 프로세스가 반복됩니다.

실제 결과

FHIR R4와 제휴 학습을 결합한 프레임워크는 2025년 연구에서 다음과 같이 입증되었습니다.

AUC 분류에서 중앙 집중식과 비슷하거나 우수한 FL 모델의 정확도
기존 중앙 집중식 시스템에 비해 대기 시간 38% 감소
여러 병원 시스템에서 데이터 복구 성공률 95%
개별 데이터를 공유하지 않고 FHIR R4 및 GDPR을 완벽하게 준수합니다.

사용 가능한 프레임워크

PySyft(오픈마이닝): 개인 정보 보호 ML을 위한 Python 프레임워크, FL 및 보안 다자간 컴퓨팅 지원
엔비디아 플레어: 의료 기업을 위해 설계된 연합 학습 애플리케이션 런타임 환경
꽃(flwr): 프레임워크에 구애받지 않고 PyTorch 및 TensorFlow를 지원하며 사용이 간편합니다.
TensorFlow 페더레이션(TFF): 차등 개인정보 보호 지원이 내장된 Google 프레임워크

상호 운용성: FHIR, HL7 및 EHR 통합

이탈리아와 유럽의 병원 정보 시스템은 단편화되어 있습니다. CPOE(Computerized 의사 지시 입력), LIS(실험실 정보 시스템), RIS(방사선 정보 System), PACS(Picture Archiving and Communication System)는 종종 다른 언어를 사용합니다. 상호 운용성은 의료 분야의 모든 AI 프로젝트에 필요한 전제 조건입니다.

FHIR R4: AI 헬스케어의 표준

HL7 FHIR(Fast Healthcare Interoperability Resources) R4는 의료 서비스의 사실상 표준입니다. 현대 의료 상호 운용성. 각 임상 실체(환자, 상태, 약물, 관찰, 절차)로 표현되며 자원 접근 가능한 JSON REST API를 통해. FHIR이 AI 의료의 핵심인 주요 이유는 다음과 같습니다.

표준 RESTful API: ML/AI 시스템과의 통합 촉진
JSON/XML 형식: Python 파이프라인에서 직접 처리할 수 있는 구조화된 데이터
표준화된 용어: SNOMED CT, LOINC, RxNorm, ICD-10
국가 프로필: 이탈리아 HL7 Italia에서는 FSE 2.0용 FHIR 프로필을 게시합니다.
SMART on FHIR: 타사 임상 앱에 대한 OAuth2 인증

AI 헬스케어를 위한 FHIR 기술 스택

레이어	기술	기능
FHIR 서버	HAPI FHIR, Azure 건강 데이터 서비스, Google Cloud Healthcare API	스토리지 및 API FHIR R4
ETL/수집	Apache NiFi, HL7 MLLP 수신기, dbt	HL7 v2 변환 → FHIR R4
데이터 레이크	S3 또는 ADLS의 Delta Lake/Apache Iceberg	ML 훈련을 위한 분석 스토리지
피처스토어	Feast, Tecton, Databricks 피처 스토어	ML 모델의 임상 특징
ML 훈련	Databricks/SageMaker의 PyTorch, TensorFlow, scikit-learn	훈련 분류/예측 모델
모델 제공	MLflow + FastAPI, Triton 추론 서버	EHR에서 실시간 예측 제공
은둔	NVIDIA FLARE, PySyft, 차등 개인 정보 보호	FL 및 개인 정보 보호 교육

환자 흐름 최적화 및 운영 AI

진단 및 연구 외에도 의료 분야의 AI는 운영에 막대한 영향을 미칩니다. 최적화된 환자 흐름으로 대기 시간을 줄이고 과밀을 방지합니다. 응급실, 침대 공간을 최적화하고 환자 경험을 향상시킵니다.

재입원 위험 예측

30일 재입원은 의료 분야에서 가장 많이 모니터링되는 지표 중 하나입니다. 재정적으로 불이익을 받는 국가). 재입원 위험 예측을 위한 ML 모델 구조화된 데이터(진단, 절차, 약물, 실험실 값, 데이터)를 사용합니다. 인구통계) 시계열에 대한 그래디언트 부스팅 또는 LSTM을 사용하여 AUC 0.75-0.85에 도달합니다. 고위험 환자에 대한 적극적인 개입은 재입원을 15~20% 줄일 수 있습니다.

ED 크라우딩 예측

응급실 과밀화는 중요한 환자 안전 문제입니다 그리고 병원 효율성을 위해. 크라우딩 예측 모델은 시계열을 사용합니다. 액세스, 기상 데이터, 지역 행사 일정 및 독감 동향 최대 출석률을 24~72시간 전에 미리 예측하여 계획을 세울 수 있습니다. 인재를 적극적으로 활용합니다.

패혈증 조기 경고

패혈증은 중환자실에서 사망의 주요 원인입니다. 조기 경보 시스템 AI(예: 현재 많은 미국 병원에 존재하는 EPIC 패혈증 모델)는 지속적으로 모니터링합니다. 위험에 처한 환자를 식별하기 위한 활력징후 및 실험실 수치 패혈증은 전통적인 임상 기준(qSOFA, SIRS)보다 4~6시간 빠릅니다. 다기관 연구에 따르면 패혈증 사망률이 절대 3~5% 감소하는 것으로 나타났습니다. AI 경고에 따라 개입합니다.

규제: EU MDR, AI Act 및 CE 마킹

의료 분야의 AI는 가장 규제가 심한 분야 중 하나입니다. 시스템을 출시하기 전에 EU에서 임상적 영향을 미치는 AI는 이중 규제 프레임워크를 탐색해야 합니다. EU MDR/IVDR 및 AI법.

EU 의료기기 규정(MDR 2017/745)

MDR은 AI 소프트웨어를 위험에 따라 의료 기기로 분류합니다.

클래스 I: 낮은 위험(예: 관리 지원 소프트웨어)
클래스 IIa: 중간-낮은 위험(예: 약물 복용 알림)
클래스 IIb: 중간-고위험(예: 진단 지원, 치료 권장 사항)
클래스 III: 고위험(예: 생명을 위협하는 상태에 대한 자율 진단 결정)

Class IIa 이상은 전문가의 평가가 필요합니다. 인증기관 (공인인증기관). CE 마킹은 자동 스탬프가 아닌 임상 평가 보고서, 시판 후 감시 계획 및 품질을 포함하는 프로세스 관리 시스템(ISO 13485).

AI Act EU: 의료 AI 타임라인

EU AI법은 의료 분야의 AI 시스템을 다음과 같이 분류합니다. 위험 (부속서 III). AI 의료기기 구현 일정 및:

2024년 8월: AI법 시행
2025년 2월: 허용할 수 없는 위험이 있는 AI에 대한 의무 적용
2026년 8월: 범용 AI 의무(GPAI) 적용
2027년 8월: 고위험 AI 의무 적용(의료기기 포함)

AI법: 의료 분야의 고위험 AI 요구 사항

의료 분야의 고위험 AI 시스템은 다음을 충족해야 합니다.

문서화되고 지속적인 위험 관리 시스템
데이터 거버넌스: 품질, 대표성, 훈련 데이터의 편향 없음
완전하고 업데이트된 기술 문서
감사 및 추적성을 위한 작업 기록(로깅)
사용자에 대한 투명성: AI임을 공개
인간의 감독: AI 결정을 인간이 무시하는 메커니즘
정확성, 견고성 및 사이버 보안
고위험 AI 시스템에 대한 EU 데이터베이스 등록

미국 FDA AI/ML 실행 계획

FDA는 2025년 말까지 1,240개 이상의 AI/ML 장치를 승인했습니다. 프레임워크 FDA 규제 기관은 다음을 구별합니다.

잠긴 알고리즘: 고정 성능 모델에는 각 업데이트마다 새로운 510(k)/PMA가 필요합니다.
적응형 알고리즘: 지속적으로 업데이트되는 모델에는 PCCP(사전 결정된 변경 관리 계획)가 필요합니다.

2025년에 FDA가 AI 기기를 허가하는 데 소요되는 평균 시간은 다음과 같습니다. 142일, 새로운 경로 덕분에 90일 이내에 장치의 4분의 1이 승인되었습니다. 간소화되고 AI 전용 사전 제출 회의에서 진행됩니다.

의료 AI의 윤리와 편견

의료 AI의 편견은 이론적인 문제가 아닙니다. 문서화되고 측정 가능하며 해롭습니다. 환자를 위해. 실제 사례는 다음과 같습니다:

맥박 산소 측정의 인종적 편견: 2020~2022년 연구에 따르면 맥박 산소 측정기(및 해당 데이터에 대해 훈련된 ML 모델)는 포화도를 과대평가합니다. 피부색이 어두운 환자의 산소 부족으로 인해 코로나19 치료가 지연될 수 있습니다.
심장 모델의 성별 편견: 훈련 데이터 세트 심장마비 진단은 역사적으로 여성(그 증상이 심장마비의 위험은 남성과 다르기 때문에 잘못된 진단을 초래합니다.
지리적 편향: 모집단의 데이터로 훈련된 모델 유럽계 백인은 아시아인이나 아프리카인 인구 전체에 잘 일반화되지 않습니다. 유전적 요소가 강한 질병.

편견 완화 전략

의료 분야에서 공평한 AI 시스템을 개발하려면:

감사 데이터세트: 인구통계학적 대표성(연령, 성별, 민족, 지리적 출신)에 대한 체계적 분석
계층화된 평가: 인구통계학적 하위 그룹에 대한 별도의 성과 지표
공정성 측정항목: 기회균등, 인구통계학적 동등성, 그룹 간 보정
연합 학습: 데이터를 중앙 집중화하지 않고 다양한 모집단에 대한 교육
설명 가능성(XAI): SHAP 가치, 주의 지도, 투명한 의사결정을 위한 LIME
전향적 임상 검증: 배포 전 훈련 대상이 아닌 모집단에 대한 테스트

사례 연구: 이탈리아 의료 시스템의 AI

이탈리아의 의료 분야 AI 파노라마는 다음과 같은 이유로 빠르게 진화하고 있습니다. PNRR 투자. 몇 가지 구체적인 예:

Agostino Gemelli University 폴리클리닉 재단(로마)

Gemelli는 2024~2025년에 여러 AI 프로젝트를 활성화했습니다.

대장내시경을 통한 대장암 검진을 위한 AI 시스템(CADe)으로 폴립 누락률 17% 감소
심장 수술 후 재입원 위험 예측 모델(AUC 0.79)
ESF 2.0의 사직서 자동 구조화를 위한 NLP

IRCCS 유럽 종양학 연구소(IEO, 밀라노)

IEO는 학계 파트너와 협력하여 다음 모델을 개발했습니다.

유방암 검진을 위한 유방촬영 영상 분석
전립선암에 대한 디지털 병리 이미지의 AI 등급(Gleason 등급)
방사선 영상(방사선학)을 통한 화학요법 반응 예측

PNRR 및 전자 건강 기록 2.0

PNRR이 할당한 16억 7천만 유로 헬스케어 디지털화를 위한 이탈리아어. FHIR R4 표준을 기반으로 하는 전자 건강 기록 2.0(FSE 2.0)은 미래 AI 프로젝트를 가능하게 하는 데이터 인프라를 나타냅니다. 2025년까지 80% 의 이탈리아 건강 문서가 ESF에서 디지털 방식으로 이용 가능해야 하며, 연구 및 AI를 위한 대규모 종단적 데이터 세트 생성(적절한 프레임워크 사용) 거버넌스와 합의).

의료 분야 AI 프로젝트 모범 사례

체크리스트: AI 헬스케어 프로젝트

거버넌스 및 규정 준수: GDPR, EU MDR(해당하는 경우), AI법 위험 평가 완료
편견 감사: 인구통계학적 대표성을 위해 분석된 데이터 세트
설명 가능성: 디버깅 및 임상적 신뢰성을 위해 구현된 SHAP 또는 주의 지도
임상 검증: 학습/테스트 분할뿐만 아니라 독립적인 데이터에 대한 전향적 검증
인간 참여형: 의사는 항상 마지막 단어인 AI와 "두 번째 독자"를 가지고 있습니다.
모니터링: 입력 데이터 및 모델 성능에 대한 드리프트 감지
FHIR 통합: EHR과의 통합을 위한 FHIR 형식의 모델 출력
기술 문서: 모델 카드, 데이터 시트, 용도 및 알려진 제한 사항
사고 관리: 모델 실패 처리를 위한 문서화된 프로세스
지속적인 학습: 회귀 없이 시간이 지남에 따라 모델을 업데이트할 계획

피해야 할 안티패턴

훈련-제공 편향: 과거 데이터에 대한 교육 및 데이터 배포 다양한 분포로 실시간으로. 의료 분야에서는 인구가 변화합니다(새로운 병원체, 인구통계학적 변화), 지속적인 모니터링이 필요합니다.
회고적 데이터에 대한 과적합: 회고적 데이터 세트에는 종종 라벨 편향(진단되지 않은 사례는 기록에 표시되지 않음) 잠재 코호트 사용 가능한 경우.
워크플로 통합 무시: 방해하는 정확한 모델 임상 작업흐름은 채택되지 않습니다. 최소한의 마찰로 기존 EHR에 통합합니다.
불확실성 정량화 부족: 모델은 의사소통을 해야 한다 언제가 확실하지 않습니다. 신뢰 구간이 없는 예측은 의료 분야에서 위험합니다.

결론 및 다음 단계

의료 분야의 AI는 성숙 단계를 거치고 있습니다. 더 이상 실험이 필요하지 않습니다. 학문적이지만 측정 가능한 영향을 미치는 실제 임상 배포. 숫자는 명확하게 말해줍니다: 1,240개 이상의 FDA 승인 AI 장치, 75개 이상의 임상 시험 중인 AI 분자, 시장 이탈리아의 디지털 건강은 73억 8천만 달러로 CAGR 13.6% 성장했습니다.

2025-2027년 이탈리아의 주요 기회는 다음과 같습니다.

국가 규모의 AI 지원 데이터 인프라인 FSE 2.0
의료 문서의 자동 구조화 및 ICD 코딩을 위한 임상 NLP
방사선 전문의의 인력 부족이 현실인 종양 검진(유방촬영, 대장내시경)을 위한 AI
GDPR을 준수하는 병원 간 협업을 위한 연합 학습
병원 비용 절감을 위한 환자 흐름 최적화 및 재입원 예측

규제(EU MDR + AI법) 문제를 걸림돌로 여겨서는 안 됩니다. 그러나 신뢰 프레임워크로서: 인증 가능한 AI 시스템 구축 및 채택으로 가는 길 대규모 진료소. 현재 투자하고 있는 기업 및 병원 IT 팀 설계에 의한 규정 준수는 2027년에 상당한 경쟁 우위를 갖게 될 것입니다. 고위험에 대한 AI법 의무가 완전히 시행될 것입니다.

시리즈에서 계속

이전의: 소매업의 AI: 수요 예측 및 추천 엔진 - AI가 수요, 가격, 추천을 최적화하는 방법
다음: 물류의 AI: 경로 최적화 및 창고 자동화 - VRP, 라스트 마일 배송 및 자동 피킹
관련(MLOps): 비즈니스용 MLOps: MLflow를 사용하여 프로덕션 중인 AI 모델 - 의료 모델을 프로덕션에 적용하는 방법
관련 (AI 엔지니어링): 비즈니스 LLM: RAG Enterprise, 미세 조정 및 가드레일 - 임상 의사결정 지원을 위한 LLM