안녕하세요!

Federico Calò

Sviluppatore Software | Divulgatore Tecnico

Creo applicazioni web moderne e strumenti digitali personalizzati per aiutare le attività a crescere attraverso l'innovazione tecnologica. La mia passione è unire informatica ed economia per generare valore reale.

연락하기

소개

La mia passione per l'informatica è nata tra i banchi dell'Istituto Tecnico Commerciale di Maglie, dove ho scoperto il potere della programmazione e il fascino di creare soluzioni digitali. Fin da subito, ho capito che l'informatica non era solo codice, ma uno strumento straordinario per trasformare idee in realtà.

Durante gli studi superiori in Sistemi Informativi Aziendali, ho iniziato a intrecciare informatica ed economia, comprendendo come la tecnologia possa essere il motore della crescita per qualsiasi attività. Questa visione mi ha accompagnato all'Università degli Studi di Bari, dove ho conseguito la Laurea in Informatica, approfondendo le mie competenze tecniche e la mia passione per lo sviluppo software.

Oggi metto questa esperienza al servizio di imprese, professionisti e startup, creando soluzioni digitali su misura che automatizzano processi, ottimizzano risorse e aprono nuove opportunità di business. Perché la vera innovazione inizia quando la tecnologia incontra le esigenze reali delle persone.

역량

Analisi Dati & Modelli Previsionali

Trasformo i dati in insights strategici con analisi approfondite e modelli predittivi per decisioni informate

프로세스 자동화

Creo strumenti personalizzati che automatizzano operazioni ripetitive e liberano tempo per attività a valore aggiunto

맞춤 시스템

Sviluppo sistemi software su misura, dalle integrazioni tra piattaforme alle dashboard personalizzate

const federico = {
  nome: "Federico Calò",
  ruolo: "Sviluppatore Software",
  città: "Bari, Italia",
  missione: "Aiutare attraverso l'informatica",
  passioni: [
    "Codice Pulito",
    "Innovazione",
    "Crescita Continua"
  ]
};

미션

Credo fermamente che l'informatica sia lo strumento più potente per trasformare le idee in realtà e migliorare la vita delle persone.

🚀

기술의 민주화

La mia missione è rendere l'informatica accessibile a tutti: dalle piccole imprese locali alle startup innovative, fino ai professionisti che vogliono digitalizzare la propria attività. Ogni realtà merita di sfruttare le potenzialità del digitale.

💡

IT와 비즈니스 통합

Non è solo questione di scrivere codice: è capire come la tecnologia possa generare valore reale. Intrecciando competenze informatiche e visione economica, aiuto le attività a crescere, ottimizzare processi e raggiungere nuovi traguardi di efficienza e redditività.

🎯

맞춤 솔루션

Ogni attività è unica, e così devono esserlo le soluzioni. Sviluppo strumenti personalizzati che rispondono alle esigenze specifiche di ciascun cliente, automatizzando processi ripetitivi e liberando tempo per ciò che conta davvero: far crescere il business.

기술로 비즈니스를 혁신하세요

Dicembre 2024

Visualizza

Master SQL

RoadMap.sh

Novembre 2024

Visualizza

Oracle Certified Foundations Associate

Oracle

Ottobre 2024

Visualizza

People Leadership Credential

Connect

Settembre 2024

💻 Linguaggi & Tecnologie

☕Java

🐍Python

📜JavaScript

🅰️Angular

⚛️React

🔷TypeScript

🗄️SQL

🐘PHP

🎨CSS/SCSS

🔧Node.js

🐳Docker

🌿Git

💼

12/2024 - Presente

Custom Software Engineering Analyst

Accenture

Bari, Puglia, Italia · Ibrida Analisi e sviluppo di sistemi informatici attraverso l'utilizzo di Java e Quarkus in Health and Public Sector. Formazione continua su tecnologie moderne per la creazione di soluzioni software personalizzate ed efficienti e sugli agenti.

💼

06/2022 - 12/2024

Analista software e Back End Developer Associate Consultant

Links Management and Technology SpA

Esperienza nell'analisi di sistemi software as-is e flussi ETL utilizzando PowerCenter. Formazione completata su Spring Boot per lo sviluppo di applicazioni backend moderne e scalabili. Sviluppatore Backend specializzato in Spring Boot, con esperienza in progettazione di database, analisi, sviluppo e testing dei task assegnati.

💼

02/2021 - 10/2021

Programmatore software

Adesso.it (prima era WebScience srl)

Esperienza nell'analisi AS-IS e TO-BE, evoluzioni SEO ed evoluzioni website per migliorare le performance e l'engagement degli utenti.

🎓

2018 - 2025

Laurea in Informatica

Università degli Studi di Bari Aldo Moro

Bachelor's degree in Computer Science, focusing on software engineering, algorithms, and modern development practices.

📚

2013 - 2018

Diploma - Sistemi Informativi Aziendali

Istituto Tecnico Commerciale di Maglie

Technical diploma specializing in Business Information Systems, combining IT knowledge with business management.

연락하기

프로젝트가 있으신가요? 아래 양식을 작성해 주시면 빠르게 답변드리겠습니다.

* Campi obbligatori. I tuoi dati saranno utilizzati solo per rispondere alla tua richiesta.

AI 언더라이팅: 현대 보험의 기능 엔지니어링 및 위험 평가

보험 인수는 모든 보험 회사의 핵심입니다. 위험을 감수할지 여부, 가격, 조건을 결정합니다. 수십년 동안 이 과정은 이는 종이 문서를 분석하고 적용하는 인간 보험사의 전문 지식에 달려 있었습니다. 표로 정리된 보험계리 규칙. 결과는? 영업일 기준 3~5일 이내에 결정, 비용 높은 운영 비용과 보험업자 간의 주관적인 변동성.

인공지능은 이러한 규칙을 근본적으로 다시 작성하고 있습니다. 맥킨지에 따르면, 보험용 AI 솔루션에 대한 글로벌 투자가 역대 최대 규모를 넘어설 것입니다. 60억 달러 2025년에, BCG는 다음과 같이 추정합니다. 보험 AI 전체 가치의 36% 정확하게는 인수 기능에 집중되어 있습니다. 운영 수치도 마찬가지로 인상적입니다. 평균 인수 결정 시간이 3~5일에서 12.4분 그들을 위해 표준정책으로 위험평가 정확도 99.3%를 유지하고 있습니다.

그런데 AI 인수 시스템은 실제로 어떻게 작동할까요? 이 가이드는 모든 것을 분해합니다. 기술 스택: 기능 수집 및 엔지니어링부터 위험 점수 모델, 최대 생산 준비가 완료된 실제 코드 예제를 통해 해석 가능성 및 편향 관리를 지원합니다.

무엇을 배울 것인가

엔드투엔드 AI 인수 시스템 아키텍처
보험 분야에 특화된 기능 엔지니어링
위험 평가를 위한 ML 모델: XGBoost, 2개의 빈도/심각도 단계
감사 가능하고 규정 준수에 대비한 결정을 위한 SHAP의 해석성
EU 규제 맥락에서 편견 탐지 및 공정성 완화
MLflow를 사용하여 프로덕션에서 모델 인수를 위한 MLOps
PSI(인구 안정성 지수)를 사용한 데이터 드리프트 모니터링

Underwriting 프로세스: 레거시에서 AI 기반까지

AI 시스템을 설계하기 전에 우리가 다루고 있는 전통적인 워크플로우를 이해하는 것이 필수적입니다. 자동화. 언더라이팅 프로세스는 4가지 기본 단계로 구분됩니다.

정보 수집: 신청자는 자신과 위험에 대한 데이터(설문지, 문서, 자산의 가능한 물리적 검사)를 제공합니다.
위험 분석: 보험업자는 향후 청구의 가능성과 심각도를 평가합니다.
가격: 평가된 위험과 포트폴리오의 결합 비율 목표를 기반으로 프리미엄 결정
결정: 조건이 있는 수락, 거부 또는 수락(제외, 프랜차이즈, 프리미엄)

AI 기반 시스템은 이러한 단계를 제거하는 것이 아니라 이를 근본적으로 변환합니다. 데이터 수집 위험 분석은 이기종 소스(공개 데이터, 텔레매틱스, 신용 조사 기관)에서 자동으로 수행됩니다. ML 모델에 의해 밀리초 단위로 수행되며 가격은 각각에 대해 동적이며 개인화됩니다. 신청자이며 i에 대한 사람의 감독을 통해 표준 사례에 대한 결정이 자동화됩니다. 복잡하거나 경계선에 있는 경우.

규제 프레임워크: AI Act EU 및 Underwriting

유럽 AI법(2027년 8월부터 완전 시행)은 시스템을 분류합니다. 득점 신용과 보험 ~처럼 고위험 AI(부속서 III). 이 자동화된 결정의 투명성, 사람의 검토 권한, 자세한 기술 문서 및 시판 전 적합성 평가. 디자인 AI underwriting systems must incorporate these requirements right from the architecture, not how 후속 개조.

보험 인수를 위한 기능 엔지니어링

특성 엔지니어링의 품질은 인수 모델을 가장 차별화하는 요소입니다. 평범함에서 우수함. 기능이 다음과 같은 컴퓨터 비전과 같은 도메인과 달리 컨볼루셔널 레이어에서 자동으로 추출되는 보험 테이블 형식 데이터에는 다음이 필요합니다. deep manual engineering based on actuarial domain knowledge.

자동차 부문의 기능은 5가지 주요 범주로 분류됩니다.

인구통계학적 특성: 나이, 결혼 여부, 거주 유형
운전 기능: 운전면허 보유년수, 처음 취득한 나이, 사고 및 위반 이력
차량 특징: 제조사, 모델, 연도, 값, 힘, 등록 연도
지리적 특징: 도시 밀도, 해당 지역의 범죄 지수, 날씨 위험
경제적 특징: 신용점수, 보험계약 종류 필수

import pandas as pd
import numpy as np
from typing import Dict, Optional
from dataclasses import dataclass
from datetime import date


@dataclass
class PolicyApplicant:
    """Rappresenta i dati grezzi di un richiedente polizza auto."""
    applicant_id: str
    birth_date: date
    license_date: date
    zip_code: str
    vehicle_make: str
    vehicle_year: int
    vehicle_value: float
    annual_mileage: int
    claims_3yr: int
    violations_3yr: int
    credit_score: Optional[int] = None
    marital_status: str = "single"
    housing_type: str = "tenant"


class AutoInsuranceFeatureEngineer:
    """
    Feature engineering per underwriting auto.
    Produce 40+ feature da dati grezzi del richiedente,
    includendo feature derivate, interazioni e encoding
    domain-specific.
    """

    VEHICLE_MAKE_RISK: Dict[str, int] = {
        "Ferrari": 5, "Lamborghini": 5, "Porsche": 4,
        "BMW": 3, "Mercedes": 3, "Audi": 3,
        "Toyota": 1, "Honda": 1, "Volkswagen": 2,
        "Ford": 2, "Fiat": 2, "Renault": 2,
    }

    def __init__(self, reference_date: Optional[date] = None):
        self.reference_date = reference_date or date.today()

    def engineer_features(self, applicant: PolicyApplicant) -> Dict[str, float]:
        features: Dict[str, float] = {}
        features.update(self._demographic_features(applicant))
        features.update(self._driving_experience_features(applicant))
        features.update(self._vehicle_features(applicant))
        features.update(self._claims_features(applicant))
        features.update(self._geographic_features(applicant))
        if applicant.credit_score is not None:
            features.update(self._credit_features(applicant))
        features.update(self._interaction_features(features))
        return features

    def _demographic_features(self, applicant: PolicyApplicant) -> Dict[str, float]:
        age = (self.reference_date - applicant.birth_date).days / 365.25
        return {
            "age": age,
            "age_squared": age ** 2,
            "age_under_25": float(age < 25),
            "age_over_70": float(age > 70),
            "age_risk_young": max(0.0, (25 - age) / 25) if age < 25 else 0.0,
            "age_risk_senior": max(0.0, (age - 70) / 20) if age > 70 else 0.0,
            "is_married": float(applicant.marital_status == "married"),
            "is_homeowner": float(applicant.housing_type == "owner"),
        }

    def _driving_experience_features(self, applicant: PolicyApplicant) -> Dict[str, float]:
        years_licensed = (self.reference_date - applicant.license_date).days / 365.25
        age = (self.reference_date - applicant.birth_date).days / 365.25
        age_at_license = age - years_licensed
        return {
            "years_licensed": years_licensed,
            "years_licensed_squared": years_licensed ** 2,
            "age_at_first_license": age_at_license,
            "late_license_ratio": max(0.0, (age_at_license - 18) / 10),
            "is_new_driver": float(years_licensed < 2),
            "is_experienced_driver": float(years_licensed > 10),
        }

    def _vehicle_features(self, applicant: PolicyApplicant) -> Dict[str, float]:
        vehicle_age = self.reference_date.year - applicant.vehicle_year
        make_risk = self.VEHICLE_MAKE_RISK.get(applicant.vehicle_make, 2)
        return {
            "vehicle_age": float(vehicle_age),
            "vehicle_value": applicant.vehicle_value,
            "vehicle_value_log": np.log1p(applicant.vehicle_value),
            "vehicle_make_risk_score": float(make_risk),
            "is_high_performance": float(make_risk >= 4),
            "is_new_vehicle": float(vehicle_age <= 2),
            "is_old_vehicle": float(vehicle_age > 10),
            "annual_mileage": float(applicant.annual_mileage),
            "annual_mileage_log": np.log1p(applicant.annual_mileage),
            "high_mileage": float(applicant.annual_mileage > 20000),
        }

    def _claims_features(self, applicant: PolicyApplicant) -> Dict[str, float]:
        claims = applicant.claims_3yr
        violations = applicant.violations_3yr
        return {
            "claims_3yr": float(claims),
            "violations_3yr": float(violations),
            "has_any_claim": float(claims > 0),
            "has_multiple_claims": float(claims > 1),
            "has_violations": float(violations > 0),
            # Score combinato ponderato: sinistri pesano 3x rispetto a infrazioni
            "incident_score": claims * 3.0 + violations * 1.5,
            "claims_x_violations": float(claims * violations),
        }

    def _geographic_features(self, applicant: PolicyApplicant) -> Dict[str, float]:
        # In produzione: lookup su DB geografici (ISTAT, OpenStreetMap, criminalita)
        zip_hash = hash(applicant.zip_code) % 100
        urban_score = (zip_hash % 5) / 4.0
        crime_index = (zip_hash % 3) / 2.0
        weather_risk = (zip_hash % 4) / 3.0
        return {
            "urban_density_score": urban_score,
            "area_crime_index": crime_index,
            "area_weather_risk": weather_risk,
            "composite_geo_risk": (urban_score + crime_index + weather_risk) / 3,
        }

    def _credit_features(self, applicant: PolicyApplicant) -> Dict[str, float]:
        score = applicant.credit_score or 0
        return {
            "credit_score": float(score),
            "credit_score_normalized": (score - 300) / (850 - 300),
            "poor_credit": float(score < 580),
            "fair_credit": float(580 <= score < 670),
            "good_credit": float(670 <= score < 740),
            "excellent_credit": float(score >= 740),
        }

    def _interaction_features(self, features: Dict[str, float]) -> Dict[str, float]:
        return {
            # Giovane + auto sportiva = rischio molto alto
            "young_high_perf": (
                features.get("age_risk_young", 0) *
                features.get("is_high_performance", 0)
            ),
            # Sinistri + area ad alto crimine amplificano il rischio
            "claims_urban": (
                features.get("claims_3yr", 0) *
                features.get("urban_density_score", 0)
            ),
            # Mileage alto + veicolo vecchio = rischio meccanico aumentato
            "mileage_old_vehicle": (
                features.get("annual_mileage_log", 0) *
                features.get("is_old_vehicle", 0)
            ),
        }

위험 점수 모델: 접근 방식 및 장단점

위험 평가를 위한 기계 학습 모델 선택은 정확성의 균형을 맞춰야 합니다. 예측성, 해석 가능성(규정 준수의 기본), 추론 속도 및 용이성 유지 관리. 보험업계의 주요 접근방식은 다음과 같습니다.

보험 위험 점수 모델 비교

모델	정확성	해석 가능성	이상적인 사용 사례
GLM(푸아송/감마)	평균	매우 높음	보험수리적 기준, 규제 수용
랜덤 포레스트	높은	평균	기능 중요도, 이상치에 대한 견고성
XGBoost / LightGBM	매우 높음	평균	표준 제작, 표 형식 데이터의 SOTA
테이블 형식 신경망	높은	낮은	범주형 임베딩을 사용한 복잡한 기능

업계에서 가장 확립된 접근 방식은 2단계 모델: 모델 빈도(적어도 한 번 이상 사고가 발생할 확률)와 심각도(사고의 예상 비용) 발생한다는 점을 고려하면). 기대되는 순수 프리미엄은 다음과 같습니다. 빈도 x 심각도.

import xgboost as xgb
from sklearn.metrics import mean_absolute_error, mean_squared_error
import numpy as np
import pandas as pd
from typing import Dict, Optional


class TwoStageRiskScorer:
    """
    Modello a due stadi per pricing assicurativo auto.

    Stage 1: Frequency model (Poisson regression con XGBoost)
             Target = numero sinistri per polizza
    Stage 2: Severity model (Tweedie/Gamma con XGBoost)
             Target = importo sinistro, addestrato solo su polizze con sinistri

    Pure Premium = E[Frequency] * E[Severity | has_claim]
    """

    FREQUENCY_PARAMS: Dict = {
        "objective": "count:poisson",
        "eval_metric": "poisson-nloglik",
        "max_depth": 6,
        "learning_rate": 0.05,
        "n_estimators": 500,
        "min_child_weight": 50,  # stabilità attuariale: min sinistri per leaf
        "subsample": 0.8,
        "colsample_bytree": 0.8,
        "reg_alpha": 0.1,
        "reg_lambda": 1.0,
        "tree_method": "hist",
        "early_stopping_rounds": 50,
    }

    SEVERITY_PARAMS: Dict = {
        "objective": "reg:tweedie",
        "tweedie_variance_power": 1.5,  # 1=Poisson, 2=Gamma
        "eval_metric": "tweedie-nloglik@1.5",
        "max_depth": 5,
        "learning_rate": 0.05,
        "n_estimators": 300,
        "min_child_weight": 30,
        "subsample": 0.8,
        "colsample_bytree": 0.7,
        "reg_alpha": 0.1,
        "reg_lambda": 1.0,
        "tree_method": "hist",
        "early_stopping_rounds": 30,
    }

    def __init__(self) -> None:
        self.frequency_model = xgb.XGBRegressor(**self.FREQUENCY_PARAMS)
        self.severity_model = xgb.XGBRegressor(**self.SEVERITY_PARAMS)
        self.feature_names: list = []

    def fit(
        self,
        X: pd.DataFrame,
        y_claims: pd.Series,
        y_amounts: pd.Series,
        exposure: pd.Series,
        eval_fraction: float = 0.2,
    ) -> "TwoStageRiskScorer":
        """
        Addestra entrambi i modelli.

        IMPORTANTE: usa split temporale, non random shuffle.
        I dati assicurativi sono autocorrelati nel tempo.
        """
        self.feature_names = X.columns.tolist()
        split_idx = int(len(X) * (1 - eval_fraction))

        X_train, X_val = X.iloc[:split_idx], X.iloc[split_idx:]
        freq_train = y_claims.iloc[:split_idx]
        freq_val = y_claims.iloc[split_idx:]

        # Stage 1: Frequency
        self.frequency_model.fit(
            X_train, freq_train,
            sample_weight=exposure.iloc[:split_idx],
            eval_set=[(X_val, freq_val)],
            verbose=50,
        )

        # Stage 2: Severity - solo su polizze con sinistri
        has_claim = y_amounts > 0
        X_sev = X[has_claim]
        y_sev = y_amounts[has_claim]
        sev_split = int(len(X_sev) * (1 - eval_fraction))

        self.severity_model.fit(
            X_sev.iloc[:sev_split], y_sev.iloc[:sev_split],
            eval_set=[(X_sev.iloc[sev_split:], y_sev.iloc[sev_split:])],
            verbose=30,
        )
        return self

    def predict_pure_premium(
        self, X: pd.DataFrame, exposure: float = 1.0
    ) -> np.ndarray:
        """Calcola il pure premium: E[Freq] * E[Severity]."""
        freq = self.frequency_model.predict(X) * exposure
        sev = self.severity_model.predict(X)
        return freq * sev

    def evaluate(self, X: pd.DataFrame, y_claims: pd.Series) -> Dict[str, float]:
        pred = self.frequency_model.predict(X)
        mae = mean_absolute_error(y_claims, pred)
        rmse = float(np.sqrt(mean_squared_error(y_claims, pred)))
        gini = self._gini_coefficient(y_claims.values, pred)
        lift = self._lift_at_decile(y_claims.values, pred, 0.1)
        return {
            "mae": round(mae, 6),
            "rmse": round(rmse, 6),
            "gini_coefficient": round(gini, 4),
            "lift_top_decile": round(lift, 4),
        }

    def _gini_coefficient(self, actual: np.ndarray, predicted: np.ndarray) -> float:
        """Gini coefficient: metrica attuariale standard per modelli di frequenza."""
        idx = np.argsort(predicted)
        cum = np.cumsum(actual[idx])
        cum_norm = cum / cum[-1]
        n = len(actual)
        lorenz_area = float(np.sum(cum_norm)) / n
        return 2 * (lorenz_area - 0.5)

    def _lift_at_decile(
        self, actual: np.ndarray, predicted: np.ndarray, decile: float
    ) -> float:
        k = max(1, int(len(actual) * decile))
        top_idx = np.argsort(predicted)[-k:]
        base_rate = actual.mean()
        if base_rate == 0:
            return 0.0
        return float(actual[top_idx].mean() / base_rate)

SHAP를 통한 해석성: 감사 가능한 결정

보험과 같이 규제된 상황에서는 블랙박스 모델만으로는 충분하지 않습니다. 법률에 따르면 인수 결정은 다음과 같이 설명 가능해야 합니다. 고객(오른쪽 GDPR 설명), 보험업자(경계선 사례 검토) 및 규제 기관용 (솔벤시 II 원칙 3, ORSA). SHAP(SHapley Additive exPlanations)는 참조 도구입니다. 앙상블 모델의 사후 해석 가능성을 위한 업계.

import shap
import pandas as pd
import numpy as np
from typing import Dict, List, Tuple


class UnderwritingExplainer:
    """
    Spiegazioni SHAP per decisioni underwriting.
    Genera output a tre livelli: cliente, underwriter, compliance.
    """

    FEATURE_LABELS: Dict[str, str] = {
        "age": "eta del guidatore",
        "years_licensed": "anni di patente",
        "claims_3yr": "sinistri negli ultimi 3 anni",
        "violations_3yr": "infrazioni negli ultimi 3 anni",
        "vehicle_make_risk_score": "categoria rischio veicolo",
        "vehicle_age": "anzianita del veicolo",
        "vehicle_value": "valore del veicolo",
        "annual_mileage": "chilometraggio annuo dichiarato",
        "composite_geo_risk": "rischio della zona geografica",
        "credit_score": "score creditizio",
        "young_high_perf": "combinazione giovane + veicolo sportivo",
    }

    def __init__(self, model, feature_names: List[str]) -> None:
        self.feature_names = feature_names
        self.explainer = shap.TreeExplainer(model)

    def explain(
        self, X_row: pd.DataFrame, risk_score: float
    ) -> Dict:
        """Spiegazione completa per una singola valutazione."""
        shap_values = self.explainer.shap_values(X_row)

        impacts: List[Tuple[str, float]] = sorted(
            zip(self.feature_names, shap_values[0]),
            key=lambda x: abs(x[1]),
            reverse=True
        )

        return {
            "risk_score": round(risk_score, 2),
            "decision": self._score_to_decision(risk_score),
            "customer_message": self._customer_message(impacts, risk_score),
            "top_risk_factors": [
                {
                    "name": name,
                    "label": self.FEATURE_LABELS.get(name, name),
                    "direction": "aumenta rischio" if shap > 0 else "riduce rischio",
                    "magnitude": round(abs(shap), 4),
                }
                for name, shap in impacts[:5]
            ],
            "audit_trail": {
                "base_expected_value": float(self.explainer.expected_value),
                "all_shap_values": {
                    n: round(float(s), 6)
                    for n, s in zip(self.feature_names, shap_values[0])
                },
                "input_features": X_row.to_dict(orient="records")[0],
            },
        }

    def _customer_message(
        self, impacts: List[Tuple[str, float]], score: float
    ) -> str:
        high = [(n, v) for n, v in impacts if abs(v) > 0.1]
        if not high:
            return "Il tuo profilo rientra nella fascia di rischio standard."
        positivi = [self.FEATURE_LABELS.get(n, n) for n, v in high[:3] if v < 0]
        negativi = [self.FEATURE_LABELS.get(n, n) for n, v in high[:3] if v > 0]
        parts = []
        if negativi:
            parts.append(f"Fattori che aumentano il profilo di rischio: {', '.join(negativi)}.")
        if positivi:
            parts.append(f"Fattori a tuo favore: {', '.join(positivi)}.")
        return " ".join(parts)

    def _score_to_decision(self, score: float) -> str:
        if score < 30:
            return "ACCEPT_PREFERRED"
        elif score < 60:
            return "ACCEPT_STANDARD"
        elif score < 80:
            return "ACCEPT_SUBSTANDARD"
        return "DECLINE_OR_MANUAL_REVIEW"

EU 맥락에서 공정성과 편견 탐지

대리변수(우편번호, 신용점수) 사용으로 인해 간접적인 차별이 발생할 수 있음 법으로 금지되어 있습니다. 유럽에서는 성평등 지침( 2011년 EU 사법 재판소의 Test-Achats 판결)은 가격 결정에 성별을 사용하는 것을 금지합니다. 보험. AI법은 부록 III에 분류된 고위험 시스템에 대한 제약을 추가합니다. 배포 전에 필수 규정 준수 평가가 필요합니다.

import pandas as pd
import numpy as np
from sklearn.metrics import confusion_matrix
from typing import Dict, List


class FairnessAuditor:
    """
    Auditor di fairness per modelli underwriting (EU-compliant).

    Metriche implementate:
    - Disparate Impact (regola 80%)
    - Demographic Parity Gap
    - Equal Opportunity (TPR parity)
    - Calibration by group
    """

    DISPARATE_IMPACT_THRESHOLD = 0.8  # EEOC 80% rule
    MAX_DP_GAP = 0.1                   # linee guida EIOPA

    def __init__(
        self,
        predictions: np.ndarray,
        true_labels: np.ndarray,
        sensitive_df: pd.DataFrame,
    ) -> None:
        self.predictions = predictions
        self.true_labels = true_labels
        self.sensitive_df = sensitive_df

    def full_audit(self) -> Dict:
        results: Dict = {}

        for attr in self.sensitive_df.columns:
            groups = self.sensitive_df[attr].unique()
            attr_results: Dict = {}

            for group in groups:
                mask = self.sensitive_df[attr] == group
                g_pred = self.predictions[mask]
                g_true = self.true_labels[mask]

                attr_results[str(group)] = {
                    "count": int(mask.sum()),
                    "acceptance_rate": float((g_pred < 0.6).mean()),
                    "avg_score": round(float(g_pred.mean()), 4),
                    "tpr": self._tpr(g_true, g_pred),
                }

            di = self._disparate_impact(attr_results)
            dp = self._dp_gap(attr_results)

            attr_results["_metrics"] = {
                "disparate_impact": round(di, 4),
                "demographic_parity_gap": round(dp, 4),
                "passes_di_rule": di >= self.DISPARATE_IMPACT_THRESHOLD,
                "passes_dp_rule": dp <= self.MAX_DP_GAP,
                "overall_fair": di >= self.DISPARATE_IMPACT_THRESHOLD and dp <= self.MAX_DP_GAP,
            }
            results[attr] = attr_results

        return results

    def _tpr(self, labels: np.ndarray, preds: np.ndarray, thr: float = 0.5) -> float:
        if len(labels) < 10:
            return float("nan")
        binary = (preds >= thr).astype(int)
        try:
            tn, fp, fn, tp = confusion_matrix(labels, binary, labels=[0, 1]).ravel()
            return round(tp / (tp + fn), 4) if (tp + fn) > 0 else 0.0
        except ValueError:
            return float("nan")

    def _disparate_impact(self, groups: Dict) -> float:
        rates = [v["acceptance_rate"] for k, v in groups.items()
                 if not k.startswith("_") and isinstance(v, dict)]
        if not rates or max(rates) == 0:
            return 1.0
        return min(rates) / max(rates)

    def _dp_gap(self, groups: Dict) -> float:
        rates = [v["acceptance_rate"] for k, v in groups.items()
                 if not k.startswith("_") and isinstance(v, dict)]
        return (max(rates) - min(rates)) if rates else 0.0

프로덕션 환경의 MLOps 및 모니터링

언더라이팅 모델에는 다음이 적용됩니다. 컨셉 드리프트 자주: 프로필 신청자 변경(전기차 신모델, 인구통계학적 변화), 비용 수리 비용이 인플레이션을 겪고 극단적인 기후 현상으로 인해 손실 패턴이 변경됩니다. 지속적인 모니터링 시스템 인구안정지수(PSI) 전자 모델을 다시 견인해야 하는 시기를 식별하는 데 필수적입니다.

from scipy import stats
import numpy as np
import pandas as pd
from typing import Dict, List
from datetime import datetime


class DriftMonitor:
    """
    Monitora data drift per modelli underwriting.
    Usa PSI (Population Stability Index) come metrica primaria.

    PSI interpretation:
    - PSI < 0.1:  Nessun cambiamento significativo
    - PSI 0.1-0.25: Cambiamento moderato, monitorare
    - PSI > 0.25:  Cambiamento significativo, retraining consigliato
    """

    def __init__(self, reference_df: pd.DataFrame, features: List[str]) -> None:
        self.reference_df = reference_df
        self.features = features

    def check_drift(self, current_df: pd.DataFrame) -> Dict:
        feature_results: Dict = {}
        critical_features = []

        for feat in self.features:
            if feat not in current_df.columns:
                continue

            psi = self._psi(self.reference_df[feat], current_df[feat])
            ks_stat, ks_p = stats.ks_2samp(
                self.reference_df[feat].dropna(),
                current_df[feat].dropna()
            )

            status = "ok" if psi < 0.1 else ("warning" if psi < 0.25 else "critical")

            feature_results[feat] = {
                "psi": round(psi, 4),
                "ks_statistic": round(ks_stat, 4),
                "ks_pvalue": round(ks_p, 4),
                "status": status,
            }

            if status == "critical":
                critical_features.append(feat)

        avg_psi = float(np.mean([v["psi"] for v in feature_results.values()]))

        return {
            "checked_at": datetime.now().isoformat(),
            "overall_psi": round(avg_psi, 4),
            "retraining_recommended": avg_psi > 0.1,
            "critical_features": critical_features,
            "feature_details": feature_results,
        }

    def _psi(self, ref: pd.Series, cur: pd.Series, bins: int = 10) -> float:
        ref_clean = ref.dropna().values
        cur_clean = cur.dropna().values
        edges = np.percentile(ref_clean, np.linspace(0, 100, bins + 1))
        edges = np.unique(edges)

        ref_counts, _ = np.histogram(ref_clean, bins=edges)
        cur_counts, _ = np.histogram(cur_clean, bins=edges)

        ref_pct = (ref_counts + 1e-10) / len(ref_clean)
        cur_pct = (cur_counts + 1e-10) / len(cur_clean)

        return float(np.sum((cur_pct - ref_pct) * np.log(cur_pct / ref_pct)))

모범 사례 및 안티패턴

AI 언더라이팅 모범 사례

2단계 아키텍처(빈도/심각도): 및 보험계리적 표준을 적용하여 청구 금액에 대해 단일 모델보다 더 정확한 가격을 산출합니다.
필수 시간 분할: 보험 데이터는 시간이 지남에 따라 자기상관되어 있습니다. 학습/테스트 분할에 무작위 셔플을 사용하지 마세요.
오프셋 노출: 항상 보험 기간(노출 연도)을 포아송 모델의 오프셋으로 사용하여 청구 건수를 정규화합니다.
기본 GLM 유지: 일반화된 선형 모델은 규제 기관에 의해 보다 쉽게 검증되며 ML의 부가가치를 평가하기 위한 벤치마크를 제공합니다.
가동 전 섀도우 모드: 자동화하기 전에 결정을 비교하면서 30~90일 동안 인간 보험과 병행하여 모델을 실행합니다.
매주 PSI를 모니터링합니다. 자동차 부문의 드리프트는 신차종, 수리비 인플레이션, 규제 변화로 인해 빈번하게 발생하고 있습니다.

피해야 할 안티패턴

기능 누출: 청구 후에만 사용 가능한 변수(청구 금액, 준비금)를 빈도 모델의 훈련 기능으로 사용하지 마십시오.
AUC만 최적화: 보험 부문에서 관련 지표는 위험 상위 10분위의 지니 계수, 결합 비율 및 리프트입니다.
500개 이상의 기능을 갖춘 모델: 보험계리적으로 검증하고 규제 기관에 정당화하는 것이 불가능합니다. 엄격한 기능 선택을 선호합니다(최대 40~60개의 기능).
포트폴리오 집중 무시: 매우 낮은 위험 프로필만 수용하는 모델은 선택 반대 및 불균형 포트폴리오를 생성합니다.
대리 차별: 우편번호와 같은 변수는 인종에 대한 프록시가 될 수 있습니다. 항상 배포하기 전에 서로 다른 영향을 확인하세요.

결론 및 다음 단계

AI 언더라이팅은 인간 언더라이터를 대체하는 것이 아니라 이를 증폭시킵니다. 표준 정책(볼륨의 80-90%)을 정확하게 완전히 자동화할 수 있습니다. 인간 평균보다 높으므로 경험이 풍부한 복잡한 사례에 전문가를 투입할 수 있습니다. 지배력과 대체불가.

The keys to a successful system are: knowledge-based deep feature engineering 보험계리적, 2단계 빈도/심각도 아키텍처, 규정 준수를 위한 SHAP 해석성, 드리프트 관리를 위해 PSI를 통한 필수 공정성 감사 및 지속적인 모니터링이 가능합니다.

시리즈의 다음 기사에서는Computer Vision을 통한 청구 자동화 그리고 NLP: 디지털 FNOL부터 자동 사진 손상 평가까지, 엔드투엔드 결제를 가속화합니다.

InsurTech 엔지니어링 시리즈

01 - 개발자를 위한 보험 도메인: 제품, 행위자 및 데이터 모델
02 - 클라우드 네이티브 정책 관리: API 우선 아키텍처
03 - 텔레매틱스 파이프라인: 대규모 UBI 데이터 처리
04 - AI Underwriting: 기능 엔지니어링 및 위험 점수 산정(이 기사)
05 - 청구 자동화: 컴퓨터 비전 및 NLP
06 - 사기 탐지: 그래프 분석 및 행동 신호
07 - ACORD 표준 및 보험 API 통합
08 - 규정 준수 엔지니어링: Solvency II 및 IFRS 17