Ciao! Sono

Federico Calò

Sviluppatore Software | Divulgatore Tecnico

Creo applicazioni web moderne e strumenti digitali personalizzati per aiutare le attività a crescere attraverso l'innovazione tecnologica. La mia passione è unire informatica ed economia per generare valore reale.

Contattami

Chi Sono

La mia passione per l'informatica è nata tra i banchi dell'Istituto Tecnico Commerciale di Maglie, dove ho scoperto il potere della programmazione e il fascino di creare soluzioni digitali. Fin da subito, ho capito che l'informatica non era solo codice, ma uno strumento straordinario per trasformare idee in realtà.

Durante gli studi superiori in Sistemi Informativi Aziendali, ho iniziato a intrecciare informatica ed economia, comprendendo come la tecnologia possa essere il motore della crescita per qualsiasi attività. Questa visione mi ha accompagnato all'Università degli Studi di Bari, dove ho conseguito la Laurea in Informatica, approfondendo le mie competenze tecniche e la mia passione per lo sviluppo software.

Oggi metto questa esperienza al servizio di imprese, professionisti e startup, creando soluzioni digitali su misura che automatizzano processi, ottimizzano risorse e aprono nuove opportunità di business. Perché la vera innovazione inizia quando la tecnologia incontra le esigenze reali delle persone.

Le Mie Competenze

Analisi Dati & Modelli Previsionali

Trasformo i dati in insights strategici con analisi approfondite e modelli predittivi per decisioni informate

Automazione Processi

Creo strumenti personalizzati che automatizzano operazioni ripetitive e liberano tempo per attività a valore aggiunto

Sistemi Custom

Sviluppo sistemi software su misura, dalle integrazioni tra piattaforme alle dashboard personalizzate

const federico = {
  nome: "Federico Calò",
  ruolo: "Sviluppatore Software",
  città: "Bari, Italia",
  missione: "Aiutare attraverso l'informatica",
  passioni: [
    "Codice Pulito",
    "Innovazione",
    "Crescita Continua"
  ]
};

La Mia Missione

Credo fermamente che l'informatica sia lo strumento più potente per trasformare le idee in realtà e migliorare la vita delle persone.

🚀

Democratizzare la Tecnologia

La mia missione è rendere l'informatica accessibile a tutti: dalle piccole imprese locali alle startup innovative, fino ai professionisti che vogliono digitalizzare la propria attività. Ogni realtà merita di sfruttare le potenzialità del digitale.

💡

Unire Informatica ed Economia

Non è solo questione di scrivere codice: è capire come la tecnologia possa generare valore reale. Intrecciando competenze informatiche e visione economica, aiuto le attività a crescere, ottimizzare processi e raggiungere nuovi traguardi di efficienza e redditività.

🎯

Creare Soluzioni su Misura

Ogni attività è unica, e così devono esserlo le soluzioni. Sviluppo strumenti personalizzati che rispondono alle esigenze specifiche di ciascun cliente, automatizzando processi ripetitivi e liberando tempo per ciò che conta davvero: far crescere il business.

Trasforma la Tua Attività con la Tecnologia

Dicembre 2024

Visualizza

Master SQL

RoadMap.sh

Novembre 2024

Visualizza

Oracle Certified Foundations Associate

Oracle

Ottobre 2024

Visualizza

People Leadership Credential

Connect

Settembre 2024

💻 Linguaggi & Tecnologie

☕Java

🐍Python

📜JavaScript

🅰️Angular

⚛️React

🔷TypeScript

🗄️SQL

🐘PHP

🎨CSS/SCSS

🔧Node.js

🐳Docker

🌿Git

💼

12/2024 - Presente

Custom Software Engineering Analyst

Accenture

Bari, Puglia, Italia · Ibrida Analisi e sviluppo di sistemi informatici attraverso l'utilizzo di Java e Quarkus in Health and Public Sector. Formazione continua su tecnologie moderne per la creazione di soluzioni software personalizzate ed efficienti e sugli agenti.

💼

06/2022 - 12/2024

Analista software e Back End Developer Associate Consultant

Links Management and Technology SpA

Esperienza nell'analisi di sistemi software as-is e flussi ETL utilizzando PowerCenter. Formazione completata su Spring Boot per lo sviluppo di applicazioni backend moderne e scalabili. Sviluppatore Backend specializzato in Spring Boot, con esperienza in progettazione di database, analisi, sviluppo e testing dei task assegnati.

💼

02/2021 - 10/2021

Programmatore software

Adesso.it (prima era WebScience srl)

Esperienza nell'analisi AS-IS e TO-BE, evoluzioni SEO ed evoluzioni website per migliorare le performance e l'engagement degli utenti.

🎓

2018 - 2025

Laurea in Informatica

Università degli Studi di Bari Aldo Moro

Bachelor's degree in Computer Science, focusing on software engineering, algorithms, and modern development practices.

📚

2013 - 2018

Diploma - Sistemi Informativi Aziendali

Istituto Tecnico Commerciale di Maglie

Technical diploma specializing in Business Information Systems, combining IT knowledge with business management.

Contattami

Hai un progetto in mente? Parliamone! Compila il form qui sotto e ti risponderò al più presto.

* Campi obbligatori. I tuoi dati saranno utilizzati solo per rispondere alla tua richiesta.

Claims Automation: Computer Vision and NLP for Insurance Claims Management

Claims management is traditionally the most costly and customer-intensive process in the insurance industry. A standard motor claim takes 8-15 days to settle, involves 4-7 touchpoints between the customer and the insurer, requires documents to be collected in paper format, physical vehicle inspections, and back-and-forth between multiple departments. Customer dissatisfaction during the claims phase is historically the highest of any interaction with an insurer.

Artificial intelligence is transforming this process from the ground up. The industry results in 2025 are remarkable: Admiral Seguros achieved 90% fully touchless auto estimates, with 98% of assessments completed in under 15 minutes. Some companies report end-to-end automation rates of up to 57%, with average settlement time reduced from weeks to hours for standard motor claims. Data extraction accuracy from documents reaches 96%, compared to 65% for human operators.

This guide builds a complete claims automation system: from digital FNOL (First Notice of Loss) management, to damage assessment with computer vision, to information extraction from documents with NLP, through to settlement workflow orchestration.

What You Will Learn

End-to-end claims automation system architecture
Digital FNOL: receiving and automatically triaging claims notifications
Computer vision for vehicle damage estimation: CNN and visual transformer models
NLP for extracting data from insurance documents: medical reports, police reports
Advanced OCR for digitizing legacy documents
Workflow orchestration for the settlement process
Monitoring metrics for claims automation systems

Claims Automation System Architecture

A modern claims automation system consists of distinct layers with well-defined responsibilities. A microservices architecture allows independent scaling of components and updating of individual models without impacting the entire pipeline.

Architectural Layers

Layer	Components	Technologies
Ingestion	FNOL intake, document upload, API gateway	FastAPI, S3/GCS, Kafka
Processing	OCR, NLP extraction, Computer Vision	Tesseract, spaCy, PyTorch, Hugging Face
Intelligence	Damage estimation, fraud scoring, reserve calculation	YOLOv8, Detectron2, XGBoost
Orchestration	Workflow engine, SLA management, escalation	Temporal.io, Airflow, state machines
Output	Settlement offer, customer communications, audit log	Twilio, SendGrid, EventStore

Digital FNOL: Intelligent First Notice of Loss

The FNOL (First Notice of Loss) is the initial claim notification by the customer. Traditionally done by phone or in person, in modern systems it happens via mobile app, chatbot, or web portal. AI intervenes from the very first moment to classify the claim type, assess its complexity, and route it to the correct workflow.

from dataclasses import dataclass, field
from datetime import datetime
from enum import Enum
from typing import List, Optional, Dict
import uuid


class ClaimType(str, Enum):
    AUTO_COLLISION = "auto_collision"
    AUTO_THEFT = "auto_theft"
    AUTO_WINDSHIELD = "auto_windshield"
    PROPERTY_WATER = "property_water"
    PROPERTY_FIRE = "property_fire"
    LIABILITY = "liability"
    HEALTH = "health"
    UNKNOWN = "unknown"


class ClaimComplexity(str, Enum):
    SIMPLE = "simple"       # automatic settlement
    STANDARD = "standard"   # guided process
    COMPLEX = "complex"     # human intervention
    LITIGIOUS = "litigious" # legal team


@dataclass
class FNOLSubmission:
    """Represents a FNOL notification received from the customer."""
    policy_number: str
    incident_date: datetime
    incident_description: str
    location: str
    photos: List[str] = field(default_factory=list)
    documents: List[str] = field(default_factory=list)
    contact_phone: str = ""
    contact_email: str = ""
    third_parties_involved: bool = False
    injuries_reported: bool = False
    police_report_available: bool = False
    claim_id: str = field(default_factory=lambda: str(uuid.uuid4()))
    received_at: datetime = field(default_factory=datetime.now)


@dataclass
class FNOLAssessment:
    """AI analysis result for an FNOL submission."""
    claim_id: str
    claim_type: ClaimType
    complexity: ClaimComplexity
    estimated_severity: str
    auto_settlement_eligible: bool
    required_documents: List[str]
    assigned_workflow: str
    fraud_risk_score: float
    priority: int
    routing_notes: str = ""


class FNOLTriageService:
    """
    Automatic triage service for FNOL submissions.
    Combines business rules and ML models to classify
    each claim and assign it to the appropriate workflow.
    """

    CLAIM_TYPE_KEYWORDS: Dict[ClaimType, List[str]] = {
        ClaimType.AUTO_COLLISION: [
            "collision", "crash", "accident", "impact", "rear-end", "sideswipe"
        ],
        ClaimType.AUTO_THEFT: [
            "theft", "stolen", "missing vehicle", "car stolen"
        ],
        ClaimType.AUTO_WINDSHIELD: [
            "windshield", "windscreen", "cracked glass", "window break"
        ],
        ClaimType.PROPERTY_WATER: [
            "flood", "water damage", "leak", "pipe burst", "rain infiltration"
        ],
        ClaimType.PROPERTY_FIRE: [
            "fire", "burn", "flames", "smoke damage"
        ],
    }

    REQUIRED_DOCS: Dict[ClaimType, List[str]] = {
        ClaimType.AUTO_COLLISION: [
            "Police report or agreed statement form",
            "Vehicle photos (4+ angles)",
            "Driver's licence",
            "Vehicle registration certificate",
        ],
        ClaimType.AUTO_THEFT: [
            "Police report (within 48h)",
            "Proof of ownership",
            "Both sets of keys",
        ],
        ClaimType.PROPERTY_WATER: [
            "Damage photos",
            "Plumber report (if available)",
            "Repair estimate",
        ],
    }

    def triage(self, fnol: FNOLSubmission, fraud_score: float = 0.0) -> FNOLAssessment:
        claim_type = self._classify_type(fnol.incident_description)
        complexity = self._assess_complexity(fnol, fraud_score)
        severity = self._estimate_severity(fnol, claim_type)
        auto_eligible = self._is_auto_settlement_eligible(fnol, complexity, fraud_score)
        workflow = self._assign_workflow(claim_type, complexity)
        priority = self._calculate_priority(fnol, complexity, fraud_score)

        return FNOLAssessment(
            claim_id=fnol.claim_id,
            claim_type=claim_type,
            complexity=complexity,
            estimated_severity=severity,
            auto_settlement_eligible=auto_eligible,
            required_documents=self.REQUIRED_DOCS.get(claim_type, [
                "Proof of identity",
                "Damage photos",
                "Repair estimate",
            ]),
            assigned_workflow=workflow,
            fraud_risk_score=fraud_score,
            priority=priority,
            routing_notes=self._build_notes(fnol, complexity, fraud_score),
        )

    def _classify_type(self, description: str) -> ClaimType:
        desc_lower = description.lower()
        scores: Dict[ClaimType, int] = {}
        for claim_type, keywords in self.CLAIM_TYPE_KEYWORDS.items():
            score = sum(1 for kw in keywords if kw in desc_lower)
            if score > 0:
                scores[claim_type] = score
        return max(scores, key=lambda k: scores[k]) if scores else ClaimType.UNKNOWN

    def _assess_complexity(self, fnol: FNOLSubmission, fraud_score: float) -> ClaimComplexity:
        if fnol.injuries_reported:
            return ClaimComplexity.LITIGIOUS
        if fraud_score > 0.7:
            return ClaimComplexity.COMPLEX
        if fnol.third_parties_involved and not fnol.police_report_available:
            return ClaimComplexity.COMPLEX
        if fnol.third_parties_involved or fraud_score > 0.4:
            return ClaimComplexity.STANDARD
        return ClaimComplexity.SIMPLE

    def _estimate_severity(self, fnol: FNOLSubmission, claim_type: ClaimType) -> str:
        if fnol.injuries_reported:
            return "high"
        if claim_type == ClaimType.AUTO_THEFT:
            return "high"
        if fnol.third_parties_involved:
            return "medium"
        return "low"

    def _is_auto_settlement_eligible(
        self, fnol: FNOLSubmission, complexity: ClaimComplexity, fraud_score: float
    ) -> bool:
        if complexity not in [ClaimComplexity.SIMPLE, ClaimComplexity.STANDARD]:
            return False
        if fraud_score > 0.3:
            return False
        if fnol.injuries_reported or fnol.third_parties_involved:
            return False
        return len(fnol.photos) >= 2

    def _assign_workflow(self, claim_type: ClaimType, complexity: ClaimComplexity) -> str:
        workflow_map = {
            (ClaimType.AUTO_COLLISION, ClaimComplexity.SIMPLE): "auto_collision_fast_track",
            (ClaimType.AUTO_COLLISION, ClaimComplexity.STANDARD): "auto_collision_standard",
            (ClaimType.AUTO_COLLISION, ClaimComplexity.COMPLEX): "auto_collision_manual",
            (ClaimType.AUTO_THEFT, ClaimComplexity.SIMPLE): "auto_theft_standard",
            (ClaimType.AUTO_WINDSHIELD, ClaimComplexity.SIMPLE): "windshield_auto",
        }
        return workflow_map.get(
            (claim_type, complexity),
            f"generic_{complexity.value}_workflow"
        )

    def _calculate_priority(
        self, fnol: FNOLSubmission, complexity: ClaimComplexity, fraud_score: float
    ) -> int:
        if fnol.injuries_reported:
            return 1
        if complexity == ClaimComplexity.LITIGIOUS:
            return 1
        if fraud_score > 0.7:
            return 2
        if complexity == ClaimComplexity.COMPLEX:
            return 3
        return 5

    def _build_notes(
        self, fnol: FNOLSubmission, complexity: ClaimComplexity, fraud_score: float
    ) -> str:
        notes = []
        if fnol.injuries_reported:
            notes.append("ALERT: personal injuries reported - mandatory legal/medical escalation")
        if fraud_score > 0.5:
            notes.append(f"High fraud score ({fraud_score:.2f}) - SIU review recommended")
        if not fnol.photos:
            notes.append("No photos attached - request from customer before proceeding")
        return "; ".join(notes) if notes else "No special routing notes"

Computer Vision for Vehicle Damage Assessment

Automated damage estimation is the highest-impact AI component in claims automation. The customer photographs the damaged vehicle with their smartphone, and the system analyzes the images to identify damaged parts, estimate the type of intervention needed (repair vs replacement), and calculate a cost estimate based on updated pricing databases.

The most widely used models in the industry combine object detection (identifying damaged parts) with damage severity classification (classifying the damage extent from minor to total loss). Players like Tractable, the market leader, have demonstrated that these systems can match or exceed the accuracy of an experienced human appraiser.

import torch
import torchvision.transforms as T
from torchvision.models import resnet50, ResNet50_Weights
import numpy as np
from PIL import Image
from typing import Dict, List, Tuple
from dataclasses import dataclass
import io


@dataclass
class DamageRegion:
    """A damage region identified in an image."""
    part_name: str
    damage_type: str       # scratch, dent, crack, broken
    severity: str          # minor, moderate, severe, total_loss
    confidence: float
    repair_vs_replace: str # "repair" or "replace"
    estimated_cost_gbp: float


@dataclass
class VehicleDamageAssessment:
    """Complete result of vehicle damage image analysis."""
    claim_id: str
    images_analyzed: int
    damage_regions: List[DamageRegion]
    total_estimated_cost: float
    total_loss_likelihood: float
    settlement_recommendation: str
    confidence_overall: float
    requires_physical_inspection: bool
    assessment_notes: str


class VehicleDamageClassifier:
    """
    Vehicle damage classifier based on transfer learning.

    Architecture: ResNet-50 fine-tuned on proprietary vehicle damage dataset.
    Output: per-part classification + severity + damage type.

    In production: consider specialized vendors (Tractable, Mitchell)
    or models trained on your own claims dataset.
    """

    VEHICLE_PARTS = [
        "bumper_front", "bumper_rear",
        "hood", "trunk",
        "door_front_left", "door_front_right",
        "door_rear_left", "door_rear_right",
        "fender_front_left", "fender_front_right",
        "windshield_front", "windshield_rear",
        "headlight_left", "headlight_right",
        "mirror_left", "mirror_right",
    ]

    # Reference repair costs (GBP, 2025 estimates)
    REPAIR_COST_TABLE: Dict[str, Dict[str, float]] = {
        "bumper_front": {"minor": 180, "moderate": 550, "severe": 1100, "replace": 800},
        "bumper_rear":  {"minor": 180, "moderate": 500, "severe": 1000, "replace": 750},
        "hood":         {"minor": 280, "moderate": 750, "severe": 1400, "replace": 1100},
        "door_front_left":  {"minor": 220, "moderate": 650, "severe": 1300, "replace": 1000},
        "door_front_right": {"minor": 220, "moderate": 650, "severe": 1300, "replace": 1000},
        "windshield_front": {"minor": 0, "moderate": 320, "severe": 550, "replace": 420},
        "headlight_left":   {"minor": 70, "moderate": 180, "severe": 380, "replace": 320},
        "headlight_right":  {"minor": 70, "moderate": 180, "severe": 380, "replace": 320},
    }

    DEFAULT_COST = {"minor": 130, "moderate": 400, "severe": 850, "replace": 650}

    def __init__(self, model_path: str = "damage_classifier.pt") -> None:
        self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
        self.model = self._load_model(model_path)
        self.transform = T.Compose([
            T.Resize((224, 224)),
            T.ToTensor(),
            T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
        ])

    def _load_model(self, model_path: str) -> torch.nn.Module:
        try:
            model = resnet50(weights=None)
            num_classes = len(self.VEHICLE_PARTS) * 4
            model.fc = torch.nn.Linear(model.fc.in_features, num_classes)
            state = torch.load(model_path, map_location=self.device)
            model.load_state_dict(state)
        except FileNotFoundError:
            print("Fine-tuned model not found, using pretrained ResNet50 (demo only)")
            model = resnet50(weights=ResNet50_Weights.IMAGENET1K_V2)
        model.eval()
        return model.to(self.device)

    def analyze_image(self, image_bytes: bytes) -> List[Tuple[str, str, float]]:
        """Analyze a single image. Returns: list of (part_name, severity, confidence)."""
        image = Image.open(io.BytesIO(image_bytes)).convert("RGB")
        tensor = self.transform(image).unsqueeze(0).to(self.device)

        with torch.no_grad():
            logits = self.model(tensor)
            probs = torch.softmax(logits, dim=1).cpu().numpy()[0]

        results = []
        severity_levels = ["minor", "moderate", "severe", "total_loss"]

        for i, part in enumerate(self.VEHICLE_PARTS):
            for j, sev in enumerate(severity_levels):
                idx = i * 4 + j
                if idx < len(probs) and probs[idx] > 0.30:
                    results.append((part, sev, float(probs[idx])))

        return sorted(results, key=lambda x: x[2], reverse=True)

    def estimate_repair_cost(self, part: str, severity: str) -> Tuple[float, str]:
        cost_table = self.REPAIR_COST_TABLE.get(part, self.DEFAULT_COST)
        if severity in ["severe", "total_loss"]:
            replace_cost = cost_table.get("replace", 650)
            repair_cost = cost_table.get("severe", 850)
            if replace_cost < repair_cost * 0.8:
                return replace_cost, "replace"
            return repair_cost, "repair"
        return cost_table.get(severity, 130), "repair"

    def assess_multiple_images(
        self, claim_id: str, image_bytes_list: List[bytes]
    ) -> VehicleDamageAssessment:
        """Complete assessment from multiple vehicle photos."""
        all_detections: Dict[str, List[Tuple[str, float]]] = {}

        for img_bytes in image_bytes_list:
            for part, severity, confidence in self.analyze_image(img_bytes):
                all_detections.setdefault(part, []).append((severity, confidence))

        severity_order = {"minor": 0, "moderate": 1, "severe": 2, "total_loss": 3}
        damage_regions: List[DamageRegion] = []
        total_cost = 0.0

        for part, sev_confs in all_detections.items():
            max_sev = max(sev_confs, key=lambda x: severity_order.get(x[0], 0))
            severity, _ = max_sev
            avg_confidence = np.mean([c for _, c in sev_confs])

            cost, repair_replace = self.estimate_repair_cost(part, severity)
            total_cost += cost

            damage_regions.append(DamageRegion(
                part_name=part,
                damage_type="structural" if severity in ["severe", "total_loss"] else "cosmetic",
                severity=severity,
                confidence=round(float(avg_confidence), 3),
                repair_vs_replace=repair_replace,
                estimated_cost_gbp=cost,
            ))

        total_loss_prob = self._total_loss_probability(damage_regions, total_cost)
        needs_inspection = (
            total_loss_prob > 0.5 or total_cost > 8000 or
            any(d.severity == "severe" and "windshield" in d.part_name for d in damage_regions)
        )

        return VehicleDamageAssessment(
            claim_id=claim_id,
            images_analyzed=len(image_bytes_list),
            damage_regions=damage_regions,
            total_estimated_cost=round(total_cost, 2),
            total_loss_likelihood=round(total_loss_prob, 3),
            settlement_recommendation=self._settlement_recommendation(total_cost, total_loss_prob),
            confidence_overall=round(
                float(np.mean([d.confidence for d in damage_regions])) if damage_regions else 0.0, 3
            ),
            requires_physical_inspection=needs_inspection,
            assessment_notes=self._build_notes(damage_regions, total_loss_prob),
        )

    def _total_loss_probability(self, regions: List[DamageRegion], cost: float) -> float:
        if cost > 15000:
            return 0.95
        if cost > 10000:
            return 0.70
        severe_count = sum(1 for r in regions if r.severity in ["severe", "total_loss"])
        if severe_count >= 4:
            return 0.80
        if severe_count >= 2:
            return 0.40
        return max(0.0, (cost - 5000) / 10000) if cost > 5000 else 0.05

    def _settlement_recommendation(self, cost: float, total_loss_prob: float) -> str:
        if total_loss_prob > 0.7:
            return "TOTAL_LOSS_SETTLEMENT"
        if cost > 8000:
            return "HIGH_VALUE_REPAIR_AUTHORIZATION"
        if cost > 3000:
            return "STANDARD_REPAIR_AUTHORIZATION"
        return "FAST_TRACK_SETTLEMENT"

    def _build_notes(self, regions: List[DamageRegion], total_loss_prob: float) -> str:
        notes = []
        if total_loss_prob > 0.5:
            notes.append("High total loss likelihood - verify vehicle market value")
        severe = [r.part_name for r in regions if r.severity == "severe"]
        if severe:
            notes.append(f"Severe damage detected on: {', '.join(severe)}")
        return "; ".join(notes) if notes else "Assessment completed without anomalies"

NLP for Document Information Extraction

Every claim generates dozens of documents: police reports, medical reports, repair estimates, witness statements, agreed statement forms. Manual processing of these documents is slow (1-3 days) and error-prone. Modern NLP systems, combined with advanced OCR, automatically extract structured information with 90-96% accuracy.

import spacy
import re
from typing import Dict, List, Optional, Tuple
from dataclasses import dataclass
from datetime import datetime, date
from enum import Enum


class DocumentType(str, Enum):
    POLICE_REPORT = "police_report"
    MEDICAL_REPORT = "medical_report"
    REPAIR_ESTIMATE = "repair_estimate"
    AGREED_STATEMENT = "agreed_statement"
    WITNESS_STATEMENT = "witness_statement"
    INVOICE = "invoice"
    UNKNOWN = "unknown"


@dataclass
class ExtractedEntity:
    entity_type: str
    value: str
    confidence: float
    source_text: str
    position: Tuple[int, int]


@dataclass
class DocumentExtraction:
    document_type: DocumentType
    document_date: Optional[date]
    entities: List[ExtractedEntity]
    structured_data: Dict
    extraction_confidence: float
    raw_text: str


class InsuranceDocumentExtractor:
    """
    NLP extractor for insurance documents.
    Combines spaCy NER with domain-specific regex patterns.
    """

    PATTERNS: Dict[str, str] = {
        "uk_reg": r"\b[A-Z]{2}\d{2}\s?[A-Z]{3}\b|\b[A-Z]\d{1,3}[A-Z]{1,3}\b",
        "policy_number": r"(?:policy|pol\.?)\s*(?:no\.?|number)?\s*[:\.]?\s*([A-Z0-9/-]{6,20})",
        "ni_number": r"\b[A-Z]{2}\s?\d{2}\s?\d{2}\s?\d{2}\s?[A-D]\b",
        "gbp_amount": r"(?:GBP|£)\s*[\d,.]{1,12}|[\d,.]{1,12}\s*(?:GBP|£)",
        "date_uk": r"\b\d{1,2}[/\-\.]\d{1,2}[/\-\.]\d{2,4}\b",
        "phone_uk": r"(?:0|\+44)\s*\d{2,5}[\s\-]?\d{3,8}[\s\-]?\d{0,6}",
        "postcode": r"\b[A-Z]{1,2}\d{1,2}[A-Z]?\s?\d[A-Z]{2}\b",
        "email": r"[a-zA-Z0-9._%+\-]+@[a-zA-Z0-9.\-]+\.[a-zA-Z]{2,6}",
    }

    CLASSIFICATION_RULES = [
        (DocumentType.POLICE_REPORT, ["incident report", "police", "constabulary", "officer", "ref no"]),
        (DocumentType.MEDICAL_REPORT, ["diagnosis", "prognosis", "injury", "hospital", "physician", "A&E"]),
        (DocumentType.AGREED_STATEMENT, ["agreed statement", "mutual declaration", "no fault"]),
        (DocumentType.REPAIR_ESTIMATE, ["estimate", "bodyshop", "garage", "parts", "labour cost"]),
        (DocumentType.INVOICE, ["invoice", "total amount", "VAT", "payment due"]),
    ]

    def __init__(self, spacy_model: str = "en_core_web_lg") -> None:
        try:
            self.nlp = spacy.load(spacy_model)
        except OSError:
            print(f"spaCy model '{spacy_model}' not found.")
            print(f"Install with: python -m spacy download {spacy_model}")
            self.nlp = None

    def classify_document(self, text: str) -> DocumentType:
        text_lower = text.lower()
        scores: Dict[DocumentType, int] = {}
        for doc_type, keywords in self.CLASSIFICATION_RULES:
            score = sum(1 for kw in keywords if kw in text_lower)
            if score > 0:
                scores[doc_type] = score
        return max(scores, key=lambda k: scores[k]) if scores else DocumentType.UNKNOWN

    def extract(self, raw_text: str) -> DocumentExtraction:
        doc_type = self.classify_document(raw_text)
        entities = self._extract_entities(raw_text)
        structured = self._build_structured_data(entities, doc_type)
        confidence = self._calculate_confidence(entities, doc_type)
        doc_date = self._extract_date(entities)

        return DocumentExtraction(
            document_type=doc_type,
            document_date=doc_date,
            entities=entities,
            structured_data=structured,
            extraction_confidence=confidence,
            raw_text=raw_text,
        )

    def _extract_entities(self, text: str) -> List[ExtractedEntity]:
        entities: List[ExtractedEntity] = []

        for entity_type, pattern in self.PATTERNS.items():
            for match in re.finditer(pattern, text, re.IGNORECASE):
                entities.append(ExtractedEntity(
                    entity_type=entity_type,
                    value=match.group().strip(),
                    confidence=0.85,
                    source_text=text[max(0, match.start()-20):match.end()+20],
                    position=(match.start(), match.end()),
                ))

        if self.nlp:
            doc = self.nlp(text[:100000])
            for ent in doc.ents:
                if ent.label_ in ["PERSON", "ORG", "GPE", "DATE", "MONEY"]:
                    entities.append(ExtractedEntity(
                        entity_type=f"spacy_{ent.label_.lower()}",
                        value=ent.text.strip(),
                        confidence=0.75,
                        source_text=text[max(0, ent.start_char-20):ent.end_char+20],
                        position=(ent.start_char, ent.end_char),
                    ))

        return entities

    def _build_structured_data(self, entities: List[ExtractedEntity], doc_type: DocumentType) -> Dict:
        structured: Dict = {"document_type": doc_type.value}
        by_type: Dict[str, List[str]] = {}
        for ent in entities:
            by_type.setdefault(ent.entity_type, []).append(ent.value)

        if "uk_reg" in by_type:
            structured["vehicle_registrations"] = list(set(by_type["uk_reg"]))
        if "policy_number" in by_type:
            structured["policy_numbers"] = list(set(by_type["policy_number"]))
        if "gbp_amount" in by_type:
            amounts = []
            for s in by_type["gbp_amount"]:
                clean = re.sub(r"[^\d.]", "", s.replace(",", ""))
                try:
                    amounts.append(float(clean))
                except ValueError:
                    pass
            structured["amounts_gbp"] = sorted(amounts)
            structured["max_amount_gbp"] = max(amounts) if amounts else 0
        if "spacy_person" in by_type:
            structured["persons_mentioned"] = list(set(by_type["spacy_person"]))

        return structured

    def _calculate_confidence(self, entities: List[ExtractedEntity], doc_type: DocumentType) -> float:
        if not entities:
            return 0.1
        avg = float(sum(e.confidence for e in entities) / len(entities))
        return round(min(avg * (0.7 if doc_type == DocumentType.UNKNOWN else 1.0), 1.0), 3)

    def _extract_date(self, entities: List[ExtractedEntity]) -> Optional[date]:
        date_entities = [e for e in entities if e.entity_type == "date_uk"]
        for de in date_entities:
            for fmt in ["%d/%m/%Y", "%d-%m-%Y", "%d.%m.%Y", "%d/%m/%y"]:
                try:
                    return datetime.strptime(de.value, fmt).date()
                except ValueError:
                    continue
        return None

Claims Automation KPIs and Monitoring

Monitoring a claims automation system requires metrics that go beyond standard ML metrics. Business and operational metrics are equally critical to ensure the system is genuinely improving customer experience and operational profitability.

Claims Automation KPIs

Metric	Definition	Target
Automation Rate	% claims closed without human intervention	> 50%
Touchless Claims Rate	% auto claims without physical appraisal	> 70% (standard motor)
Avg Settlement Time	Time from FNOL to payment for motor claims	< 24h (fast track)
Document Extraction Accuracy	% fields extracted correctly	> 90%
Damage Estimate Deviation	% variance between AI estimate and final cost	< 15%
False Positive Fraud Rate	% legitimate claims flagged as fraudulent	< 2%
Customer NPS (post-claim)	Net Promoter Score after claim settlement	> +30 (vs +10 legacy)

Best Practices and Anti-patterns

Best Practices for Claims Automation

Start with windshield claims: the simplest full-automation use case — clearly visible damage, standardized costs, no third party involved
Mandatory human-in-the-loop for personal injuries: never fully automate claims with bodily injury; escalation to a human adjuster is always required
Always require minimum 4 photos: front, rear, left side, right side; plus odometer and registration plate; insufficient photos dramatically reduce accuracy
Integrate fraud scoring upstream: the fraud check must happen before damage estimation, not after, to avoid processing fraudulent claims
Immutable audit trail: every system decision (automated or manual) must be logged with timestamp, model version, and input values for dispute resolution

Anti-patterns to Avoid

Automation without safety nets: always implement a minimum confidence threshold; below that threshold, the claim must automatically go to human review
Estimates without geographic calibration: repair costs vary enormously by region; a national average estimate can be wildly inaccurate locally
Ignoring photo quality: blurry, poorly lit, or partial photos strongly degrade accuracy; implement a quality check before processing
Premature settlement offers: offering a settlement before the customer has assessed the full extent of damage generates costly disputes later

Conclusions and Next Steps

Claims automation with Computer Vision and NLP delivers some of the highest ROI of any AI application in insurance: it reduces operational costs by 40-60%, improves customer satisfaction, and accelerates settlement times from weeks to hours for standard cases.

The key to success is a gradual approach: start with the simplest cases (windshield, minor cosmetic damage), accurately measure system accuracy against human appraisers, and progressively expand automation to more complex cases while always maintaining a robust human escalation mechanism.

The next article in this series dives into Insurance Fraud Detection: how to combine graph analytics to identify fraudulent networks and behavioral signals to detect anomalous claim patterns.

InsurTech Engineering Series

01 - Insurance Domain for Developers: Products, Actors and Data Model
02 - Cloud-Native Policy Management: API-First Architecture
03 - Telematics Pipeline: Processing UBI Data at Scale
04 - AI Underwriting: Feature Engineering and Risk Scoring
05 - Claims Automation: Computer Vision and NLP (this article)
06 - Fraud Detection: Graph Analytics and Behavioral Signals
07 - ACORD Standards and Insurance API Integration
08 - Compliance Engineering: Solvency II and IFRS 17