Ciao! Sono

Federico Calò

Sviluppatore Software | Divulgatore Tecnico

Creo applicazioni web moderne e strumenti digitali personalizzati per aiutare le attività a crescere attraverso l'innovazione tecnologica. La mia passione è unire informatica ed economia per generare valore reale.

Contattami

Chi Sono

La mia passione per l'informatica è nata tra i banchi dell'Istituto Tecnico Commerciale di Maglie, dove ho scoperto il potere della programmazione e il fascino di creare soluzioni digitali. Fin da subito, ho capito che l'informatica non era solo codice, ma uno strumento straordinario per trasformare idee in realtà.

Durante gli studi superiori in Sistemi Informativi Aziendali, ho iniziato a intrecciare informatica ed economia, comprendendo come la tecnologia possa essere il motore della crescita per qualsiasi attività. Questa visione mi ha accompagnato all'Università degli Studi di Bari, dove ho conseguito la Laurea in Informatica, approfondendo le mie competenze tecniche e la mia passione per lo sviluppo software.

Oggi metto questa esperienza al servizio di imprese, professionisti e startup, creando soluzioni digitali su misura che automatizzano processi, ottimizzano risorse e aprono nuove opportunità di business. Perché la vera innovazione inizia quando la tecnologia incontra le esigenze reali delle persone.

Le Mie Competenze

Analisi Dati & Modelli Previsionali

Trasformo i dati in insights strategici con analisi approfondite e modelli predittivi per decisioni informate

Automazione Processi

Creo strumenti personalizzati che automatizzano operazioni ripetitive e liberano tempo per attività a valore aggiunto

Sistemi Custom

Sviluppo sistemi software su misura, dalle integrazioni tra piattaforme alle dashboard personalizzate

const federico = {
  nome: "Federico Calò",
  ruolo: "Sviluppatore Software",
  città: "Bari, Italia",
  missione: "Aiutare attraverso l'informatica",
  passioni: [
    "Codice Pulito",
    "Innovazione",
    "Crescita Continua"
  ]
};

La Mia Missione

Credo fermamente che l'informatica sia lo strumento più potente per trasformare le idee in realtà e migliorare la vita delle persone.

🚀

Democratizzare la Tecnologia

La mia missione è rendere l'informatica accessibile a tutti: dalle piccole imprese locali alle startup innovative, fino ai professionisti che vogliono digitalizzare la propria attività. Ogni realtà merita di sfruttare le potenzialità del digitale.

💡

Unire Informatica ed Economia

Non è solo questione di scrivere codice: è capire come la tecnologia possa generare valore reale. Intrecciando competenze informatiche e visione economica, aiuto le attività a crescere, ottimizzare processi e raggiungere nuovi traguardi di efficienza e redditività.

🎯

Creare Soluzioni su Misura

Ogni attività è unica, e così devono esserlo le soluzioni. Sviluppo strumenti personalizzati che rispondono alle esigenze specifiche di ciascun cliente, automatizzando processi ripetitivi e liberando tempo per ciò che conta davvero: far crescere il business.

Trasforma la Tua Attività con la Tecnologia

Dicembre 2024

Visualizza

Master SQL

RoadMap.sh

Novembre 2024

Visualizza

Oracle Certified Foundations Associate

Oracle

Ottobre 2024

Visualizza

People Leadership Credential

Connect

Settembre 2024

💻 Linguaggi & Tecnologie

☕Java

🐍Python

📜JavaScript

🅰️Angular

⚛️React

🔷TypeScript

🗄️SQL

🐘PHP

🎨CSS/SCSS

🔧Node.js

🐳Docker

🌿Git

💼

12/2024 - Presente

Custom Software Engineering Analyst

Accenture

Bari, Puglia, Italia · Ibrida Analisi e sviluppo di sistemi informatici attraverso l'utilizzo di Java e Quarkus in Health and Public Sector. Formazione continua su tecnologie moderne per la creazione di soluzioni software personalizzate ed efficienti e sugli agenti.

💼

06/2022 - 12/2024

Analista software e Back End Developer Associate Consultant

Links Management and Technology SpA

Esperienza nell'analisi di sistemi software as-is e flussi ETL utilizzando PowerCenter. Formazione completata su Spring Boot per lo sviluppo di applicazioni backend moderne e scalabili. Sviluppatore Backend specializzato in Spring Boot, con esperienza in progettazione di database, analisi, sviluppo e testing dei task assegnati.

💼

02/2021 - 10/2021

Programmatore software

Adesso.it (prima era WebScience srl)

Esperienza nell'analisi AS-IS e TO-BE, evoluzioni SEO ed evoluzioni website per migliorare le performance e l'engagement degli utenti.

🎓

2018 - 2025

Laurea in Informatica

Università degli Studi di Bari Aldo Moro

Bachelor's degree in Computer Science, focusing on software engineering, algorithms, and modern development practices.

📚

2013 - 2018

Diploma - Sistemi Informativi Aziendali

Istituto Tecnico Commerciale di Maglie

Technical diploma specializing in Business Information Systems, combining IT knowledge with business management.

Contattami

Hai un progetto in mente? Parliamone! Compila il form qui sotto e ti risponderò al più presto.

* Campi obbligatori. I tuoi dati saranno utilizzati solo per rispondere alla tua richiesta.

Named Entity Recognition: Extracting Information from Text

Every day, NLP systems automatically extract structured information from billions of documents: news articles, contracts, emails, medical records, social media posts. The engine driving this extraction is Named Entity Recognition (NER) — a task that identifies and classifies named entities in text: persons, organizations, locations, dates, monetary values, and much more.

NER is the first step in many information extraction pipelines: without knowing who does what, where and when, we cannot build knowledge graphs, feed RAG systems, automate contract analysis, or parse financial news. In this article we build NER systems from a spaCy baseline to BERT fine-tuning, with specific attention to Italian-language text.

What You Will Learn

What NER is and the main entity categories (PER, ORG, LOC, DATE, MONEY...)
The BIO (Beginning-Inside-Outside) format for token annotation
NER with spaCy: pre-trained models and customization
Fine-tuning BERT for NER with HuggingFace Transformers
Metrics: span-level F1, precision, recall with seqeval
Handling WordPiece tokenization for label alignment
NER for Italian with spaCy it_core_news and Italian BERT models
NER on long documents: sliding window and post-processing
Advanced architectures: CRF layer, RoBERTa, DeBERTa for NER
Production pipeline, visualization and end-to-end case study

1. What is Named Entity Recognition

NER is a token classification task: for each token in the text, the model must predict whether it is part of a named entity and what type it belongs to. Unlike sentence classification (which produces one output per sentence), NER produces one output per token — this makes it more complex both architecturally and in post-processing.

NER Example

Input: "Elon Musk founded Tesla in 2003 in San Carlos, California."

Annotated output:

Elon Musk → PER (person)
Tesla → ORG (organization)
2003 → DATE
San Carlos → LOC (location)
California → LOC (location)

1.1 The BIO Format

NER annotation uses the BIO (Beginning-Inside-Outside) format:

B-TYPE: first token of an entity of type TYPE
I-TYPE: token inside an entity of type TYPE
O: token outside any named entity

# BIO format example
sentence = "Elon Musk founded Tesla in San Carlos in 2003"
bio_labels = [
    ("Elon",    "B-PER"),   # beginning of person
    ("Musk",    "I-PER"),   # inside person
    ("founded", "O"),
    ("Tesla",   "B-ORG"),   # beginning of organization
    ("in",      "O"),
    ("San",     "B-LOC"),   # beginning of location
    ("Carlos",  "I-LOC"),   # inside location
    ("in",      "O"),
    ("2003",    "B-DATE"),  # date
]

# BIOES format (extended): adds S-TYPE for single-token entities
# S-Tesla = single ORG token
# BIO format is the most common in modern NER datasets

# Label set for CoNLL-2003 (most widely used NER benchmark):
CONLL_LABELS = [
    'O',
    'B-PER', 'I-PER',    # persons
    'B-ORG', 'I-ORG',    # organizations
    'B-LOC', 'I-LOC',    # locations
    'B-MISC', 'I-MISC',  # miscellaneous
]

1.2 NER Benchmarks and Datasets

Standard NER Datasets for Benchmarking

Dataset	Language	Entities	Train size	Best F1
CoNLL-2003	EN	PER, ORG, LOC, MISC	14,041 sent	~94% (DeBERTa)
OntoNotes 5.0	EN	18 types	~75K sent	~92%
Evalita 2009 NER	IT	PER, ORG, LOC, GPE	~10K sent	~88%
WikiNEuRal IT	IT	PER, ORG, LOC, MISC	~40K sent	~90%
I2B2 2014	EN (medical)	PHI (de-identification)	27K sent	~97%

2. NER with spaCy

spaCy offers pre-trained NER models for many languages, including Italian. It is the fastest starting point for a production NER system.

2.1 Out-of-the-Box NER with spaCy

import spacy
from spacy import displacy

# Load Italian model with NER
# python -m spacy download it_core_news_lg
nlp_it = spacy.load("it_core_news_lg")

# English model for comparison
# python -m spacy download en_core_web_trf
nlp_en = spacy.load("en_core_web_trf")   # Transformer-based, more accurate

# NER on Italian text
text_it = """
Il presidente Sergio Mattarella ha incontrato ieri a Roma il CEO di Fiat Stellantis
Carlos Tavares per discutere del piano industriale 2025-2030.
L'incontro e avvenuto al Quirinale e ha riguardato investimenti per 5 miliardi di euro.
"""

doc_it = nlp_it(text_it)
print("=== Italian NER ===")
for ent in doc_it.ents:
    print(f"  '{ent.text}' -> {ent.label_} ({spacy.explain(ent.label_)})")

# NER on English text
text_en = "Apple CEO Tim Cook announced a new $3 billion investment in Austin, Texas on Monday."
doc_en = nlp_en(text_en)
print("\n=== English NER ===")
for ent in doc_en.ents:
    print(f"  '{ent.text}' -> {ent.label_}")

# HTML visualization (useful in Jupyter)
html = displacy.render(doc_en, style="ent", page=False)
with open("ner_visualization.html", "w") as f:
    f.write(html)

2.2 spaCy Entity Categories for Italian

Label	Type	Example
PER	Person	Mario Draghi, Sophia Loren
ORG	Organization	ENI, Juventus, Banca d'Italia
LOC	Generic location	Alpi, Mar Mediterraneo
GPE	Geopolitical entity	Italia, Roma, Lombardia
DATE	Date/period	3 marzo, estate 2024
MONEY	Currency	5 miliardi di euro
MISC	Miscellaneous	Coppa del Mondo, COVID-19

2.3 Training a Custom spaCy NER Model

import spacy
from spacy.training import Example
import random

# Annotated training data (with character offsets)
TRAIN_DATA = [
    (
        "La startup Satispay ha raccolto 320 milioni dalla BAFIN.",
        {"entities": [(11, 19, "ORG"), (39, 53, "MONEY"), (58, 63, "ORG")]}
    ),
    (
        "Andrea Pirlo allena la Juve a Torino.",
        {"entities": [(0, 12, "PER"), (23, 27, "ORG"), (30, 36, "LOC")]}
    ),
    (
        "Ferrari ha presentato la nuova SF-23 al Gran Premio di Monza.",
        {"entities": [(0, 7, "ORG"), (29, 34, "MISC"), (38, 60, "MISC")]}
    ),
]

def train_custom_ner(train_data, n_iter=30):
    """Train a custom spaCy NER component."""
    nlp = spacy.blank("it")
    ner = nlp.add_pipe("ner")

    # Add labels
    for _, annotations in train_data:
        for _, _, label in annotations.get("entities", []):
            ner.add_label(label)

    # Training loop
    other_pipes = [pipe for pipe in nlp.pipe_names if pipe != "ner"]
    with nlp.disable_pipes(*other_pipes):
        optimizer = nlp.begin_training()
        for i in range(n_iter):
            random.shuffle(train_data)
            losses = {}
            for text, annotations in train_data:
                doc = nlp.make_doc(text)
                example = Example.from_dict(doc, annotations)
                nlp.update([example], sgd=optimizer, losses=losses)

            if (i + 1) % 10 == 0:
                print(f"Iteration {i+1}: losses = {losses}")

    return nlp

custom_nlp = train_custom_ner(TRAIN_DATA)

# Test
test_text = "Enel ha investito 2 miliardi a Milano."
doc = custom_nlp(test_text)
for ent in doc.ents:
    print(f"  '{ent.text}' -> {ent.label_}")

3. NER with BERT and HuggingFace Transformers

Transformer models outperform spaCy on most NER benchmarks, especially on complex text or when entities are ambiguous. They require more data and training time, but deliver significantly higher precision and recall on challenging entity types.

3.1 CoNLL-2003 Dataset

from datasets import load_dataset

# CoNLL-2003: standard English NER benchmark
dataset = load_dataset("conll2003")
print(dataset)
# train: 14,041 | validation: 3,250 | test: 3,453

# Dataset structure
example = dataset['train'][0]
print("Tokens:", example['tokens'])
print("NER tags:", example['ner_tags'])
# Tokens: ['EU', 'rejects', 'German', 'call', 'to', 'boycott', 'British', 'lamb', '.']
# NER tags: [3, 0, 7, 0, 0, 0, 7, 0, 0]
# (3=B-ORG, 0=O, 7=B-MISC)

# ID to label mapping
label_names = dataset['train'].features['ner_tags'].feature.names
print("Labels:", label_names)
# ['O', 'B-PER', 'I-PER', 'B-ORG', 'I-ORG', 'B-LOC', 'I-LOC', 'B-MISC', 'I-MISC']

3.2 The Token-Label Alignment Problem

BERT uses WordPiece tokenization: a word can be split into multiple subtokens. We must align word-level NER labels with BERT subtokens. This is one of the transformer-specific challenges in NER that does not exist with spaCy.

from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")

# Example: word "Johannesburg" and its labels
words = ["Johannesburg", "is", "the", "largest", "city"]
word_labels = ["B-LOC", "O", "O", "O", "O"]

# WordPiece tokenization
tokenized = tokenizer(
    words,
    is_split_into_words=True,   # input already word-tokenized
    return_offsets_mapping=True
)
print("Subword tokens:", tokenizer.convert_ids_to_tokens(tokenized['input_ids']))
# ['[CLS]', 'Johann', '##es', '##burg', 'is', 'the', 'largest', 'city', '[SEP]']

# Label alignment (strategy: -100 for non-first subtokens)
def align_labels(tokenized, word_labels, label2id):
    word_ids = tokenized.word_ids()
    label_ids = []
    prev_word_id = None
    for word_id in word_ids:
        if word_id is None:
            # Special token [CLS] or [SEP]
            label_ids.append(-100)
        elif word_id != prev_word_id:
            # First subtoken of word: use the real label
            label_ids.append(label2id[word_labels[word_id]])
        else:
            # Subsequent subtokens: -100 (ignored in loss)
            label_ids.append(-100)
        prev_word_id = word_id
    return label_ids

label2id = {"O": 0, "B-LOC": 1, "I-LOC": 2, "B-PER": 3, "I-PER": 4,
             "B-ORG": 5, "I-ORG": 6, "B-MISC": 7, "I-MISC": 8}

aligned = align_labels(tokenized, word_labels, label2id)
tokens = tokenizer.convert_ids_to_tokens(tokenized['input_ids'])
for tok, lab in zip(tokens, aligned):
    print(f"  {tok:15s}: {lab}")
# [CLS]          : -100
# Johann         :  1  (B-LOC)
# ##es           : -100 (ignored)
# ##burg         : -100 (ignored)
# is             :  0  (O)
# ...

3.3 Complete BERT Fine-tuning for NER

from transformers import (
    AutoModelForTokenClassification,
    AutoTokenizer,
    TrainingArguments,
    Trainer,
    DataCollatorForTokenClassification
)
from datasets import load_dataset
import evaluate
import numpy as np

# Configuration
MODEL_NAME = "bert-base-cased"
DATASET_NAME = "conll2003"
MAX_LENGTH = 128

dataset = load_dataset(DATASET_NAME)
label_names = dataset['train'].features['ner_tags'].feature.names
num_labels = len(label_names)
id2label = {i: l for i, l in enumerate(label_names)}
label2id = {l: i for i, l in enumerate(label_names)}

tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)

def tokenize_and_align_labels(examples):
    tokenized = tokenizer(
        examples["tokens"],
        truncation=True,
        max_length=MAX_LENGTH,
        is_split_into_words=True
    )
    all_labels = []
    for i, labels in enumerate(examples["ner_tags"]):
        word_ids = tokenized.word_ids(batch_index=i)
        label_ids = []
        prev_word_id = None
        for word_id in word_ids:
            if word_id is None:
                label_ids.append(-100)
            elif word_id != prev_word_id:
                label_ids.append(labels[word_id])
            else:
                label_ids.append(-100)
            prev_word_id = word_id
        all_labels.append(label_ids)
    tokenized["labels"] = all_labels
    return tokenized

tokenized_datasets = dataset.map(
    tokenize_and_align_labels,
    batched=True,
    remove_columns=dataset["train"].column_names
)

# Model
model = AutoModelForTokenClassification.from_pretrained(
    MODEL_NAME,
    num_labels=num_labels,
    id2label=id2label,
    label2id=label2id
)

# Data collator with dynamic padding for NER
data_collator = DataCollatorForTokenClassification(tokenizer)

# seqeval metrics for span-level NER evaluation
seqeval = evaluate.load("seqeval")

def compute_metrics(p):
    predictions, labels = p
    predictions = np.argmax(predictions, axis=2)
    true_predictions = [
        [label_names[p] for (p, l) in zip(pred, label) if l != -100]
        for pred, label in zip(predictions, labels)
    ]
    true_labels = [
        [label_names[l] for (p, l) in zip(pred, label) if l != -100]
        for pred, label in zip(predictions, labels)
    ]
    results = seqeval.compute(predictions=true_predictions, references=true_labels)
    return {
        "precision": results["overall_precision"],
        "recall": results["overall_recall"],
        "f1": results["overall_f1"],
        "accuracy": results["overall_accuracy"],
    }

# Training
args = TrainingArguments(
    output_dir="./results/bert-ner-conll",
    num_train_epochs=3,
    per_device_train_batch_size=32,
    per_device_eval_batch_size=64,
    learning_rate=2e-5,
    warmup_ratio=0.1,
    weight_decay=0.01,
    evaluation_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
    metric_for_best_model="f1",
    fp16=True,
    report_to="none"
)

trainer = Trainer(
    model=model,
    args=args,
    train_dataset=tokenized_datasets["train"],
    eval_dataset=tokenized_datasets["validation"],
    tokenizer=tokenizer,
    data_collator=data_collator,
    compute_metrics=compute_metrics
)

trainer.train()
# Expected F1 on CoNLL-2003 test: ~91-92% (BERT-base-cased)
# With RoBERTa-large: ~93-94%

4. Advanced NER Architectures

Beyond classic BERT fine-tuning, there are architectural variants that improve NER performance, particularly for capturing dependencies between BIO labels.

4.1 BERT + CRF Layer

A CRF (Conditional Random Field) applied on top of BERT imposes structural constraints on label sequences: for example, an I-ORG token cannot follow a B-PER. This reduces common sequence errors in purely neural architectures.

# BERT + CRF with pytorch-crf
# pip install pytorch-crf

import torch
import torch.nn as nn
from transformers import BertModel, BertPreTrainedModel
from torchcrf import CRF

class BertCRFForNER(BertPreTrainedModel):
    """BERT fine-tuned with a CRF layer for NER."""

    def __init__(self, config, num_labels):
        super().__init__(config)
        self.bert = BertModel(config)
        self.dropout = nn.Dropout(config.hidden_dropout_prob)
        self.classifier = nn.Linear(config.hidden_size, num_labels)
        self.crf = CRF(num_labels, batch_first=True)
        self.init_weights()

    def forward(self, input_ids, attention_mask, token_type_ids=None, labels=None):
        outputs = self.bert(
            input_ids,
            attention_mask=attention_mask,
            token_type_ids=token_type_ids
        )
        sequence_output = self.dropout(outputs[0])
        emissions = self.classifier(sequence_output)  # (batch, seq_len, num_labels)

        if labels is not None:
            # Training: compute CRF loss (negative log-likelihood)
            loss = -self.crf(emissions, labels, mask=attention_mask.bool(), reduction='mean')
            return {'loss': loss, 'logits': emissions}
        else:
            # Inference: Viterbi decoding
            predictions = self.crf.decode(emissions, mask=attention_mask.bool())
            return {'predictions': predictions, 'logits': emissions}

# Advantages of CRF:
# + Guarantees valid BIO sequences (no I-X without B-X before)
# + Improves F1 by ~0.5-1.5 points on CoNLL
# Disadvantages:
# - Slower at inference (Viterbi decoding O(n * L^2))
# - More complex to implement

4.2 More Recent Models: RoBERTa and DeBERTa for NER

from transformers import pipeline

# RoBERTa-large: ~1.5% more F1 than BERT-base on CoNLL-2003
# Use the same code but change MODEL_NAME

# Best English NER model on CoNLL benchmark:
model_name = "Jean-Baptiste/roberta-large-ner-english"

ner_pipeline = pipeline(
    "ner",
    model=model_name,
    aggregation_strategy="simple"
)

text = "Elon Musk's Tesla announced a new Gigafactory in Berlin, Germany, with a 5B EUR investment."
entities = ner_pipeline(text)
for ent in entities:
    print(f"  '{ent['word']}' -> {ent['entity_group']} (score={ent['score']:.3f})")

# Benchmark comparison (CoNLL-2003 test set F1):
# BERT-base-cased:     ~92.0%
# RoBERTa-large:       ~93.5%
# DeBERTa-v3-large:    ~94.0%
# XLNet-large:         ~93.0%

5. Inference and Post-processing for NER

After training, inference requires post-processing to reconstruct named entities from token spans.

from transformers import pipeline
import torch

# HuggingFace Pipeline (handles post-processing automatically)
ner_pipeline = pipeline(
    "ner",
    model="./results/bert-ner-conll",
    tokenizer="./results/bert-ner-conll",
    aggregation_strategy="simple"  # groups subtokens of the same entity
)

texts = [
    "Tim Cook presented Apple's new iPhone 16 in Cupertino last September.",
    "The European Central Bank in Frankfurt raised rates by 25 basis points.",
    "Enel Green Power signed a deal worth 2.5 billion euros with the Italian government.",
]

for text in texts:
    entities = ner_pipeline(text)
    print(f"\nText: {text}")
    for ent in entities:
        print(f"  '{ent['word']}' -> {ent['entity_group']} "
              f"(score={ent['score']:.3f}, start={ent['start']}, end={ent['end']})")

# Available aggregation strategies:
# "none": returns all tokens with their label
# "simple": groups consecutive tokens with the same entity group
# "first": uses the label of the first subtoken per word
# "average": averages logits across subtokens (more accurate)
# "max": uses the maximum logit across subtokens

5.1 NER on Long Documents (over 512 tokens)

def ner_long_document(text, ner_pipeline, max_length=400, stride=50):
    """
    NER on documents longer than 512 tokens using a sliding window.
    max_length: maximum tokens per window
    stride: overlap between consecutive windows (avoids boundary artifacts)
    """
    words = text.split()
    all_entities = []
    processed_positions = set()

    for start_idx in range(0, len(words), max_length - stride):
        end_idx = min(start_idx + max_length, len(words))
        chunk = ' '.join(words[start_idx:end_idx])

        entities = ner_pipeline(chunk)

        # Adjust offset for position in original text
        chunk_offset = len(' '.join(words[:start_idx])) + (1 if start_idx > 0 else 0)
        for ent in entities:
            abs_start = ent['start'] + chunk_offset
            abs_end = ent['end'] + chunk_offset

            # Avoid duplicates from overlap
            if abs_start not in processed_positions:
                all_entities.append({
                    'word': ent['word'],
                    'entity_group': ent['entity_group'],
                    'score': ent['score'],
                    'start': abs_start,
                    'end': abs_end
                })
                processed_positions.add(abs_start)

        if end_idx == len(words):
            break

    return sorted(all_entities, key=lambda x: x['start'])

# Alternative: use Longformer (supports up to 4096 tokens natively)
# from allenai/longformer-base-4096

6. NER for Italian

Italian has morphological characteristics that make NER more challenging: gender and number agreement, clitic forms, proper names with definite articles ("la Roma", "il Milan"). Here are the best available options.

import spacy
from transformers import pipeline

# spaCy NER for Italian
nlp_it = spacy.load("it_core_news_lg")

italian_texts = [
    "Il primo ministro Giorgia Meloni ha incontrato il presidente francese Macron a Parigi.",
    "Fiat Chrysler Automobiles ha annunciato fusione con PSA Group per 50 miliardi.",
    "L'AS Roma ha battuto la Lazio per 2-1 allo Stadio Olimpico domenica sera.",
    "Il Tribunale di Milano ha condannato Mediaset a pagare 300 milioni a Vivendi.",
]

print("=== Italian NER with spaCy it_core_news_lg ===")
for text in italian_texts:
    doc = nlp_it(text)
    entities = [(ent.text, ent.label_) for ent in doc.ents]
    print(f"\nText: {text[:70]}")
    print(f"Entities: {entities}")

# BERT NER for Italian
try:
    it_ner = pipeline(
        "ner",
        model="osiria/bert-base-italian-uncased-ner",
        aggregation_strategy="simple"
    )
    text = "Matteo Renzi ha fondato Italia Viva a Firenze nel 2019."
    entities = it_ner(text)
    print("\n=== BERT NER Italian ===")
    for ent in entities:
        print(f"  '{ent['word']}' -> {ent['entity_group']} ({ent['score']:.3f})")
except Exception as e:
    print(f"Model not available: {e}")

# Italian NER options summary:
print("\nItalian NER Options:")
print("  1. spaCy it_core_news_lg (fastest, F1 ~85%)")
print("  2. osiria/bert-base-italian-uncased-ner (more accurate, F1 ~88%)")
print("  3. Custom fine-tuning on WikiNEuRal IT (highest quality)")

7. Evaluation and NER Metrics

from seqeval.metrics import (
    classification_report,
    f1_score,
    precision_score,
    recall_score
)

# seqeval evaluates at span level (entire entity)
# more appropriate than token-level accuracy

true_sequences = [
    ['O', 'B-PER', 'I-PER', 'O', 'B-ORG', 'O'],
    ['B-LOC', 'I-LOC', 'O', 'O', 'B-DATE', 'O'],
]
pred_sequences = [
    ['O', 'B-PER', 'I-PER', 'O', 'O', 'O'],      # misses ORG
    ['B-LOC', 'I-LOC', 'O', 'O', 'B-DATE', 'O'],   # perfect
]

print("=== NER Evaluation (span-level) ===")
print(classification_report(true_sequences, pred_sequences))
print(f"Overall F1:        {f1_score(true_sequences, pred_sequences):.4f}")
print(f"Overall Precision: {precision_score(true_sequences, pred_sequences):.4f}")
print(f"Overall Recall:    {recall_score(true_sequences, pred_sequences):.4f}")

# Types of NER errors:
# 1. False Negative (Missed): entity not recognized
# 2. False Positive (Spurious): entity invented where there is none
# 3. Wrong Type: entity found but wrong type (PER instead of ORG)
# 4. Wrong Boundary: entity found but span partially incorrect

# Key difference:
# Token-level accuracy: counts correct tokens / total tokens
# Span-level F1 (seqeval): an entity is correct ONLY if
# ALL its tokens have the right label
# -> much stricter and more realistic

8. Case Study: NER on Financial News Articles

Let us build a complete NER pipeline to extract entities from financial articles: companies, key people, monetary values, and dates.

from transformers import pipeline
from collections import defaultdict

class FinancialNERExtractor:
    """
    NER extractor specialized for financial news.
    Extracts: companies, key people, monetary values and dates.
    """

    def __init__(self, model_name="dslim/bert-large-NER"):
        self.ner = pipeline(
            "ner",
            model=model_name,
            aggregation_strategy="simple"
        )
        self.entity_types = {
            'ORG': 'companies',
            'PER': 'people',
            'MONEY': 'values',
            'DATE': 'dates',
            'LOC': 'locations',
            'GPE': 'locations'
        }

    def extract(self, text: str) -> dict:
        """Extract and organize entities by type."""
        entities = self.ner(text)
        result = defaultdict(list)

        for ent in entities:
            group = ent['entity_group']
            mapped = self.entity_types.get(group)
            if mapped and ent['score'] > 0.8:
                result[mapped].append({
                    'text': ent['word'],
                    'score': round(ent['score'], 3),
                    'position': (ent['start'], ent['end'])
                })

        return dict(result)

    def analyze_article(self, title: str, body: str) -> dict:
        """Full analysis of a financial article."""
        full_text = f"{title}. {body}"
        raw_entities = self.extract(full_text)

        # Deduplicate (same text, different positions)
        for etype, ents in raw_entities.items():
            seen = set()
            deduped = []
            for e in ents:
                if e['text'] not in seen:
                    seen.add(e['text'])
                    deduped.append(e)
            raw_entities[etype] = deduped

        return {
            'title': title,
            'entities': raw_entities,
            'entity_count': sum(len(v) for v in raw_entities.values())
        }

# Test
extractor = FinancialNERExtractor()

articles = [
    {
        "title": "Amazon acquires Whole Foods for $13.7 billion",
        "body": "Jeff Bezos announced the acquisition in Seattle on June 16, 2017. Whole Foods CEO John Mackey will remain in his role."
    },
    {
        "title": "Tesla opens new Gigafactory in Germany",
        "body": "Elon Musk inaugurated the Berlin factory in March 2022. The facility in Gruenheide will employ 12,000 people and produce 500,000 vehicles per year."
    },
]

for article in articles:
    result = extractor.analyze_article(article['title'], article['body'])
    print(f"Title: {result['title']}")
    print(f"Total entities: {result['entity_count']}")
    for etype, ents in result['entities'].items():
        if ents:
            texts = [e['text'] for e in ents]
            print(f"  {etype:12s}: {', '.join(texts)}")
    print()

9. Optimized NER Pipeline for Production

A production NER system must balance precision, speed, and computational cost. Below is an optimized pipeline combining a lexical pre-filter, batch inference, and result caching for high-volume scenarios.

from transformers import pipeline
import hashlib
import time
from typing import List, Dict

class OptimizedNERPipeline:
    """
    Production-optimized NER pipeline:
    - LRU-style result caching
    - Adaptive batch processing
    - Confidence filtering
    - Latency and accuracy monitoring
    """

    def __init__(
        self,
        model_name: str = "dslim/bert-large-NER",
        batch_size: int = 8,
        min_confidence: float = 0.75,
        cache_size: int = 1024
    ):
        self.ner = pipeline(
            "ner",
            model=model_name,
            aggregation_strategy="simple",
            batch_size=batch_size,
            device=0   # -1 for CPU, 0 for first GPU
        )
        self.min_confidence = min_confidence
        self._cache: Dict[str, list] = {}
        self._cache_size = cache_size
        self._stats = {"hits": 0, "misses": 0, "total_time_ms": 0.0}

    def _text_hash(self, text: str) -> str:
        return hashlib.md5(text.encode()).hexdigest()

    def extract(self, texts: List[str]) -> List[List[Dict]]:
        """NER extraction with caching and batch processing."""
        results = [None] * len(texts)
        uncached_indices = []
        uncached_texts = []

        # Check cache
        for i, text in enumerate(texts):
            key = self._text_hash(text)
            if key in self._cache:
                results[i] = self._cache[key]
                self._stats["hits"] += 1
            else:
                uncached_indices.append(i)
                uncached_texts.append(text)
                self._stats["misses"] += 1

        # Process texts not in cache
        if uncached_texts:
            start = time.perf_counter()
            raw_results = self.ner(uncached_texts)
            elapsed_ms = (time.perf_counter() - start) * 1000
            self._stats["total_time_ms"] += elapsed_ms

            # Handle single vs batch
            if len(uncached_texts) == 1:
                raw_results = [raw_results]

            for idx, raw in zip(uncached_indices, raw_results):
                # Filter by confidence and clean
                filtered = [
                    {
                        'word': e['word'].replace(' ##', '').strip(),
                        'entity_group': e['entity_group'],
                        'score': round(e['score'], 4),
                        'start': e['start'],
                        'end': e['end']
                    }
                    for e in raw
                    if e['score'] >= self.min_confidence
                ]
                key = self._text_hash(texts[idx])

                # Simple FIFO cache eviction
                if len(self._cache) >= self._cache_size:
                    oldest_key = next(iter(self._cache))
                    del self._cache[oldest_key]

                self._cache[key] = filtered
                results[idx] = filtered

        return results

    def get_stats(self) -> Dict:
        """Return pipeline performance statistics."""
        total = self._stats["hits"] + self._stats["misses"]
        return {
            "cache_hit_rate": self._stats["hits"] / total if total > 0 else 0.0,
            "avg_latency_ms": self._stats["total_time_ms"] / max(self._stats["misses"], 1),
            "cache_size": len(self._cache),
            **self._stats
        }

# Usage
ner_pipe = OptimizedNERPipeline(min_confidence=0.80)

batch_texts = [
    "Mario Draghi led the ECB from 2011 to 2019.",
    "Amazon acquired MGM Studios for $8.45 billion.",
    "MIT researchers published a study on GPT-4 capabilities.",
    "Sergio Mattarella is the President of the Italian Republic.",
]

# First call: full processing
results1 = ner_pipe.extract(batch_texts)
# Second call: all from cache!
results2 = ner_pipe.extract(batch_texts)

print("NER Pipeline Statistics:")
for k, v in ner_pipe.get_stats().items():
    print(f"  {k}: {v}")

print("\nExtraction results:")
for text, entities in zip(batch_texts, results1):
    print(f"\n  Text: {text[:60]}")
    for ent in entities:
        print(f"    '{ent['word']}' -> {ent['entity_group']} ({ent['score']:.3f})")

9.1 NER Model Comparison: Practical Benchmark

NER Benchmark: Speed vs Accuracy (2024-2025)

Model	CoNLL F1	Speed (CPU)	Params	Language	Use Case
spaCy en_core_web_sm	~84%	Very fast (<5ms)	12M	EN	Rapid prototyping
spaCy en_core_web_trf	~89%	Fast (10-30ms)	440M	EN	CPU production
dslim/bert-base-NER	~91%	Medium (50-100ms)	110M	EN	GPU production
dslim/bert-large-NER	~92%	Slow (100-200ms)	340M	EN	High accuracy
Jean-Baptiste/roberta-large-ner-english	~93.5%	Slow (150-250ms)	355M	EN	State of the art EN
osiria/bert-base-italian-uncased-ner	~88%	Medium (50-100ms)	110M	IT	Best Italian model

9.2 Text Anonymization with NER

A critical use case in legal, medical, and GDPR contexts is automatic anonymization of personal data. NER can automatically identify PER, ORG, LOC, and DATE entities for pseudonymization or redaction of sensitive documents.

from transformers import pipeline

class TextAnonymizer:
    """
    NER-based text anonymizer.
    Replaces sensitive entities with typed placeholders.
    Useful for GDPR-compliant data processing and training dataset creation.
    """

    REPLACEMENT_MAP = {
        'PER': '<PERSON>',
        'ORG': '<ORGANIZATION>',
        'LOC': '<LOCATION>',
        'GPE': '<LOCATION>',
        'DATE': '<DATE>',
        'MONEY': '<AMOUNT>',
        'MISC': '<OTHER>',
    }

    def __init__(self, model_name="dslim/bert-large-NER"):
        self.ner = pipeline(
            "ner",
            model=model_name,
            aggregation_strategy="simple"
        )

    def anonymize(self, text: str, entity_types: list = None) -> dict:
        """
        Anonymize text by replacing entities.
        entity_types: list of types to anonymize (None = all)
        """
        entities = self.ner(text)

        # Filter by type if specified
        if entity_types:
            entities = [e for e in entities if e['entity_group'] in entity_types]

        # Sort by position descending to replace from the end
        entities_sorted = sorted(entities, key=lambda e: e['start'], reverse=True)

        anonymized = text
        replacements = []

        for ent in entities_sorted:
            placeholder = self.REPLACEMENT_MAP.get(ent['entity_group'], '<ENTITY>')
            original = text[ent['start']:ent['end']]
            anonymized = anonymized[:ent['start']] + placeholder + anonymized[ent['end']:]
            replacements.append({
                'original': original,
                'replacement': placeholder,
                'type': ent['entity_group'],
                'confidence': round(ent['score'], 3)
            })

        return {
            'original': text,
            'anonymized': anonymized,
            'replacements': replacements,
            'num_entities': len(replacements)
        }

# Test
anonymizer = TextAnonymizer()

sensitive_texts = [
    "Patient John Smith, born March 15, 1978, was admitted to Massachusetts General Hospital on January 3, 2024 with a diagnosis of pneumonia.",
    "Accenture plc, headquartered at 161 North Clark Street in Chicago, reported revenues of $64.9 billion in fiscal year 2023.",
    "Attorney Sarah Johnson from Skadden Arps represented Apple Inc. in the appeal to the Federal Circuit Court.",
]

print("=== Text Anonymization with NER ===\n")
for text in sensitive_texts:
    result = anonymizer.anonymize(text, entity_types=['PER', 'ORG', 'LOC', 'GPE', 'DATE', 'MONEY'])
    print(f"Original:   {result['original'][:100]}")
    print(f"Anonymized: {result['anonymized'][:100]}")
    print(f"Replaced:   {result['num_entities']} entities")
    print()

10. NER Production Best Practices

Anti-Pattern: Ignoring Post-processing

NER models output raw BIO token-level predictions. In production, you must always reconstruct spans, handle WordPiece subtokens, and filter low-confidence entities. Never expose raw token predictions to end users.

Anti-Pattern: Evaluating with Token Accuracy Only

Token accuracy on CoNLL-2003 is typically 98-99% even for mediocre models, because most tokens have label O. Always use seqeval for span-level F1 evaluation, which is the only relevant metric for NER.

Production NER Checklist

Evaluate with seqeval (span F1), not just token accuracy
Set confidence thresholds (typically 0.7-0.85) to filter false positives
Handle overlapping entities (rare but possible)
Normalize extracted entities (deduplication, canonicalization)
Monitor entity distribution over time to detect domain shift
Use visualization (displacy) for debugging predictions
Test across different text domains: news, contracts, social media behave very differently
For Italian: use it_core_news_lg (fast) or BERT fine-tuned on WikiNEuRal IT (accurate)

Conclusions and Next Steps

NER is one of the most useful NLP tasks in real-world applications: information extraction, knowledge graph construction, feeding RAG systems, data anonymization. With spaCy for simple cases and fine-tuned BERT for high precision, you have all the tools to build robust NER pipelines for both Italian and English.

The key to excellent performance in a specific domain is always fine-tuning on annotated data from your context: even a few hundred domain-specific examples can significantly improve performance over the generic model.

Continue the Series

Next: Multi-label Text Classification — classifying texts with multiple simultaneous labels
Article 7: HuggingFace Transformers: Complete Guide — Trainer API, Model Hub, optimization
Article 8: LoRA Fine-tuning — train LLMs locally on consumer GPU
Article 9: Semantic Similarity — NER as an extraction step in RAG pipelines
Related series: AI Engineering/RAG — NER as an extraction step in RAG pipelines