Hi! I'm

Federico Calò

Software Developer | Technical Writer

I create modern web applications and custom digital tools to help businesses grow through technological innovation. My passion is combining computer science and economics to generate real value.

Contact Me

About Me

My passion for computer science was born at the Technical Commercial Institute of Maglie, where I discovered the power of programming and the fascination of creating digital solutions. From the start, I understood that computer science was not just code, but an extraordinary tool for turning ideas into reality.

During my studies in Business Information Systems, I began to interweave computer science and economics, understanding how technology can be the engine of growth for any business. This vision accompanied me to the University of Bari, where I obtained my degree in Computer Science, deepening my technical skills and passion for software development.

Today I put this experience at the service of businesses, professionals and startups, creating tailor-made digital solutions that automate processes, optimize resources and open new business opportunities. Because true innovation begins when technology meets the real needs of people.

My Skills

Data Analysis & Predictive Models

I transform data into strategic insights with in-depth analysis and predictive models for informed decisions

Process Automation

I create custom tools that automate repetitive operations and free up time for value-added activities

Custom Systems

I develop tailor-made software systems, from platform integrations to customized dashboards

const federico = {
  nome: "Federico Calò",
  ruolo: "Sviluppatore Software",
  città: "Bari, Italia",
  missione: "Aiutare attraverso l'informatica",
  passioni: [
    "Codice Pulito",
    "Innovazione",
    "Crescita Continua"
  ]
};

La Mia Missione

Credo fermamente che l'informatica sia lo strumento più potente per trasformare le idee in realtà e migliorare la vita delle persone.

🚀

Democratizzare la Tecnologia

La mia missione è rendere l'informatica accessibile a tutti: dalle piccole imprese locali alle startup innovative, fino ai professionisti che vogliono digitalizzare la propria attività. Ogni realtà merita di sfruttare le potenzialità del digitale.

💡

Unire Informatica ed Economia

Non è solo questione di scrivere codice: è capire come la tecnologia possa generare valore reale. Intrecciando competenze informatiche e visione economica, aiuto le attività a crescere, ottimizzare processi e raggiungere nuovi traguardi di efficienza e redditività.

🎯

Creare Soluzioni su Misura

Ogni attività è unica, e così devono esserlo le soluzioni. Sviluppo strumenti personalizzati che rispondono alle esigenze specifiche di ciascun cliente, automatizzando processi ripetitivi e liberando tempo per ciò che conta davvero: far crescere il business.

Trasforma la Tua Attività con la Tecnologia

December 2024

View

Master SQL

RoadMap.sh

Novembre 2024

View

Oracle Certified Foundations Associate

Oracle

October 2024

View

People Leadership Credential

Connect

Settembre 2024

💻 Languages & Technologies

☕Java

🐍Python

📜JavaScript

🅰️Angular

⚛️React

🔷TypeScript

🗄️SQL

🐘PHP

🎨CSS/SCSS

🔧Node.js

🐳Docker

🌿Git

💼

12/2024 - Presente

Custom Software Engineering Analyst

Accenture

Bari, Puglia, Italia · Ibrida Analisi e sviluppo di sistemi informatici attraverso l'utilizzo di Java e Quarkus in Health and Public Sector. Formazione continua su tecnologie moderne per la creazione di soluzioni software personalizzate ed efficienti e sugli agenti.

💼

06/2022 - 12/2024

Analista software e Back End Developer Associate Consultant

Links Management and Technology SpA

Esperienza nell'analisi di sistemi software as-is e flussi ETL utilizzando PowerCenter. Formazione completata su Spring Boot per lo sviluppo di applicazioni backend moderne e scalabili. Sviluppatore Backend specializzato in Spring Boot, con esperienza in progettazione di database, analisi, sviluppo e testing dei task assegnati.

💼

02/2021 - 10/2021

Programmatore software

Adesso.it (prima era WebScience srl)

Esperienza nell'analisi AS-IS e TO-BE, evoluzioni SEO ed evoluzioni website per migliorare le performance e l'engagement degli utenti.

🎓

2018 - 2025

Laurea in Informatica

Università degli Studi di Bari Aldo Moro

Bachelor's degree in Computer Science, focusing on software engineering, algorithms, and modern development practices.

📚

2013 - 2018

Diploma - Sistemi Informativi Aziendali

Istituto Tecnico Commerciale di Maglie

Technical diploma specializing in Business Information Systems, combining IT knowledge with business management.

Contattami

Hai un progetto in mente? Parliamone! Compila il form qui sotto e ti risponderò al più presto.

* Campi obbligatori. I tuoi dati saranno utilizzati solo per rispondere alla tua richiesta.

SLM in 2026: Overview of Small Language Models and Benchmarks

In 2023 "AI model" almost always meant GPT-4 or Claude. In 2026, the landscape and radically different: Phi-4-mini (3.8 billion parameters) outperforms Mixtral 8x7B (46B) on mathematical reasoning benchmarks, the 135 million parameter SmolLM2 runs on Raspberry Pi 4, and Gemma 3n E4B has an LMArena Elo above 1300 — higher than many 70B models from a year ago. The era of Small Language Models has arrived, and it has implications concrete for those who develop AI applications.

What You Will Learn

The map of the main SLMs in 2026: Phi-4-mini, Gemma 3n, Qwen 3, SmolLM2, DeepSeek
How to interpret the benchmarks: MMLU, HumanEval, MATH, GPQA
Which model to choose for coding, reasoning, chat and classification tasks
Hardware required for local inference with each model
How to run custom benchmarks on your use case

What Defines a "Small" Language Model in 2026

The definition of "small" has changed over time. In 2024, "small" meant below 7B parameters. In 2026, with 1B models competing with the 13Bs of two years ago, the threshold practice has moved: we consider SLM the models under 10B parameters that can run on consumer hardware without aggressive quantization.

Model	Parameters	Creator	License	VRAM (fp16)
SmolLM2	135M - 1.7B	HuggingFace	Apache 2.0	0.3 - 3.5 GB
Phi-4-mini	3.8B	Microsoft	MIT	7.6 GB
Gemma 3n E4B	4B eff.	Google	Gemma ToS	8GB
Qwen 3 (1.7B)	1.7B	Alibaba	Apache 2.0	3.4GB
Qwen 3 (7B)	7B	Alibaba	Apache 2.0	14GB
DeepSeek-R1 (7B)	7B distilled	DeepSeek	MIT	14GB
Mistral 7B v0.3	7B	Mistral AI	Apache 2.0	14GB
Llama 3.2 (3B)	3B	Half	Llamas 3.2	6GB

Benchmarks: How to Interpret Them Correctly

Academic benchmarks are useful but should be interpreted with caution. A model that excels on MMLU (general knowledge) may be inadequate for generating clean Python code. Here are the main benchmarks and what they actually measure.

Key Academic Benchmarks

# Confronto benchmark principali (valori approssimativi, Febbraio 2026)
benchmarks = {
    "Phi-4-mini (3.8B)": {
        "MMLU": 72.8,         # Conoscenza generale, 57 soggetti
        "HumanEval": 62.3,    # Completamento codice Python
        "MATH": 70.5,         # Ragionamento matematico (AMC/AIME)
        "GPQA Diamond": 36.2, # PhD-level science questions
        "MT-Bench": 7.8,      # Conversazione multi-turno (1-10)
    },
    "Gemma 3n E4B": {
        "MMLU": 74.1,
        "HumanEval": 58.7,
        "MATH": 65.3,
        "GPQA Diamond": 34.8,
        "MT-Bench": 8.1,
        "LMArena Elo": 1312,  # Confronto umano (stile chess ELO)
    },
    "Qwen 3 7B": {
        "MMLU": 78.3,
        "HumanEval": 72.1,
        "MATH": 78.9,
        "GPQA Diamond": 41.2,
        "MT-Bench": 8.4,
    },
    "DeepSeek-R1 7B distilled": {
        "MMLU": 75.2,
        "HumanEval": 68.4,
        "MATH": 82.3,         # Eccelle nel ragionamento matematico
        "GPQA Diamond": 38.7,
        "MT-Bench": 8.0,
    },
    # Per confronto: modelli piu grandi
    "Mixtral 8x7B (46B)": {
        "MMLU": 71.4,         # Phi-4-mini (3.8B) lo supera!
        "HumanEval": 60.1,
        "MATH": 66.8,
    },
}

# NOTA IMPORTANTE: i benchmark non misurano tutto
# - Hallucination rate (frequenza di invenzioni plausibili)
# - Instruction following su task complessi
# - Context handling su documenti lunghi
# - Velocita di inferenza su hardware specifico
# - Consumo energetico

How to Build a Benchmark on Your Use Case

from lm_eval.api.model import LM
from lm_eval import evaluator
import json

def benchmark_slm_for_custom_task(
    model_name: str,
    task_examples: list[dict],
    metric_fn: callable
) -> dict:
    """
    Benchmark di un SLM su un task personalizzato.
    task_examples: lista di {"input": str, "expected": str}
    metric_fn: funzione che restituisce float 0-1 (accuracy, F1, etc.)
    """
    from transformers import AutoModelForCausalLM, AutoTokenizer
    import torch

    tokenizer = AutoTokenizer.from_pretrained(model_name)
    model = AutoModelForCausalLM.from_pretrained(
        model_name,
        torch_dtype=torch.float16,
        device_map="auto"
    )

    results = []
    total_time = 0

    for example in task_examples:
        import time
        start = time.time()

        inputs = tokenizer(example["input"], return_tensors="pt").to(model.device)
        with torch.no_grad():
            outputs = model.generate(
                **inputs,
                max_new_tokens=256,
                temperature=0.1,  # bassa temperatura per task deterministici
                do_sample=False
            )
        elapsed = time.time() - start

        generated = tokenizer.decode(
            outputs[0][inputs.input_ids.shape[1]:],
            skip_special_tokens=True
        )

        score = metric_fn(generated, example["expected"])
        results.append({
            "input": example["input"][:50],
            "expected": example["expected"],
            "generated": generated,
            "score": score,
            "latency_ms": elapsed * 1000
        })
        total_time += elapsed

    avg_score = sum(r["score"] for r in results) / len(results)
    avg_latency = sum(r["latency_ms"] for r in results) / len(results)

    return {
        "model": model_name,
        "task_score": round(avg_score, 4),
        "avg_latency_ms": round(avg_latency, 1),
        "total_examples": len(results),
        "hardware": torch.cuda.get_device_name(0) if torch.cuda.is_available() else "CPU",
        "detail": results
    }

# Esempio di uso per classificazione del sentiment in italiano
sentiment_examples = [
    {"input": "Classifica il sentiment: 'Il prodotto e eccellente!' -> ", "expected": "positivo"},
    {"input": "Classifica il sentiment: 'Esperienza terribile, non lo raccomando' -> ", "expected": "negativo"},
    # ... 100 esempi dal proprio dataset reale ...
]

def exact_match(generated: str, expected: str) -> float:
    return 1.0 if expected.lower() in generated.lower() else 0.0

# Eseguire il benchmark su piu modelli
for model in ["microsoft/phi-4-mini", "Qwen/Qwen3-7B", "google/gemma-3n-e4b"]:
    result = benchmark_slm_for_custom_task(model, sentiment_examples, exact_match)
    print(f"{model}: score={result['task_score']}, latency={result['avg_latency_ms']}ms")

Hardware Requirements: What You Need to Run SLMs

One of the main advantages of SLMs is that they run on consumer hardware. Here is the practical guide.

Hardware	VRAM / RAM	Compatible Models (fp16)	Tokens/sec (approx)
MacBook M3 Pro (18GB)	18 GB unified	Phi-4-mini, Gemma 3n, Llama 3.2 3B	25-40 tok/s
MacBook M4 Max (48GB)	48 GB unified	All 7B, Llama 3 8B	60-80 tok/s
RTX 4060 (8GB)	8GB VRAM	Phi-4-mini q4, SmolLM2 1.7B	35-55 tok/s
RTX 4070 (12GB)	12GB VRAM	Phi-4-mini fp16, Qwen 3 7B q4	50-70 tok/s
RTX 4090 (24GB)	24GB VRAM	All 7B fp16, Llama 3 8B	100-130 tok/s
A100 Server (80GB)	80GB VRAM	Models up to 40B	200-400 tok/s

# Verificare se un modello entra nella VRAM disponibile
def check_model_fits_vram(
    model_name: str,
    quantization: str = "fp16",
    safety_margin: float = 0.85
) -> dict:
    """
    Stima il VRAM necessario e verifica la compatibilita.
    quantization: 'fp32', 'fp16', 'int8', 'int4' (gguf q4)
    """
    import torch

    # Stima parametri del modello
    param_counts = {
        "microsoft/phi-4-mini": 3.8e9,
        "google/gemma-3n-E4B": 4.0e9,
        "Qwen/Qwen3-7B": 7.6e9,
        "deepseek-ai/DeepSeek-R1-Distill-Qwen-7B": 7.6e9,
        "meta-llama/Llama-3.2-3B": 3.2e9,
        "HuggingFaceTB/SmolLM2-1.7B": 1.7e9,
    }

    bytes_per_param = {
        "fp32": 4, "fp16": 2, "bf16": 2, "int8": 1, "int4": 0.5
    }

    params = param_counts.get(model_name, 0)
    if params == 0:
        return {"error": f"Model {model_name} not in database"}

    model_vram_gb = (params * bytes_per_param.get(quantization, 2)) / 1e9
    overhead_gb = 1.5  # KV cache + activations
    total_vram_gb = model_vram_gb + overhead_gb

    available_vram = 0
    if torch.cuda.is_available():
        available_vram = torch.cuda.get_device_properties(0).total_memory / 1e9
    elif hasattr(torch.backends, 'mps') and torch.backends.mps.is_available():
        # Apple Silicon: usare la memoria di sistema disponibile
        import psutil
        available_vram = psutil.virtual_memory().available / 1e9 * 0.7

    fits = total_vram_gb <= available_vram * safety_margin

    return {
        "model": model_name,
        "quantization": quantization,
        "model_vram_gb": round(model_vram_gb, 2),
        "total_with_overhead_gb": round(total_vram_gb, 2),
        "available_gb": round(available_vram, 2),
        "fits": fits,
        "recommendation": "OK" if fits else
            f"Insufficiente: prova int4 o un modello piu piccolo"
    }

# Test
for model in ["microsoft/phi-4-mini", "Qwen/Qwen3-7B"]:
    for quant in ["fp16", "int8", "int4"]:
        result = check_model_fits_vram(model, quant)
        status = "OK" if result["fits"] else "NO"
        print(f"[{status}] {model} ({quant}): {result['total_with_overhead_gb']:.1f}GB needed")

Which SLM to Choose for Your Use Case

The choice of model mainly depends on the task. Here is the practical guide based on benchmarks and community testing in 2026.

Use Case Recommendations

Coding (Python, TypeScript, SQL): Qwen 3 7B or DeepSeek-R1 7B — best in class for 7B
Mathematical/logical reasoning: DeepSeek-R1 7B distilled — huge improvement over the base
Chat and general assistant: Phi-4-mini or Gemma 3n — best quality/size ratio
Simple classification and NLU: SmolLM2 1.7B — already above threshold for many tasks
Mobile on-device: Gemma 3n E4B (optimized for NPU) or SmolLM2 135M
Italian RAG: Phi-4-mini (multilingual forte) or Mistral 7B v0.3

Conclusions

2026 has definitively validated the era of Small Language Models: 3-7B models with the right architecture and the right training data beat models 10x older than two years ago. The choice is no longer "LLM vs SLM" but "which SLM for which task on which hardware".

The next article in the series compares the Phi-4-mini and Gemma 3n in detail: the two choices most interesting for edge deployment in 2026, with side-by-side benchmarks on coding, reasoning and conversational tasks in Italian and English.

Series: Small Language Models

Article 1 (this): SLM in 2026 - Overview and Benchmark
Article 2: Phi-4-mini vs Gemma 3n - Detailed Comparison
Article 3: Fine-tuning with LoRA and QLoRA
Article 4: Quantization for Edge - GGUF, ONNX, INT4
Article 5: Ollama - SLM Locally in 5 Minutes