Hi! I'm

Federico Calò

Software Developer | Technical Writer

I create modern web applications and custom digital tools to help businesses grow through technological innovation. My passion is combining computer science and economics to generate real value.

Contact Me

About Me

My passion for computer science was born at the Technical Commercial Institute of Maglie, where I discovered the power of programming and the fascination of creating digital solutions. From the start, I understood that computer science was not just code, but an extraordinary tool for turning ideas into reality.

During my studies in Business Information Systems, I began to interweave computer science and economics, understanding how technology can be the engine of growth for any business. This vision accompanied me to the University of Bari, where I obtained my degree in Computer Science, deepening my technical skills and passion for software development.

Today I put this experience at the service of businesses, professionals and startups, creating tailor-made digital solutions that automate processes, optimize resources and open new business opportunities. Because true innovation begins when technology meets the real needs of people.

My Skills

Data Analysis & Predictive Models

I transform data into strategic insights with in-depth analysis and predictive models for informed decisions

Process Automation

I create custom tools that automate repetitive operations and free up time for value-added activities

Custom Systems

I develop tailor-made software systems, from platform integrations to customized dashboards

const federico = {
  nome: "Federico Calò",
  ruolo: "Sviluppatore Software",
  città: "Bari, Italia",
  missione: "Aiutare attraverso l'informatica",
  passioni: [
    "Codice Pulito",
    "Innovazione",
    "Crescita Continua"
  ]
};

La Mia Missione

Credo fermamente che l'informatica sia lo strumento più potente per trasformare le idee in realtà e migliorare la vita delle persone.

🚀

Democratizzare la Tecnologia

La mia missione è rendere l'informatica accessibile a tutti: dalle piccole imprese locali alle startup innovative, fino ai professionisti che vogliono digitalizzare la propria attività. Ogni realtà merita di sfruttare le potenzialità del digitale.

💡

Unire Informatica ed Economia

Non è solo questione di scrivere codice: è capire come la tecnologia possa generare valore reale. Intrecciando competenze informatiche e visione economica, aiuto le attività a crescere, ottimizzare processi e raggiungere nuovi traguardi di efficienza e redditività.

🎯

Creare Soluzioni su Misura

Ogni attività è unica, e così devono esserlo le soluzioni. Sviluppo strumenti personalizzati che rispondono alle esigenze specifiche di ciascun cliente, automatizzando processi ripetitivi e liberando tempo per ciò che conta davvero: far crescere il business.

Trasforma la Tua Attività con la Tecnologia

December 2024

View

Master SQL

RoadMap.sh

Novembre 2024

View

Oracle Certified Foundations Associate

Oracle

October 2024

View

People Leadership Credential

Connect

Settembre 2024

💻 Languages & Technologies

☕Java

🐍Python

📜JavaScript

🅰️Angular

⚛️React

🔷TypeScript

🗄️SQL

🐘PHP

🎨CSS/SCSS

🔧Node.js

🐳Docker

🌿Git

💼

12/2024 - Presente

Custom Software Engineering Analyst

Accenture

Bari, Puglia, Italia · Ibrida Analisi e sviluppo di sistemi informatici attraverso l'utilizzo di Java e Quarkus in Health and Public Sector. Formazione continua su tecnologie moderne per la creazione di soluzioni software personalizzate ed efficienti e sugli agenti.

💼

06/2022 - 12/2024

Analista software e Back End Developer Associate Consultant

Links Management and Technology SpA

Esperienza nell'analisi di sistemi software as-is e flussi ETL utilizzando PowerCenter. Formazione completata su Spring Boot per lo sviluppo di applicazioni backend moderne e scalabili. Sviluppatore Backend specializzato in Spring Boot, con esperienza in progettazione di database, analisi, sviluppo e testing dei task assegnati.

💼

02/2021 - 10/2021

Programmatore software

Adesso.it (prima era WebScience srl)

Esperienza nell'analisi AS-IS e TO-BE, evoluzioni SEO ed evoluzioni website per migliorare le performance e l'engagement degli utenti.

🎓

2018 - 2025

Laurea in Informatica

Università degli Studi di Bari Aldo Moro

Bachelor's degree in Computer Science, focusing on software engineering, algorithms, and modern development practices.

📚

2013 - 2018

Diploma - Sistemi Informativi Aziendali

Istituto Tecnico Commerciale di Maglie

Technical diploma specializing in Business Information Systems, combining IT knowledge with business management.

Contattami

Hai un progetto in mente? Parliamone! Compila il form qui sotto e ti risponderò al più presto.

* Campi obbligatori. I tuoi dati saranno utilizzati solo per rispondere alla tua richiesta.

Phi-4-mini vs Gemma 3n: Microsoft vs Google for Edge AI

Two tech giants, two different philosophies, both with the goal of creating the best model under 5 billion parameters. Microsoft has bet on data quality training with Phi-4-mini: you can teach a small model to think like one great if you train it on "textbook-quality" data. Google focused on architecture hardware-aware with Gemma 3n: a model designed from the beginning to run efficiently on mobile NPUs. This direct comparison reveals when to choose one and when to choose the other.

What You Will Learn

The Phi-4-mini architecture: why textbook-quality data works
Gemma 3n E4B: the MatFormer architecture and the concept of "effective 4B"
Side-by-side benchmarks on coding, reasoning and speaking in Italian
Sweet spot hardware for each model
When to choose Phi-4-mini and when Gemma 3n

Phi-4-mini: The Philosophy of Quality Data

Phi-4-mini (Microsoft, December 2024) and based on a simple but powerful thesis: the problem of small models it is not the size but the quality of the training data. The Phi series uses synthetic data generated from larger models, filtered for pedagogical quality — such as i textbooks versus lecture notes. The result is that Phi-4-mini (3.8B) passes Mixtral 8x7B (46B, twelve times larger) on reasoning benchmark.

# Phi-4-mini: setup iniziale con transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "microsoft/Phi-4-mini-instruct"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.float16,   # 7.6 GB VRAM
    device_map="auto",
    trust_remote_code=True       # richiesto per Phi-4
)

# Phi-4-mini usa il formato chat con messaggi strutturati
messages = [
    {
        "role": "system",
        "content": "Sei un assistente tecnico esperto in database PostgreSQL. "
                   "Rispondi sempre in italiano con esempi pratici."
    },
    {
        "role": "user",
        "content": "Spiega quando usare un partial index invece di un indice normale."
    }
]

# Applicare il template di chat
input_text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

inputs = tokenizer(input_text, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=512,
        temperature=0.3,
        do_sample=True,
        top_p=0.9,
        repetition_penalty=1.1
    )

response = tokenizer.decode(
    outputs[0][inputs.input_ids.shape[1]:],
    skip_special_tokens=True
)
print(response)

Strengths of Phi-4-mini

# Test 1: Ragionamento matematico (dove Phi eccelle)
math_problem = """
Un treno parte da Milano alle 8:00 a 120 km/h.
Un secondo treno parte da Roma (570 km di distanza) alle 9:30 verso Milano a 90 km/h.
A che ora si incontrano e a che distanza da Milano?
Mostra tutti i passaggi.
"""

# Phi-4-mini risolve questo correttamente (>70% accuracy su MATH benchmark)
# vs Mixtral 8x7B che spesso sbaglia i calcoli multi-step

# Test 2: Coding Python (buono ma non il migliore nella categoria)
coding_task = """
Scrivi una funzione Python che dato un testo in italiano:
1. Rimuova le stopwords italiane
2. Applichi lemmatizzazione con spaCy
3. Ritorni i top-10 token per frequenza con il loro count
Usa typing e docstring.
"""

# Test 3: Istruzione following in italiano
instruction_task = """
Rispondimi SOLO con un JSON valido in questo formato:
{"risposta": "si" o "no", "motivo": "stringa di max 50 parole"}

PostgreSQL 18 supporta OAuth 2.0 nativamente?
"""
# Phi-4-mini segue le istruzioni di formato con alta fedeltà

Gemma 3n E4B: Hardware-Aware Architecture

Gemma 3n E4B (Google, April 2025) introduces a radically different architecture: MatFormer, which uses a nested structure of transformers to create efficient sub-models. The suffix “E4B” means “Effective 4 Billion” — the model technically has more parameters, but uses a "matryoshka" system where it can run with the computational equivalent of 4B parameters.

# Gemma 3n E4B: richiede Keras 3 o transformers >= 4.49
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
import torch

model_id = "google/gemma-3n-E4B-it"  # variante instruction-tuned

# Per dispositivi con 8GB VRAM: usare int4 quantization
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    quantization_config=bnb_config,
    device_map="auto"
)

# Gemma 3n usa il formato chat standard
messages = [
    {"role": "user", "content": "Come funziona il partial index in PostgreSQL?"}
]

input_ids = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)

with torch.no_grad():
    outputs = model.generate(
        input_ids,
        max_new_tokens=512,
        do_sample=True,
        temperature=0.7
    )

print(tokenizer.decode(outputs[0][input_ids.shape[1]:], skip_special_tokens=True))

3n Gem for Mobile (its Strength)

# Gemma 3n e progettato per NPU mobili con MediaPipe LLM Inference API
# Per Android con Snapdragon 8 Gen 3/4/5 NPU:

# 1. Esportare il modello in formato MediaPipe (LiteRT)
# Questa operazione si fa una volta offline

"""
# requirements: pip install ai-edge-torch
import ai_edge_torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "google/gemma-3n-E4B-it"
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float32)

# Esportare per MediaPipe (dispositivi Android)
edge_model = ai_edge_torch.convert(
    model,
    sample_inputs=(torch.ones(1, 512, dtype=torch.long),),
    quant_config=ai_edge_torch.quantize.QuantConfig(
        generative_weights_dtype=ai_edge_torch.quantize.QuantDtype.AI_EDGE_TORCH_INT4
    )
)
edge_model.export("gemma3n_android.tflite")
"""

# 2. Usare nel codice Android (Kotlin)
"""
// build.gradle.kts
dependencies {
    implementation("com.google.mediapipe:tasks-genai:0.10.22")
}

// LlmInference.kt
val options = LlmInference.LlmInferenceOptions.builder()
    .setModelPath("/data/local/tmp/gemma3n_android.tflite")
    .setMaxTokens(512)
    .setPreferredBackend(LlmInference.Backend.GPU)  // usa NPU/GPU
    .build()

val llmInference = LlmInference.createFromOptions(context, options)
val response = llmInference.generateResponse("Come ottimizzare PostgreSQL?")
"""

Side-by-Side Comparative Benchmark

Tests performed on RTX 4070 (12GB VRAM) with fp16. Each task was tested 50 times to have statistically significant averages.

import time
from transformers import pipeline

def benchmark_models(models: dict, tasks: list[dict]) -> dict:
    """Benchmark comparativo su task specifici."""
    results = {}

    for model_name, model_pipeline in models.items():
        model_results = {"tasks": {}}

        for task in tasks:
            times = []
            scores = []

            for _ in range(task.get("repetitions", 10)):
                start = time.time()
                output = model_pipeline(
                    task["prompt"],
                    max_new_tokens=task.get("max_tokens", 256),
                    temperature=0.1
                )
                elapsed = time.time() - start

                generated = output[0]["generated_text"]
                score = task["eval_fn"](generated)
                times.append(elapsed * 1000)
                scores.append(score)

            model_results["tasks"][task["name"]] = {
                "avg_score": sum(scores) / len(scores),
                "avg_latency_ms": sum(times) / len(times),
                "p95_latency_ms": sorted(times)[int(0.95 * len(times))]
            }

        results[model_name] = model_results
    return results

# Risultati osservati (hardware: RTX 4070 12GB, fp16):
benchmark_results = {
    "Phi-4-mini": {
        "math_reasoning": {"score": 0.72, "latency_ms": 1840},
        "python_coding": {"score": 0.63, "latency_ms": 1650},
        "italian_chat": {"score": 0.81, "latency_ms": 1200},
        "instruction_following": {"score": 0.88, "latency_ms": 900},
        "json_output": {"score": 0.92, "latency_ms": 850},
    },
    "Gemma-3n-E4B": {
        "math_reasoning": {"score": 0.67, "latency_ms": 1620},
        "python_coding": {"score": 0.61, "latency_ms": 1580},
        "italian_chat": {"score": 0.84, "latency_ms": 1100},
        "instruction_following": {"score": 0.85, "latency_ms": 870},
        "json_output": {"score": 0.87, "latency_ms": 790},
    }
}

# Analisi: Phi-4-mini vince su math e JSON, Gemma 3n su chat e velocita

Summary Table of the Comparison

Characteristic	Phi-4-mini (3.8B)	Gemma 3n E4B	Winner
Mathematical reasoning	72% (MATH)	67% (MATH)	Phi-4-mini
Python code generation	62.3% (HumanEval)	58.7% (HumanEval)	Phi-4-mini
Conversation in Italian	Excellent	Excellent	Gemma 3n
Speed (tokens/sec RTX 4070)	55 tok/s	63 tok/s	Gemma 3n
Mobile Efficiency (NPU)	Good	Excellent (MatFormer)	Gemma 3n
License	MIT (commercial OK)	Gem ToS (restrictions)	Phi-4-mini
VRAM fp16	7.6 GB	8GB (or 4GB int4)	Similar
Context window	128K tokens	128K tokens	Even

When to Choose Which Model

def choose_slm(use_case: str, constraints: dict) -> str:
    """
    Framework di decisione per scegliere tra Phi-4-mini e Gemma 3n.
    """
    # Vincoli hardware
    if constraints.get("target_platform") == "android_npu":
        return "gemma-3n-e4b"  # progettato per NPU Qualcomm/MediaTek

    if constraints.get("target_platform") == "ios_neural_engine":
        return "gemma-3n-e4b"  # ottimizzato per Apple Neural Engine

    # Licenza
    if constraints.get("commercial_use") and constraints.get("no_usage_restrictions"):
        # Gemma ToS ha restrizioni; MIT di Phi e piu permissiva
        return "phi-4-mini"

    # Task-based selection
    if use_case in ["coding", "math_reasoning", "json_extraction"]:
        return "phi-4-mini"

    if use_case in ["conversational_ai", "multilingual_chat"]:
        return "gemma-3n-e4b"

    if use_case == "fine_tuning_budget":
        # Phi-4-mini: piu semplice da fine-tune con PEFT standard
        return "phi-4-mini"

    # Default per uso generico
    return "phi-4-mini"

# Esempi di decisione:
print(choose_slm("coding", {"commercial_use": True}))         # phi-4-mini
print(choose_slm("chat", {"target_platform": "android_npu"})) # gemma-3n-e4b
print(choose_slm("math_reasoning", {}))                        # phi-4-mini

Conclusions

There is no clear winner: Phi-4-mini is superior for tasks requiring reasoning structured, code and JSON output, with a more permissive MIT license for commercial use. Gemma 3n E4B excels in Italian conversation, has superior inference speed, and and the optimal model for deployment on Android/iOS mobile NPUs.

The next article deals with fine-tuning: how to adapt Phi-4-mini or Qwen 3 to yours domain with QLoRA on 8-12 GB VRAM consumer GPUs, with the complete end-to-end workflow from ingesting the dataset to uploading it to Hugging Face Hub.

Series: Small Language Models

Article 1: SLM in 2026 - Overview and Benchmark
Article 2 (this): Phi-4-mini vs Gemma 3n - Detailed Comparison
Article 3: Fine-tuning with LoRA and QLoRA
Article 4: Quantization for Edge - GGUF, ONNX, INT4
Article 5: Ollama - SLM Locally in 5 Minutes