Hi! I'm

Federico Calò

Software Developer | Technical Writer

I create modern web applications and custom digital tools to help businesses grow through technological innovation. My passion is combining computer science and economics to generate real value.

Contact Me

About Me

My passion for computer science was born at the Technical Commercial Institute of Maglie, where I discovered the power of programming and the fascination of creating digital solutions. From the start, I understood that computer science was not just code, but an extraordinary tool for turning ideas into reality.

During my studies in Business Information Systems, I began to interweave computer science and economics, understanding how technology can be the engine of growth for any business. This vision accompanied me to the University of Bari, where I obtained my degree in Computer Science, deepening my technical skills and passion for software development.

Today I put this experience at the service of businesses, professionals and startups, creating tailor-made digital solutions that automate processes, optimize resources and open new business opportunities. Because true innovation begins when technology meets the real needs of people.

My Skills

Data Analysis & Predictive Models

I transform data into strategic insights with in-depth analysis and predictive models for informed decisions

Process Automation

I create custom tools that automate repetitive operations and free up time for value-added activities

Custom Systems

I develop tailor-made software systems, from platform integrations to customized dashboards

const federico = {
  nome: "Federico Calò",
  ruolo: "Sviluppatore Software",
  città: "Bari, Italia",
  missione: "Aiutare attraverso l'informatica",
  passioni: [
    "Codice Pulito",
    "Innovazione",
    "Crescita Continua"
  ]
};

La Mia Missione

Credo fermamente che l'informatica sia lo strumento più potente per trasformare le idee in realtà e migliorare la vita delle persone.

🚀

Democratizzare la Tecnologia

La mia missione è rendere l'informatica accessibile a tutti: dalle piccole imprese locali alle startup innovative, fino ai professionisti che vogliono digitalizzare la propria attività. Ogni realtà merita di sfruttare le potenzialità del digitale.

💡

Unire Informatica ed Economia

Non è solo questione di scrivere codice: è capire come la tecnologia possa generare valore reale. Intrecciando competenze informatiche e visione economica, aiuto le attività a crescere, ottimizzare processi e raggiungere nuovi traguardi di efficienza e redditività.

🎯

Creare Soluzioni su Misura

Ogni attività è unica, e così devono esserlo le soluzioni. Sviluppo strumenti personalizzati che rispondono alle esigenze specifiche di ciascun cliente, automatizzando processi ripetitivi e liberando tempo per ciò che conta davvero: far crescere il business.

Trasforma la Tua Attività con la Tecnologia

December 2024

View

Master SQL

RoadMap.sh

Novembre 2024

View

Oracle Certified Foundations Associate

Oracle

October 2024

View

People Leadership Credential

Connect

Settembre 2024

💻 Languages & Technologies

☕Java

🐍Python

📜JavaScript

🅰️Angular

⚛️React

🔷TypeScript

🗄️SQL

🐘PHP

🎨CSS/SCSS

🔧Node.js

🐳Docker

🌿Git

💼

12/2024 - Presente

Custom Software Engineering Analyst

Accenture

Bari, Puglia, Italia · Ibrida Analisi e sviluppo di sistemi informatici attraverso l'utilizzo di Java e Quarkus in Health and Public Sector. Formazione continua su tecnologie moderne per la creazione di soluzioni software personalizzate ed efficienti e sugli agenti.

💼

06/2022 - 12/2024

Analista software e Back End Developer Associate Consultant

Links Management and Technology SpA

Esperienza nell'analisi di sistemi software as-is e flussi ETL utilizzando PowerCenter. Formazione completata su Spring Boot per lo sviluppo di applicazioni backend moderne e scalabili. Sviluppatore Backend specializzato in Spring Boot, con esperienza in progettazione di database, analisi, sviluppo e testing dei task assegnati.

💼

02/2021 - 10/2021

Programmatore software

Adesso.it (prima era WebScience srl)

Esperienza nell'analisi AS-IS e TO-BE, evoluzioni SEO ed evoluzioni website per migliorare le performance e l'engagement degli utenti.

🎓

2018 - 2025

Laurea in Informatica

Università degli Studi di Bari Aldo Moro

Bachelor's degree in Computer Science, focusing on software engineering, algorithms, and modern development practices.

📚

2013 - 2018

Diploma - Sistemi Informativi Aziendali

Istituto Tecnico Commerciale di Maglie

Technical diploma specializing in Business Information Systems, combining IT knowledge with business management.

Contattami

Hai un progetto in mente? Parliamone! Compila il form qui sotto e ti risponderò al più presto.

* Campi obbligatori. I tuoi dati saranno utilizzati solo per rispondere alla tua richiesta.

RAG Architecture: Naive, Advanced and Modular RAG Patterns

The term "RAG" actually covers a very broad spectrum of architectures, from the simple pattern to three steps from 2023 to modular systems from 2026 that integrate query routing, reranking, self-RAG and consistency checks. Understanding this evolution is fundamental: the Naive RAG It is quick to implement but produces low-quality retrievals on complex documents; theAdvanced RAG solves specific retrieval problems; The Modular RAG offers maximum flexibility for systems in production.

This guide covers the three architectures with real Python code, comparative quality metrics and criteria for choosing the right level of complexity for your use case.

What You Will Learn

Naive RAG: basic architecture, limits and when it is sufficient
Advanced RAG: pre-retrieval (query rewriting, HyDE), post-retrieval (reranking)
Modular RAG: Routing, self-RAG, CRAG and composable pipelines
RAGAS metrics to compare architectures objectively
Complete Python code for each architecture
Decision guide: when to advance to the next level

Naive RAG: The Basic Pattern

The Naive RAG follows the index-retrieve-generate flow without optimizations:

Index documents with fixed chunks (typically 512-1024 tokens)
Converts the query to embedding and searches for the k most similar chunks
Concatenate the chunks into the prompt and generate the response

# Naive RAG con LangChain — implementazione completa
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_community.vectorstores import Qdrant
from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import DirectoryLoader, UnstructuredMarkdownLoader

# --- FASE 1: Indicizzazione ---
loader = DirectoryLoader(
    "./docs",
    glob="**/*.md",
    loader_cls=UnstructuredMarkdownLoader
)
documents = loader.load()

# Chunking fisso — il limite principale del Naive RAG
splitter = RecursiveCharacterTextSplitter(
    chunk_size=512,
    chunk_overlap=64,
    separators=["\n\n", "\n", ".", " "]
)
chunks = splitter.split_documents(documents)

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vectorstore = Qdrant.from_documents(
    chunks, embeddings,
    url="http://localhost:6333",
    collection_name="naive_rag"
)

# --- FASE 2 + 3: Retrieval + Generation ---
NAIVE_RAG_PROMPT = PromptTemplate(
    input_variables=["context", "question"],
    template="""Rispondi alla domanda basandoti SOLO sul contesto fornito.
Se il contesto non contiene la risposta, dì "Non ho informazioni su questo argomento".

Contesto:
{context}

Domanda: {question}

Risposta:"""
)

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
retriever = vectorstore.as_retriever(search_kwargs={"k": 5})

rag_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=retriever,
    chain_type_kwargs={"prompt": NAIVE_RAG_PROMPT},
    return_source_documents=True
)

result = rag_chain.invoke({"query": "Come gestire gli errori di timeout?"})
print(result["result"])

Limits of the Naive RAG: Poor performance on ambiguous queries, chunk retrieval partially relevant, no case management where the recovered documents contradict each other, variable quality with structured documents (tables, code, lists).

Advanced RAG: Pre and Post Retrieval Optimizations

Advanced RAG adds optimizations in the pre- and post-retrieval phases. The most techniques impacting:

Pre-retrieval: Query Rewriting and HyDE

User queries are often ambiguous or poorly worded. Query rewriting uses the LLM to reformulate the query in forms more suitable for semantic search.

# Advanced RAG: Query Rewriting + HyDE (Hypothetical Document Embeddings)
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

# 1. Multi-query: genera query alternative per copertura piu ampia
MULTI_QUERY_PROMPT = ChatPromptTemplate.from_messages([
    ("system", """Sei un esperto di information retrieval.
Genera 3 varianti della query fornita per recuperare documenti rilevanti
da diverse angolazioni. Restituisci solo le query, una per riga."""),
    ("human", "Query originale: {query}")
])

multi_query_chain = MULTI_QUERY_PROMPT | llm | StrOutputParser()

def generate_multiple_queries(query: str) -> list[str]:
    result = multi_query_chain.invoke({"query": query})
    queries = [q.strip() for q in result.strip().split('\n') if q.strip()]
    return [query] + queries[:3]  # query originale + 3 varianti

# 2. HyDE: genera un documento ipotetico che conterrebbe la risposta
HYDE_PROMPT = ChatPromptTemplate.from_messages([
    ("system", """Scrivi un breve paragrafo tecnico che risponderebbe
alla seguente domanda, come se fosse tratto da una documentazione ufficiale.
Usa terminologia tecnica precisa."""),
    ("human", "{query}")
])

hyde_chain = HYDE_PROMPT | llm | StrOutputParser()

def hyde_search(query: str, vectorstore, k: int = 5):
    # Genera documento ipotetico
    hypothetical_doc = hyde_chain.invoke({"query": query})

    # Cerca usando il documento ipotetico come query (invece della query diretta)
    results = vectorstore.similarity_search(hypothetical_doc, k=k)
    return results

# 3. Multi-query retrieval con deduplicazione
from langchain.retrievers import MergerRetriever
from langchain_community.document_transformers import EmbeddingsRedundantFilter

def advanced_retrieve(query: str, vectorstore, k: int = 5) -> list:
    queries = generate_multiple_queries(query)

    # Raccogli risultati da tutte le query
    all_docs = []
    for q in queries:
        docs = vectorstore.similarity_search(q, k=k)
        all_docs.extend(docs)

    # Deduplica per contenuto simile
    seen_content = set()
    unique_docs = []
    for doc in all_docs:
        content_hash = hash(doc.page_content[:200])
        if content_hash not in seen_content:
            seen_content.add(content_hash)
            unique_docs.append(doc)

    return unique_docs[:k * 2]  # ritorna il doppio dei risultati per il reranker

Post-retrieval: Reranking with Cross-Encoder

Vector embeddings use a "bi-encoder" representation (separate query and document): and fast but less precise. Cross-encoder reranking (query + document together) improves precision by 15-25% at the cost of additional latency (typically 50-150ms).

# Post-retrieval: Reranking con Cohere Rerank o cross-encoder locale
import cohere
from sentence_transformers import CrossEncoder

# Opzione 1: Cohere Rerank API (managed, accurato)
co = cohere.Client("your-api-key")

def rerank_with_cohere(query: str, documents: list[str], top_n: int = 5) -> list[dict]:
    response = co.rerank(
        query=query,
        documents=documents,
        top_n=top_n,
        model="rerank-v3.5"
    )
    return [
        {"content": documents[r.index], "relevance_score": r.relevance_score}
        for r in response.results
    ]

# Opzione 2: Cross-encoder locale (gratuito, ~100MB)
cross_encoder = CrossEncoder("cross-encoder/ms-marco-MiniLM-L-6-v2")

def rerank_local(query: str, documents: list[str], top_n: int = 5) -> list[dict]:
    # Crea coppie (query, documento) per il cross-encoder
    pairs = [[query, doc] for doc in documents]
    scores = cross_encoder.predict(pairs)

    # Ordina per score decrescente
    ranked = sorted(zip(documents, scores), key=lambda x: x[1], reverse=True)
    return [{"content": doc, "relevance_score": float(score)} for doc, score in ranked[:top_n]]

# Advanced RAG completo: multi-query + HyDE + reranking
def advanced_rag(query: str, vectorstore) -> dict:
    # 1. Retrieval ampliato
    candidates = advanced_retrieve(query, vectorstore, k=8)
    candidate_texts = [doc.page_content for doc in candidates]

    # 2. Reranking
    reranked = rerank_local(query, candidate_texts, top_n=5)

    # 3. Generation con contesto di qualita
    context = "\n\n---\n\n".join([r["content"] for r in reranked])

    response = llm.invoke(f"""Contesto:\n{context}\n\nDomanda: {query}\nRisposta:""")
    return {"answer": response.content, "sources": reranked}

Modular RAG: Modular Architecture

The 2026 Modular RAG treats each stage of the pipeline as an interchangeable module. The patterns most important:

CRAG: Corrective RAG

CRAG adds a relevance classifier: if the retrieved documents have a low score, the system performs a backup web search instead of generating with irrelevant context.

# Modular RAG: CRAG (Corrective RAG) con LangGraph
from langgraph.graph import StateGraph, END
from typing import TypedDict, Annotated
from langchain_community.tools.tavily_search import TavilySearchResults

class RAGState(TypedDict):
    query: str
    documents: list
    relevance_scores: list[float]
    web_results: list
    answer: str
    retrieval_quality: str  # "high" | "low" | "ambiguous"

def retrieve(state: RAGState) -> RAGState:
    """Retrieval dal vector store"""
    docs = vectorstore.similarity_search_with_score(state["query"], k=5)
    documents = [doc for doc, _ in docs]
    scores = [float(score) for _, score in docs]
    return {**state, "documents": documents, "relevance_scores": scores}

def assess_relevance(state: RAGState) -> RAGState:
    """Valuta se i documenti sono sufficientemente rilevanti"""
    avg_score = sum(state["relevance_scores"]) / len(state["relevance_scores"])

    if avg_score > 0.85:
        quality = "high"
    elif avg_score > 0.70:
        quality = "ambiguous"
    else:
        quality = "low"

    return {**state, "retrieval_quality": quality}

def web_search_fallback(state: RAGState) -> RAGState:
    """Fallback: web search quando il retrieval e scarso"""
    search_tool = TavilySearchResults(max_results=3)
    results = search_tool.invoke(state["query"])
    return {**state, "web_results": results}

def generate_answer(state: RAGState) -> RAGState:
    """Genera risposta usando documenti disponibili"""
    if state["retrieval_quality"] == "low" and state["web_results"]:
        context = "\n".join([r["content"] for r in state["web_results"]])
        source_type = "web search"
    else:
        context = "\n".join([doc.page_content for doc in state["documents"]])
        source_type = "knowledge base"

    response = llm.invoke(
        f"Contesto ({source_type}):\n{context}\n\nDomanda: {state['query']}\nRisposta:"
    )
    return {**state, "answer": response.content}

# Routing basato sulla qualita del retrieval
def should_web_search(state: RAGState) -> str:
    return "web_search" if state["retrieval_quality"] == "low" else "generate"

# Costruzione del grafo
graph = StateGraph(RAGState)
graph.add_node("retrieve", retrieve)
graph.add_node("assess_relevance", assess_relevance)
graph.add_node("web_search", web_search_fallback)
graph.add_node("generate", generate_answer)

graph.set_entry_point("retrieve")
graph.add_edge("retrieve", "assess_relevance")
graph.add_conditional_edges(
    "assess_relevance",
    should_web_search,
    {"web_search": "web_search", "generate": "generate"}
)
graph.add_edge("web_search", "generate")
graph.add_edge("generate", END)

crag = graph.compile()

# Esecuzione
result = crag.invoke({"query": "Qual e la versione piu recente di Qiskit?"})
print(result["answer"])

Quality Comparison: Naive vs Advanced vs Modular

Benchmark su dataset di test enterprise (500 domande, base di conoscenza 50K docs)

Metrica             | Naive RAG | Advanced RAG | Modular RAG (CRAG)
--------------------|-----------|--------------|--------------------
Faithfulness        | 0.71      | 0.88         | 0.92
Answer Relevancy    | 0.74      | 0.86         | 0.89
Context Recall      | 0.65      | 0.81         | 0.84
Context Precision   | 0.72      | 0.87         | 0.88
--------------------|-----------|--------------|--------------------
Latenza p50         | 850ms     | 1.4s         | 1.8s (con web fallback: 3.2s)
Costo per query     | $0.003    | $0.007       | $0.009 (avg)
--------------------|-----------|--------------|--------------------
"Hallucination rate"| 18%       | 6%           | 4%
Domande senza risp. | 12%       | 8%           | 3% (web fallback)

When to Advance to the Next Level

Naive -> Advanced: if faithfulness < 0.80 or users report responses irrelevant frequent; additional cost ~2x
Advanced -> Modular: If your knowledge base only covers a subset of the topics requested, or if the queries range across heterogeneous topics; additional cost ~1.3x
Keep Naive: if your knowledge base is well structured, the queries are homogeneous and faithfulness > 0.85 already with the basic pattern

Conclusions

The right RAG architecture depends on the complexity of your use case. Always start with Naive RAG, measure with RAGAS and advance only when the data warrants it. Add complexity without measurement leads to over-engineered systems that cost more without improvements measurable.

The next article delves into chunking strategies — the retrieval pipeline component which has the greatest impact on the quality of the Naive RAG and which is often overlooked.