Ciao! Sono

Federico Calò

Sviluppatore Software | Divulgatore Tecnico

Creo applicazioni web moderne e strumenti digitali personalizzati per aiutare le attività a crescere attraverso l'innovazione tecnologica. La mia passione è unire informatica ed economia per generare valore reale.

Contattami

Chi Sono

La mia passione per l'informatica è nata tra i banchi dell'Istituto Tecnico Commerciale di Maglie, dove ho scoperto il potere della programmazione e il fascino di creare soluzioni digitali. Fin da subito, ho capito che l'informatica non era solo codice, ma uno strumento straordinario per trasformare idee in realtà.

Durante gli studi superiori in Sistemi Informativi Aziendali, ho iniziato a intrecciare informatica ed economia, comprendendo come la tecnologia possa essere il motore della crescita per qualsiasi attività. Questa visione mi ha accompagnato all'Università degli Studi di Bari, dove ho conseguito la Laurea in Informatica, approfondendo le mie competenze tecniche e la mia passione per lo sviluppo software.

Oggi metto questa esperienza al servizio di imprese, professionisti e startup, creando soluzioni digitali su misura che automatizzano processi, ottimizzano risorse e aprono nuove opportunità di business. Perché la vera innovazione inizia quando la tecnologia incontra le esigenze reali delle persone.

Le Mie Competenze

Analisi Dati & Modelli Previsionali

Trasformo i dati in insights strategici con analisi approfondite e modelli predittivi per decisioni informate

Automazione Processi

Creo strumenti personalizzati che automatizzano operazioni ripetitive e liberano tempo per attività a valore aggiunto

Sistemi Custom

Sviluppo sistemi software su misura, dalle integrazioni tra piattaforme alle dashboard personalizzate

const federico = {
  nome: "Federico Calò",
  ruolo: "Sviluppatore Software",
  città: "Bari, Italia",
  missione: "Aiutare attraverso l'informatica",
  passioni: [
    "Codice Pulito",
    "Innovazione",
    "Crescita Continua"
  ]
};

La Mia Missione

Credo fermamente che l'informatica sia lo strumento più potente per trasformare le idee in realtà e migliorare la vita delle persone.

🚀

Democratizzare la Tecnologia

La mia missione è rendere l'informatica accessibile a tutti: dalle piccole imprese locali alle startup innovative, fino ai professionisti che vogliono digitalizzare la propria attività. Ogni realtà merita di sfruttare le potenzialità del digitale.

💡

Unire Informatica ed Economia

Non è solo questione di scrivere codice: è capire come la tecnologia possa generare valore reale. Intrecciando competenze informatiche e visione economica, aiuto le attività a crescere, ottimizzare processi e raggiungere nuovi traguardi di efficienza e redditività.

🎯

Creare Soluzioni su Misura

Ogni attività è unica, e così devono esserlo le soluzioni. Sviluppo strumenti personalizzati che rispondono alle esigenze specifiche di ciascun cliente, automatizzando processi ripetitivi e liberando tempo per ciò che conta davvero: far crescere il business.

Trasforma la Tua Attività con la Tecnologia

Dicembre 2024

Visualizza

Master SQL

RoadMap.sh

Novembre 2024

Visualizza

Oracle Certified Foundations Associate

Oracle

Ottobre 2024

Visualizza

People Leadership Credential

Connect

Settembre 2024

💻 Linguaggi & Tecnologie

☕Java

🐍Python

📜JavaScript

🅰️Angular

⚛️React

🔷TypeScript

🗄️SQL

🐘PHP

🎨CSS/SCSS

🔧Node.js

🐳Docker

🌿Git

💼

12/2024 - Presente

Custom Software Engineering Analyst

Accenture

Bari, Puglia, Italia · Ibrida Analisi e sviluppo di sistemi informatici attraverso l'utilizzo di Java e Quarkus in Health and Public Sector. Formazione continua su tecnologie moderne per la creazione di soluzioni software personalizzate ed efficienti e sugli agenti.

💼

06/2022 - 12/2024

Analista software e Back End Developer Associate Consultant

Links Management and Technology SpA

Esperienza nell'analisi di sistemi software as-is e flussi ETL utilizzando PowerCenter. Formazione completata su Spring Boot per lo sviluppo di applicazioni backend moderne e scalabili. Sviluppatore Backend specializzato in Spring Boot, con esperienza in progettazione di database, analisi, sviluppo e testing dei task assegnati.

💼

02/2021 - 10/2021

Programmatore software

Adesso.it (prima era WebScience srl)

Esperienza nell'analisi AS-IS e TO-BE, evoluzioni SEO ed evoluzioni website per migliorare le performance e l'engagement degli utenti.

🎓

2018 - 2025

Laurea in Informatica

Università degli Studi di Bari Aldo Moro

Bachelor's degree in Computer Science, focusing on software engineering, algorithms, and modern development practices.

📚

2013 - 2018

Diploma - Sistemi Informativi Aziendali

Istituto Tecnico Commerciale di Maglie

Technical diploma specializing in Business Information Systems, combining IT knowledge with business management.

Contattami

Hai un progetto in mente? Parliamone! Compila il form qui sotto e ti risponderò al più presto.

* Campi obbligatori. I tuoi dati saranno utilizzati solo per rispondere alla tua richiesta.

Knowledge Graphs and AI: Integrating Structured Knowledge into LLMs

Large Language Models are remarkable at generating fluent text, but they suffer from a fundamental limitation: the knowledge they contain is implicit, distributed across model parameters, difficult to update, and impossible to query with structured queries. "Give me all the people who work at AI companies founded after 2020" is trivial in a knowledge graph, but impossible to guarantee with an LLM.

Knowledge Graphs (KG) represent knowledge as graphs of entities and relationships: explicit structure that is queryable, updatable, and verifiable. Integrating KGs with LLMs — the GraphRAG paradigm — produces systems capable of structured reasoning that traditional RAG cannot offer. In this article we build GraphRAG systems with Neo4j, explore automatic graph extraction from text using LLMs, and see how to query knowledge graphs to enrich RAG systems.

What You Will Learn

Knowledge graph fundamentals: nodes, relationships, properties, RDF and Property Graph
Neo4j: the model, Cypher query language, and LangChain integration
Automatic KG extraction from unstructured text using LLMs
GraphRAG: combining graph traversal and vector retrieval
KG for RAG: enriching chunks with entity relationships
Multi-hop reasoning on knowledge graphs
Wikidata and public Knowledge Graphs for enrichment
Best practices for building maintainable KGs in production

1. Knowledge Graph Fundamentals

A knowledge graph is a representation of knowledge as a graph where nodes represent entities (people, organizations, concepts, events) and edges represent relationships between them. Each triple (subject, predicate, object) encodes a fact.

Knowledge Graph Structure


KNOWLEDGE GRAPH: example business domain

ENTITIES (Nodes):
  Person: "John Smith" (name, role="CEO", birthYear=1975)
  Company: "TechCorp" (name, founded=2010, sector="AI")
  Product: "AIAnalytics" (name, version="2.0", category="software")
  Technology: "Python" (name, type="language")

RELATIONSHIPS (Edges):
  (John Smith) --[WORKS_AT]--> (TechCorp)
  (John Smith) --[FOUNDED]--> (TechCorp)
  (TechCorp) --[DEVELOPS]--> (AIAnalytics)
  (AIAnalytics) --[USES_TECHNOLOGY]--> (Python)
  (TechCorp) --[COMPETES_WITH]--> (AICorp)

RDF TRIPLES:
  ("TechCorp", "rdf:type", "Company")
  ("TechCorp", "schema:foundingDate", "2010")
  ("John Smith", "schema:worksFor", "TechCorp")

PROPERTY GRAPH (Neo4j):
  (:Person {name: "John Smith", role: "CEO"})
    -[:WORKS_AT {since: 2010, equity: true}]->
  (:Company {name: "TechCorp", sector: "AI"})

ADVANTAGES of Knowledge Graphs over relational tables:
1. Flexibility: add relationships without modifying the schema
2. Navigability: natural multi-hop traversal
3. Reasoning: inference of new relationships
4. Semantics: relationships have explicit meaning

1.1 RDF vs Property Graph

There are two main knowledge graph models, each with different tradeoffs:

      RDF vs Property Graph Comparison
      
            Dimension
            RDF/SPARQL
            Property Graph (Neo4j)
          
            Model
            Standardized triples (S, P, O)
            Nodes and edges with arbitrary properties
          
            Standard
            W3C standard, interoperable
            Proprietary but more flexible
          
            Query language
            SPARQL (complex)
            Cypher (more readable)
          
            Properties on relationships
            Complicated (reification)
            Native and simple
          
            AI ecosystem
            Wikidata, DBpedia, Schema.org
            Neo4j (LangChain integration)
          
            When to use
            Open data, interoperability
            AI applications, GraphRAG

1.2 Why Knowledge Graphs Matter for AI

The combination of knowledge graphs with LLMs addresses three critical limitations of pure vector-based RAG systems:

Hallucination prevention: factual relationships stored in a KG are verifiable and can be used to ground LLM outputs, reducing confabulation.
Multi-hop reasoning: "Which products are made by companies that compete with OpenAI?" requires traversing multiple relationship hops — trivial in a graph query, unreliable with pure semantic search.
Updateability: you can add or modify facts in a KG without retraining the model. Critical knowledge (pricing, org changes, product versions) stays current.

2. Neo4j: Setup and the Cypher Query Language

Neo4j is the most widely adopted graph database for AI applications, with excellent integration with LangChain. The Cypher language uses an intuitive ASCII-art syntax to express graph patterns.

Neo4j Setup and Basic Cypher Queries


from neo4j import GraphDatabase
from typing import List, Dict, Any, Optional
import os


class Neo4jKnowledgeGraph:
    """Python interface for a Neo4j knowledge graph"""

    def __init__(
        self,
        uri: str = "bolt://localhost:7687",
        user: str = "neo4j",
        password: str = "password"
    ):
        self.driver = GraphDatabase.driver(uri, auth=(user, password))

    def close(self):
        self.driver.close()

    def execute_query(self, query: str, parameters: dict = None) -> List[Dict]:
        """Execute a Cypher query and return results"""
        with self.driver.session() as session:
            result = session.run(query, parameters or {})
            return [record.data() for record in result]

    def create_entity(self, label: str, properties: Dict) -> str:
        """Create a node with a label and properties"""
        props_str = ", ".join(f"{k}: #123;k}" for k in properties.keys())
        query = f"CREATE (n:{label} {{{props_str}}}) RETURN id(n) as id"
        result = self.execute_query(query, properties)
        return result[0]["id"] if result else None

    def create_relationship(
        self,
        from_label: str, from_props: Dict,
        rel_type: str, rel_props: Dict,
        to_label: str, to_props: Dict
    ):
        """Create a relationship between two nodes"""
        from_match = " AND ".join(f"a.{k} = $from_{k}" for k in from_props)
        to_match = " AND ".join(f"b.{k} = $to_{k}" for k in to_props)
        rel_props_str = ", ".join(f"{k}: $rel_{k}" for k in rel_props) if rel_props else ""

        params = {
            **{f"from_{k}": v for k, v in from_props.items()},
            **{f"to_{k}": v for k, v in to_props.items()},
            **{f"rel_{k}": v for k, v in rel_props.items()}
        }

        query = f"""
MATCH (a:{from_label}) WHERE {from_match}
MATCH (b:{to_label}) WHERE {to_match}
MERGE (a)-[r:{rel_type} {{{rel_props_str}}}]->(b)
RETURN type(r) as rel_type"""

        return self.execute_query(query, params)

    def upsert_entity(self, label: str, match_props: Dict, set_props: Dict = None):
        """Upsert: create if not exists, update if it does"""
        match_str = ", ".join(f"{k}: #123;k}" for k in match_props)
        query = f"MERGE (n:{label} {{{match_str}}})"

        params = dict(match_props)
        if set_props:
            set_str = ", ".join(f"n.{k} = $set_{k}" for k in set_props)
            query += f" ON CREATE SET {set_str} ON MATCH SET {set_str}"
            params.update({f"set_{k}": v for k, v in set_props.items()})

        query += " RETURN n"
        return self.execute_query(query, params)


# Examples of advanced Cypher queries
CYPHER_EXAMPLES = {
    # Find all AI companies founded after 2020
    "recent_companies": """
MATCH (c:Company {sector: 'AI'})
WHERE c.founded > 2020
RETURN c.name, c.founded
ORDER BY c.founded DESC""",

    # Find shortest path between two people (degrees of separation)
    "social_path": """
MATCH path = shortestPath(
  (p1:Person {name: $person1})-[*..6]-(p2:Person {name: $person2})
)
RETURN path, length(path) as degrees""",

    # Find related entities cluster (community detection)
    "related_entities": """
MATCH (n:Company)-[r]-(related)
WHERE n.name = $company_name
RETURN related, type(r), n
LIMIT 50""",

    # Multi-hop: products made by companies competing with X
    "competitor_products": """
MATCH (c1:Company)-[:COMPETES_WITH]->(c2:Company)
WHERE c1.name = $company_name
MATCH (c2)-[:DEVELOPS]->(p:Product)
RETURN DISTINCT p.name, p.category, c2.name as developed_by"""
}

2.1 Schema Design in Neo4j

Before populating a knowledge graph, defining a clear ontology prevents costly restructuring later. Key design principles:

Neo4j Constraints and Indexes for Production


# Create uniqueness constraints and indexes
# Run these once during database initialization

SCHEMA_SETUP_QUERIES = [
    # Uniqueness constraints prevent duplicate entities
    "CREATE CONSTRAINT person_name IF NOT EXISTS FOR (p:Person) REQUIRE p.name IS UNIQUE",
    "CREATE CONSTRAINT company_name IF NOT EXISTS FOR (c:Company) REQUIRE c.name IS UNIQUE",
    "CREATE CONSTRAINT product_name IF NOT EXISTS FOR (p:Product) REQUIRE p.name IS UNIQUE",

    # Full-text indexes for fuzzy name matching
    """CREATE FULLTEXT INDEX entity_names IF NOT EXISTS
       FOR (n:Person|Company|Product|Technology)
       ON EACH [n.name, n.description]""",

    # Range index for temporal queries
    "CREATE INDEX company_founded IF NOT EXISTS FOR (c:Company) ON (c.founded)",

    # Composite index for typed entity lookup
    "CREATE INDEX entity_type_name IF NOT EXISTS FOR (n:Person) ON (n.name, n.role)"
]


def initialize_schema(kg: Neo4jKnowledgeGraph):
    """Initialize schema constraints and indexes"""
    for query in SCHEMA_SETUP_QUERIES:
        try:
            kg.execute_query(query)
            print(f"Schema query OK: {query[:60]}...")
        except Exception as e:
            print(f"Schema query failed (may already exist): {e}")

3. Automatic Knowledge Graph Extraction from Text

Building a knowledge graph manually is expensive and time-consuming. Modern LLMs allow automatic extraction of entities and relationships from unstructured text, populating the graph semi-automatically. This is one of the most powerful applications of LLMs beyond text generation.

KG Extraction with LLM and Pydantic Schemas


from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from pydantic import BaseModel, Field
from typing import List, Optional


# Structured extraction schema
class Entity(BaseModel):
    """An entity extracted from text"""
    name: str = Field(description="The entity name")
    entity_type: str = Field(
        description="Type: Person, Company, Product, Technology, Location, Event, Concept"
    )
    description: Optional[str] = Field(
        description="Brief description of the entity", default=None
    )
    properties: dict = Field(
        description="Additional properties (e.g. founded_year, role)",
        default_factory=dict
    )


class Relationship(BaseModel):
    """A relationship between two entities"""
    source: str = Field(description="Name of the source entity")
    target: str = Field(description="Name of the target entity")
    relationship_type: str = Field(
        description="Relationship type (e.g. WORKS_AT, FOUNDED, COMPETES_WITH)"
    )
    properties: dict = Field(
        description="Relationship properties (e.g. since_year)",
        default_factory=dict
    )


class KnowledgeGraphExtraction(BaseModel):
    """Result of knowledge graph extraction from text"""
    entities: List[Entity] = Field(description="Entities extracted from text")
    relationships: List[Relationship] = Field(description="Relationships between entities")


class LLMKnowledgeGraphExtractor:
    """Extracts knowledge graphs from text using LLMs"""

    def __init__(self, model: str = "gpt-4o-mini"):
        llm = ChatOpenAI(model=model, temperature=0)
        self.structured_llm = llm.with_structured_output(KnowledgeGraphExtraction)

        self.extraction_prompt = ChatPromptTemplate.from_template("""
Extract entities and relationships from the following text to build a knowledge graph.

Entity types to extract: Person, Company, Product, Technology, Location, Event, Concept
Common relationship types: WORKS_AT, FOUNDED, DEVELOPS, USES, COMPETES_WITH, PART_OF,
  LOCATED_IN, ACQUIRED_BY, INVESTED_IN, AUTHORED_BY

Text to analyze:
{text}

Extract ALL entities and relationships mentioned, including implicit ones.
For properties, only extract those explicitly mentioned in the text.""")

    def extract(self, text: str) -> KnowledgeGraphExtraction:
        """Extract entities and relationships from text"""
        return self.structured_llm.invoke(
            self.extraction_prompt.format_messages(text=text)
        )

    def extract_and_store(
        self,
        text: str,
        neo4j_kg: Neo4jKnowledgeGraph,
        source_metadata: Dict = None
    ) -> dict:
        """Extract from text and store directly in Neo4j"""
        extraction = self.extract(text)

        stored_entities = 0
        stored_relationships = 0

        # Store entities
        for entity in extraction.entities:
            props = {
                "name": entity.name,
                **(entity.properties or {}),
            }
            if entity.description:
                props["description"] = entity.description
            if source_metadata:
                props["source"] = source_metadata.get("source", "")

            neo4j_kg.upsert_entity(
                label=entity.entity_type,
                match_props={"name": entity.name},
                set_props=props
            )
            stored_entities += 1

        # Store relationships
        for rel in extraction.relationships:
            # Verify entities exist before creating relationship
            source_entity = next(
                (e for e in extraction.entities if e.name == rel.source), None
            )
            target_entity = next(
                (e for e in extraction.entities if e.name == rel.target), None
            )

            if source_entity and target_entity:
                neo4j_kg.create_relationship(
                    from_label=source_entity.entity_type,
                    from_props={"name": rel.source},
                    rel_type=rel.relationship_type,
                    rel_props=rel.properties or {},
                    to_label=target_entity.entity_type,
                    to_props={"name": rel.target}
                )
                stored_relationships += 1

        return {
            "entities_found": len(extraction.entities),
            "relationships_found": len(extraction.relationships),
            "entities_stored": stored_entities,
            "relationships_stored": stored_relationships
        }


# Example usage
extractor = LLMKnowledgeGraphExtractor()
kg = Neo4jKnowledgeGraph()

text = """
OpenAI, founded by Sam Altman and Elon Musk in 2015, developed GPT-4 and ChatGPT.
The company received a $10 billion investment from Microsoft in 2023.
Anthropic, founded by former OpenAI employees including Dario Amodei, develops Claude,
a model that directly competes with ChatGPT.
"""

result = extractor.extract_and_store(text, kg, {"source": "news_article.txt"})
print(f"Extracted: {result['entities_found']} entities, {result['relationships_found']} relationships")

3.1 Handling Extraction Quality

LLM-based extraction is powerful but imperfect. Entity co-reference (multiple mentions of the same entity with different names) and hallucinated relationships are the two most common failure modes. Here is a validation layer:

Entity Resolution and Extraction Validation


from difflib import SequenceMatcher


class EntityResolver:
    """
    Resolves entity co-references and deduplicates before KG storage.
    Example: "OpenAI", "Open AI", "the company" -> "OpenAI"
    """

    def __init__(self, similarity_threshold: float = 0.85):
        self.threshold = similarity_threshold
        self._known_entities: Dict[str, str] = {}  # canonical_name -> canonical_name

    def _similarity(self, a: str, b: str) -> float:
        return SequenceMatcher(None, a.lower(), b.lower()).ratio()

    def resolve(self, entity_name: str) -> str:
        """Return canonical name for an entity"""
        # Check exact match first
        if entity_name in self._known_entities:
            return self._known_entities[entity_name]

        # Fuzzy match against known entities
        for known in self._known_entities:
            if self._similarity(entity_name, known) >= self.threshold:
                # Map this variant to the canonical form
                self._known_entities[entity_name] = self._known_entities[known]
                return self._known_entities[known]

        # New entity: register with itself as canonical
        self._known_entities[entity_name] = entity_name
        return entity_name

    def resolve_extraction(
        self,
        extraction: KnowledgeGraphExtraction
    ) -> KnowledgeGraphExtraction:
        """Resolve all entity names in an extraction result"""
        resolved_entities = []
        name_map: Dict[str, str] = {}

        for entity in extraction.entities:
            canonical = self.resolve(entity.name)
            name_map[entity.name] = canonical
            if canonical == entity.name or canonical not in [e.name for e in resolved_entities]:
                resolved_entities.append(
                    Entity(
                        name=canonical,
                        entity_type=entity.entity_type,
                        description=entity.description,
                        properties=entity.properties
                    )
                )

        resolved_relationships = []
        for rel in extraction.relationships:
            resolved_relationships.append(
                Relationship(
                    source=name_map.get(rel.source, rel.source),
                    target=name_map.get(rel.target, rel.target),
                    relationship_type=rel.relationship_type,
                    properties=rel.properties
                )
            )

        return KnowledgeGraphExtraction(
            entities=resolved_entities,
            relationships=resolved_relationships
        )

Production Warning: Extraction Accuracy

LLM extraction accuracy for entities is typically 85-92% on domain text. For relationships it drops to 70-80%. Always implement a human review loop for high-stakes knowledge (medical, legal, financial). Use gpt-4o instead of gpt-4o-mini for critical extraction tasks — the accuracy difference on complex relational text is significant (approximately +12% on relationship extraction benchmarks).

4. GraphRAG: Combining Graph and Vector Retrieval

GraphRAG is the paradigm that combines traditional semantic search (vector search) with knowledge graph traversal. For questions that require reasoning about relationships between entities, GraphRAG significantly outperforms classic RAG. Microsoft's GraphRAG paper (2024) showed up to 40% improvement on community-level questions compared to naive RAG.

GraphRAG System with LangChain and Neo4j


from langchain_community.graphs import Neo4jGraph
from langchain.chains import GraphCypherQAChain
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate


class GraphRAGSystem:
    """
    GraphRAG system combining:
    1. Vector retrieval for semantic questions
    2. Cypher queries on Neo4j for structured questions
    3. LLM to synthesize both sources
    """

    def __init__(self, neo4j_url: str, username: str, password: str, vector_retriever):
        self.llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
        self.vector_retriever = vector_retriever

        # Neo4j connection for LangChain
        self.graph = Neo4jGraph(
            url=neo4j_url,
            username=username,
            password=password
        )

        # Chain to automatically generate and execute Cypher queries
        self.cypher_chain = GraphCypherQAChain.from_llm(
            cypher_llm=ChatOpenAI(model="gpt-4o-mini", temperature=0),
            qa_llm=self.llm,
            graph=self.graph,
            verbose=True,
            return_intermediate_steps=True,
            allow_dangerous_requests=True  # Required for auto-generated queries
        )

        # Router to decide which source to use
        self.router_chain = (
            ChatPromptTemplate.from_template("""
Analyze this question and decide the best retrieval strategy.

Question: {question}

Choose ONE strategy:
- "graph": the question requires entity relationships, counts, paths, or specific attributes
- "vector": the question requires explanations, concepts, procedures, or narrative text
- "hybrid": the question benefits from both sources

Reply ONLY with: graph, vector, or hybrid""")
            | self.llm
        )

    def _classify_query(self, question: str) -> str:
        """Classify query type"""
        result = self.router_chain.invoke({"question": question})
        strategy = result.content.strip().lower()
        return strategy if strategy in ["graph", "vector", "hybrid"] else "vector"

    def query(self, question: str) -> dict:
        """Answer the question using the optimal strategy"""
        strategy = self._classify_query(question)
        print(f"Selected strategy: {strategy}")

        graph_context = ""
        vector_context = ""

        if strategy in ["graph", "hybrid"]:
            try:
                # Auto-generate and execute Cypher query
                graph_result = self.cypher_chain.invoke({"query": question})
                graph_context = str(graph_result.get("result", ""))
            except Exception as e:
                graph_context = f"Graph query error: {e}"

        if strategy in ["vector", "hybrid"]:
            docs = self.vector_retriever.invoke(question)
            vector_context = "\n".join(d.page_content for d in docs[:3])

        # Final synthesis
        synthesis_prompt = f"""Question: {question}

{f"Data from knowledge graph:\n{graph_context}\n" if graph_context else ""}
{f"Relevant documents:\n{vector_context}\n" if vector_context else ""}

Answer comprehensively based on the available information."""

        final_answer = self.llm.invoke(synthesis_prompt).content

        return {
            "answer": final_answer,
            "strategy": strategy,
            "graph_context": graph_context,
            "vector_context": vector_context[:200] if vector_context else ""
        }


# Examples showing GraphRAG advantages by query type
graph_rag_examples = [
    # Structural question: best answered with graph
    "How many AI companies were founded after 2020?",

    # Relational question: best with graph
    "Who are the people working for companies that compete with OpenAI?",

    # Semantic question: best with vector
    "How does the attention mechanism in transformers work?",

    # Hybrid question: benefits from both
    "What is Anthropic's product development strategy?"
]

4.1 When GraphRAG Outperforms Standard RAG

      Query Type Performance Comparison
      
        
            Query Type
            Standard RAG
            GraphRAG
            Best Approach
          

        
            Factual lookup ("When was X founded?")
            Good
            Excellent
            Graph
          

            Multi-hop ("Who works at X's competitors?")
            Poor
            Excellent
            Graph
          

            Aggregation ("How many companies in sector Y?")
            Poor
            Excellent
            Graph
          

            Conceptual explanation
            Excellent
            Poor
            Vector
          

            Procedural ("How to implement X?")
            Excellent
            Poor
            Vector
          

            Entity overview with context
            Good
            Good
            Hybrid
          

      
    

5. Knowledge Graph Enrichment for RAG

Even without implementing full GraphRAG, a knowledge graph can significantly enrich a traditional RAG system: by expanding queries with related entities, filtering documents by relevant relationships, or adding structured context to retrieved chunks.

KG-Enhanced RAG: Query Expansion via Graph


class KGEnhancedRetriever:
    """
    Retriever that uses the knowledge graph to expand queries
    with related entities before performing vector search.
    """

    def __init__(self, kg: Neo4jKnowledgeGraph, vector_retriever, llm):
        self.kg = kg
        self.retriever = vector_retriever
        self.llm = llm

    def extract_entities_from_query(self, query: str) -> List[str]:
        """Extract entities from query using NER"""
        prompt = f"""Extract entity names (people, organizations, products, technologies)
from the following query. Return only the names, one per line.

Query: {query}"""

        result = self.llm.invoke(prompt).content
        entities = [e.strip() for e in result.split('\n') if e.strip()]
        return entities

    def get_related_entities(self, entity_name: str, max_hops: int = 2) -> List[str]:
        """Get related entities from the graph"""
        query = f"""
MATCH (n)-[*1..{max_hops}]-(related)
WHERE n.name CONTAINS $entity_name
RETURN DISTINCT related.name as name
LIMIT 20"""

        results = self.kg.execute_query(query, {"entity_name": entity_name})
        return [r["name"] for r in results if r["name"]]

    def enhanced_retrieve(self, query: str, top_k: int = 5) -> list:
        """
        Retrieve documents with KG-based query expansion.
        1. Extract entities from the query
        2. Find related entities in the graph
        3. Expand the query with related entities
        4. Perform vector search on the expanded query
        """
        # Step 1: Extract entities from query
        entities = self.extract_entities_from_query(query)
        print(f"Entities found: {entities}")

        # Step 2: Find related entities
        all_related = set()
        for entity in entities[:3]:  # Limit to 3 entities
            related = self.get_related_entities(entity)
            all_related.update(related[:5])  # Max 5 related per entity

        # Step 3: Expand query
        if all_related:
            expansion = ", ".join(list(all_related)[:10])
            expanded_query = f"{query} [Related entities: {expansion}]"
            print(f"Query expanded with: {expansion}")
        else:
            expanded_query = query

        # Step 4: Vector search on expanded query
        docs = self.retriever.invoke(expanded_query)
        return docs[:top_k]

    def get_entity_context(self, entity_name: str) -> str:
        """Get structured context for an entity from the graph"""
        query = """
MATCH (n {name: $name})
OPTIONAL MATCH (n)-[r]->(related)
RETURN n, type(r) as rel_type, related.name as related_name
LIMIT 20"""

        results = self.kg.execute_query(query, {"name": entity_name})
        if not results:
            return ""

        lines = [f"Entity: {entity_name}"]
        for r in results:
            if r.get("rel_type") and r.get("related_name"):
                lines.append(f"  -> {r['rel_type']}: {r['related_name']}")

        return "\n".join(lines)

    def retrieve_with_graph_context(self, query: str, top_k: int = 5) -> list:
        """
        Retrieve documents AND attach relevant graph context
        to each document for richer LLM grounding.
        """
        docs = self.enhanced_retrieve(query, top_k)
        entities = self.extract_entities_from_query(query)

        # Attach graph context to each document's metadata
        enriched_docs = []
        for doc in docs:
            entity_contexts = []
            for entity in entities[:2]:
                ctx = self.get_entity_context(entity)
                if ctx:
                    entity_contexts.append(ctx)

            enriched_doc = doc.copy()
            if entity_contexts:
                enriched_doc.metadata["graph_context"] = "\n\n".join(entity_contexts)
            enriched_docs.append(enriched_doc)

        return enriched_docs

6. Public Knowledge Graphs: Wikidata and Schema.org

Building a KG from scratch is expensive. Public knowledge graphs like Wikidata (100M+ entities) and DBpedia provide a foundation you can extend with domain-specific knowledge. LangChain provides a native Wikidata tool:

Wikidata Integration for KG Enrichment


import requests
from langchain.tools import WikidataQueryRun
from langchain_community.utilities import WikidataAPIWrapper


# Use LangChain's Wikidata tool for agent-based KG queries
wikidata_tool = WikidataQueryRun(api_wrapper=WikidataAPIWrapper())

# Direct SPARQL queries to Wikidata endpoint
WIKIDATA_ENDPOINT = "https://query.wikidata.org/sparql"


def query_wikidata_sparql(sparql_query: str) -> List[Dict]:
    """Execute a SPARQL query against Wikidata"""
    headers = {
        "Accept": "application/sparql-results+json",
        "User-Agent": "KG-RAG-System/1.0"
    }
    params = {
        "query": sparql_query,
        "format": "json"
    }

    response = requests.get(WIKIDATA_ENDPOINT, headers=headers, params=params)
    response.raise_for_status()
    data = response.json()

    results = []
    for binding in data.get("results", {}).get("bindings", []):
        row = {k: v.get("value", "") for k, v in binding.items()}
        results.append(row)

    return results


# Example: find CEOs of Fortune 500 tech companies
CEO_QUERY = """
SELECT ?companyLabel ?ceoLabel ?foundedDate WHERE {
  ?company wdt:P31 wd:Q4830453.  # instance of business
  ?company wdt:P169 ?ceo.         # has CEO
  ?company wdt:P571 ?foundedDate. # founding date
  ?company wdt:P452 wd:Q11661.    # industry: information technology
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
LIMIT 50"""


class WikidataKGEnricher:
    """Enriches a local Neo4j KG with data from Wikidata"""

    def __init__(self, kg: Neo4jKnowledgeGraph):
        self.kg = kg

    def enrich_company(self, company_name: str) -> dict:
        """Fetch Wikidata data for a company and store in local KG"""
        sparql = f"""
SELECT ?companyLabel ?foundedDate ?ceoLabel ?countryLabel WHERE {
  ?company ?label "{company_name}"@en.
  OPTIONAL { ?company wdt:P571 ?foundedDate. }
  OPTIONAL { ?company wdt:P169 ?ceo. ?ceo rdfs:label ?ceoLabel. FILTER(LANG(?ceoLabel)="en") }
  OPTIONAL { ?company wdt:P17 ?country. }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
} LIMIT 1"""

        results = query_wikidata_sparql(sparql)
        if not results:
            return {"status": "not_found"}

        row = results[0]
        enrichment = {
            "wikidata_enriched": True,
            "founded": row.get("foundedDate", "")[:4] if row.get("foundedDate") else None,
            "country": row.get("countryLabel", ""),
            "ceo": row.get("ceoLabel", "")
        }

        # Update local KG
        self.kg.upsert_entity(
            label="Company",
            match_props={"name": company_name},
            set_props={k: v for k, v in enrichment.items() if v}
        )

        return {"status": "enriched", "data": enrichment}

7. Best Practices and Anti-Patterns

Knowledge Graph Best Practices for AI

Start small and iterate: do not try to build the perfect KG from scratch. Start with the entities and relationships most important to your use case, then expand incrementally as you understand your query patterns better.
Define a clear ontology first: before populating the graph, define node types and relationship types. A poorly designed ontology is extremely costly to change after the graph has been populated at scale.
Validate automatic extraction: LLMs make mistakes in extraction. Implement a human review process for critical data, especially in the early stages before you trust the extraction pipeline.
Use MERGE, not CREATE: in Neo4j, always use MERGE for entities to avoid duplicates. CREATE always creates a new node even if one already exists.
Index search properties: create Neo4j indexes on properties used in WHERE clauses (e.g. name, date). Without indexes, queries on large graphs become unacceptably slow.
Version your ontology: as your domain evolves, relationship types and entity schemas will change. Use property versioning or schema migration scripts, similar to database migrations in relational DBs.

Anti-Patterns to Avoid

KG as a complete replacement for RAG: GraphRAG is powerful but has high setup costs. For purely semantic questions, traditional RAG is often better and cheaper. Use GraphRAG selectively where it adds value.
Overly generic relationships: a "RELATED_TO" relationship carries no information. Relationships must be semantically precise (FOUNDED, WORKS_AT, COMPETES_WITH, ACQUIRED_BY).
No update strategy: a static KG becomes stale quickly. Define from the beginning how and how often the graph will be updated, and implement incremental update pipelines.
Unvalidated LLM-generated Cypher queries: LLM-generated Cypher can be dangerous (accidental deletion, performance issues). Use parameterized query templates where possible, and add query validation before execution in production.
Neglecting graph schema documentation: without clear documentation of node types, relationship types, and property meanings, the KG becomes unmaintainable as the team grows.

7.1 GraphRAG vs Standard RAG: When to Choose

      Decision Framework
      
            Factor
            Choose Standard RAG
            Choose GraphRAG
          
            Query types
            Mostly conceptual / procedural
            Relational, multi-hop, aggregation
          
            Data structure
            Unstructured documents
            Entities with defined relationships
          
            Update frequency
            Infrequent, batch updates
            Frequent, incremental updates
          
            Team expertise
            Python, ML basics
            Graph databases, Cypher, ontology design
          
            Setup time
            Days to weeks
            Weeks to months
          
            Hallucination risk
            Moderate
            Lower (structured facts)

Conclusions

Knowledge graphs bring something that traditional RAG systems cannot offer: structured, relational, and verifiable knowledge. GraphRAG combines the best of both worlds: the flexibility of semantic retrieval with the precision of structured graph reasoning. We explored Neo4j, automatic KG extraction with LLMs, GraphRAG with LangChain, and enriching traditional RAG with graph-based query expansion.

The key takeaways from this article:

KGs represent knowledge as entities and relationships: explicit and queryable structure
LLMs enable automatic KG extraction from unstructured text at scale
GraphRAG significantly outperforms classic RAG for relational and multi-hop questions
KG query expansion improves recall in traditional RAG pipelines
Start with a simple ontology and expand iteratively based on real query patterns
Use MERGE over CREATE, define indexes, and validate LLM-generated Cypher queries

This article concludes the AI Engineering and Advanced RAG series. We have traveled the full stack: from RAG fundamentals, through embeddings, vector databases, hybrid retrieval, context window management, multi-agent systems, prompt engineering in production, and finally knowledge graphs. The field is evolving rapidly — continue following the blog for updates as new techniques emerge.

Complete AI Engineering and Advanced RAG Series

Article 1: RAG Explained - Fundamentals
Article 2: Embeddings and Semantic Search
Article 3: Vector Databases
Article 4: Hybrid Retrieval
Article 5: RAG in Production
Article 6: LangChain for RAG
Article 7: Context Window Management
Article 8: Multi-Agent Systems
Article 9: Prompt Engineering in Production
Article 10: Knowledge Graphs for AI (current)

Continue with related series: pgvector for RAG on PostgreSQL and BERT and Modern NLP.

Dimension	RDF/SPARQL	Property Graph (Neo4j)
Model	Standardized triples (S, P, O)	Nodes and edges with arbitrary properties
Standard	W3C standard, interoperable	Proprietary but more flexible
Query language	SPARQL (complex)	Cypher (more readable)
Properties on relationships	Complicated (reification)	Native and simple
AI ecosystem	Wikidata, DBpedia, Schema.org	Neo4j (LangChain integration)
When to use	Open data, interoperability	AI applications, GraphRAG

Query Type	Standard RAG	GraphRAG	Best Approach
Factual lookup ("When was X founded?")	Good	Excellent	Graph
Multi-hop ("Who works at X's competitors?")	Poor	Excellent	Graph
Aggregation ("How many companies in sector Y?")	Poor	Excellent	Graph
Conceptual explanation	Excellent	Poor	Vector
Procedural ("How to implement X?")	Excellent	Poor	Vector
Entity overview with context	Good	Good	Hybrid

Factor	Choose Standard RAG	Choose GraphRAG
Query types	Mostly conceptual / procedural	Relational, multi-hop, aggregation
Data structure	Unstructured documents	Entities with defined relationships
Update frequency	Infrequent, batch updates	Frequent, incremental updates
Team expertise	Python, ML basics	Graph databases, Cypher, ontology design
Setup time	Days to weeks	Weeks to months
Hallucination risk	Moderate	Lower (structured facts)