Creo applicazioni web moderne e strumenti digitali personalizzati per aiutare le attività a crescere attraverso l'innovazione tecnologica. La mia passione è unire informatica ed economia per generare valore reale.
La mia passione per l'informatica è nata tra i banchi dell'Istituto Tecnico Commerciale di Maglie, dove ho scoperto il potere della programmazione e il fascino di creare soluzioni digitali. Fin da subito, ho capito che l'informatica non era solo codice, ma uno strumento straordinario per trasformare idee in realtà.
Durante gli studi superiori in Sistemi Informativi Aziendali, ho iniziato a intrecciare informatica ed economia, comprendendo come la tecnologia possa essere il motore della crescita per qualsiasi attività. Questa visione mi ha accompagnato all'Università degli Studi di Bari, dove ho conseguito la Laurea in Informatica, approfondendo le mie competenze tecniche e la mia passione per lo sviluppo software.
Oggi metto questa esperienza al servizio di imprese, professionisti e startup, creando soluzioni digitali su misura che automatizzano processi, ottimizzano risorse e aprono nuove opportunità di business. Perché la vera innovazione inizia quando la tecnologia incontra le esigenze reali delle persone.
Le Mie Competenze
Analisi Dati & Modelli Previsionali
Trasformo i dati in insights strategici con analisi approfondite e modelli predittivi per decisioni informate
Automazione Processi
Creo strumenti personalizzati che automatizzano operazioni ripetitive e liberano tempo per attività a valore aggiunto
Sistemi Custom
Sviluppo sistemi software su misura, dalle integrazioni tra piattaforme alle dashboard personalizzate
Credo fermamente che l'informatica sia lo strumento più potente per trasformare le idee in realtà e migliorare la vita delle persone.
🚀
Democratizzare la Tecnologia
La mia missione è rendere l'informatica accessibile a tutti: dalle piccole imprese locali alle startup innovative, fino ai professionisti che vogliono digitalizzare la propria attività. Ogni realtà merita di sfruttare le potenzialità del digitale.
💡
Unire Informatica ed Economia
Non è solo questione di scrivere codice: è capire come la tecnologia possa generare valore reale. Intrecciando competenze informatiche e visione economica, aiuto le attività a crescere, ottimizzare processi e raggiungere nuovi traguardi di efficienza e redditività.
🎯
Creare Soluzioni su Misura
Ogni attività è unica, e così devono esserlo le soluzioni. Sviluppo strumenti personalizzati che rispondono alle esigenze specifiche di ciascun cliente, automatizzando processi ripetitivi e liberando tempo per ciò che conta davvero: far crescere il business.
Trasforma la Tua Attività con la Tecnologia
Che tu gestisca un negozio, uno studio professionale o un'azienda, posso aiutarti a sfruttare le potenzialità dell'informatica per lavorare meglio, più velocemente e in modo più intelligente.
Il mio percorso accademico e le tecnologie che padroneggio
Certificazioni Professionali
8 certificazioni conseguite
Nuovo
Visualizza
Reinvention With Agentic AI Learning Program
Anthropic
Dicembre 2024
Nuovo
Visualizza
Agentic AI Fluency
Anthropic
Dicembre 2024
Nuovo
Visualizza
AI Fluency for Students
Anthropic
Dicembre 2024
Nuovo
Visualizza
AI Fluency: Framework and Foundations
Anthropic
Dicembre 2024
Nuovo
Visualizza
Claude with the Anthropic API
Anthropic
Dicembre 2024
Visualizza
Master SQL
RoadMap.sh
Novembre 2024
Visualizza
Oracle Certified Foundations Associate
Oracle
Ottobre 2024
Visualizza
People Leadership Credential
Connect
Settembre 2024
💻 Linguaggi & Tecnologie
☕Java
🐍Python
📜JavaScript
🅰️Angular
⚛️React
🔷TypeScript
🗄️SQL
🐘PHP
🎨CSS/SCSS
🔧Node.js
🐳Docker
🌿Git
💼
12/2024 - Presente
Custom Software Engineering Analyst
Accenture
Bari, Puglia, Italia · Ibrida
Analisi e sviluppo di sistemi informatici attraverso l'utilizzo di Java e Quarkus in Health and Public Sector. Formazione continua su tecnologie moderne per la creazione di soluzioni software personalizzate ed efficienti e sugli agenti.
💼
06/2022 - 12/2024
Analista software e Back End Developer Associate Consultant
Links Management and Technology SpA
Esperienza nell'analisi di sistemi software as-is e flussi ETL utilizzando PowerCenter. Formazione completata su Spring Boot per lo sviluppo di applicazioni backend moderne e scalabili. Sviluppatore Backend specializzato in Spring Boot, con esperienza in progettazione di database, analisi, sviluppo e testing dei task assegnati.
💼
02/2021 - 10/2021
Programmatore software
Adesso.it (prima era WebScience srl)
Esperienza nell'analisi AS-IS e TO-BE, evoluzioni SEO ed evoluzioni website per migliorare le performance e l'engagement degli utenti.
🎓
2018 - 2025
Laurea in Informatica
Università degli Studi di Bari Aldo Moro
Bachelor's degree in Computer Science, focusing on software engineering, algorithms, and modern development practices.
📚
2013 - 2018
Diploma - Sistemi Informativi Aziendali
Istituto Tecnico Commerciale di Maglie
Technical diploma specializing in Business Information Systems, combining IT knowledge with business management.
Contattami
Hai un progetto in mente? Parliamone! Compila il form qui sotto e ti risponderò al più presto.
* Campi obbligatori. I tuoi dati saranno utilizzati solo per rispondere alla tua richiesta.
Knowledge Graphs and AI: Integrating Structured Knowledge into LLMs
Large Language Models are remarkable at generating fluent text,
but they suffer from a fundamental limitation: the knowledge they contain is
implicit, distributed across model parameters, difficult to update,
and impossible to query with structured queries. "Give me all the people who
work at AI companies founded after 2020" is trivial in a knowledge graph,
but impossible to guarantee with an LLM.
Knowledge Graphs (KG) represent knowledge as graphs of entities
and relationships: explicit structure that is queryable, updatable, and verifiable.
Integrating KGs with LLMs — the GraphRAG paradigm — produces systems
capable of structured reasoning that traditional RAG cannot offer.
In this article we build GraphRAG systems with Neo4j, explore automatic graph
extraction from text using LLMs, and see how to query knowledge graphs
to enrich RAG systems.
What You Will Learn
Knowledge graph fundamentals: nodes, relationships, properties, RDF and Property Graph
Neo4j: the model, Cypher query language, and LangChain integration
Automatic KG extraction from unstructured text using LLMs
GraphRAG: combining graph traversal and vector retrieval
KG for RAG: enriching chunks with entity relationships
Multi-hop reasoning on knowledge graphs
Wikidata and public Knowledge Graphs for enrichment
Best practices for building maintainable KGs in production
1. Knowledge Graph Fundamentals
A knowledge graph is a representation of knowledge as a
graph where nodes represent entities (people, organizations,
concepts, events) and edges represent relationships between them.
Each triple (subject, predicate, object) encodes a fact.
There are two main knowledge graph models, each with different tradeoffs:
RDF vs Property Graph Comparison
Dimension
RDF/SPARQL
Property Graph (Neo4j)
Model
Standardized triples (S, P, O)
Nodes and edges with arbitrary properties
Standard
W3C standard, interoperable
Proprietary but more flexible
Query language
SPARQL (complex)
Cypher (more readable)
Properties on relationships
Complicated (reification)
Native and simple
AI ecosystem
Wikidata, DBpedia, Schema.org
Neo4j (LangChain integration)
When to use
Open data, interoperability
AI applications, GraphRAG
1.2 Why Knowledge Graphs Matter for AI
The combination of knowledge graphs with LLMs addresses three critical limitations
of pure vector-based RAG systems:
Hallucination prevention: factual relationships stored in a KG
are verifiable and can be used to ground LLM outputs, reducing confabulation.
Multi-hop reasoning: "Which products are made by companies that
compete with OpenAI?" requires traversing multiple relationship hops — trivial
in a graph query, unreliable with pure semantic search.
Updateability: you can add or modify facts in a KG without
retraining the model. Critical knowledge (pricing, org changes, product versions)
stays current.
2. Neo4j: Setup and the Cypher Query Language
Neo4j is the most widely adopted graph database for AI applications,
with excellent integration with LangChain. The Cypher language
uses an intuitive ASCII-art syntax to express graph patterns.
Neo4j Setup and Basic Cypher Queries
from neo4j import GraphDatabase
from typing import List, Dict, Any, Optional
import os
class Neo4jKnowledgeGraph:
"""Python interface for a Neo4j knowledge graph"""
def __init__(
self,
uri: str = "bolt://localhost:7687",
user: str = "neo4j",
password: str = "password"
):
self.driver = GraphDatabase.driver(uri, auth=(user, password))
def close(self):
self.driver.close()
def execute_query(self, query: str, parameters: dict = None) -> List[Dict]:
"""Execute a Cypher query and return results"""
with self.driver.session() as session:
result = session.run(query, parameters or {})
return [record.data() for record in result]
def create_entity(self, label: str, properties: Dict) -> str:
"""Create a node with a label and properties"""
props_str = ", ".join(f"{k}:
#123;k}" for k in properties.keys())
query = f"CREATE (n:{label} {{{props_str}}}) RETURN id(n) as id"
result = self.execute_query(query, properties)
return result[0]["id"] if result else None
def create_relationship(
self,
from_label: str, from_props: Dict,
rel_type: str, rel_props: Dict,
to_label: str, to_props: Dict
):
"""Create a relationship between two nodes"""
from_match = " AND ".join(f"a.{k} = $from_{k}" for k in from_props)
to_match = " AND ".join(f"b.{k} = $to_{k}" for k in to_props)
rel_props_str = ", ".join(f"{k}: $rel_{k}" for k in rel_props) if rel_props else ""
params = {
**{f"from_{k}": v for k, v in from_props.items()},
**{f"to_{k}": v for k, v in to_props.items()},
**{f"rel_{k}": v for k, v in rel_props.items()}
}
query = f"""
MATCH (a:{from_label}) WHERE {from_match}
MATCH (b:{to_label}) WHERE {to_match}
MERGE (a)-[r:{rel_type} {{{rel_props_str}}}]->(b)
RETURN type(r) as rel_type"""
return self.execute_query(query, params)
def upsert_entity(self, label: str, match_props: Dict, set_props: Dict = None):
"""Upsert: create if not exists, update if it does"""
match_str = ", ".join(f"{k}: #123;k}" for k in match_props)
query = f"MERGE (n:{label} {{{match_str}}})"
params = dict(match_props)
if set_props:
set_str = ", ".join(f"n.{k} = $set_{k}" for k in set_props)
query += f" ON CREATE SET {set_str} ON MATCH SET {set_str}"
params.update({f"set_{k}": v for k, v in set_props.items()})
query += " RETURN n"
return self.execute_query(query, params)
# Examples of advanced Cypher queries
CYPHER_EXAMPLES = {
# Find all AI companies founded after 2020
"recent_companies": """
MATCH (c:Company {sector: 'AI'})
WHERE c.founded > 2020
RETURN c.name, c.founded
ORDER BY c.founded DESC""",
# Find shortest path between two people (degrees of separation)
"social_path": """
MATCH path = shortestPath(
(p1:Person {name: $person1})-[*..6]-(p2:Person {name: $person2})
)
RETURN path, length(path) as degrees""",
# Find related entities cluster (community detection)
"related_entities": """
MATCH (n:Company)-[r]-(related)
WHERE n.name = $company_name
RETURN related, type(r), n
LIMIT 50""",
# Multi-hop: products made by companies competing with X
"competitor_products": """
MATCH (c1:Company)-[:COMPETES_WITH]->(c2:Company)
WHERE c1.name = $company_name
MATCH (c2)-[:DEVELOPS]->(p:Product)
RETURN DISTINCT p.name, p.category, c2.name as developed_by"""
}
2.1 Schema Design in Neo4j
Before populating a knowledge graph, defining a clear ontology prevents costly
restructuring later. Key design principles:
Neo4j Constraints and Indexes for Production
# Create uniqueness constraints and indexes
# Run these once during database initialization
SCHEMA_SETUP_QUERIES = [
# Uniqueness constraints prevent duplicate entities
"CREATE CONSTRAINT person_name IF NOT EXISTS FOR (p:Person) REQUIRE p.name IS UNIQUE",
"CREATE CONSTRAINT company_name IF NOT EXISTS FOR (c:Company) REQUIRE c.name IS UNIQUE",
"CREATE CONSTRAINT product_name IF NOT EXISTS FOR (p:Product) REQUIRE p.name IS UNIQUE",
# Full-text indexes for fuzzy name matching
"""CREATE FULLTEXT INDEX entity_names IF NOT EXISTS
FOR (n:Person|Company|Product|Technology)
ON EACH [n.name, n.description]""",
# Range index for temporal queries
"CREATE INDEX company_founded IF NOT EXISTS FOR (c:Company) ON (c.founded)",
# Composite index for typed entity lookup
"CREATE INDEX entity_type_name IF NOT EXISTS FOR (n:Person) ON (n.name, n.role)"
]
def initialize_schema(kg: Neo4jKnowledgeGraph):
"""Initialize schema constraints and indexes"""
for query in SCHEMA_SETUP_QUERIES:
try:
kg.execute_query(query)
print(f"Schema query OK: {query[:60]}...")
except Exception as e:
print(f"Schema query failed (may already exist): {e}")
3. Automatic Knowledge Graph Extraction from Text
Building a knowledge graph manually is expensive and time-consuming. Modern LLMs
allow automatic extraction of entities and relationships from unstructured text,
populating the graph semi-automatically. This is one of the most powerful
applications of LLMs beyond text generation.
KG Extraction with LLM and Pydantic Schemas
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from pydantic import BaseModel, Field
from typing import List, Optional
# Structured extraction schema
class Entity(BaseModel):
"""An entity extracted from text"""
name: str = Field(description="The entity name")
entity_type: str = Field(
description="Type: Person, Company, Product, Technology, Location, Event, Concept"
)
description: Optional[str] = Field(
description="Brief description of the entity", default=None
)
properties: dict = Field(
description="Additional properties (e.g. founded_year, role)",
default_factory=dict
)
class Relationship(BaseModel):
"""A relationship between two entities"""
source: str = Field(description="Name of the source entity")
target: str = Field(description="Name of the target entity")
relationship_type: str = Field(
description="Relationship type (e.g. WORKS_AT, FOUNDED, COMPETES_WITH)"
)
properties: dict = Field(
description="Relationship properties (e.g. since_year)",
default_factory=dict
)
class KnowledgeGraphExtraction(BaseModel):
"""Result of knowledge graph extraction from text"""
entities: List[Entity] = Field(description="Entities extracted from text")
relationships: List[Relationship] = Field(description="Relationships between entities")
class LLMKnowledgeGraphExtractor:
"""Extracts knowledge graphs from text using LLMs"""
def __init__(self, model: str = "gpt-4o-mini"):
llm = ChatOpenAI(model=model, temperature=0)
self.structured_llm = llm.with_structured_output(KnowledgeGraphExtraction)
self.extraction_prompt = ChatPromptTemplate.from_template("""
Extract entities and relationships from the following text to build a knowledge graph.
Entity types to extract: Person, Company, Product, Technology, Location, Event, Concept
Common relationship types: WORKS_AT, FOUNDED, DEVELOPS, USES, COMPETES_WITH, PART_OF,
LOCATED_IN, ACQUIRED_BY, INVESTED_IN, AUTHORED_BY
Text to analyze:
{text}
Extract ALL entities and relationships mentioned, including implicit ones.
For properties, only extract those explicitly mentioned in the text.""")
def extract(self, text: str) -> KnowledgeGraphExtraction:
"""Extract entities and relationships from text"""
return self.structured_llm.invoke(
self.extraction_prompt.format_messages(text=text)
)
def extract_and_store(
self,
text: str,
neo4j_kg: Neo4jKnowledgeGraph,
source_metadata: Dict = None
) -> dict:
"""Extract from text and store directly in Neo4j"""
extraction = self.extract(text)
stored_entities = 0
stored_relationships = 0
# Store entities
for entity in extraction.entities:
props = {
"name": entity.name,
**(entity.properties or {}),
}
if entity.description:
props["description"] = entity.description
if source_metadata:
props["source"] = source_metadata.get("source", "")
neo4j_kg.upsert_entity(
label=entity.entity_type,
match_props={"name": entity.name},
set_props=props
)
stored_entities += 1
# Store relationships
for rel in extraction.relationships:
# Verify entities exist before creating relationship
source_entity = next(
(e for e in extraction.entities if e.name == rel.source), None
)
target_entity = next(
(e for e in extraction.entities if e.name == rel.target), None
)
if source_entity and target_entity:
neo4j_kg.create_relationship(
from_label=source_entity.entity_type,
from_props={"name": rel.source},
rel_type=rel.relationship_type,
rel_props=rel.properties or {},
to_label=target_entity.entity_type,
to_props={"name": rel.target}
)
stored_relationships += 1
return {
"entities_found": len(extraction.entities),
"relationships_found": len(extraction.relationships),
"entities_stored": stored_entities,
"relationships_stored": stored_relationships
}
# Example usage
extractor = LLMKnowledgeGraphExtractor()
kg = Neo4jKnowledgeGraph()
text = """
OpenAI, founded by Sam Altman and Elon Musk in 2015, developed GPT-4 and ChatGPT.
The company received a $10 billion investment from Microsoft in 2023.
Anthropic, founded by former OpenAI employees including Dario Amodei, develops Claude,
a model that directly competes with ChatGPT.
"""
result = extractor.extract_and_store(text, kg, {"source": "news_article.txt"})
print(f"Extracted: {result['entities_found']} entities, {result['relationships_found']} relationships")
3.1 Handling Extraction Quality
LLM-based extraction is powerful but imperfect. Entity co-reference (multiple
mentions of the same entity with different names) and hallucinated relationships
are the two most common failure modes. Here is a validation layer:
Entity Resolution and Extraction Validation
from difflib import SequenceMatcher
class EntityResolver:
"""
Resolves entity co-references and deduplicates before KG storage.
Example: "OpenAI", "Open AI", "the company" -> "OpenAI"
"""
def __init__(self, similarity_threshold: float = 0.85):
self.threshold = similarity_threshold
self._known_entities: Dict[str, str] = {} # canonical_name -> canonical_name
def _similarity(self, a: str, b: str) -> float:
return SequenceMatcher(None, a.lower(), b.lower()).ratio()
def resolve(self, entity_name: str) -> str:
"""Return canonical name for an entity"""
# Check exact match first
if entity_name in self._known_entities:
return self._known_entities[entity_name]
# Fuzzy match against known entities
for known in self._known_entities:
if self._similarity(entity_name, known) >= self.threshold:
# Map this variant to the canonical form
self._known_entities[entity_name] = self._known_entities[known]
return self._known_entities[known]
# New entity: register with itself as canonical
self._known_entities[entity_name] = entity_name
return entity_name
def resolve_extraction(
self,
extraction: KnowledgeGraphExtraction
) -> KnowledgeGraphExtraction:
"""Resolve all entity names in an extraction result"""
resolved_entities = []
name_map: Dict[str, str] = {}
for entity in extraction.entities:
canonical = self.resolve(entity.name)
name_map[entity.name] = canonical
if canonical == entity.name or canonical not in [e.name for e in resolved_entities]:
resolved_entities.append(
Entity(
name=canonical,
entity_type=entity.entity_type,
description=entity.description,
properties=entity.properties
)
)
resolved_relationships = []
for rel in extraction.relationships:
resolved_relationships.append(
Relationship(
source=name_map.get(rel.source, rel.source),
target=name_map.get(rel.target, rel.target),
relationship_type=rel.relationship_type,
properties=rel.properties
)
)
return KnowledgeGraphExtraction(
entities=resolved_entities,
relationships=resolved_relationships
)
Production Warning: Extraction Accuracy
LLM extraction accuracy for entities is typically 85-92% on domain text.
For relationships it drops to 70-80%. Always implement a human review
loop for high-stakes knowledge (medical, legal, financial). Use
gpt-4o instead of gpt-4o-mini for critical
extraction tasks — the accuracy difference on complex relational text
is significant (approximately +12% on relationship extraction benchmarks).
4. GraphRAG: Combining Graph and Vector Retrieval
GraphRAG is the paradigm that combines traditional semantic
search (vector search) with knowledge graph traversal. For questions that require
reasoning about relationships between entities, GraphRAG significantly outperforms
classic RAG. Microsoft's GraphRAG paper (2024) showed up to 40% improvement on
community-level questions compared to naive RAG.
GraphRAG System with LangChain and Neo4j
from langchain_community.graphs import Neo4jGraph
from langchain.chains import GraphCypherQAChain
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
class GraphRAGSystem:
"""
GraphRAG system combining:
1. Vector retrieval for semantic questions
2. Cypher queries on Neo4j for structured questions
3. LLM to synthesize both sources
"""
def __init__(self, neo4j_url: str, username: str, password: str, vector_retriever):
self.llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
self.vector_retriever = vector_retriever
# Neo4j connection for LangChain
self.graph = Neo4jGraph(
url=neo4j_url,
username=username,
password=password
)
# Chain to automatically generate and execute Cypher queries
self.cypher_chain = GraphCypherQAChain.from_llm(
cypher_llm=ChatOpenAI(model="gpt-4o-mini", temperature=0),
qa_llm=self.llm,
graph=self.graph,
verbose=True,
return_intermediate_steps=True,
allow_dangerous_requests=True # Required for auto-generated queries
)
# Router to decide which source to use
self.router_chain = (
ChatPromptTemplate.from_template("""
Analyze this question and decide the best retrieval strategy.
Question: {question}
Choose ONE strategy:
- "graph": the question requires entity relationships, counts, paths, or specific attributes
- "vector": the question requires explanations, concepts, procedures, or narrative text
- "hybrid": the question benefits from both sources
Reply ONLY with: graph, vector, or hybrid""")
| self.llm
)
def _classify_query(self, question: str) -> str:
"""Classify query type"""
result = self.router_chain.invoke({"question": question})
strategy = result.content.strip().lower()
return strategy if strategy in ["graph", "vector", "hybrid"] else "vector"
def query(self, question: str) -> dict:
"""Answer the question using the optimal strategy"""
strategy = self._classify_query(question)
print(f"Selected strategy: {strategy}")
graph_context = ""
vector_context = ""
if strategy in ["graph", "hybrid"]:
try:
# Auto-generate and execute Cypher query
graph_result = self.cypher_chain.invoke({"query": question})
graph_context = str(graph_result.get("result", ""))
except Exception as e:
graph_context = f"Graph query error: {e}"
if strategy in ["vector", "hybrid"]:
docs = self.vector_retriever.invoke(question)
vector_context = "\n".join(d.page_content for d in docs[:3])
# Final synthesis
synthesis_prompt = f"""Question: {question}
{f"Data from knowledge graph:\n{graph_context}\n" if graph_context else ""}
{f"Relevant documents:\n{vector_context}\n" if vector_context else ""}
Answer comprehensively based on the available information."""
final_answer = self.llm.invoke(synthesis_prompt).content
return {
"answer": final_answer,
"strategy": strategy,
"graph_context": graph_context,
"vector_context": vector_context[:200] if vector_context else ""
}
# Examples showing GraphRAG advantages by query type
graph_rag_examples = [
# Structural question: best answered with graph
"How many AI companies were founded after 2020?",
# Relational question: best with graph
"Who are the people working for companies that compete with OpenAI?",
# Semantic question: best with vector
"How does the attention mechanism in transformers work?",
# Hybrid question: benefits from both
"What is Anthropic's product development strategy?"
]
4.1 When GraphRAG Outperforms Standard RAG
Query Type Performance Comparison
Query Type
Standard RAG
GraphRAG
Best Approach
Factual lookup ("When was X founded?")
Good
Excellent
Graph
Multi-hop ("Who works at X's competitors?")
Poor
Excellent
Graph
Aggregation ("How many companies in sector Y?")
Poor
Excellent
Graph
Conceptual explanation
Excellent
Poor
Vector
Procedural ("How to implement X?")
Excellent
Poor
Vector
Entity overview with context
Good
Good
Hybrid
5. Knowledge Graph Enrichment for RAG
Even without implementing full GraphRAG, a knowledge graph can significantly
enrich a traditional RAG system: by expanding queries with related entities,
filtering documents by relevant relationships, or adding structured context
to retrieved chunks.
KG-Enhanced RAG: Query Expansion via Graph
class KGEnhancedRetriever:
"""
Retriever that uses the knowledge graph to expand queries
with related entities before performing vector search.
"""
def __init__(self, kg: Neo4jKnowledgeGraph, vector_retriever, llm):
self.kg = kg
self.retriever = vector_retriever
self.llm = llm
def extract_entities_from_query(self, query: str) -> List[str]:
"""Extract entities from query using NER"""
prompt = f"""Extract entity names (people, organizations, products, technologies)
from the following query. Return only the names, one per line.
Query: {query}"""
result = self.llm.invoke(prompt).content
entities = [e.strip() for e in result.split('\n') if e.strip()]
return entities
def get_related_entities(self, entity_name: str, max_hops: int = 2) -> List[str]:
"""Get related entities from the graph"""
query = f"""
MATCH (n)-[*1..{max_hops}]-(related)
WHERE n.name CONTAINS $entity_name
RETURN DISTINCT related.name as name
LIMIT 20"""
results = self.kg.execute_query(query, {"entity_name": entity_name})
return [r["name"] for r in results if r["name"]]
def enhanced_retrieve(self, query: str, top_k: int = 5) -> list:
"""
Retrieve documents with KG-based query expansion.
1. Extract entities from the query
2. Find related entities in the graph
3. Expand the query with related entities
4. Perform vector search on the expanded query
"""
# Step 1: Extract entities from query
entities = self.extract_entities_from_query(query)
print(f"Entities found: {entities}")
# Step 2: Find related entities
all_related = set()
for entity in entities[:3]: # Limit to 3 entities
related = self.get_related_entities(entity)
all_related.update(related[:5]) # Max 5 related per entity
# Step 3: Expand query
if all_related:
expansion = ", ".join(list(all_related)[:10])
expanded_query = f"{query} [Related entities: {expansion}]"
print(f"Query expanded with: {expansion}")
else:
expanded_query = query
# Step 4: Vector search on expanded query
docs = self.retriever.invoke(expanded_query)
return docs[:top_k]
def get_entity_context(self, entity_name: str) -> str:
"""Get structured context for an entity from the graph"""
query = """
MATCH (n {name: $name})
OPTIONAL MATCH (n)-[r]->(related)
RETURN n, type(r) as rel_type, related.name as related_name
LIMIT 20"""
results = self.kg.execute_query(query, {"name": entity_name})
if not results:
return ""
lines = [f"Entity: {entity_name}"]
for r in results:
if r.get("rel_type") and r.get("related_name"):
lines.append(f" -> {r['rel_type']}: {r['related_name']}")
return "\n".join(lines)
def retrieve_with_graph_context(self, query: str, top_k: int = 5) -> list:
"""
Retrieve documents AND attach relevant graph context
to each document for richer LLM grounding.
"""
docs = self.enhanced_retrieve(query, top_k)
entities = self.extract_entities_from_query(query)
# Attach graph context to each document's metadata
enriched_docs = []
for doc in docs:
entity_contexts = []
for entity in entities[:2]:
ctx = self.get_entity_context(entity)
if ctx:
entity_contexts.append(ctx)
enriched_doc = doc.copy()
if entity_contexts:
enriched_doc.metadata["graph_context"] = "\n\n".join(entity_contexts)
enriched_docs.append(enriched_doc)
return enriched_docs
6. Public Knowledge Graphs: Wikidata and Schema.org
Building a KG from scratch is expensive. Public knowledge graphs like
Wikidata (100M+ entities) and DBpedia
provide a foundation you can extend with domain-specific knowledge.
LangChain provides a native Wikidata tool:
Wikidata Integration for KG Enrichment
import requests
from langchain.tools import WikidataQueryRun
from langchain_community.utilities import WikidataAPIWrapper
# Use LangChain's Wikidata tool for agent-based KG queries
wikidata_tool = WikidataQueryRun(api_wrapper=WikidataAPIWrapper())
# Direct SPARQL queries to Wikidata endpoint
WIKIDATA_ENDPOINT = "https://query.wikidata.org/sparql"
def query_wikidata_sparql(sparql_query: str) -> List[Dict]:
"""Execute a SPARQL query against Wikidata"""
headers = {
"Accept": "application/sparql-results+json",
"User-Agent": "KG-RAG-System/1.0"
}
params = {
"query": sparql_query,
"format": "json"
}
response = requests.get(WIKIDATA_ENDPOINT, headers=headers, params=params)
response.raise_for_status()
data = response.json()
results = []
for binding in data.get("results", {}).get("bindings", []):
row = {k: v.get("value", "") for k, v in binding.items()}
results.append(row)
return results
# Example: find CEOs of Fortune 500 tech companies
CEO_QUERY = """
SELECT ?companyLabel ?ceoLabel ?foundedDate WHERE {
?company wdt:P31 wd:Q4830453. # instance of business
?company wdt:P169 ?ceo. # has CEO
?company wdt:P571 ?foundedDate. # founding date
?company wdt:P452 wd:Q11661. # industry: information technology
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
LIMIT 50"""
class WikidataKGEnricher:
"""Enriches a local Neo4j KG with data from Wikidata"""
def __init__(self, kg: Neo4jKnowledgeGraph):
self.kg = kg
def enrich_company(self, company_name: str) -> dict:
"""Fetch Wikidata data for a company and store in local KG"""
sparql = f"""
SELECT ?companyLabel ?foundedDate ?ceoLabel ?countryLabel WHERE {
?company ?label "{company_name}"@en.
OPTIONAL { ?company wdt:P571 ?foundedDate. }
OPTIONAL { ?company wdt:P169 ?ceo. ?ceo rdfs:label ?ceoLabel. FILTER(LANG(?ceoLabel)="en") }
OPTIONAL { ?company wdt:P17 ?country. }
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
} LIMIT 1"""
results = query_wikidata_sparql(sparql)
if not results:
return {"status": "not_found"}
row = results[0]
enrichment = {
"wikidata_enriched": True,
"founded": row.get("foundedDate", "")[:4] if row.get("foundedDate") else None,
"country": row.get("countryLabel", ""),
"ceo": row.get("ceoLabel", "")
}
# Update local KG
self.kg.upsert_entity(
label="Company",
match_props={"name": company_name},
set_props={k: v for k, v in enrichment.items() if v}
)
return {"status": "enriched", "data": enrichment}
7. Best Practices and Anti-Patterns
Knowledge Graph Best Practices for AI
Start small and iterate: do not try to build the perfect KG
from scratch. Start with the entities and relationships most important to your
use case, then expand incrementally as you understand your query patterns better.
Define a clear ontology first: before populating the graph,
define node types and relationship types. A poorly designed ontology is
extremely costly to change after the graph has been populated at scale.
Validate automatic extraction: LLMs make mistakes in extraction.
Implement a human review process for critical data, especially in the early
stages before you trust the extraction pipeline.
Use MERGE, not CREATE: in Neo4j, always use MERGE for entities
to avoid duplicates. CREATE always creates a new node even if one already exists.
Index search properties: create Neo4j indexes on properties
used in WHERE clauses (e.g. name, date). Without indexes, queries on large
graphs become unacceptably slow.
Version your ontology: as your domain evolves, relationship
types and entity schemas will change. Use property versioning or schema migration
scripts, similar to database migrations in relational DBs.
Anti-Patterns to Avoid
KG as a complete replacement for RAG: GraphRAG is powerful
but has high setup costs. For purely semantic questions, traditional RAG
is often better and cheaper. Use GraphRAG selectively where it adds value.
Overly generic relationships: a "RELATED_TO" relationship
carries no information. Relationships must be semantically precise
(FOUNDED, WORKS_AT, COMPETES_WITH, ACQUIRED_BY).
No update strategy: a static KG becomes stale quickly.
Define from the beginning how and how often the graph will be updated,
and implement incremental update pipelines.
Unvalidated LLM-generated Cypher queries: LLM-generated
Cypher can be dangerous (accidental deletion, performance issues).
Use parameterized query templates where possible, and add query validation
before execution in production.
Neglecting graph schema documentation: without clear
documentation of node types, relationship types, and property meanings,
the KG becomes unmaintainable as the team grows.
7.1 GraphRAG vs Standard RAG: When to Choose
Decision Framework
Factor
Choose Standard RAG
Choose GraphRAG
Query types
Mostly conceptual / procedural
Relational, multi-hop, aggregation
Data structure
Unstructured documents
Entities with defined relationships
Update frequency
Infrequent, batch updates
Frequent, incremental updates
Team expertise
Python, ML basics
Graph databases, Cypher, ontology design
Setup time
Days to weeks
Weeks to months
Hallucination risk
Moderate
Lower (structured facts)
Conclusions
Knowledge graphs bring something that traditional RAG systems cannot offer:
structured, relational, and verifiable knowledge. GraphRAG
combines the best of both worlds: the flexibility of semantic retrieval with
the precision of structured graph reasoning. We explored Neo4j, automatic KG
extraction with LLMs, GraphRAG with LangChain, and enriching traditional RAG
with graph-based query expansion.
The key takeaways from this article:
KGs represent knowledge as entities and relationships: explicit and queryable structure
LLMs enable automatic KG extraction from unstructured text at scale
GraphRAG significantly outperforms classic RAG for relational and multi-hop questions
KG query expansion improves recall in traditional RAG pipelines
Start with a simple ontology and expand iteratively based on real query patterns
Use MERGE over CREATE, define indexes, and validate LLM-generated Cypher queries
This article concludes the AI Engineering and Advanced RAG series.
We have traveled the full stack: from RAG fundamentals, through embeddings, vector
databases, hybrid retrieval, context window management, multi-agent systems, prompt
engineering in production, and finally knowledge graphs. The field is evolving
rapidly — continue following the blog for updates as new techniques emerge.