Ciao! Sono

Federico Calò

Sviluppatore Software | Divulgatore Tecnico

Creo applicazioni web moderne e strumenti digitali personalizzati per aiutare le attività a crescere attraverso l'innovazione tecnologica. La mia passione è unire informatica ed economia per generare valore reale.

Contattami

Chi Sono

La mia passione per l'informatica è nata tra i banchi dell'Istituto Tecnico Commerciale di Maglie, dove ho scoperto il potere della programmazione e il fascino di creare soluzioni digitali. Fin da subito, ho capito che l'informatica non era solo codice, ma uno strumento straordinario per trasformare idee in realtà.

Durante gli studi superiori in Sistemi Informativi Aziendali, ho iniziato a intrecciare informatica ed economia, comprendendo come la tecnologia possa essere il motore della crescita per qualsiasi attività. Questa visione mi ha accompagnato all'Università degli Studi di Bari, dove ho conseguito la Laurea in Informatica, approfondendo le mie competenze tecniche e la mia passione per lo sviluppo software.

Oggi metto questa esperienza al servizio di imprese, professionisti e startup, creando soluzioni digitali su misura che automatizzano processi, ottimizzano risorse e aprono nuove opportunità di business. Perché la vera innovazione inizia quando la tecnologia incontra le esigenze reali delle persone.

Le Mie Competenze

Analisi Dati & Modelli Previsionali

Trasformo i dati in insights strategici con analisi approfondite e modelli predittivi per decisioni informate

Automazione Processi

Creo strumenti personalizzati che automatizzano operazioni ripetitive e liberano tempo per attività a valore aggiunto

Sistemi Custom

Sviluppo sistemi software su misura, dalle integrazioni tra piattaforme alle dashboard personalizzate

const federico = {
  nome: "Federico Calò",
  ruolo: "Sviluppatore Software",
  città: "Bari, Italia",
  missione: "Aiutare attraverso l'informatica",
  passioni: [
    "Codice Pulito",
    "Innovazione",
    "Crescita Continua"
  ]
};

La Mia Missione

Credo fermamente che l'informatica sia lo strumento più potente per trasformare le idee in realtà e migliorare la vita delle persone.

🚀

Democratizzare la Tecnologia

La mia missione è rendere l'informatica accessibile a tutti: dalle piccole imprese locali alle startup innovative, fino ai professionisti che vogliono digitalizzare la propria attività. Ogni realtà merita di sfruttare le potenzialità del digitale.

💡

Unire Informatica ed Economia

Non è solo questione di scrivere codice: è capire come la tecnologia possa generare valore reale. Intrecciando competenze informatiche e visione economica, aiuto le attività a crescere, ottimizzare processi e raggiungere nuovi traguardi di efficienza e redditività.

🎯

Creare Soluzioni su Misura

Ogni attività è unica, e così devono esserlo le soluzioni. Sviluppo strumenti personalizzati che rispondono alle esigenze specifiche di ciascun cliente, automatizzando processi ripetitivi e liberando tempo per ciò che conta davvero: far crescere il business.

Trasforma la Tua Attività con la Tecnologia

Dicembre 2024

Visualizza

Master SQL

RoadMap.sh

Novembre 2024

Visualizza

Oracle Certified Foundations Associate

Oracle

Ottobre 2024

Visualizza

People Leadership Credential

Connect

Settembre 2024

💻 Linguaggi & Tecnologie

☕Java

🐍Python

📜JavaScript

🅰️Angular

⚛️React

🔷TypeScript

🗄️SQL

🐘PHP

🎨CSS/SCSS

🔧Node.js

🐳Docker

🌿Git

💼

12/2024 - Presente

Custom Software Engineering Analyst

Accenture

Bari, Puglia, Italia · Ibrida Analisi e sviluppo di sistemi informatici attraverso l'utilizzo di Java e Quarkus in Health and Public Sector. Formazione continua su tecnologie moderne per la creazione di soluzioni software personalizzate ed efficienti e sugli agenti.

💼

06/2022 - 12/2024

Analista software e Back End Developer Associate Consultant

Links Management and Technology SpA

Esperienza nell'analisi di sistemi software as-is e flussi ETL utilizzando PowerCenter. Formazione completata su Spring Boot per lo sviluppo di applicazioni backend moderne e scalabili. Sviluppatore Backend specializzato in Spring Boot, con esperienza in progettazione di database, analisi, sviluppo e testing dei task assegnati.

💼

02/2021 - 10/2021

Programmatore software

Adesso.it (prima era WebScience srl)

Esperienza nell'analisi AS-IS e TO-BE, evoluzioni SEO ed evoluzioni website per migliorare le performance e l'engagement degli utenti.

🎓

2018 - 2025

Laurea in Informatica

Università degli Studi di Bari Aldo Moro

Bachelor's degree in Computer Science, focusing on software engineering, algorithms, and modern development practices.

📚

2013 - 2018

Diploma - Sistemi Informativi Aziendali

Istituto Tecnico Commerciale di Maglie

Technical diploma specializing in Business Information Systems, combining IT knowledge with business management.

Contattami

Hai un progetto in mente? Parliamone! Compila il form qui sotto e ti risponderò al più presto.

* Campi obbligatori. I tuoi dati saranno utilizzati solo per rispondere alla tua richiesta.

Enterprise Vector Databases: pgvector, Pinecone and Weaviate

The vector database market has exploded from a niche technology used only by advanced AI teams into mainstream infrastructure adopted by companies of all sizes. In 2025, the market is worth $2.65 billion and is projected to reach $8.9 billion by 2030 with a CAGR of 27.5%. The primary driver is straightforward: Large Language Models and RAG pipelines need to semantically search across billions of documents in milliseconds, and traditional relational databases simply are not designed for this task.

A vector database is not simply a database that "stores vectors": it is a system optimized for computing high-dimensional semantic similarity (typically 768-4096 dimensions) at massive scale, with queries returning the documents most similar to a natural language question. The difference from a SQL LIKE query or full-text index is profound: while keyword engines search for exact term matches, a vector database finds meaning, even when the words are entirely different.

Choosing the right vector database for an enterprise project is not straightforward. Available options in 2025 range from fully managed, zero-infrastructure solutions like Pinecone, to powerful open-source databases like Weaviate and Qdrant, to the pgvector extension that brings vector search directly into PostgreSQL. Each solution has distinct strengths and limitations. This article builds a concrete decision framework, with real code, cost benchmarks, and production-ready architectural patterns.

What You Will Learn

What a vector database is and how it works internally (HNSW, IVF, PQ)
Detailed comparison: Pinecone, Weaviate, Qdrant, Milvus, pgvector, ChromaDB
Embedding models: OpenAI text-embedding-3, sentence-transformers, FastEmbed
Similarity search and hybrid search implementation with real Python code
Scaling from millions to billions of vectors: architectures and strategies
Enterprise use cases: RAG, semantic search, recommendations, fraud detection
Cost analysis: TCO managed vs self-hosted across different volumes
Decision framework for choosing the right solution

The Data Warehouse, AI and Digital Transformation Series

#	Article	Focus
1	Data Warehouse Evolution	From SQL Server to Data Lakehouse
2	Data Mesh Architecture	Domain ownership of data
3	Modern ETL vs ELT	dbt, Airbyte and Fivetran
4	Pipeline Orchestration	Airflow, Dagster and Prefect
5	AI in Manufacturing	Predictive Maintenance and Digital Twin
6	AI in Finance	Fraud Detection and Credit Scoring
7	AI in Retail	Demand Forecasting and Recommendations
8	AI in Healthcare	Diagnostics and Drug Discovery
9	AI in Logistics	Route Optimization and Warehouse Automation
10	Enterprise LLMs	RAG and AI Guardrails
11	You are here - Enterprise Vector Databases	pgvector, Pinecone and Weaviate
12	MLOps for Business	AI Models in Production with MLflow
13	Data Governance	Data Quality for Trustworthy AI
14	Data-Driven Roadmap	How SMBs Adopt AI and DWH

What Is a Vector Database and How Does It Work

A vector database is a specialized storage system for saving, indexing, and querying high-dimensional vectors (embeddings). These vectors are numerical representations of unstructured data: text, images, audio, video, source code. Each embedding captures the "semantic meaning" of the original data in a mathematical space where similar elements are close to each other.

The core of every vector database is the Approximate Nearest Neighbor (ANN) algorithm: given a query vector, find the K nearest (most similar) vectors in the entire dataset. Computing the exact distance between a vector and all others (brute force) is computationally prohibitive for millions of vectors: with 10 million vectors at 1536 dimensions, exhaustive computation takes hundreds of milliseconds even on GPU. ANN algorithms trade a small percentage of recall (typically 1-5%) to reduce latency by 100-1000x.

Main Indexing Algorithms

      The 3 Primary ANN Algorithms
      
        
            Algorithm
            Type
            Recall
            Query Speed
            Memory
            Used By
          

        
            HNSW
            Graph-based
            95-99%
            Very high
            High
            Pinecone, Weaviate, Qdrant, pgvector
          

            IVF (+ PQ)
            Cluster-based
            85-95%
            High
            Low (with PQ)
            Milvus, FAISS
          

            DiskANN
            Graph on disk
            90-98%
            Medium
            Minimal (SSD)
            Azure AI Search
          

      
    

HNSW (Hierarchical Navigable Small World) is the dominant algorithm: it builds a multi-layer graph where connected nodes are close in vector space. Search starts at the highest level (few highly connected nodes), progressively descends finding closer nodes, until it reaches level 0 where the entire dataset lives. The result is latencies under 10ms even with tens of millions of vectors.

Product Quantization (PQ), often combined with IVF, compresses vectors reducing required memory by 4-32x at the cost of a slight recall decrease. It is the preferred technique when managing billions of vectors with limited hardware budget.

Similarity Metrics

The choice of distance metric depends on the embedding type and intended use:

# Similarity metrics in vector databases

# 1. Cosine Similarity (most common for text embeddings)
# Measures the angle between vectors, ignores magnitude
# Range: -1 (opposite) -> 0 (orthogonal) -> 1 (identical)
# Best for: text embeddings, OpenAI, sentence-transformers

# 2. Dot Product (Inner Product)
# Measures both angle and magnitude
# Faster than cosine if vectors are already normalized
# Best for: pre-normalized vectors, maximum inner product search

# 3. L2 (Euclidean Distance)
# Geometric distance in n-dimensional space
# Range: 0 (identical) -> infinity
# Best for: images, audio, numerical data

# Example with numpy to understand the differences
import numpy as np

def cosine_similarity(a, b):
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

def dot_product(a, b):
    return np.dot(a, b)

def euclidean_distance(a, b):
    return np.linalg.norm(a - b)

# Sample vectors (normalized embeddings)
v1 = np.array([0.1, 0.8, 0.3, 0.5])
v2 = np.array([0.2, 0.7, 0.4, 0.4])  # Semantically close
v3 = np.array([0.9, 0.1, 0.1, 0.1])  # Semantically distant

print(f"Cosine(v1,v2): {cosine_similarity(v1, v2):.4f}")   # ~0.97
print(f"Cosine(v1,v3): {cosine_similarity(v1, v3):.4f}")   # ~0.42
print(f"L2(v1,v2): {euclidean_distance(v1, v2):.4f}")     # ~0.20
print(f"L2(v1,v3): {euclidean_distance(v1, v3):.4f}")     # ~1.11

Comparing the Leading Enterprise Solutions

The vector database landscape in 2025 is rich and differentiated. Let us analyze the most widely adopted production solutions, focusing on enterprise features, scalability and costs.

      Enterprise Vector Database Comparison 2025
      
        
            Solution
            Type
            Max Scale
            Hybrid Search
            Deployment
            Cost/month (10M vectors)
          

        
            Pinecone
            Managed SaaS
            Billions
            Yes (sparse+dense)
            Cloud only
            ~$675
          

            Weaviate
            Open-source / Cloud
            Billions
            Yes (BM25+vector)
            Cloud / Self-hosted
            ~$200 (infra)
          

            Qdrant
            Open-source / Cloud
            Billions
            Yes
            Cloud / Self-hosted
            ~$150 (infra)
          

            Milvus / Zilliz
            Open-source / Cloud
            Tens of billions
            Yes
            Cloud / K8s
            ~$300 (Zilliz Cloud)
          

            pgvector
            PostgreSQL extension
            10-100M
            Yes (full-text+vector)
            Same Postgres DB
            ~$50-250 (Postgres host)
          

            ChromaDB
            Open-source
            Millions (dev)
            Limited
            Local / Self-hosted
            Free (own infra)
          

      
    

Pinecone: Enterprise Managed with Zero Ops

Pinecone is the fully managed vector database par excellence. Its value proposition is simple: zero infrastructure to manage, enterprise SLA, predictable performance, and an intuitive API. It is the ideal choice for teams that want to move fast without a dedicated database DevOps.

Pinecone's strengths include: sub-millisecond latency on queries with configurable recall, support for sparse-dense hybrid search (combining exact keyword search with semantic search), namespaces for multi-tenant data isolation, and advanced metadata filtering. The Serverless version (2024) made pricing more accessible for variable workloads. The main limitation is cost: at high scale, Pinecone becomes significantly more expensive than self-hosted alternatives.

Weaviate: AI-Native with Advanced Hybrid Search

Weaviate distinguishes itself with an AI-native philosophy: the database internally manages data vectorization through integrated modules (text2vec-openai, text2vec-cohere, img2vec-neural), eliminating the need for external embedding pipelines. Its standout feature is native hybrid search that combines BM25 (keyword search) with vector search in a single query, with a configurable alpha parameter to balance the two approaches.

Weaviate is particularly well-suited for applications where semantic context and exact matching coexist: product search, enterprise knowledge bases, RAG systems with category or date filters. Its GraphQL-like API makes queries expressive and powerful.

Qdrant: High Performance with Advanced Filtering

Qdrant, written in Rust, has won the enterprise market through its combination of high performance and flexible payload filtering. Unlike other vector databases where metadata filters can significantly degrade performance, Qdrant applies filters during the ANN search phase, maintaining low latency even with complex filter conditions.

Official benchmarks show Qdrant at 41.47 QPS at 99% recall on 50 million vectors. It supports scalar and binary quantization to reduce memory usage, and on-disk mode to handle datasets that do not fit in RAM. It is the preferred choice for complex RAG pipelines where documents are filtered by metadata (date, author, category, confidentiality level).

Milvus: Extreme Scale with GPU Acceleration

Milvus is the reference solution for billion-scale and GPU acceleration. Born at Zilliz and donated to the CNCF, Milvus supports multiple ANN index types (HNSW, IVF, PQ, DISKANN) and can leverage NVIDIA GPUs to accelerate both index building and queries. The disaggregated architecture (storage separated from compute) enables independent horizontal scaling of both layers.

Milvus is ideal for use cases like global recommendation engines (billions of items), image search in e-commerce with massive catalogs, and fraud detection systems on massive transaction streams. Operational complexity is however significant: Kubernetes deployment, dependencies on etcd and Kafka, and a DevOps team with ML infrastructure experience.

pgvector: The Pragmatism of PostgreSQL

pgvector is the extension that brings vector search directly into PostgreSQL. Its value proposition is revolutionary for companies already using Postgres: zero additional infrastructure, natural joins between vector data and relational tables, ACID compliance, and all the familiarity of SQL. For workloads up to 10-100 million vectors, pgvector with HNSW indexing offers performance comparable to dedicated databases.

pgvector Scale Limitation

pgvector with HNSW indexing works well up to about 10-100 million vectors. Beyond this threshold, performance degrades significantly. If your use case requires hundreds of millions or billions of vectors, consider Qdrant, Weaviate or Milvus from the start: migrating later has high costs. For most SMBs, pgvector is sufficient and offers the lowest TCO.

Embedding Models: The Choice Matters

The quality of semantic search depends as much on the vector database as on the embedding model used. A vector is only as good as the model that generated it: choosing the wrong model compromises all results regardless of the database's efficiency.

      Leading Embedding Models in 2025
      
        
            Model
            Dimensions
            Cost
            Quality
            Latency
            Best For
          

        
            OpenAI text-embedding-3-large
            3072
            $0.13/1M tokens
            Excellent
            API call
            Enterprise RAG, maximum quality
          

            OpenAI text-embedding-3-small
            1536
            $0.02/1M tokens
            Very good
            API call
            Cost/quality balance
          

            all-MiniLM-L6-v2
            384
            Free (local)
            Good
            Very low
            High volume, limited budget
          

            BAAI/bge-large-en-v1.5
            1024
            Free (local)
            Excellent
            Low (GPU)
            Open-source OpenAI alternative
          

            Cohere embed-v3
            1024
            $0.10/1M tokens
            Very good
            API call
            Multilingual, enterprise
          

            FastEmbed (Qdrant)
            384-1024
            Free
            Good-Very good
            Very low
            On-device, edge, real-time
          

      
    

For multilingual enterprise contexts, Cohere embed-multilingual-v3 and multilingual-e5-large (Microsoft Research) offer superior quality for indexing documents in multiple languages including technical manuals, regulations and internal communications. Optimal embedding dimensions involve a trade-off: higher dimensions mean greater expressive capacity but also more memory and search latency.

Implementation: Similarity Search from Scratch

Let us build a complete semantic search system, from document loading to query, using Qdrant as the vector database and sentence-transformers for embeddings. This pattern is reusable for RAG, knowledge base search, and recommendation systems.

Qdrant Setup and Document Loading

# Install dependencies
# pip install qdrant-client sentence-transformers openai langchain

from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct
from sentence_transformers import SentenceTransformer
import uuid

# Initialize Qdrant client (local for development)
client = QdrantClient(":memory:")  # In-memory for testing
# For production: QdrantClient(host="localhost", port=6333)
# For Qdrant Cloud: QdrantClient(url="https://xxx.cloud.qdrant.io", api_key="...")

# Embedding model
model = SentenceTransformer("all-MiniLM-L6-v2")
VECTOR_SIZE = 384  # Dimension of the chosen model

# Create collection
client.create_collection(
    collection_name="knowledge_base",
    vectors_config=VectorParams(
        size=VECTOR_SIZE,
        distance=Distance.COSINE,    # Cosine similarity
        # Options: COSINE, DOT, EUCLID
    )
)

# Documents to index (example: enterprise technical documentation)
documents = [
    {
        "id": str(uuid.uuid4()),
        "text": "The onboarding process requires 3 business days. "
                "The candidate must bring a photo ID and social security number.",
        "metadata": {
            "department": "HR",
            "category": "onboarding",
            "language": "en",
            "last_updated": "2025-01-15"
        }
    },
    {
        "id": str(uuid.uuid4()),
        "text": "The annual budget for project ALPHA is $500,000. "
                "Expenses must be approved by the CFO for amounts over $50,000.",
        "metadata": {
            "department": "Finance",
            "category": "budget",
            "language": "en",
            "confidentiality": "internal"
        }
    },
    {
        "id": str(uuid.uuid4()),
        "text": "Account passwords must be at least 12 characters long, "
                "including uppercase, lowercase, numbers and special characters.",
        "metadata": {
            "department": "IT",
            "category": "security",
            "language": "en"
        }
    },
]

# Generate embeddings and upload
def index_documents(documents: list[dict]) -> None:
    texts = [doc["text"] for doc in documents]
    embeddings = model.encode(texts, batch_size=32, show_progress_bar=True)

    points = [
        PointStruct(
            id=doc["id"],
            vector=embedding.tolist(),
            payload=doc["metadata"] | {"text": doc["text"]}
        )
        for doc, embedding in zip(documents, embeddings)
    ]

    client.upsert(
        collection_name="knowledge_base",
        points=points,
        wait=True  # Wait for confirmation before proceeding
    )
    print(f"Indexed {len(points)} documents")

index_documents(documents)

# Verify
collection_info = client.get_collection("knowledge_base")
print(f"Total vectors: {collection_info.points_count}")

Search Query with Filters

from qdrant_client.models import Filter, FieldCondition, MatchValue, Range

def search_knowledge_base(
    query: str,
    top_k: int = 5,
    department: str | None = None,
    score_threshold: float = 0.7
) -> list[dict]:
    """
    Semantic search in the enterprise knowledge base.
    Supports filtering by department and relevance threshold.
    """
    # Generate query embedding
    query_vector = model.encode(query).tolist()

    # Build optional filter
    query_filter = None
    if department:
        query_filter = Filter(
            must=[
                FieldCondition(
                    key="department",
                    match=MatchValue(value=department)
                )
            ]
        )

    # Vector search with metadata filter
    results = client.search(
        collection_name="knowledge_base",
        query_vector=query_vector,
        query_filter=query_filter,
        limit=top_k,
        score_threshold=score_threshold,
        with_payload=True,
        with_vectors=False  # Do not return vectors to save bandwidth
    )

    return [
        {
            "id": hit.id,
            "text": hit.payload.get("text", ""),
            "metadata": {k: v for k, v in hit.payload.items() if k != "text"},
            "score": hit.score
        }
        for hit in results
    ]

# Example queries
print("=== Generic search ===")
results = search_knowledge_base("How does hiring a new employee work?")
for r in results:
    print(f"Score: {r['score']:.3f} | {r['text'][:80]}...")

print("\n=== Department-filtered search ===")
results = search_knowledge_base(
    "What are the password security requirements?",
    department="IT",
    top_k=3
)
for r in results:
    print(f"Score: {r['score']:.3f} | Dept: {r['metadata']['department']}")
    print(f"  {r['text'][:100]}...")

Hybrid Search: Semantics + Keywords in One Query

Pure semantic search has a critical limitation for enterprise applications: it fails on queries with domain-specific terms (product codes, proper names, acronyms, contract numbers) that do not appear in the embedding model's training data. A user searching for "contract ALPHA-2024-001" does not want semantically similar results like "commercial agreement": they want that specific contract.

Hybrid search solves this problem by combining vector similarity search with BM25 (Best Match 25), the standard algorithm for full-text search. The result is a system that understands both meaning (vector) and exact words (keyword), with an alpha parameter controlling the balance between the two approaches.

Hybrid Search with Weaviate

import weaviate
import weaviate.classes as wvc

# Connect to Weaviate (local or cloud)
client = weaviate.connect_to_local()
# For Weaviate Cloud:
# client = weaviate.connect_to_weaviate_cloud(
#     cluster_url="https://xxx.weaviate.network",
#     auth_credentials=wvc.init.Auth.api_key("YOUR_API_KEY"),
# )

# Create schema with integrated vectorization module
documents = client.collections.create(
    name="CompanyDocuments",
    vectorizer_config=wvc.config.Configure.Vectorizer.text2vec_openai(
        model="text-embedding-3-small"
    ),
    # Weaviate automatically handles embedding generation!
    properties=[
        wvc.config.Property(name="content", data_type=wvc.config.DataType.TEXT),
        wvc.config.Property(name="title", data_type=wvc.config.DataType.TEXT),
        wvc.config.Property(name="department", data_type=wvc.config.DataType.TEXT),
        wvc.config.Property(name="doc_id", data_type=wvc.config.DataType.TEXT),
        wvc.config.Property(name="date", data_type=wvc.config.DataType.DATE),
    ]
)

# Insert documents (Weaviate generates embeddings automatically)
with documents.batch.dynamic() as batch:
    batch.add_object({
        "doc_id": "PROC-2025-001",
        "title": "Procurement Procedure ALPHA-2024-001",
        "content": "The procurement procedure for contract ALPHA-2024-001 requires "
                   "approval from the procurement manager and CFO for amounts over $100,000. "
                   "Suppliers must be registered in the approved vendor list.",
        "department": "Procurement",
        "date": "2025-01-01T00:00:00Z"
    })
    batch.add_object({
        "doc_id": "SEC-2025-042",
        "title": "IT Security Policy Revision 2025",
        "content": "All systems must implement two-factor authentication. "
                   "Passwords must be changed every 90 days. "
                   "Access to critical systems is recorded with audit logs.",
        "department": "IT Security",
        "date": "2025-02-01T00:00:00Z"
    })

# HYBRID SEARCH: combines keyword + semantic
# alpha=0.0 -> pure keyword search (BM25)
# alpha=1.0 -> pure semantic search (vector)
# alpha=0.5 -> 50/50 balance (recommended default)
results = documents.query.hybrid(
    query="procurement contract ALPHA-2024-001 approval",
    alpha=0.5,          # Keyword/semantic balance
    limit=5,
    return_metadata=wvc.query.MetadataQuery(score=True, explain_score=True)
)

for obj in results.objects:
    print(f"Score: {obj.metadata.score:.4f}")
    print(f"Doc ID: {obj.properties['doc_id']}")
    print(f"Title: {obj.properties['title']}")
    print(f"Explain: {obj.metadata.explain_score}")
    print("---")

# HYBRID SEARCH with department filter
from weaviate.classes.query import Filter

results_filtered = documents.query.hybrid(
    query="security policy password",
    alpha=0.6,
    filters=Filter.by_property("department").equal("IT Security"),
    limit=3
)

client.close()

When to Use Hybrid Search

Enterprise document search: contracts, procedures, regulations with specific codes
E-commerce search: product search with SKU codes and semantic descriptions
IT knowledge base: tickets, bug reports with IDs and natural language descriptions
Legal/compliance search: exact regulatory references + semantic context
Customer support RAG: combination of ticket numbers and problem descriptions

Scaling from Millions to Billions of Vectors

Managing large volumes of vectors requires specific architectural strategies. Choosing the right database is not enough: the entire pipeline must be designed with scalability in mind from the start.

Partitioning and Namespacing Strategies

For multi-tenant applications or data of very different natures, logical and physical partitioning of vectors improves performance and simplifies security management. Pinecone uses namespaces, Weaviate uses separate classes, Qdrant supports multiple collections and payload filtering.

# Multi-tenant partitioning strategy with Qdrant
from qdrant_client import QdrantClient
from qdrant_client.models import (
    Distance, VectorParams, PointStruct,
    Filter, FieldCondition, MatchValue,
    ScalarQuantization, ScalarQuantizationConfig, ScalarType
)

client = QdrantClient(host="localhost", port=6333)

# Collection with scalar quantization to reduce memory by 4x
client.create_collection(
    collection_name="enterprise_docs",
    vectors_config=VectorParams(
        size=1536,
        distance=Distance.COSINE,
    ),
    # Quantization: reduces memory by 75% with ~1-2% recall loss
    quantization_config=ScalarQuantization(
        scalar=ScalarQuantizationConfig(
            type=ScalarType.INT8,    # From float32 to int8 = 4x compression
            quantile=0.99,           # Preserve 99% of distribution
            always_ram=True          # Keep quantized in RAM
        )
    ),
    # Sharding for horizontal scaling
    shard_number=4,           # 4 shards distributed across nodes
    replication_factor=2,     # 2 replicas for HA
)

def upload_tenant_documents(
    tenant_id: str,
    documents: list[dict],
    embeddings: list[list[float]]
) -> None:
    """
    Upload documents with tenant_id in payload for logical isolation.
    More efficient than separate collections for many tenants.
    """
    points = [
        PointStruct(
            id=doc["id"],
            vector=emb,
            payload={
                "tenant_id": tenant_id,  # Key for multi-tenant filter
                "text": doc["text"],
                "created_at": doc.get("created_at"),
                "doc_type": doc.get("doc_type", "general"),
            }
        )
        for doc, emb in zip(documents, embeddings)
    ]

    client.upsert(
        collection_name="enterprise_docs",
        points=points,
        wait=False  # Async for fast batch upload
    )

def search_tenant(
    tenant_id: str,
    query_vector: list[float],
    top_k: int = 5,
    doc_type: str | None = None
) -> list:
    """
    Search with mandatory tenant_id filter.
    Without this filter, a tenant would see other tenants' documents.
    """
    must_conditions = [
        FieldCondition(key="tenant_id", match=MatchValue(value=tenant_id))
    ]

    if doc_type:
        must_conditions.append(
            FieldCondition(key="doc_type", match=MatchValue(value=doc_type))
        )

    return client.search(
        collection_name="enterprise_docs",
        query_vector=query_vector,
        query_filter=Filter(must=must_conditions),
        limit=top_k,
        with_payload=True
    )

Enterprise Use Cases: Real-World Applications

1. RAG for Knowledge Management

The most widespread use case for enterprise vector databases in 2025: RAG systems that allow LLMs to answer company questions based on internal documents. Documented results include 40-60% reduction in information search time, 35% improvement in customer service response quality, and 50% faster onboarding of new employees.

The vector database in a RAG system acts as long-term memory: it converts thousands of documents into embeddings during ingestion, and at runtime retrieves the top-K most relevant fragments for the user's question. These fragments are then included in the LLM context to generate an accurate, citable response. For more details on enterprise RAG architecture, see the previous article in the series: Enterprise LLMs: RAG, Fine-Tuning and AI Guardrails.

2. Semantic Search for E-Commerce

Semantic search in product catalogs is one of the use cases with the most measurable ROI. Companies like Shopify and Zalando report 15-25% conversion rate increases after introducing vector search compared to traditional keyword search. A user searching for "comfortable shoes for long walks" finds relevant results even if no product in the catalog uses exactly those words.

3. Real-Time Fraud Detection

In the finance sector, vector databases are used to detect fraud patterns similar to previous transactions. Each transaction is converted into a vector capturing features like amount, merchant, geolocation, time, recent frequency, and the system retrieves the N most similar transactions from the historical database. If the current transaction resembles known fraud, it gets flagged for review.

4. Recommendation Engine

Vector-based collaborative filtering outperforms traditional sparse matrix similarity methods. User embeddings capture latent preferences; finding the most similar users (user-based CF) or items (item-based CF) in vector space returns more accurate recommendations with latency under 10ms.

      Vector Database ROI by Enterprise Use Case
      
        
            Use Case
            Improved Metric
            Typical Improvement
            Time-to-Value
          

        
            RAG / Knowledge Base
            Information search time
            -40-60%
            4-8 weeks
          

            E-commerce Search
            Conversion Rate
            +15-25%
            6-12 weeks
          

            Customer Support RAG
            First Contact Resolution
            +30-40%
            8-16 weeks
          

            Fraud Detection
            Fraud Precision/Recall
            +20-30%
            12-20 weeks
          

            Recommendation Engine
            Click-through Rate
            +10-20%
            8-16 weeks
          

      
    

Cost Analysis: Managed vs Self-Hosted

The choice between managed and self-hosted solutions depends on data volume, query count, the team's DevOps skills, and time horizon. The rule of thumb: for fewer than 5 million vectors and a team without ML DevOps, managed solutions are competitive. Beyond 50 million vectors with intensive queries, self-hosted almost always becomes more economical.

      TCO Comparison: Managed vs Self-Hosted (100M vectors, 10K queries/day)
      
        
            Solution
            Infra Cost/month
            Ops Cost/month
            Total/month
            Notes
          

        
            Pinecone Enterprise
            $2,000-5,000
            $0
            $2,000-5,000
            Zero ops, guaranteed SLA
          

            Weaviate Cloud
            $800-2,000
            $200
            $1,000-2,200
            Minimal ops
          

            Qdrant Cloud
            $600-1,500
            $200
            $800-1,700
            Minimal ops
          

            Qdrant Self-hosted (K8s)
            $300-800
            $800
            $1,100-1,600
            Requires DevOps
          

            pgvector (RDS Postgres)
            $200-500
            $100
            $300-600
            Only up to 100M vectors
          

            Milvus / Zilliz Cloud
            $1,000-3,000
            $0-500
            $1,000-3,500
            Scales to billions
          

      
    

Hidden Costs to Consider

In TCO calculations, do not forget embedding costs: with OpenAI text-embedding-3-small at $0.02 per million tokens, indexing 10 million documents of 500 tokens each costs about $100. But every re-indexing (model update, schema change) doubles the cost. Open-source models like sentence-transformers eliminate this cost but require dedicated GPU or compute, typically $200-500/month to serve embeddings in real-time at 100+ req/sec.

Decision Framework: Choosing the Right Vector Database

      Decision Tree for Vector Database Selection
      
            Criterion
            If...
            Then
          
            Already using PostgreSQL
            Dataset < 50M vectors, small team
            pgvector (zero additional infra)
          
            Extreme scale
            Billions of vectors, GPU acceleration
            Milvus / Zilliz Cloud
          
            Zero ops, speed-to-market
            Team without ML DevOps, fast MVP
            Pinecone Serverless
          
            Hybrid search critical
            Documents with specific codes + semantics
            Weaviate (native BM25 + vector)
          
            Complex filtering
            Multi-tenant, rich metadata, GDPR isolation
            Qdrant (filtering during ANN)
          
            Limited budget, open-source
            SMB, internal project, proof-of-concept
            ChromaDB (dev) or Qdrant (prod)
          
            Data sovereignty / on-premise
            Sensitive data, strict compliance, no cloud
            Qdrant or Weaviate self-hosted

Integration with the Broader Data Stack

Vector databases do not operate in isolation: they are part of broader data pipelines that include ETL/ELT (see article 3 of the series), orchestration (article 4) and LLM systems (article 10). The choice of vector database must account for available native integrations:

LangChain / LlamaIndex: All major vector databases have native integrations
dbt + pgvector: Generate embeddings as a dbt transformation in PostgreSQL
Spark + Milvus: Batch indexing of Petabyte-scale datasets
Kafka + Qdrant: Real-time embedding updates from event streams
MLflow + any vector DB: Versioning of embedding models and indexes

Cross-Link: Related Series

AI Engineering / RAG: Advanced RAG architectures with re-ranking and query expansion (AI Engineering Series)
PostgreSQL AI: pgvector in depth, HNSW vs IVFFlat, query optimization (PostgreSQL AI Series)
MLOps: Versioning embedding models and quality monitoring (article 12 of this series)

Conclusion

Vector databases have become fundamental infrastructure for any company looking to build enterprise AI applications in 2025. This is no longer an experimental technology: with a market worth $2.65 billion and 27.5% annual growth, it is a standard component of the modern data stack.

Choosing the right solution depends on specific context. pgvector is the ideal starting point for teams already using PostgreSQL: zero additional infrastructure, immediate ROI, sufficient for most SMBs. Qdrant and Weaviate cover the enterprise tier with excellent performance, advanced filtering, and hybrid search. Pinecone wins on operational simplicity when budget allows. Milvus is the choice for billion-scale operations.

But remember: the vector database is only one piece of the puzzle. The quality of embeddings, the RAG pipeline architecture, the document chunking strategy, and quality monitoring over time matter at least as much as the database choice. Start with a simple prototype using ChromaDB or pgvector, measure results, and scale toward more robust solutions when volumes demand it.

Next Steps

Article 12: MLOps for Business: AI Models in Production with MLflow - Versioning and monitoring embedding models
Article 10 (previous): Enterprise LLMs: RAG, Fine-Tuning and AI Guardrails - How to use vector databases in a complete RAG pipeline
PostgreSQL AI Series: Advanced pgvector, HNSW tuning, query optimization
AI Engineering Series: Advanced RAG with re-ranking, query expansion, evaluation

Algorithm	Type	Recall	Query Speed	Memory	Used By
HNSW	Graph-based	95-99%	Very high	High	Pinecone, Weaviate, Qdrant, pgvector
IVF (+ PQ)	Cluster-based	85-95%	High	Low (with PQ)	Milvus, FAISS
DiskANN	Graph on disk	90-98%	Medium	Minimal (SSD)	Azure AI Search

Solution	Type	Max Scale	Hybrid Search	Deployment	Cost/month (10M vectors)
Pinecone	Managed SaaS	Billions	Yes (sparse+dense)	Cloud only	~$675
Weaviate	Open-source / Cloud	Billions	Yes (BM25+vector)	Cloud / Self-hosted	~$200 (infra)
Qdrant	Open-source / Cloud	Billions	Yes	Cloud / Self-hosted	~$150 (infra)
Milvus / Zilliz	Open-source / Cloud	Tens of billions	Yes	Cloud / K8s	~$300 (Zilliz Cloud)
pgvector	PostgreSQL extension	10-100M	Yes (full-text+vector)	Same Postgres DB	~$50-250 (Postgres host)
ChromaDB	Open-source	Millions (dev)	Limited	Local / Self-hosted	Free (own infra)

Model	Dimensions	Cost	Quality	Latency	Best For
OpenAI text-embedding-3-large	3072	$0.13/1M tokens	Excellent	API call	Enterprise RAG, maximum quality
OpenAI text-embedding-3-small	1536	$0.02/1M tokens	Very good	API call	Cost/quality balance
all-MiniLM-L6-v2	384	Free (local)	Good	Very low	High volume, limited budget
BAAI/bge-large-en-v1.5	1024	Free (local)	Excellent	Low (GPU)	Open-source OpenAI alternative
Cohere embed-v3	1024	$0.10/1M tokens	Very good	API call	Multilingual, enterprise
FastEmbed (Qdrant)	384-1024	Free	Good-Very good	Very low	On-device, edge, real-time

Use Case	Improved Metric	Typical Improvement	Time-to-Value
RAG / Knowledge Base	Information search time	-40-60%	4-8 weeks
E-commerce Search	Conversion Rate	+15-25%	6-12 weeks
Customer Support RAG	First Contact Resolution	+30-40%	8-16 weeks
Fraud Detection	Fraud Precision/Recall	+20-30%	12-20 weeks
Recommendation Engine	Click-through Rate	+10-20%	8-16 weeks

Solution	Infra Cost/month	Ops Cost/month	Total/month	Notes
Pinecone Enterprise	$2,000-5,000	$0	$2,000-5,000	Zero ops, guaranteed SLA
Weaviate Cloud	$800-2,000	$200	$1,000-2,200	Minimal ops
Qdrant Cloud	$600-1,500	$200	$800-1,700	Minimal ops
Qdrant Self-hosted (K8s)	$300-800	$800	$1,100-1,600	Requires DevOps
pgvector (RDS Postgres)	$200-500	$100	$300-600	Only up to 100M vectors
Milvus / Zilliz Cloud	$1,000-3,000	$0-500	$1,000-3,500	Scales to billions

Criterion	If...	Then
Already using PostgreSQL	Dataset < 50M vectors, small team	pgvector (zero additional infra)
Extreme scale	Billions of vectors, GPU acceleration	Milvus / Zilliz Cloud
Zero ops, speed-to-market	Team without ML DevOps, fast MVP	Pinecone Serverless
Hybrid search critical	Documents with specific codes + semantics	Weaviate (native BM25 + vector)
Complex filtering	Multi-tenant, rich metadata, GDPR isolation	Qdrant (filtering during ANN)
Limited budget, open-source	SMB, internal project, proof-of-concept	ChromaDB (dev) or Qdrant (prod)
Data sovereignty / on-premise	Sensitive data, strict compliance, no cloud	Qdrant or Weaviate self-hosted