Enterprise Vector Databases: pgvector, Pinecone and Weaviate
The vector database market has exploded from a niche technology used only by advanced AI teams into mainstream infrastructure adopted by companies of all sizes. In 2025, the market is worth $2.65 billion and is projected to reach $8.9 billion by 2030 with a CAGR of 27.5%. The primary driver is straightforward: Large Language Models and RAG pipelines need to semantically search across billions of documents in milliseconds, and traditional relational databases simply are not designed for this task.
A vector database is not simply a database that "stores vectors": it is a system optimized for computing high-dimensional semantic similarity (typically 768-4096 dimensions) at massive scale, with queries returning the documents most similar to a natural language question. The difference from a SQL LIKE query or full-text index is profound: while keyword engines search for exact term matches, a vector database finds meaning, even when the words are entirely different.
Choosing the right vector database for an enterprise project is not straightforward. Available options in 2025 range from fully managed, zero-infrastructure solutions like Pinecone, to powerful open-source databases like Weaviate and Qdrant, to the pgvector extension that brings vector search directly into PostgreSQL. Each solution has distinct strengths and limitations. This article builds a concrete decision framework, with real code, cost benchmarks, and production-ready architectural patterns.
What You Will Learn
- What a vector database is and how it works internally (HNSW, IVF, PQ)
- Detailed comparison: Pinecone, Weaviate, Qdrant, Milvus, pgvector, ChromaDB
- Embedding models: OpenAI text-embedding-3, sentence-transformers, FastEmbed
- Similarity search and hybrid search implementation with real Python code
- Scaling from millions to billions of vectors: architectures and strategies
- Enterprise use cases: RAG, semantic search, recommendations, fraud detection
- Cost analysis: TCO managed vs self-hosted across different volumes
- Decision framework for choosing the right solution
The Data Warehouse, AI and Digital Transformation Series
| # | Article | Focus |
|---|---|---|
| 1 | Data Warehouse Evolution | From SQL Server to Data Lakehouse |
| 2 | Data Mesh Architecture | Domain ownership of data |
| 3 | Modern ETL vs ELT | dbt, Airbyte and Fivetran |
| 4 | Pipeline Orchestration | Airflow, Dagster and Prefect |
| 5 | AI in Manufacturing | Predictive Maintenance and Digital Twin |
| 6 | AI in Finance | Fraud Detection and Credit Scoring |
| 7 | AI in Retail | Demand Forecasting and Recommendations |
| 8 | AI in Healthcare | Diagnostics and Drug Discovery |
| 9 | AI in Logistics | Route Optimization and Warehouse Automation |
| 10 | Enterprise LLMs | RAG and AI Guardrails |
| 11 | You are here - Enterprise Vector Databases | pgvector, Pinecone and Weaviate |
| 12 | MLOps for Business | AI Models in Production with MLflow |
| 13 | Data Governance | Data Quality for Trustworthy AI |
| 14 | Data-Driven Roadmap | How SMBs Adopt AI and DWH |
What Is a Vector Database and How Does It Work
A vector database is a specialized storage system for saving, indexing, and querying high-dimensional vectors (embeddings). These vectors are numerical representations of unstructured data: text, images, audio, video, source code. Each embedding captures the "semantic meaning" of the original data in a mathematical space where similar elements are close to each other.
The core of every vector database is the Approximate Nearest Neighbor (ANN) algorithm: given a query vector, find the K nearest (most similar) vectors in the entire dataset. Computing the exact distance between a vector and all others (brute force) is computationally prohibitive for millions of vectors: with 10 million vectors at 1536 dimensions, exhaustive computation takes hundreds of milliseconds even on GPU. ANN algorithms trade a small percentage of recall (typically 1-5%) to reduce latency by 100-1000x.
Main Indexing Algorithms
The 3 Primary ANN Algorithms
| Algorithm | Type | Recall | Query Speed | Memory | Used By |
|---|---|---|---|---|---|
| HNSW | Graph-based | 95-99% | Very high | High | Pinecone, Weaviate, Qdrant, pgvector |
| IVF (+ PQ) | Cluster-based | 85-95% | High | Low (with PQ) | Milvus, FAISS |
| DiskANN | Graph on disk | 90-98% | Medium | Minimal (SSD) | Azure AI Search |
HNSW (Hierarchical Navigable Small World) is the dominant algorithm: it builds a multi-layer graph where connected nodes are close in vector space. Search starts at the highest level (few highly connected nodes), progressively descends finding closer nodes, until it reaches level 0 where the entire dataset lives. The result is latencies under 10ms even with tens of millions of vectors.
Product Quantization (PQ), often combined with IVF, compresses vectors reducing required memory by 4-32x at the cost of a slight recall decrease. It is the preferred technique when managing billions of vectors with limited hardware budget.
Similarity Metrics
The choice of distance metric depends on the embedding type and intended use:
# Similarity metrics in vector databases
# 1. Cosine Similarity (most common for text embeddings)
# Measures the angle between vectors, ignores magnitude
# Range: -1 (opposite) -> 0 (orthogonal) -> 1 (identical)
# Best for: text embeddings, OpenAI, sentence-transformers
# 2. Dot Product (Inner Product)
# Measures both angle and magnitude
# Faster than cosine if vectors are already normalized
# Best for: pre-normalized vectors, maximum inner product search
# 3. L2 (Euclidean Distance)
# Geometric distance in n-dimensional space
# Range: 0 (identical) -> infinity
# Best for: images, audio, numerical data
# Example with numpy to understand the differences
import numpy as np
def cosine_similarity(a, b):
return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))
def dot_product(a, b):
return np.dot(a, b)
def euclidean_distance(a, b):
return np.linalg.norm(a - b)
# Sample vectors (normalized embeddings)
v1 = np.array([0.1, 0.8, 0.3, 0.5])
v2 = np.array([0.2, 0.7, 0.4, 0.4]) # Semantically close
v3 = np.array([0.9, 0.1, 0.1, 0.1]) # Semantically distant
print(f"Cosine(v1,v2): {cosine_similarity(v1, v2):.4f}") # ~0.97
print(f"Cosine(v1,v3): {cosine_similarity(v1, v3):.4f}") # ~0.42
print(f"L2(v1,v2): {euclidean_distance(v1, v2):.4f}") # ~0.20
print(f"L2(v1,v3): {euclidean_distance(v1, v3):.4f}") # ~1.11
Comparing the Leading Enterprise Solutions
The vector database landscape in 2025 is rich and differentiated. Let us analyze the most widely adopted production solutions, focusing on enterprise features, scalability and costs.
Enterprise Vector Database Comparison 2025
| Solution | Type | Max Scale | Hybrid Search | Deployment | Cost/month (10M vectors) |
|---|---|---|---|---|---|
| Pinecone | Managed SaaS | Billions | Yes (sparse+dense) | Cloud only | ~$675 |
| Weaviate | Open-source / Cloud | Billions | Yes (BM25+vector) | Cloud / Self-hosted | ~$200 (infra) |
| Qdrant | Open-source / Cloud | Billions | Yes | Cloud / Self-hosted | ~$150 (infra) |
| Milvus / Zilliz | Open-source / Cloud | Tens of billions | Yes | Cloud / K8s | ~$300 (Zilliz Cloud) |
| pgvector | PostgreSQL extension | 10-100M | Yes (full-text+vector) | Same Postgres DB | ~$50-250 (Postgres host) |
| ChromaDB | Open-source | Millions (dev) | Limited | Local / Self-hosted | Free (own infra) |
Pinecone: Enterprise Managed with Zero Ops
Pinecone is the fully managed vector database par excellence. Its value proposition is simple: zero infrastructure to manage, enterprise SLA, predictable performance, and an intuitive API. It is the ideal choice for teams that want to move fast without a dedicated database DevOps.
Pinecone's strengths include: sub-millisecond latency on queries with configurable recall, support for sparse-dense hybrid search (combining exact keyword search with semantic search), namespaces for multi-tenant data isolation, and advanced metadata filtering. The Serverless version (2024) made pricing more accessible for variable workloads. The main limitation is cost: at high scale, Pinecone becomes significantly more expensive than self-hosted alternatives.
Weaviate: AI-Native with Advanced Hybrid Search
Weaviate distinguishes itself with an AI-native philosophy: the database internally manages data vectorization through integrated modules (text2vec-openai, text2vec-cohere, img2vec-neural), eliminating the need for external embedding pipelines. Its standout feature is native hybrid search that combines BM25 (keyword search) with vector search in a single query, with a configurable alpha parameter to balance the two approaches.
Weaviate is particularly well-suited for applications where semantic context and exact matching coexist: product search, enterprise knowledge bases, RAG systems with category or date filters. Its GraphQL-like API makes queries expressive and powerful.
Qdrant: High Performance with Advanced Filtering
Qdrant, written in Rust, has won the enterprise market through its combination of high performance and flexible payload filtering. Unlike other vector databases where metadata filters can significantly degrade performance, Qdrant applies filters during the ANN search phase, maintaining low latency even with complex filter conditions.
Official benchmarks show Qdrant at 41.47 QPS at 99% recall on 50 million vectors. It supports scalar and binary quantization to reduce memory usage, and on-disk mode to handle datasets that do not fit in RAM. It is the preferred choice for complex RAG pipelines where documents are filtered by metadata (date, author, category, confidentiality level).
Milvus: Extreme Scale with GPU Acceleration
Milvus is the reference solution for billion-scale and GPU acceleration. Born at Zilliz and donated to the CNCF, Milvus supports multiple ANN index types (HNSW, IVF, PQ, DISKANN) and can leverage NVIDIA GPUs to accelerate both index building and queries. The disaggregated architecture (storage separated from compute) enables independent horizontal scaling of both layers.
Milvus is ideal for use cases like global recommendation engines (billions of items), image search in e-commerce with massive catalogs, and fraud detection systems on massive transaction streams. Operational complexity is however significant: Kubernetes deployment, dependencies on etcd and Kafka, and a DevOps team with ML infrastructure experience.
pgvector: The Pragmatism of PostgreSQL
pgvector is the extension that brings vector search directly into PostgreSQL. Its value proposition is revolutionary for companies already using Postgres: zero additional infrastructure, natural joins between vector data and relational tables, ACID compliance, and all the familiarity of SQL. For workloads up to 10-100 million vectors, pgvector with HNSW indexing offers performance comparable to dedicated databases.
pgvector Scale Limitation
pgvector with HNSW indexing works well up to about 10-100 million vectors. Beyond this threshold, performance degrades significantly. If your use case requires hundreds of millions or billions of vectors, consider Qdrant, Weaviate or Milvus from the start: migrating later has high costs. For most SMBs, pgvector is sufficient and offers the lowest TCO.
Embedding Models: The Choice Matters
The quality of semantic search depends as much on the vector database as on the embedding model used. A vector is only as good as the model that generated it: choosing the wrong model compromises all results regardless of the database's efficiency.
Leading Embedding Models in 2025
| Model | Dimensions | Cost | Quality | Latency | Best For |
|---|---|---|---|---|---|
| OpenAI text-embedding-3-large | 3072 | $0.13/1M tokens | Excellent | API call | Enterprise RAG, maximum quality |
| OpenAI text-embedding-3-small | 1536 | $0.02/1M tokens | Very good | API call | Cost/quality balance |
| all-MiniLM-L6-v2 | 384 | Free (local) | Good | Very low | High volume, limited budget |
| BAAI/bge-large-en-v1.5 | 1024 | Free (local) | Excellent | Low (GPU) | Open-source OpenAI alternative |
| Cohere embed-v3 | 1024 | $0.10/1M tokens | Very good | API call | Multilingual, enterprise |
| FastEmbed (Qdrant) | 384-1024 | Free | Good-Very good | Very low | On-device, edge, real-time |
For multilingual enterprise contexts, Cohere embed-multilingual-v3 and multilingual-e5-large (Microsoft Research) offer superior quality for indexing documents in multiple languages including technical manuals, regulations and internal communications. Optimal embedding dimensions involve a trade-off: higher dimensions mean greater expressive capacity but also more memory and search latency.
Implementation: Similarity Search from Scratch
Let us build a complete semantic search system, from document loading to query, using Qdrant as the vector database and sentence-transformers for embeddings. This pattern is reusable for RAG, knowledge base search, and recommendation systems.
Qdrant Setup and Document Loading
# Install dependencies
# pip install qdrant-client sentence-transformers openai langchain
from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct
from sentence_transformers import SentenceTransformer
import uuid
# Initialize Qdrant client (local for development)
client = QdrantClient(":memory:") # In-memory for testing
# For production: QdrantClient(host="localhost", port=6333)
# For Qdrant Cloud: QdrantClient(url="https://xxx.cloud.qdrant.io", api_key="...")
# Embedding model
model = SentenceTransformer("all-MiniLM-L6-v2")
VECTOR_SIZE = 384 # Dimension of the chosen model
# Create collection
client.create_collection(
collection_name="knowledge_base",
vectors_config=VectorParams(
size=VECTOR_SIZE,
distance=Distance.COSINE, # Cosine similarity
# Options: COSINE, DOT, EUCLID
)
)
# Documents to index (example: enterprise technical documentation)
documents = [
{
"id": str(uuid.uuid4()),
"text": "The onboarding process requires 3 business days. "
"The candidate must bring a photo ID and social security number.",
"metadata": {
"department": "HR",
"category": "onboarding",
"language": "en",
"last_updated": "2025-01-15"
}
},
{
"id": str(uuid.uuid4()),
"text": "The annual budget for project ALPHA is $500,000. "
"Expenses must be approved by the CFO for amounts over $50,000.",
"metadata": {
"department": "Finance",
"category": "budget",
"language": "en",
"confidentiality": "internal"
}
},
{
"id": str(uuid.uuid4()),
"text": "Account passwords must be at least 12 characters long, "
"including uppercase, lowercase, numbers and special characters.",
"metadata": {
"department": "IT",
"category": "security",
"language": "en"
}
},
]
# Generate embeddings and upload
def index_documents(documents: list[dict]) -> None:
texts = [doc["text"] for doc in documents]
embeddings = model.encode(texts, batch_size=32, show_progress_bar=True)
points = [
PointStruct(
id=doc["id"],
vector=embedding.tolist(),
payload=doc["metadata"] | {"text": doc["text"]}
)
for doc, embedding in zip(documents, embeddings)
]
client.upsert(
collection_name="knowledge_base",
points=points,
wait=True # Wait for confirmation before proceeding
)
print(f"Indexed {len(points)} documents")
index_documents(documents)
# Verify
collection_info = client.get_collection("knowledge_base")
print(f"Total vectors: {collection_info.points_count}")
Search Query with Filters
from qdrant_client.models import Filter, FieldCondition, MatchValue, Range
def search_knowledge_base(
query: str,
top_k: int = 5,
department: str | None = None,
score_threshold: float = 0.7
) -> list[dict]:
"""
Semantic search in the enterprise knowledge base.
Supports filtering by department and relevance threshold.
"""
# Generate query embedding
query_vector = model.encode(query).tolist()
# Build optional filter
query_filter = None
if department:
query_filter = Filter(
must=[
FieldCondition(
key="department",
match=MatchValue(value=department)
)
]
)
# Vector search with metadata filter
results = client.search(
collection_name="knowledge_base",
query_vector=query_vector,
query_filter=query_filter,
limit=top_k,
score_threshold=score_threshold,
with_payload=True,
with_vectors=False # Do not return vectors to save bandwidth
)
return [
{
"id": hit.id,
"text": hit.payload.get("text", ""),
"metadata": {k: v for k, v in hit.payload.items() if k != "text"},
"score": hit.score
}
for hit in results
]
# Example queries
print("=== Generic search ===")
results = search_knowledge_base("How does hiring a new employee work?")
for r in results:
print(f"Score: {r['score']:.3f} | {r['text'][:80]}...")
print("\n=== Department-filtered search ===")
results = search_knowledge_base(
"What are the password security requirements?",
department="IT",
top_k=3
)
for r in results:
print(f"Score: {r['score']:.3f} | Dept: {r['metadata']['department']}")
print(f" {r['text'][:100]}...")
Hybrid Search: Semantics + Keywords in One Query
Pure semantic search has a critical limitation for enterprise applications: it fails on queries with domain-specific terms (product codes, proper names, acronyms, contract numbers) that do not appear in the embedding model's training data. A user searching for "contract ALPHA-2024-001" does not want semantically similar results like "commercial agreement": they want that specific contract.
Hybrid search solves this problem by combining vector similarity search with BM25 (Best Match 25), the standard algorithm for full-text search. The result is a system that understands both meaning (vector) and exact words (keyword), with an alpha parameter controlling the balance between the two approaches.
Hybrid Search with Weaviate
import weaviate
import weaviate.classes as wvc
# Connect to Weaviate (local or cloud)
client = weaviate.connect_to_local()
# For Weaviate Cloud:
# client = weaviate.connect_to_weaviate_cloud(
# cluster_url="https://xxx.weaviate.network",
# auth_credentials=wvc.init.Auth.api_key("YOUR_API_KEY"),
# )
# Create schema with integrated vectorization module
documents = client.collections.create(
name="CompanyDocuments",
vectorizer_config=wvc.config.Configure.Vectorizer.text2vec_openai(
model="text-embedding-3-small"
),
# Weaviate automatically handles embedding generation!
properties=[
wvc.config.Property(name="content", data_type=wvc.config.DataType.TEXT),
wvc.config.Property(name="title", data_type=wvc.config.DataType.TEXT),
wvc.config.Property(name="department", data_type=wvc.config.DataType.TEXT),
wvc.config.Property(name="doc_id", data_type=wvc.config.DataType.TEXT),
wvc.config.Property(name="date", data_type=wvc.config.DataType.DATE),
]
)
# Insert documents (Weaviate generates embeddings automatically)
with documents.batch.dynamic() as batch:
batch.add_object({
"doc_id": "PROC-2025-001",
"title": "Procurement Procedure ALPHA-2024-001",
"content": "The procurement procedure for contract ALPHA-2024-001 requires "
"approval from the procurement manager and CFO for amounts over $100,000. "
"Suppliers must be registered in the approved vendor list.",
"department": "Procurement",
"date": "2025-01-01T00:00:00Z"
})
batch.add_object({
"doc_id": "SEC-2025-042",
"title": "IT Security Policy Revision 2025",
"content": "All systems must implement two-factor authentication. "
"Passwords must be changed every 90 days. "
"Access to critical systems is recorded with audit logs.",
"department": "IT Security",
"date": "2025-02-01T00:00:00Z"
})
# HYBRID SEARCH: combines keyword + semantic
# alpha=0.0 -> pure keyword search (BM25)
# alpha=1.0 -> pure semantic search (vector)
# alpha=0.5 -> 50/50 balance (recommended default)
results = documents.query.hybrid(
query="procurement contract ALPHA-2024-001 approval",
alpha=0.5, # Keyword/semantic balance
limit=5,
return_metadata=wvc.query.MetadataQuery(score=True, explain_score=True)
)
for obj in results.objects:
print(f"Score: {obj.metadata.score:.4f}")
print(f"Doc ID: {obj.properties['doc_id']}")
print(f"Title: {obj.properties['title']}")
print(f"Explain: {obj.metadata.explain_score}")
print("---")
# HYBRID SEARCH with department filter
from weaviate.classes.query import Filter
results_filtered = documents.query.hybrid(
query="security policy password",
alpha=0.6,
filters=Filter.by_property("department").equal("IT Security"),
limit=3
)
client.close()
When to Use Hybrid Search
- Enterprise document search: contracts, procedures, regulations with specific codes
- E-commerce search: product search with SKU codes and semantic descriptions
- IT knowledge base: tickets, bug reports with IDs and natural language descriptions
- Legal/compliance search: exact regulatory references + semantic context
- Customer support RAG: combination of ticket numbers and problem descriptions
Scaling from Millions to Billions of Vectors
Managing large volumes of vectors requires specific architectural strategies. Choosing the right database is not enough: the entire pipeline must be designed with scalability in mind from the start.
Partitioning and Namespacing Strategies
For multi-tenant applications or data of very different natures, logical and physical partitioning of vectors improves performance and simplifies security management. Pinecone uses namespaces, Weaviate uses separate classes, Qdrant supports multiple collections and payload filtering.
# Multi-tenant partitioning strategy with Qdrant
from qdrant_client import QdrantClient
from qdrant_client.models import (
Distance, VectorParams, PointStruct,
Filter, FieldCondition, MatchValue,
ScalarQuantization, ScalarQuantizationConfig, ScalarType
)
client = QdrantClient(host="localhost", port=6333)
# Collection with scalar quantization to reduce memory by 4x
client.create_collection(
collection_name="enterprise_docs",
vectors_config=VectorParams(
size=1536,
distance=Distance.COSINE,
),
# Quantization: reduces memory by 75% with ~1-2% recall loss
quantization_config=ScalarQuantization(
scalar=ScalarQuantizationConfig(
type=ScalarType.INT8, # From float32 to int8 = 4x compression
quantile=0.99, # Preserve 99% of distribution
always_ram=True # Keep quantized in RAM
)
),
# Sharding for horizontal scaling
shard_number=4, # 4 shards distributed across nodes
replication_factor=2, # 2 replicas for HA
)
def upload_tenant_documents(
tenant_id: str,
documents: list[dict],
embeddings: list[list[float]]
) -> None:
"""
Upload documents with tenant_id in payload for logical isolation.
More efficient than separate collections for many tenants.
"""
points = [
PointStruct(
id=doc["id"],
vector=emb,
payload={
"tenant_id": tenant_id, # Key for multi-tenant filter
"text": doc["text"],
"created_at": doc.get("created_at"),
"doc_type": doc.get("doc_type", "general"),
}
)
for doc, emb in zip(documents, embeddings)
]
client.upsert(
collection_name="enterprise_docs",
points=points,
wait=False # Async for fast batch upload
)
def search_tenant(
tenant_id: str,
query_vector: list[float],
top_k: int = 5,
doc_type: str | None = None
) -> list:
"""
Search with mandatory tenant_id filter.
Without this filter, a tenant would see other tenants' documents.
"""
must_conditions = [
FieldCondition(key="tenant_id", match=MatchValue(value=tenant_id))
]
if doc_type:
must_conditions.append(
FieldCondition(key="doc_type", match=MatchValue(value=doc_type))
)
return client.search(
collection_name="enterprise_docs",
query_vector=query_vector,
query_filter=Filter(must=must_conditions),
limit=top_k,
with_payload=True
)
Enterprise Use Cases: Real-World Applications
1. RAG for Knowledge Management
The most widespread use case for enterprise vector databases in 2025: RAG systems that allow LLMs to answer company questions based on internal documents. Documented results include 40-60% reduction in information search time, 35% improvement in customer service response quality, and 50% faster onboarding of new employees.
The vector database in a RAG system acts as long-term memory: it converts thousands of documents into embeddings during ingestion, and at runtime retrieves the top-K most relevant fragments for the user's question. These fragments are then included in the LLM context to generate an accurate, citable response. For more details on enterprise RAG architecture, see the previous article in the series: Enterprise LLMs: RAG, Fine-Tuning and AI Guardrails.
2. Semantic Search for E-Commerce
Semantic search in product catalogs is one of the use cases with the most measurable ROI. Companies like Shopify and Zalando report 15-25% conversion rate increases after introducing vector search compared to traditional keyword search. A user searching for "comfortable shoes for long walks" finds relevant results even if no product in the catalog uses exactly those words.
3. Real-Time Fraud Detection
In the finance sector, vector databases are used to detect fraud patterns similar to previous transactions. Each transaction is converted into a vector capturing features like amount, merchant, geolocation, time, recent frequency, and the system retrieves the N most similar transactions from the historical database. If the current transaction resembles known fraud, it gets flagged for review.
4. Recommendation Engine
Vector-based collaborative filtering outperforms traditional sparse matrix similarity methods. User embeddings capture latent preferences; finding the most similar users (user-based CF) or items (item-based CF) in vector space returns more accurate recommendations with latency under 10ms.
Vector Database ROI by Enterprise Use Case
| Use Case | Improved Metric | Typical Improvement | Time-to-Value |
|---|---|---|---|
| RAG / Knowledge Base | Information search time | -40-60% | 4-8 weeks |
| E-commerce Search | Conversion Rate | +15-25% | 6-12 weeks |
| Customer Support RAG | First Contact Resolution | +30-40% | 8-16 weeks |
| Fraud Detection | Fraud Precision/Recall | +20-30% | 12-20 weeks |
| Recommendation Engine | Click-through Rate | +10-20% | 8-16 weeks |
Cost Analysis: Managed vs Self-Hosted
The choice between managed and self-hosted solutions depends on data volume, query count, the team's DevOps skills, and time horizon. The rule of thumb: for fewer than 5 million vectors and a team without ML DevOps, managed solutions are competitive. Beyond 50 million vectors with intensive queries, self-hosted almost always becomes more economical.
TCO Comparison: Managed vs Self-Hosted (100M vectors, 10K queries/day)
| Solution | Infra Cost/month | Ops Cost/month | Total/month | Notes |
|---|---|---|---|---|
| Pinecone Enterprise | $2,000-5,000 | $0 | $2,000-5,000 | Zero ops, guaranteed SLA |
| Weaviate Cloud | $800-2,000 | $200 | $1,000-2,200 | Minimal ops |
| Qdrant Cloud | $600-1,500 | $200 | $800-1,700 | Minimal ops |
| Qdrant Self-hosted (K8s) | $300-800 | $800 | $1,100-1,600 | Requires DevOps |
| pgvector (RDS Postgres) | $200-500 | $100 | $300-600 | Only up to 100M vectors |
| Milvus / Zilliz Cloud | $1,000-3,000 | $0-500 | $1,000-3,500 | Scales to billions |
Hidden Costs to Consider
In TCO calculations, do not forget embedding costs: with OpenAI text-embedding-3-small at $0.02 per million tokens, indexing 10 million documents of 500 tokens each costs about $100. But every re-indexing (model update, schema change) doubles the cost. Open-source models like sentence-transformers eliminate this cost but require dedicated GPU or compute, typically $200-500/month to serve embeddings in real-time at 100+ req/sec.
Decision Framework: Choosing the Right Vector Database
Decision Tree for Vector Database Selection
| Criterion | If... | Then |
|---|---|---|
| Already using PostgreSQL | Dataset < 50M vectors, small team | pgvector (zero additional infra) |
| Extreme scale | Billions of vectors, GPU acceleration | Milvus / Zilliz Cloud |
| Zero ops, speed-to-market | Team without ML DevOps, fast MVP | Pinecone Serverless |
| Hybrid search critical | Documents with specific codes + semantics | Weaviate (native BM25 + vector) |
| Complex filtering | Multi-tenant, rich metadata, GDPR isolation | Qdrant (filtering during ANN) |
| Limited budget, open-source | SMB, internal project, proof-of-concept | ChromaDB (dev) or Qdrant (prod) |
| Data sovereignty / on-premise | Sensitive data, strict compliance, no cloud | Qdrant or Weaviate self-hosted |
Integration with the Broader Data Stack
Vector databases do not operate in isolation: they are part of broader data pipelines that include ETL/ELT (see article 3 of the series), orchestration (article 4) and LLM systems (article 10). The choice of vector database must account for available native integrations:
- LangChain / LlamaIndex: All major vector databases have native integrations
- dbt + pgvector: Generate embeddings as a dbt transformation in PostgreSQL
- Spark + Milvus: Batch indexing of Petabyte-scale datasets
- Kafka + Qdrant: Real-time embedding updates from event streams
- MLflow + any vector DB: Versioning of embedding models and indexes
Cross-Link: Related Series
- AI Engineering / RAG: Advanced RAG architectures with re-ranking and query expansion (AI Engineering Series)
- PostgreSQL AI: pgvector in depth, HNSW vs IVFFlat, query optimization (PostgreSQL AI Series)
- MLOps: Versioning embedding models and quality monitoring (article 12 of this series)
Conclusion
Vector databases have become fundamental infrastructure for any company looking to build enterprise AI applications in 2025. This is no longer an experimental technology: with a market worth $2.65 billion and 27.5% annual growth, it is a standard component of the modern data stack.
Choosing the right solution depends on specific context. pgvector is the ideal starting point for teams already using PostgreSQL: zero additional infrastructure, immediate ROI, sufficient for most SMBs. Qdrant and Weaviate cover the enterprise tier with excellent performance, advanced filtering, and hybrid search. Pinecone wins on operational simplicity when budget allows. Milvus is the choice for billion-scale operations.
But remember: the vector database is only one piece of the puzzle. The quality of embeddings, the RAG pipeline architecture, the document chunking strategy, and quality monitoring over time matter at least as much as the database choice. Start with a simple prototype using ChromaDB or pgvector, measure results, and scale toward more robust solutions when volumes demand it.
Next Steps
- Article 12: MLOps for Business: AI Models in Production with MLflow - Versioning and monitoring embedding models
- Article 10 (previous): Enterprise LLMs: RAG, Fine-Tuning and AI Guardrails - How to use vector databases in a complete RAG pipeline
- PostgreSQL AI Series: Advanced pgvector, HNSW tuning, query optimization
- AI Engineering Series: Advanced RAG with re-ranking, query expansion, evaluation







