Vector Database - Startup Super School

What Is a Vector Database?

A vector database is a database built specifically to store and search vector embeddings at scale. Unlike a traditional database that finds rows matching exact values, a vector database finds vectors that are most similar to a query vector - a process called approximate nearest neighbor (ANN) search.

When you embed a user’s question and want to find the 5 most relevant documents in your knowledge base, a vector database returns them in milliseconds - even across millions of stored vectors.

Why Traditional Databases Fall Short

You could technically store embeddings in PostgreSQL as arrays, but a full similarity scan across millions of 1,536-dimensional vectors would be prohibitively slow. Vector databases use specialized indexing structures (like HNSW - Hierarchical Navigable Small World graphs) that make nearest-neighbor search fast enough for production use.

Popular Vector Databases

Database	Type	Best For
Pinecone	Managed cloud	Easiest to start, production-ready
Weaviate	Self-hosted / cloud	Open-source, flexible schema
Qdrant	Self-hosted / cloud	High performance, open-source
Chroma	Local / cloud	Local development and prototyping
pgvector	PostgreSQL extension	Teams already on Postgres
Milvus	Self-hosted / cloud	Large-scale enterprise deployments

For most early-stage startups, pgvector is the pragmatic choice: it adds vector search to an existing Postgres database with no additional infrastructure. Graduate to a dedicated vector DB when query latency becomes a bottleneck.

How Vector Databases Fit Into the AI Stack

A typical RAG pipeline uses a vector database as follows:

Ingest: Split documents into chunks → generate embeddings → store in vector DB
Query: Embed user’s question → query vector DB for top-k similar chunks → inject into LLM prompt

The vector database handles the retrieval step. Its performance directly impacts the quality of your AI product’s answers.

Key Considerations for Startups

Dimensionality: Your vector DB must match the dimensions of your embedding model (e.g., 1,536 for OpenAI’s small model)
Metadata filtering: Good vector DBs let you filter by metadata (e.g., “only search documents from this customer’s account”) before doing similarity search
Cost: Managed services like Pinecone start at ~$70/month for production tiers; pgvector is free if you’re already paying for Postgres

Key Takeaway

A vector database is essential infrastructure for any AI product that needs to retrieve relevant information from a large corpus - search engines, chatbots with custom knowledge, recommendation systems, and more. Start with pgvector if you’re already on Postgres, move to a dedicated solution when you need sub-10ms latency at scale.

Frequently Asked Questions

What is a vector database used for in AI?

A vector database stores and searches vector embeddings at scale. It's the infrastructure backbone for AI-powered semantic search, RAG (Retrieval-Augmented Generation) systems, and recommendation engines - enabling fast similarity search across millions of documents in milliseconds.

What is the best vector database for a startup?

For teams already using PostgreSQL, pgvector is the pragmatic starting point - it adds vector search with no additional infrastructure cost. For dedicated vector search at scale, Pinecone is the easiest managed option. Qdrant and Weaviate are strong open-source alternatives for teams that want to self-host.

How is a vector database different from a regular database?

A traditional database finds exact matches - rows where a column equals a specific value. A vector database finds approximate nearest neighbors - the most semantically similar items to a query vector. This is fundamentally different: it's optimized for 'find things like this' rather than 'find this exact thing'.

Do I need a vector database for RAG?

Not necessarily for prototyping. You can start with a simple in-memory similarity search or pgvector. A dedicated vector database becomes worthwhile when you have more than ~100,000 documents and need sub-100ms query latency in production.