๐ง Vector Database Cheat Sheet
๐ What is a Vector Database?
A vector database stores and retrieves high-dimensional vectors, usually produced by machine learning models (called embeddings) that represent the meaning of text, images, or other data.
Instead of keyword matching, it uses vector similarity to answer:
โWhich stored items are most similar to this one?โ
๐ Key Concepts
๐ข Vector (Embedding)
- A list of floats, e.g.
[0.12, -0.88, ..., 768 values]
- Captures meaning/essence of input
- Generated by models like OpenAI, Gemini, HuggingFace
๐งฎ Similarity Measures
- Cosine similarity: Angle between vectors (most common)
- Dot product: Magnitude-based similarity
- Euclidean distance: Physical distance in space
โก Approximate Nearest Neighbor (ANN)
- Efficient way to search large vector spaces
- Algorithms: HNSW, IVF, PQ, etc.
๐๏ธ Vector DB Schema (Typical)
Field | Type | Description |
---|---|---|
id |
UUID | Unique document ID |
content |
text | Original text |
embedding |
vector(768/1536) | Vector from embedding model |
metadata |
JSONB | Tags, source, date, etc. |
๐งฐ Popular Vector Databases
DB | Type | Notes |
---|---|---|
Pinecone | SaaS | Scalable, easy to use, free tier |
Weaviate | Open/Cloud | Hybrid (text + vector), built-in modules |
Qdrant | Open | High performance, filterable metadata |
Milvus | Open | Large-scale & GPU support |
pgvector | Postgres | Vector extension for relational DB |
FAISS | Local | Facebookโs fast local vector engine |
๐ง Use Cases
- Semantic Search: Find documents with similar meaning
- Chatbot Memory: Recall similar past conversations
- Content Deduplication: Detect near-duplicate entries
- Recommendation: Suggest similar items based on embeddings
๐ Your n8n Flow Example
[Generate Tip] โ [Embed Text]
โ
[Search in Supabase Vector DB]
โ
[Too Similar?] โ Yes โ [Regenerate]
โ
No
โ
[Send to Telegram + Save to Supabase]
๐ง Supabase Vector Setup (SQL)
-- Create a table with pgvector
create extension if not exists vector;
create table documents (
id uuid primary key default gen_random_uuid(),
content text,
embedding vector(768),
metadata jsonb
);
-- Semantic search with cosine distance
select *
from documents
order by embedding <=> '[your_query_vector]'::vector
limit 5;
๐ฆ Embedding Models & Sizes
Model | Vector Size |
---|---|
OpenAI text-embedding-ada-002 |
1536 |
Gemini Text Embedding | 768 |
BERT/MPNet-based (HuggingFace) | 384โ768 |
๐งช Vector Comparison Example (Cosine)
function cosineSimilarity(a, b) {
const dot = a.reduce((sum, ai, i) => sum + ai * b[i], 0);
const normA = Math.sqrt(a.reduce((sum, ai) => sum + ai ** 2, 0));
const normB = Math.sqrt(b.reduce((sum, bi) => sum + bi ** 2, 0));
return dot / (normA * normB);
}
๐ง Best Practices
- Normalize text before embedding (trim, lowercase)
- Store metadata for filtering (e.g., by date or topic)
- Use cosine or
<=>
operator for similarity - Store recent vectors (e.g. 7โ30 days) for LLM short-term memory
- Periodically clean up or archive old vectors