Vectordb Cheatsheet

๐Ÿง  Vector Database Cheat Sheet

๐Ÿ“Œ What is a Vector Database?

A vector database stores and retrieves high-dimensional vectors, usually produced by machine learning models (called embeddings) that represent the meaning of text, images, or other data.

Instead of keyword matching, it uses vector similarity to answer:

โ€œWhich stored items are most similar to this one?โ€


๐Ÿ“ Key Concepts

๐Ÿ”ข Vector (Embedding)

  • A list of floats, e.g. [0.12, -0.88, ..., 768 values]
  • Captures meaning/essence of input
  • Generated by models like OpenAI, Gemini, HuggingFace

๐Ÿงฎ Similarity Measures

  • Cosine similarity: Angle between vectors (most common)
  • Dot product: Magnitude-based similarity
  • Euclidean distance: Physical distance in space

โšก Approximate Nearest Neighbor (ANN)

  • Efficient way to search large vector spaces
  • Algorithms: HNSW, IVF, PQ, etc.

๐Ÿ—‚๏ธ Vector DB Schema (Typical)

Field Type Description
id UUID Unique document ID
content text Original text
embedding vector(768/1536) Vector from embedding model
metadata JSONB Tags, source, date, etc.

DB Type Notes
Pinecone SaaS Scalable, easy to use, free tier
Weaviate Open/Cloud Hybrid (text + vector), built-in modules
Qdrant Open High performance, filterable metadata
Milvus Open Large-scale & GPU support
pgvector Postgres Vector extension for relational DB
FAISS Local Facebookโ€™s fast local vector engine

๐Ÿง  Use Cases

  • Semantic Search: Find documents with similar meaning
  • Chatbot Memory: Recall similar past conversations
  • Content Deduplication: Detect near-duplicate entries
  • Recommendation: Suggest similar items based on embeddings

๐Ÿ”„ Your n8n Flow Example

[Generate Tip] โ†’ [Embed Text]
              โ†“
     [Search in Supabase Vector DB]
              โ†“
[Too Similar?] โ†’ Yes โ†’ [Regenerate]
              โ†“
             No
              โ†“
     [Send to Telegram + Save to Supabase]

๐Ÿ”ง Supabase Vector Setup (SQL)

-- Create a table with pgvector
create extension if not exists vector;

create table documents (
  id uuid primary key default gen_random_uuid(),
  content text,
  embedding vector(768),
  metadata jsonb
);

-- Semantic search with cosine distance
select *
from documents
order by embedding <=> '[your_query_vector]'::vector
limit 5;

๐Ÿ“ฆ Embedding Models & Sizes

Model Vector Size
OpenAI text-embedding-ada-002 1536
Gemini Text Embedding 768
BERT/MPNet-based (HuggingFace) 384โ€“768

๐Ÿงช Vector Comparison Example (Cosine)

function cosineSimilarity(a, b) {
  const dot = a.reduce((sum, ai, i) => sum + ai * b[i], 0);
  const normA = Math.sqrt(a.reduce((sum, ai) => sum + ai ** 2, 0));
  const normB = Math.sqrt(b.reduce((sum, bi) => sum + bi ** 2, 0));
  return dot / (normA * normB);
}

๐Ÿง  Best Practices

  • Normalize text before embedding (trim, lowercase)
  • Store metadata for filtering (e.g., by date or topic)
  • Use cosine or <=> operator for similarity
  • Store recent vectors (e.g. 7โ€“30 days) for LLM short-term memory
  • Periodically clean up or archive old vectors