Text is meaningless to machines — until it becomes a vector. Explore the hidden geometry where words become coordinates, meaning becomes distance, and search becomes “what’s nearby?”
The most famous embedding discovery: you can do math with meaning. king - man + woman ≈ queen. The vector that points from “man” to “king” is roughly the same as the vector from “woman” to “queen” — both encode “royalty.”
Retrieval-Augmented Generation converts your query into a vector, then finds the closest document vectors. The top-k nearest results become context for the LLM.
An embedding model converts text into a fixed-length array of numbers (a vector). Each dimension captures some aspect of meaning.
"king" → [0.21, -0.45, 0.89, ...]
Real embeddings have 768–3072 dimensions. This demo uses a simplified hash-based projection into 2D so you can see the geometry.
king - man + woman = queen
Embedding spaces encode relationships as directions. The “gender” direction (man → woman) can be added to “king” to reach “queen.” This only works with real pre-trained vectors (GloVe or Transformers.js).
Two vectors are “similar” if they point in roughly the same direction. Cosine similarity measures this: 1.0 = identical direction, 0 = orthogonal, -1.0 = opposite.
Hash-based: Fast, deterministic, approximate. Good for seeing how the math works.
GloVe 50d: Real pre-trained word vectors from Stanford. Best for word analogies. Requires a one-time setup (generate-glove-json.js).
Transformers.js: Real neural embeddings (all-MiniLM-L6-v2, 384d) running in your browser via WebAssembly. Best for sentence-level similarity.