Data Science Dojo & Weaviate

Intro to

Vector Search

Victoria Slocum

Machine Learning Engineer

search systems

run the world

Search systems are everywhere, 

from Google, to e-commerce sites, to internal documents

Traditional search

(keyword search)

Match exact terms between the query and the documents

Struggles with:

  • Capturing meaning
  • Conversational language
  • Typos and synonyms

 

basic vector search

knn (k nearest neighbors)

Calculates the exact distance between the query and the document vectors,

for every document

To find the most similar documents:

  1. Store document embeddings
  2. Convert query to vector embedding
  3. Calculate similarity score by measuring the distance between every document embedding and the query embedding
  4. Return documents with highest score

basic vector search

knn (k nearest neighbors)

Implementing basic vector search

in Python

basic Vector search is just* math

the weird ml part is all in the vector embeddings

So...

why do you even need a vector database?

why do you even need a vector database?

ANN

kNN

ANN algorithms

Trade exact precision for huge speed improvements

Examples of ANN methods:

  • trees – e.g. ANNOY,
  • proximity graphs - e.g. HNSW,
  • clustering - e.g. FAISS,
  • hashing - e.g. LSH

approximate nearest neighbor

HNSW

(Hierarchical Navigable Small Worlds)

Construct:

  • All vectors are on the bottom layer
  • Subsets of vectors on upper layers
  • Allows to "hop" to the right place quickly

Organizes vectors into a hierarchical, multi-layered graph structure, which allows for fast navigation through the dataset during search operations

HNSW

(Hierarchical Navigable Small Worlds)

In Weaviate:

  • Custom implementation
  • CRUD operations
  • Fast, scalable, and accurate
  • Tuneable settings

Organizes vectors into a hierarchical, multi-layered graph structure, which allows for fast navigation through the dataset during search operations

HNSW in Weaviate

HNSWlib

  • open-source library
  • lacking full database operations and functionality

Weaviate

  • open-source database
  • supports full CRUD operations and modularity functions

Vector search isn't limited to just text

multimodal vector search

multilingual vector search

Hybrid search

what does at scale mean?

The difference between...

The difference between...

So when we say we do billion-scale vector search, it's kinda a big deal

Weaviate is built for AI (at scale)

demo time

next webinar!

with Data Science Dojo

connect with me on LinkedIn!