Building a Search Engine
Most production information retrieval systems are built on top of Lucene which uses BM25.
Current state of the art techniques utilize embeddings for retrieval. This workshop will cover common information retrieval concepts, what companies used in the past, and how new systems use embeddings.
Outline:
- Non deep learning based retrieval
- Embeddings and Vector Similarity Overview
- Serving Vector Similarity using Approximate Nearest Neighbors (ANN)
By the end of the session, a participant will be able to build a production information retrieval system leveraging Embeddings and Vector Similarity using ANN. This will allow participants to utilize state of the art technologies / techniques on top of the traditional information retrieval systems.