heading · body

YouTube

The Next Era of Semantic Search: Auto Embedding in Vector Search

MongoDB published 2026-02-03 added 2026-04-15 score 5/10
mongodb vector-search embeddings semantic-search databases ai-infrastructure
watch on youtube → view transcript

ELI5/TLDR

MongoDB now generates vector embeddings for you automatically. Instead of wiring up your own pipeline to convert text into numbers that capture meaning, you define a search index and MongoDB handles the rest — embedding your documents, keeping everything in sync, and letting you query with plain English. They bundled this with their new Voyage 4 embedding models, which they claim beat OpenAI and Google on benchmarks.

The Full Story

The Problem Everyone Was Quietly Hating

Semantic search — the kind where you type “fairy tale where a monster falls in love in a swamp” and get Shrek back — requires turning text into vectors. Think of vectors as coordinates in a high-dimensional space where similar meanings sit close together. The old workflow had two towers: an indexing pipeline that converts all your documents into vectors, and a query pipeline that converts each search into a vector and finds the nearest matches.

MongoDB already had vector search built into the database. But users still had to manage the embedding step themselves. That meant calling an external model API every time a document was inserted, handling retries and authentication, keeping embeddings in sync when documents changed, and building task queues and monitoring when you had millions of records. One slow embedding call could block your entire database write path.

Auto Embedding: Let the Database Do It

The fix is straightforward. MongoDB introduced a new index type called “auto embed.” You define it like any other search index — specify the model, the modality (text, for now), and the document field you want to search on. That is the entire setup. No orchestration code, no external pipeline.

On the indexing side, a designated leader process called mongot (MongoDB’s search daemon) scans the collection, batches documents, calls the Voyage embedding API, and writes the resulting vectors back to a reserved namespace in the database. Your normal writes are never blocked — embeddings happen asynchronously in an eventually consistent manner. When documents change, the embeddings update automatically.

On the query side, you pass plain text into MongoDB’s $vectorSearch operator. The system embeds your query, runs nearest-neighbor search against a Lucene index under the hood, and returns matching documents. You never touch a vector directly.

“The user just sees a plain text query and gets a list of source documents. This whole notion of calling an embedding model and vector indexes is just hidden from the user.”

The Voyage Models

MongoDB acquired a company called Voyage AI, which makes embedding models. The new Voyage 4 lineup ships with auto embedding on day zero:

  • Voyage 4 Large — the flagship. Uses a mixture-of-experts architecture (a first for production embedding models). Outperforms OpenAI’s v3 large by 14%, Cohere v4 by 8%, and Google Gemini Embedding 001 by 4% on the MTEB benchmark.
  • Voyage 4 — balanced accuracy and latency.
  • Voyage 4 Light — optimized for throughput and speed.
  • Voyage Code 3 — specialized for code search.

The interesting trick: you can use one model at indexing time and a different one at query time. During the demo, switching from Voyage 4 to Voyage 4 Large on the same “monster in a swamp” query bumped Shrek from the sixth result to the first. Model quality matters, and now you can A/B test it without re-indexing.

The Architecture Under the Hood

The system runs two processes: mongod (the database) and mongot (the search daemon). In a replica set, one mongot is designated as the leader responsible for calling the embedding API — this deduplicates API calls across the cluster. Vectors are persisted in mongod itself, so they survive restarts. Other mongot nodes pick up the vectors from the database and build their own local Lucene indexes.

For queries, any mongot can handle the request — it extracts the plain text, calls the embedding API once, searches its local Lucene index, and returns document IDs.

Practical Details

Auto embedding launched in public preview on MongoDB Community (self-managed). You need three things: mongod, mongot (now open-sourced), and a Voyage API key (available through MongoDB Atlas or voyage.com). It has day-zero integration with MongoDB’s MCP server, LangChain, LangGraph, and language drivers for Java, Go, and others.

A few current limitations: text-only for now (multimodal is coming), no built-in query caching (you can implement semantic caching yourself via LangChain), and the Voyage models support 80+ languages with cross-lingual search out of the box.

Key Takeaways

  • Auto embedding eliminates the embedding pipeline. Define an index, write plain text to MongoDB, query with plain text. No vectors in your application code.
  • Writes are never blocked by embedding generation. The system is eventually consistent — your inserts go through immediately, embeddings catch up asynchronously.
  • You can swap models at query time without re-indexing. Index with Voyage 4 Large, query with Voyage 4 Light for speed, or vice versa. This is a genuinely useful capability for iterating on search quality.
  • Voyage 4 Large is the first production embedding model using mixture-of-experts. Claims 14% improvement over OpenAI’s best on MTEB benchmarks.
  • Vectors are persisted in mongod, not just in the search process. This gives durability across restarts and consistency across replica sets.
  • One leader mongot handles all embedding API calls to deduplicate network requests across the cluster.
  • Embedding models are poor at structured data like dates. Use MQL pre-filters for numeric or date constraints alongside semantic search.
  • MongoDB Views let you create composite search fields. Combine title + plot (or any fields) into a single searchable unit.
  • Cross-lingual search works by default. Index in English, query in French. The models cover 80+ languages.

Claude’s Take

This is a well-executed product announcement dressed up as a technical talk. The core insight is real and useful: embedding pipelines are tedious infrastructure that most teams would rather not maintain. Pushing that complexity into the database layer is the obvious move, and MongoDB is arguably late to it — but they are doing it with their own models baked in, which is a differentiator.

The Voyage 4 benchmark numbers sound impressive, but benchmark claims from model providers should always be taken with a grain of salt. The MTEB leaderboard is real and respected, but 14% over OpenAI depends heavily on which tasks you average across. The demo showing Shrek jumping from sixth to first when switching models was a nice concrete illustration, though it is a cherry-picked example by definition.

What is genuinely interesting is the ability to swap query-time models without re-indexing. That is not something most vector search setups offer, and it makes experimentation much cheaper. The architecture — persisting vectors in mongod for durability, using a leader election pattern for API deduplication — is sensible engineering, not groundbreaking but solid.

The score is a 5. It is a product launch talk. Useful if you are already in the MongoDB ecosystem and want to know how auto embedding works, but there is no deep technical insight or novel thinking here. The information density is moderate, padded with the usual conference-talk pacing and a demo that could have been a two-minute clip.

Further Reading