Skip to content

FE-611 | Add Vector Index Feature #21793

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 16 commits into
base: devel
Choose a base branch
from

Conversation

bluepal-nadeem-abdun
Copy link

@bluepal-nadeem-abdun bluepal-nadeem-abdun commented Jun 2, 2025

Scope & Purpose

  • ✨ Feature

Adds support for the new experimental vector index to the collection Indexes view in the core DB UI. The index appears as "Vector index (beta)" in the dropdown, and is only shown when the server is started with the --experimental-vector-index flag.

Checklist

  • Tests
    • Manually tested
  • 📖 CHANGELOG entry made

Related Information

cmyk47
cmyk47 previously requested changes Jun 2, 2025
@cmyk47 cmyk47 dismissed their stale review June 6, 2025 07:18

fixed

label: "Index Factory",
name: "params.factory",
type: "text",
tooltip: `Defines the FAISS index factory. Must start with "IVF". Example: IVF100_HNSW10,Flat. The number in IVF must match nLists (e.g. IVF100 → nLists = 100).`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
tooltip: `Defines the FAISS index factory. Must start with "IVF". Example: IVF100_HNSW10,Flat. The number in IVF must match nLists (e.g. IVF100 → nLists = 100).`
tooltip: `Defines the FAISS index factory. Must start with "IVF". Example: IVF100_HNSW10,Flat. The number following "IVF" must match nLists (e.g. IVF100 → nLists = 100).`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved the suggested change. Please have a look and let me know if further adjustments are needed.

label: "Default Number of Probes",
name: "params.defaultNProbe",
type: "number",
tooltip: "The number of inverted lists (clusters) to search during queries by default. Increasing this value improves recall at the cost of speed. Default is 1."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
tooltip: "The number of inverted lists (clusters) to search during queries by default. Increasing this value improves recall at the cost of speed. Default is 1."
tooltip: "The number of inverted lists (clusters) to search during queries by default. Increasing this value improves recall at the cost of speed. The default is 1."

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved the suggested change. Please have a look and let me know if further adjustments are needed.

name: "params.nLists",
type: "number",
isRequired: true,
tooltip: "The number of Voronoi cells (nLists) to partition the vector space into. A higher value improves recall but increases indexing time. The value must not exceed the number of documents. Suggested: sqrt(N) / 15, where N is the number of documents."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per the FAISS paper, it is 15-20x the square root of the number of documents. The resulting number can be higher than allowed (if you have <223 docs) but I think this is already described adequately.

Suggested change
tooltip: "The number of Voronoi cells (nLists) to partition the vector space into. A higher value improves recall but increases indexing time. The value must not exceed the number of documents. Suggested: sqrt(N) / 15, where N is the number of documents."
tooltip: "The number of Voronoi cells (nLists) to partition the vector space into. A higher value improves recall but increases indexing time. The value must not exceed the number of documents. Suggested: 15 * sqrt(N), where N is the number of documents."

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved the suggested change. Please have a look and let me know if further adjustments are needed.

label: "Training Iterations",
name: "params.trainingIterations",
type: "number",
tooltip: "The number of iterations to use during index training. More iterations improve cluster quality and accuracy, but increase training time. Default is 25."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
tooltip: "The number of iterations to use during index training. More iterations improve cluster quality and accuracy, but increase training time. Default is 25."
tooltip: "The number of iterations to use during index training. More iterations improve cluster quality and accuracy, but increase training time. The default is 25."

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved the suggested change. Please have a look and let me know if further adjustments are needed.

Copy link
Contributor

@Simran-B Simran-B left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The vector index supports the following options (not persisted / returned by the server), similar to other index types:

  • inBackground (see below)
  • parallelism: The number of threads to use for indexing. The default is 2.

The only other index with a parallelism option seems to be the inverted index but we don't display it in the UI - I'm not sure if this is a conscience decision or not. I believe we found that 2 is a bit faster than 1 but any larger number were performing worse than 2, so there is little point in changing it for the inverted index. It could be very different for the vector index, though.

I noticed that e.g. for the persistent index, we don't have a particularly useful tooltip for Create in background:

image

My suggestion is to change it everywhere to something like this:

Enable this option to keep the collection/shards available for write operations by not using an exclusive write lock for the duration of the index creation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants