-
Notifications
You must be signed in to change notification settings - Fork 854
FE-611 | Add Vector Index Feature #21793
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: devel
Are you sure you want to change the base?
FE-611 | Add Vector Index Feature #21793
Conversation
...ps/system/_admin/aardvark/APP/react/src/views/collections/indices/useSupportedIndexTypes.tsx
Outdated
Show resolved
Hide resolved
.../system/_admin/aardvark/APP/react/src/views/collections/indices/CollectionIndicesContext.tsx
Outdated
Show resolved
Hide resolved
...pps/system/_admin/aardvark/APP/react/src/views/collections/indices/addIndex/AddIndexForm.tsx
Outdated
Show resolved
Hide resolved
...pps/system/_admin/aardvark/APP/react/src/views/collections/indices/addIndex/AddIndexForm.tsx
Outdated
Show resolved
Hide resolved
...ps/system/_admin/aardvark/APP/react/src/views/collections/indices/useSupportedIndexTypes.tsx
Outdated
Show resolved
Hide resolved
…ture/FE-611/add-new-vector-index-ui
...ps/system/_admin/aardvark/APP/react/src/views/collections/indices/useSupportedIndexTypes.tsx
Show resolved
Hide resolved
…ture/FE-611/add-new-vector-index-ui
...ardvark/APP/react/src/views/collections/indices/addIndex/vectorIndex/useCreateVectorIndex.ts
Outdated
Show resolved
Hide resolved
...ardvark/APP/react/src/views/collections/indices/addIndex/vectorIndex/useCreateVectorIndex.ts
Outdated
Show resolved
Hide resolved
...ardvark/APP/react/src/views/collections/indices/addIndex/vectorIndex/useCreateVectorIndex.ts
Outdated
Show resolved
Hide resolved
...ardvark/APP/react/src/views/collections/indices/addIndex/vectorIndex/useCreateVectorIndex.ts
Show resolved
Hide resolved
...ardvark/APP/react/src/views/collections/indices/addIndex/vectorIndex/useCreateVectorIndex.ts
Outdated
Show resolved
Hide resolved
...ardvark/APP/react/src/views/collections/indices/addIndex/vectorIndex/useCreateVectorIndex.ts
Outdated
Show resolved
Hide resolved
…ture/FE-611/add-new-vector-index-ui
label: "Index Factory", | ||
name: "params.factory", | ||
type: "text", | ||
tooltip: `Defines the FAISS index factory. Must start with "IVF". Example: IVF100_HNSW10,Flat. The number in IVF must match nLists (e.g. IVF100 → nLists = 100).` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tooltip: `Defines the FAISS index factory. Must start with "IVF". Example: IVF100_HNSW10,Flat. The number in IVF must match nLists (e.g. IVF100 → nLists = 100).` | |
tooltip: `Defines the FAISS index factory. Must start with "IVF". Example: IVF100_HNSW10,Flat. The number following "IVF" must match nLists (e.g. IVF100 → nLists = 100).` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolved the suggested change. Please have a look and let me know if further adjustments are needed.
label: "Default Number of Probes", | ||
name: "params.defaultNProbe", | ||
type: "number", | ||
tooltip: "The number of inverted lists (clusters) to search during queries by default. Increasing this value improves recall at the cost of speed. Default is 1." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tooltip: "The number of inverted lists (clusters) to search during queries by default. Increasing this value improves recall at the cost of speed. Default is 1." | |
tooltip: "The number of inverted lists (clusters) to search during queries by default. Increasing this value improves recall at the cost of speed. The default is 1." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolved the suggested change. Please have a look and let me know if further adjustments are needed.
name: "params.nLists", | ||
type: "number", | ||
isRequired: true, | ||
tooltip: "The number of Voronoi cells (nLists) to partition the vector space into. A higher value improves recall but increases indexing time. The value must not exceed the number of documents. Suggested: sqrt(N) / 15, where N is the number of documents." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As per the FAISS paper, it is 15-20x the square root of the number of documents. The resulting number can be higher than allowed (if you have <223 docs) but I think this is already described adequately.
tooltip: "The number of Voronoi cells (nLists) to partition the vector space into. A higher value improves recall but increases indexing time. The value must not exceed the number of documents. Suggested: sqrt(N) / 15, where N is the number of documents." | |
tooltip: "The number of Voronoi cells (nLists) to partition the vector space into. A higher value improves recall but increases indexing time. The value must not exceed the number of documents. Suggested: 15 * sqrt(N), where N is the number of documents." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolved the suggested change. Please have a look and let me know if further adjustments are needed.
label: "Training Iterations", | ||
name: "params.trainingIterations", | ||
type: "number", | ||
tooltip: "The number of iterations to use during index training. More iterations improve cluster quality and accuracy, but increase training time. Default is 25." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tooltip: "The number of iterations to use during index training. More iterations improve cluster quality and accuracy, but increase training time. Default is 25." | |
tooltip: "The number of iterations to use during index training. More iterations improve cluster quality and accuracy, but increase training time. The default is 25." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolved the suggested change. Please have a look and let me know if further adjustments are needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The vector index supports the following options (not persisted / returned by the server), similar to other index types:
inBackground
(see below)parallelism
: The number of threads to use for indexing. The default is2
.
The only other index with a parallelism
option seems to be the inverted index but we don't display it in the UI - I'm not sure if this is a conscience decision or not. I believe we found that 2 is a bit faster than 1 but any larger number were performing worse than 2, so there is little point in changing it for the inverted index. It could be very different for the vector index, though.
I noticed that e.g. for the persistent index, we don't have a particularly useful tooltip for Create in background:
My suggestion is to change it everywhere to something like this:
Enable this option to keep the collection/shards available for write operations by not using an exclusive write lock for the duration of the index creation.
Scope & Purpose
Adds support for the new experimental vector index to the collection Indexes view in the core DB UI. The index appears as
"Vector index (beta)"
in the dropdown, and is only shown when the server is started with the--experimental-vector-index
flag.Checklist
Related Information
Note: This backend PR addresses a bug where the vector index appears even without the
--experimental-vector-index
flag. This PR can proceed with a mention of the backend PR as an upstream dependency.