Description
Is there an existing issue for this?
- I have searched the existing issues
Environment
- Milvus version: 2.5.1
- Deployment mode(standalone or cluster): standalone
- MQ type(rocksmq, pulsar or kafka): pulsar
- SDK version(e.g. pymilvus v2.0.0rc2): go-milvus 2.5.3
- OS(Ubuntu or CentOS): Ubuntu
Current Behavior
We have a use case (schema modifications) where we need to query all the data in a collection and insert it into a new one. We currently have a collection with ~1.2M rows. The way we are going about it is, do an empty Query within a loop, using Offset and Limit to do 10k-row batches. This works well in standalone mode. However, in Cluster mode we see the following error in the tenth loop:
/workspace/source/pkg/tracer/stack_trace.go:51 github.com/milvus-io/milvus/pkg/v2/tracer.StackTrace
/workspace/source/internal/util/grpcclient/client.go:575 github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).Call
/workspace/source/internal/util/grpcclient/client.go:589 github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).ReCall
/workspace/source/internal/distributed/querynode/client/client.go:106 github.com/milvus-io/milvus/internal/distributed/querynode/client.wrapGrpcCall[...]
/workspace/source/internal/distributed/querynode/client/client.go:231 github.com/milvus-io/milvus/internal/distributed/querynode/client.(*Client).Query
/workspace/source/internal/proxy/task_query.go:612 github.com/milvus-io/milvus/internal/proxy.(*queryTask).queryShard
/workspace/source/internal/proxy/lb_policy.go:210 github.com/milvus-io/milvus/internal/proxy.(*LBPolicyImpl).ExecuteWithRetry.func1
/workspace/source/pkg/util/retry/retry.go:44 github.com/milvus-io/milvus/pkg/v2/util/retry.Do
/workspace/source/internal/proxy/lb_policy.go:179 github.com/milvus-io/milvus/internal/proxy.(*LBPolicyImpl).ExecuteWithRetry
/workspace/source/internal/proxy/lb_policy.go:246 github.com/milvus-io/milvus/internal/proxy.(*LBPolicyImpl).Execute.func1: rpc error: code = ResourceExhausted desc = grpc: trying to send message larger than max (548853608 vs. 536870912)
After doing some code sleuthing I tried modifying the grpc limits. I narrowed it down to queryNode.grpc.{clientMaxRecvSize,serverMaxSendSize}
. When I duplicated those (setting them to 1GiB = 1073741824) the number of records I was able to retrieve also doubled to 200k. This leads me to believe that the queryNode is doing the query, retrieving offset+limit rows, and then returning them, instead of trimming off the offset.
Expected Behavior
I should be able to use offset+limit to retrieve any number of records as long as the limit is reasonable.
Steps To Reproduce
Milvus Log
No response
Anything else?
No response