Closed
Description
Is there an existing issue for this?
- I have searched the existing issues
Environment
- Milvus version:2.5-20250604-fdfb78b9-amd64
- Deployment mode(standalone or cluster):
- MQ type(rocksmq, pulsar or kafka):
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS):
- CPU/Memory:
- GPU:
- Others:
Current Behavior
[2025/06/05 01:56:48.627 +00:00] [INFO] [task/executor.go:120] ["execute the action of task"] [taskID=1749086769340] [collectionID=458512562322749924] [replicaID=-1] [step=0] [source=segment_checker]
[2025/06/05 01:56:48.627 +00:00] [WARN] [task/executor.go:293] ["no shard leader for the segment to execute releasing"] [taskID=1749086769340] [collectionID=458512562322749924] [replicaID=-1] [segmentID=458512562322554259] [node=4] [source=segment_checker] [error="shard delegator not found: channel not found[channel=by-dev-rootcoord-dml_6_458512562322749924v1]"] [errorVerbose="shard delegator not found: channel not found[channel=by-dev-rootcoord-dml_6_458512562322749924v1]\n(1) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/pkg/v2/util/merr.warpChannelErr\n | \t/workspace/source/pkg/util/merr/utils.go:708\n | github.com/milvus-io/milvus/pkg/v2/util/merr.WrapErrChannelNotFound\n | \t/workspace/source/pkg/util/merr/utils.go:714\n | github.com/milvus-io/milvus/internal/querycoordv2/task.(*Executor).releaseSegment\n | \t/workspace/source/internal/querycoordv2/task/executor.go:292\n | github.com/milvus-io/milvus/internal/querycoordv2/task.(*Executor).executeSegmentAction\n | \t/workspace/source/internal/querycoordv2/task/executor.go:163\n | github.com/milvus-io/milvus/internal/querycoordv2/task.(*Executor).Execute.func1\n | \t/workspace/source/internal/querycoordv2/task/executor.go:123\n | runtime.goexit\n | \t/usr/local/go/src/runtime/asm_amd64.s:1700\nWraps: (2) shard delegator not found\nWraps: (3) channel not found[channel=by-dev-rootcoord-dml_6_458512562322749924v1]\nError types: (1) *withstack.withStack (2) *errutil.withPrefix (3) merr.milvusError"]
SIGNAL CATCH BY NON-GO SIGNAL HANDLER
SIGNAL CATCH BY NON-GO SIGNAL HANDLER
SIGNO: 11; SIGNAME: Segmentation fault; SI_CODE: 1; SI_ADDR: (nil)
BACKTRACE:
SIGNO: 11; SIGNAME: Segmentation fault; SI_CODE: 1; SI_ADDR: (nil)
BACKTRACE:
github.com/milvus-io/milvus/internal/querycoordv2/task.(*Executor).releaseSegment
/workspace/source/internal/querycoordv2/task/executor.go:297 pc=0x5180d8e
[2025/06/05 01:56:53.364 +00:00] [INFO] [task/executor.go:120] ["execute the action of task"] [taskID=1749086769343] [collectionID=458512562322749924] [replicaID=-1] [step=0] [source=segment_checker]
[2025/06/05 01:56:53.365 +00:00] [WARN] [task/executor.go:293] ["no shard leader for the segment to execute releasing"] [taskID=1749086769343] [collectionID=458512562322749924] [replicaID=-1] [segmentID=458512562322554768] [node=4] [source=segment_checker] [error="shard delegator not found: channel not found[channel=by-dev-rootcoord-dml_6_458512562322749924v1]"] [errorVerbose="shard delegator not found: channel not found[channel=by-dev-rootcoord-dml_6_458512562322749924v1]\n(1) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/pkg/v2/util/merr.warpChannelErr\n | \t/workspace/source/pkg/util/merr/utils.go:708\n | github.com/milvus-io/milvus/pkg/v2/util/merr.WrapErrChannelNotFound\n | \t/workspace/source/pkg/util/merr/utils.go:714\n | github.com/milvus-io/milvus/internal/querycoordv2/task.(*Executor).releaseSegment\n | \t/workspace/source/internal/querycoordv2/task/executor.go:292\n | github.com/milvus-io/milvus/internal/querycoordv2/task.(*Executor).executeSegmentAction\n | \t/workspace/source/internal/querycoordv2/task/executor.go:163\n | github.com/milvus-io/milvus/internal/querycoordv2/task.(*Executor).Execute.func1\n | \t/workspace/source/internal/querycoordv2/task/executor.go:123\n | runtime.goexit\n | \t/usr/local/go/src/runtime/asm_amd64.s:1700\nWraps: (2) shard delegator not found\nWraps: (3) channel not found[channel=by-dev-rootcoord-dml_6_458512562322749924v1]\nError types: (1) *withstack.withStack (2) *errutil.withPrefix (3) merr.milvusError"]
SIGNAL CATCH BY NON-GO SIGNAL HANDLER
SIGNO: 11; SIGNAME: Segmentation fault; SI_CODE: 1; SI_ADDR: (nil)
BACKTRACE:
github.com/milvus-io/milvus/internal/querycoordv2/task.(*Executor).releaseSegment
/workspace/source/internal/querycoordv2/task/executor.go:297 pc=0x5180d8e
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x5180d8e]
Expected Behavior
No response
Steps To Reproduce
Milvus Log
failed job: https://qa-jenkins.milvus.io/blue/organizations/jenkins/chaos-test-for-release-cron/detail/chaos-test-for-release-cron/19431/pipeline
log:
artifacts-mixcoord-pod-failure-19431-server-logs.tar.gz
Anything else?
when no pods get killed, this issue also reproduced, for example, etcd-follower chaos test
failed job: https://qa-jenkins.milvus.io/blue/organizations/jenkins/chaos-test-for-release-cron/detail/chaos-test-for-release-cron/19440/pipeline
log:
artifacts-etcd-followers-pod-failure-19440-server-logs.tar.gz
[2025/06/05 03:28:37.382 +00:00] [INFO] [task/scheduler.go:407] ["task added"] [task="[id=1749094110901] [type=Move] [source=balance_checker] [reason=channel unbalanced] [collectionID=458514015146293816] [replicaID=458514526585225217] [resourceGroup=__default_resource_group] [priority=High] [actionsCount=2] [actions={[type=Grow][node=6][shard=by-dev-rootcoord-dml_13_458514015146293816v0]},{[type=Reduce][node=5][shard=by-dev-rootcoord-dml_13_458514015146293816v0]},] [channel=by-dev-rootcoord-dml_13_458514015146293816v0]"]
github.com/milvus-io/milvus/internal/querycoordv2/task.(*Executor).releaseSegment
/workspace/source/internal/querycoordv2/task/executor.go:297 pc=0x5180d8e
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x5180d8e]
goroutine 3556 gp=0xc001a828c0 m=9 mp=0xc000189008 [running]:
panic({0x5a78b60?, 0x9163150?})
/usr/local/go/src/runtime/panic.go:811 +0x168 fp=0xc001d07bd0 sp=0xc001d07b20 pc=0x1f5ea08
runtime.panicmem(...)
/usr/local/go/src/runtime/panic.go:262
runtime.sigpanic()
/usr/local/go/src/runtime/signal_unix.go:925 +0x359 fp=0xc001d07c30 sp=0xc001d07bd0 pc=0x1f61d79
github.com/milvus-io/milvus/internal/querycoordv2/task.(*Executor).releaseSegment(0xc00735c000, 0xc002b19820, 0x0)
/workspace/source/internal/querycoordv2/task/executor.go:297 +0xaee fp=0xc001d07f58 sp=0xc001d07c30 pc=0x5180d8e
github.com/milvus-io/milvus/internal/querycoordv2/task.(*Executor).executeSegmentAction(0xc00735c000, 0xc002b19820, 0x0)
/workspace/source/internal/querycoordv2/task/executor.go:163 +0x8f fp=0xc001d07f80 sp=0xc001d07f58 pc=0x517eeef
github.com/milvus-io/milvus/internal/querycoordv2/task.(*Executor).Execute.func1()
/workspace/source/internal/querycoordv2/task/executor.go:123 +0x105 fp=0xc001d07fe0 sp=0xc001d07f80 pc=0x517e9e5
runtime.goexit({})
/usr/local/go/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc001d07fe8 sp=0xc001d07fe0 pc=0x1f68501
created by github.com/milvus-io/milvus/internal/querycoordv2/task.(*Executor).Execute in goroutine 3555
/workspace/source/internal/querycoordv2/task/executor.go:119 +0x517
cluster:4am
ns:chaos-tetsing
pods
+ kubectl get pods -o wide
+ grep etcd-followers-pod-failure-19440
etcd-followers-pod-failure-19440-0 1/1 Running 0 42m 10.104.24.111 4am-node29 <none> <none>
etcd-followers-pod-failure-19440-1 1/1 Running 2 (24m ago) 42m 10.104.19.71 4am-node28 <none> <none>
etcd-followers-pod-failure-19440-2 1/1 Running 0 42m 10.104.23.196 4am-node27 <none> <none>
etcd-followers-pod-failure-19440-milvus-datanode-894d99d76mdqvz 1/1 Running 2 (41m ago) 42m 10.104.33.154 4am-node36 <none> <none>
etcd-followers-pod-failure-19440-milvus-datanode-894d99d76mptbc 1/1 Running 2 (41m ago) 42m 10.104.26.170 4am-node32 <none> <none>
etcd-followers-pod-failure-19440-milvus-indexnode-684c4c86bc4lj 1/1 Running 2 (41m ago) 42m 10.104.17.17 4am-node23 <none> <none>
etcd-followers-pod-failure-19440-milvus-indexnode-684c4c86tlhf7 1/1 Running 2 (41m ago) 42m 10.104.33.155 4am-node36 <none> <none>
etcd-followers-pod-failure-19440-milvus-indexnode-684c4c86xqkq8 1/1 Running 2 (41m ago) 42m 10.104.32.202 4am-node39 <none> <none>
etcd-followers-pod-failure-19440-milvus-mixcoord-595fbc5887fk56 1/1 Running 4 (8m43s ago) 42m 10.104.33.153 4am-node36 <none> <none>
etcd-followers-pod-failure-19440-milvus-proxy-84fc884889-shpzw 1/1 Running 2 (41m ago) 42m 10.104.17.16 4am-node23 <none> <none>
etcd-followers-pod-failure-19440-milvus-querynode-f87f584d2lwzn 1/1 Running 2 (41m ago) 42m 10.104.26.171 4am-node32 <none> <none>
etcd-followers-pod-failure-19440-milvus-querynode-f87f584dmsgbs 1/1 Running 2 (41m ago) 42m 10.104.32.203 4am-node39 <none> <none>
etcd-followers-pod-failure-19440-milvus-querynode-f87f584dtqqkk 1/1 Running 2 (41m ago) 42m 10.104.33.156 4am-node36 <none> <none>
etcd-followers-pod-failure-19440-minio-0 1/1 Running 0 42m 10.104.19.68 4am-node28 <none> <none>
etcd-followers-pod-failure-19440-minio-1 1/1 Running 0 42m 10.104.24.113 4am-node29 <none> <none>
etcd-followers-pod-failure-19440-minio-2 1/1 Running 0 42m 10.104.23.197 4am-node27 <none> <none>
etcd-followers-pod-failure-19440-minio-3 1/1 Running 0 42m 10.104.15.82 4am-node20 <none> <none>
etcd-followers-pod-failure-19440-pulsarv3-bookie-0 1/1 Running 0 42m 10.104.15.76 4am-node20 <none> <none>
etcd-followers-pod-failure-19440-pulsarv3-bookie-1 1/1 Running 0 42m 10.104.19.75 4am-node28 <none> <none>
etcd-followers-pod-failure-19440-pulsarv3-bookie-2 1/1 Running 0 42m 10.104.24.119 4am-node29 <none> <none>
etcd-followers-pod-failure-19440-pulsarv3-bookie-init-72vm2 0/1 Completed 0 42m 10.104.15.67 4am-node20 <none> <none>
etcd-followers-pod-failure-19440-pulsarv3-broker-0 1/1 Running 0 42m 10.104.15.71 4am-node20 <none> <none>
etcd-followers-pod-failure-19440-pulsarv3-broker-1 1/1 Running 0 42m 10.104.24.108 4am-node29 <none> <none>
etcd-followers-pod-failure-19440-pulsarv3-proxy-0 1/1 Running 0 42m 10.104.15.68 4am-node20 <none> <none>
etcd-followers-pod-failure-19440-pulsarv3-proxy-1 1/1 Running 0 42m 10.104.19.63 4am-node28 <none> <none>
etcd-followers-pod-failure-19440-pulsarv3-pulsar-init-xzn45 0/1 Completed 0 42m 10.104.24.106 4am-node29 <none> <none>
etcd-followers-pod-failure-19440-pulsarv3-recovery-0 1/1 Running 0 42m 10.104.9.49 4am-node14 <none> <none>
etcd-followers-pod-failure-19440-pulsarv3-zookeeper-0 1/1 Running 0 42m 10.104.15.73 4am-node20 <none> <none>
etcd-followers-pod-failure-19440-pulsarv3-zookeeper-1 1/1 Running 0 42m 10.104.24.112 4am-node29 <none> <none>
etcd-followers-pod-failure-19440-pulsarv3-zookeeper-2 1/1 Running 0 42m 10.104.19.72 4am-node28 <none> <none>