Consistent hash exchange bindings are changed and break when a node restarts #3386

luos · 2021-09-08T12:54:57Z

Hi,

When a single node is restarted out of a cluster the consistent hash exchange will add existing bindings again into the ring.

Reproduction:

Create 3 node cluster
Create a consistent hash exchange
Bind a queue to it with any weight (for example 1)
Stop a node (with stop_app or anything)

Output:

root@rmq1:/# rabbitmqctl eval 'ets:tab2list(rabbit_exchange_type_consistent_hash_ring_state).'
[{chx_hash_ring,{resource,<<"/">>,exchange,<<"test-exchange">>},
                #{13 => {resource,<<"/">>,queue,<<"test-qq">>}},
                14}]
root@rmq1:/# rabbitmqctl eval 'ets:tab2list(rabbit_exchange_type_consistent_hash_ring_state).'^C
root@rmq1:/# rabbitmqctl start_app^C
root@rmq1:/# rabbitmqctl stop_app && rabbitmqctl start_app
Stopping rabbit application on node [email protected] ...
Starting node [email protected] ...
root@rmq1:/# rabbitmqctl eval 'ets:tab2list(rabbit_exchange_type_consistent_hash_ring_state).'
[{chx_hash_ring,{resource,<<"/">>,exchange,<<"test-exchange">>},
                #{13 => {resource,<<"/">>,queue,<<"test-qq">>},
                  14 => {resource,<<"/">>,queue,<<"test-qq">>},
                  15 => {resource,<<"/">>,queue,<<"test-qq">>}},
                16}]
root@rmq1:/# rabbitmqctl stop_app && rabbitmqctl start_app
Stopping rabbit application on node [email protected] ...
Starting node [email protected] ...
root@rmq1:/# rabbitmqctl eval 'ets:tab2list(rabbit_exchange_type_consistent_hash_ring_state).'
[{chx_hash_ring,{resource,<<"/">>,exchange,<<"test-exchange">>},
                #{13 => {resource,<<"/">>,queue,<<"test-qq">>},
                  14 => {resource,<<"/">>,queue,<<"test-qq">>},
                  15 => {resource,<<"/">>,queue,<<"test-qq">>},
                  16 => {resource,<<"/">>,queue,<<"test-qq">>},
                  17 => {resource,<<"/">>,queue,<<"test-qq">>}},
                18}]
root@rmq1:/# rabbitmqctl stop_app && rabbitmqctl start_app
Stopping rabbit application on node [email protected] ...
Starting node [email protected] ...
root@rmq1:/# rabbitmqctl eval 'ets:tab2list(rabbit_exchange_type_consistent_hash_ring_state).'
[{chx_hash_ring,{resource,<<"/">>,exchange,<<"test-exchange">>},
                #{13 => {resource,<<"/">>,queue,<<"test-qq">>},
                  14 => {resource,<<"/">>,queue,<<"test-qq">>},
                  15 => {resource,<<"/">>,queue,<<"test-qq">>},
                  16 => {resource,<<"/">>,queue,<<"test-qq">>},
                  17 => {resource,<<"/">>,queue,<<"test-qq">>},
                  18 => {resource,<<"/">>,queue,<<"test-qq">>},
                  19 => {resource,<<"/">>,queue,<<"test-qq">>}},
                20}]

After this cleaning up the bindings and recreating them breaks the exchange:

Remove all bindings from the queue / exchange
Add a binding to a queue
Try to publish messages

rmq1.monitored.host | 2021-09-08 12:52:49.746842+00:00 [warn] <0.32348.1> Bucket 6 not found
rmq1.monitored.host | 2021-09-08 12:52:53.163576+00:00 [warn] <0.32375.1> Bucket 6 not found

The text was updated successfully, but these errors were encountered:

michaelklishin · 2021-09-08T13:06:57Z

I only see two options:

Make hash ring additions idempotent
Assume that if the table can be sync, we can use its existing data (this may require using disk nodes, which I don't know how to make upgradeable)

Binding recovery will happen on node boot regardless of what we do. Besides bindings themselves, exchange state is a really rare edge case that's not really handled in RabbitMQ.

Ideas and PRs would be appreciated, our team doesn't have the capacity to redesign this plugin any time soon.

luos · 2021-09-08T14:29:58Z

I think it would be fine to not apply the binding if we already have weight number of entries - though it makes it harder that you can bind the queue multiple times.

Maybe as a quick fix just resolving the counter(?) which goes wrong with the buckets would be fine so at least there is a queue where the message goes even if it's not consistent.

michaelklishin · 2021-09-08T15:24:32Z

We can ignore the fact that you can bind a queue multiple times as that's clearly something the binding recovery process and such a stateful exchange type cannot handle (we don't know what other bindings are going to be recovered down the line).

For this exchange type, multiple bindings likely is a result of a mistake. We can assume that if you have enough ring entries ("weight") for the recovered binding, we can do nothing.

FalconerTC · 2021-09-23T17:28:47Z

Very interested in a resolution to this issue! I'm seeing the same issue, where node restarts cause the exchange to become unusable, though I'm not seeing the binding count increase.

FalconerTC · 2021-11-16T20:51:24Z

Turns out I didn't really resolve this in #3594... When bindings are copied over when a new node comes up, they aren't sorted anymore so remove_binding doesn't properly adjust

An example after node removal:

I have no name!@rabbitmq-6:/$ rabbitmqctl eval 'ets:tab2list(rabbit_exchange_type_consistent_hash_ring_state).'
 {chx_hash_ring,{resource,<<"/">>,exchange,<<"order-group-action">>},
                #{0 =>
                      {resource,<<"/">>,queue,
                                <<"order-group-action-panda-668d997c5-pzn4k">>},
                  1 =>
                      {resource,<<"/">>,queue,
                                <<"order-group-action-panda-668d997c5-kt7v5">>},
                  2 =>
                      {resource,<<"/">>,queue,
                                <<"order-group-action-panda-668d997c5-kt7v5">>},
                  3 =>
                      {resource,<<"/">>,queue,
                                <<"order-group-action-panda-668d997c5-pzn4k">>},
                  4 =>
                      {resource,<<"/">>,queue,
                                <<"order-group-action-panda-668d997c5-kt7v5">>},
                  5 =>
                      {resource,<<"/">>,queue,
                                <<"order-group-action-panda-668d997c5-pzn4k">>},
                  6 =>
                      {resource,<<"/">>,queue,
                                <<"order-group-action-panda-668d997c5-jkw6b">>}},
                7},

Which means the ring breaks when one of those bindings is removed

I have no name!@rabbitmq-6:/$ rabbitmqctl eval 'ets:tab2list(rabbit_exchange_type_consistent_hash_ring_state).'
 {chx_hash_ring,{resource,<<"/">>,exchange,<<"order-group-action">>},
                #{3 =>
                      {resource,<<"/">>,queue,
                                <<"order-group-action-panda-668d997c5-jkw6b">>}},
                4},

This also can happen by doing the following

Create binding1 with queue1
Create binding2 with queue2
Create binding3 with queue1 (with a different weight)
Delete binding1

I imagine this can be fixed with some sorting in remove_binding but as per

For this exchange type, multiple bindings likely is a result of a mistake

I wonder if it would be better to just reject new bindings with the same name? Any suggestion @michaelklishin ?

luos · 2021-11-18T15:54:29Z

I think that maybe rebinding should just ensure that there are as many buckets as the new binding and drop the old binding. What do you think?

I'd say rejecting a binding may be unexpected for some applications, though most probably for 99% of users it does not matter (and now it's broken anyway).

de1ayer · 2022-02-08T11:42:16Z

Hi all! I've experienced same issue last week ("Bucket not found" in log, no routing out of exchange) and wondering around for fix. #3594 is merged to 3.8.24 (we have 3.8.3 versioned cluster at the moment) so I may update to get partly fix.

What about adding new nodes which is the main way to scale cluster? Is it unsafe for 'x-consistent-hash' exchanges ?

caoheyang · 2022-02-16T08:43:00Z

Hi all! Is there have any updates?

Heyang

Regards

luos · 2022-02-16T09:06:24Z

Hi,

From my side, I'd prefer if the plugin rejects rebinding, that would simplify it, but it definitely needs some more thinking and implementation. :)
I think node restarts then deleting the binding still break it. I have not tested but possibility it's OK when you do not delete bindings, keeping in mind, restarting will probably reshuffle the exchange.

luos · 2022-02-23T12:56:55Z

Hi,

I made some tests and here are my findings:

Restarting a node does not really break the exchange, however it adds more bindings as previously. This means that the messages will be reshuffled.
After a restart, deleting a binding will leave the exchange in an inconsistent state, losing messages.

I tested this on 3.9.13.

nunoguerreirorosa · 2022-03-01T16:50:58Z

Consistent hash exchange is the only option as far as I am aware in RabbitMQ to be able to maintain the order of dependent messages while scaling up the number of consumers. This plugin was also added in Amazon MQ but apparently with this particular bug it ends up rendering this plugin unusable. I would say it is quite a critical bug for anyone thinking on using this particular plugin, since a node restart within a cluster is quite a standard operation. Is there any way this can be prioritized in the roadmap @michaelklishin ?

stgolem · 2022-04-20T04:56:07Z

@FalconerTC is there any chance on fixing this issue? Or are there any known workarounds for this aside from full rebinding? We're stuck with this issue in our application also.

michaelklishin · 2022-04-20T07:09:55Z

@nunoguerreirorosa this is open source software, so you if want something "prioritized", dive in and provide a set of steps to reasonably reliably reproduce, then feel free to look into the root cause. Pressing others to "prioritize" something in a piece of software you get for free is not how open source software evolves.

Consistent hashing exchange, unlike every other "built-in" (or tier 1) exchange type, is stateful. Booting nodes would recover ring state while applications can already begin modifying the same state by binding or unbinding. Besides delaying client connection listener startup
until after all plugins are enabled and some time passes, there is no way to avoid this that
would not result in refused client operations.

@luos ring state recovery does not involve creating any bindings. The plugin uses a list of
durable bindings to recompute the (transient) state and then adapts it as bindings are
added or removed.

@stgolem we don't understand what the issue is or how to reproduce it. You are welcome to
spend some time on that.

michaelklishin · 2022-04-20T07:32:20Z

I now see some investigations above 👍

Making binding addition idempotent sounds like the right thing to do (removal, of course, already is).

This is a behavior change so we either have to try to squeeze it into 3.10 or anger the SemVer lovers and ship it in a patch release (even though it would count as a bug fix to many).

michaelklishin · 2022-04-20T08:31:22Z

Other team members suggest that we cannot delay 3.10 over this, so it would have to go into a patch release at some point.

luos · 2022-04-20T08:41:04Z

I do not think that's a problem. It is a bug and now it is quite broken, I'd say it is not recommended to use in a clustered environment, so if it gets fixed at some point, it will be a bug fix.

stgolem · 2022-04-20T12:34:29Z

Thanks for your time. We have exactly same issue as others with this plugin in cluster environment and node restarts. We will gently wait for this issue to resolve in some next release. As a user i classify this as a bug.

SteamUpdate · 2022-04-20T13:25:03Z

Greetings! I have a case how to reproduce "bucket not found" error.

RMQ Version: 3.9.15
Erlang Version: 24.3.3

RMQ runs as a cluster of three nodes. One exchange and four queues has been created. 1 & 3 queues are located on node 2, 2 & 4 - on node 1. Four bindings with routing key "1" has been created also.

Check the ring status:

$ rabbitmq-diagnostics consistent_hash_exchange_ring_state test-hash
Inspecting consistent hashing ring state for exchange test-hash in virtual host '/'...
Ring index: 0, queue: 'test1'
Ring index: 1, queue: 'test2'
Ring index: 2, queue: 'test3'
Ring index: 3, queue: 'test4'

Send a message with routing key 101 in the exchange. The message has been routed to queue 'test1'.

Next, stop node 1 with rabbitmqctl stop_app command and check the ring status:

rabbitmq-diagnostics consistent_hash_exchange_ring_state test-hash
Inspecting consistent hashing ring state for exchange test-hash in virtual host '/'...
Ring index: 0, queue: 'test1'
Ring index: 1, queue: 'test3'

Next, start node 1 with rabbitmqctl start_app command and check the ring status:

rabbitmq-diagnostics consistent_hash_exchange_ring_state test-hash
Inspecting consistent hashing ring state for exchange test-hash in virtual host '/'...
Ring index: 0, queue: 'test1'
Ring index: 1, queue: 'test3'
Ring index: 2, queue: 'test1'
Ring index: 3, queue: 'test3'
Ring index: 4, queue: 'test2'
Ring index: 5, queue: 'test4'

Next, stop node 2 and check the ring status:

$ rabbitmq-diagnostics consistent_hash_exchange_ring_state test-hash
Inspecting consistent hashing ring state for exchange test-hash in virtual host '/'...
Ring index: 1, queue: 'test2'
Ring index: 2, queue: 'test4'

Next, start node 2 and check the ring status:

$ rabbitmq-diagnostics consistent_hash_exchange_ring_state test-hash
Inspecting consistent hashing ring state for exchange test-hash in virtual host '/'...
Ring index: 1, queue: 'test2'
Ring index: 2, queue: 'test4'
Ring index: 3, queue: 'test2'
Ring index: 4, queue: 'test4'
Ring index: 5, queue: 'test1'
Ring index: 6, queue: 'test3'

Finally, send a message with routing key 101 in the exchange.

Result: the message published, but not routed. Logs: 2022-04-20 13:20:04.078869+00:00 [warn] <0.24750.1> Bucket 0 not found

cc: @luos @michaelklishin

tronyx · 2022-06-08T22:57:30Z

Is this planned to be fixed in both 3.9.X and 3.10.X? Are there any recommendations on a quicker way to resolve this after container recreation other than simply removing and recreating the exchange?

michaelklishin · 2022-06-16T20:06:42Z

I have been looking at how we can make binding addition and deletion idempotent for this exchange type. I see only one simple way: to keep track of what (destination, type, routing key) have been added and if they are known, make the function do nothing.

This has one limitation. Code that repeatedly binds the same exchange and queue with different routing keys/weights is no longer going to work:

# in pseudocode
bind(chx, q1, weight: 1)
bind(chx, q1, weight: 3)

will lead to a single binding with "weight = 1". I find this acceptable but there may be some code that's going to break as a result, or at least the routing distribution will be affected.
Quite frankly, I don't think the weights idea is worth the complexity anyway, and this may
be our first step towards getting rid of it entirely for 3.11 or 4.0.

Trying to make hash ring updates idempotent is going to be too complex for me to agree to.

HoloRin · 2022-06-17T10:50:39Z

If it is the case that the effect of weights is indeterminate in the face of node restarts, then adopting a first wins approach does not seem unreasonable to me.

michaelklishin · 2022-06-20T11:42:16Z

The hash ring state record changes would require a schema database migration, which seems like an overkill here. Will try to make the function idempotent using the bindings/routes tables instead which are very stable.

michaelklishin · 2022-06-29T06:47:01Z

First attempt at relying on binding data to make hash ring state management idempotent failed.
Because RabbitMQ will create bindings anyway, it would only create confusion.

A longer term solution would be to make this the first plugin that adopts Khepri, our Raft-based next generation schema data store.

In the short term I'll look into using the state of the ring itself and/or the good old distributed locking module in Erlang.

First binding wins. Duplicate bindings, i.e. bindings with the same source exchange and same destination queue / exchange but possibly different routing key (weight) are ignored from now on by the consistent hash exchange. This applies only to bindings being added. For bindings being deleted, any duplicate binding (independent of its routing key) will delete all buckets for the given source and destination. (This is to ensure that buckets for a given source and destination can be deleted for upgrades prior to this change. This was also the behaviour prior to this commit, so nothing changes in that regard.) Note that duplicate bindings continue to be created in RabbitMQ. (They are only ignored by the consistent hash exchange.) Adding a binding will perform linear search in the bucket map. This is already stated in the README: "These two operations use linear algorithms to update the ring." The linear search when adding a binding could be optimised by adding another Mnesia table field which will require a new migration and feature flag. Hence, such an optimization is left out in this commit. Fixes #3386.

First binding wins. Duplicate bindings, i.e. bindings with the same source exchange and same destination queue / exchange but possibly different routing key (weight) are ignored from now on by the consistent hash exchange. This applies only to bindings being added. For bindings being deleted, any duplicate binding (independent of its routing key) will delete all buckets for the given source and destination. (This is to ensure that buckets for a given source and destination can be deleted for when upgrading from a version prior to this commit. This was also the behaviour prior to this commit, so nothing changes in that regard.) Note that duplicate bindings continue to be created in RabbitMQ. (They are only ignored by the consistent hash exchange.) Adding a binding will perform linear search in the bucket map. This is already stated in the README: "These two operations use linear algorithms to update the ring." The linear search when adding a binding could be optimised by adding another Mnesia table field which will require a new migration and feature flag. Hence, such an optimization is left out in this commit. Fixes #3386.

ansd · 2022-06-30T10:09:29Z

Thanks @SteamUpdate for the excellent reproduction steps in #3386 (comment).

#5121 makes adding bindings idempotent and will therefore fix this issue.

As @michaelklishin already mentioned this comes with a slight breaking change for applications that previously relied upon adding duplicate bindings (i.e. bindings with the same source exchange and same destination queue but possibly different routing key).

First binding wins. Duplicate bindings, i.e. bindings with the same source exchange and same destination queue / exchange but possibly different routing key (weight) are ignored from now on by the consistent hash exchange. This applies only to bindings being added. For bindings being deleted, any duplicate binding (independent of its routing key) will delete all buckets for the given source and destination. (This is to ensure that buckets for a given source and destination can be deleted for when upgrading from a version prior to this commit. This was also the behaviour prior to this commit, so nothing changes in that regard.) Note that duplicate bindings continue to be created in RabbitMQ. (They are only ignored by the consistent hash exchange.) Adding a binding will perform linear search in the bucket map. This is already stated in the README: "These two operations use linear algorithms to update the ring." The linear search when adding a binding could be optimised by adding another Mnesia table field which will require a new migration and feature flag. Hence, such an optimization is left out in this commit. Fixes #3386. (cherry picked from commit 878f369)

First binding wins. Duplicate bindings, i.e. bindings with the same source exchange and same destination queue / exchange but possibly different routing key (weight) are ignored from now on by the consistent hash exchange. This applies only to bindings being added. For bindings being deleted, any duplicate binding (independent of its routing key) will delete all buckets for the given source and destination. (This is to ensure that buckets for a given source and destination can be deleted for when upgrading from a version prior to this commit. This was also the behaviour prior to this commit, so nothing changes in that regard.) Note that duplicate bindings continue to be created in RabbitMQ. (They are only ignored by the consistent hash exchange.) Adding a binding will perform linear search in the bucket map. This is already stated in the README: "These two operations use linear algorithms to update the ring." The linear search when adding a binding could be optimised by adding another Mnesia table field which will require a new migration and feature flag. Hence, such an optimization is left out in this commit. Fixes #3386. (cherry picked from commit 878f369) (cherry picked from commit ab4f8a9)

FalconerTC mentioned this issue Oct 19, 2021

Correct consistent hashing exchange binding deletion when there are duplicate bindings #3594

Merged

12 tasks

michaelklishin added this to the 3.10.1 milestone Apr 21, 2022

michaelklishin removed this from the 3.10.1 milestone May 9, 2022

michaelklishin self-assigned this Jun 15, 2022

michaelklishin mentioned this issue Jun 30, 2022

Consistent hashing exchange: make addition of bindings idempotent w.r.t. ring state #5121

Merged

michaelklishin closed this as completed in #5121 Jun 30, 2022

michaelklishin added this to the 3.10.6 milestone Jun 30, 2022

Nov	DEC	Jan
	02
2023	2024	2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consistent hash exchange bindings are changed and break when a node restarts #3386

Consistent hash exchange bindings are changed and break when a node restarts #3386

luos commented Sep 8, 2021

michaelklishin commented Sep 8, 2021

luos commented Sep 8, 2021

michaelklishin commented Sep 8, 2021

FalconerTC commented Sep 23, 2021 •

edited

Loading

FalconerTC commented Nov 16, 2021 •

edited

Loading

luos commented Nov 18, 2021

de1ayer commented Feb 8, 2022

caoheyang commented Feb 16, 2022 •

edited

Loading

luos commented Feb 16, 2022

luos commented Feb 23, 2022

nunoguerreirorosa commented Mar 1, 2022 •

edited

Loading

stgolem commented Apr 20, 2022

michaelklishin commented Apr 20, 2022

michaelklishin commented Apr 20, 2022

michaelklishin commented Apr 20, 2022

luos commented Apr 20, 2022

stgolem commented Apr 20, 2022

SteamUpdate commented Apr 20, 2022

tronyx commented Jun 8, 2022

michaelklishin commented Jun 16, 2022 •

edited

Loading

HoloRin commented Jun 17, 2022

michaelklishin commented Jun 20, 2022

michaelklishin commented Jun 29, 2022

ansd commented Jun 30, 2022

Consistent hash exchange bindings are changed and break when a node restarts #3386

Consistent hash exchange bindings are changed and break when a node restarts #3386

Comments

luos commented Sep 8, 2021

michaelklishin commented Sep 8, 2021

luos commented Sep 8, 2021

michaelklishin commented Sep 8, 2021

FalconerTC commented Sep 23, 2021 • edited Loading

FalconerTC commented Nov 16, 2021 • edited Loading

luos commented Nov 18, 2021

de1ayer commented Feb 8, 2022

caoheyang commented Feb 16, 2022 • edited Loading

luos commented Feb 16, 2022

luos commented Feb 23, 2022

nunoguerreirorosa commented Mar 1, 2022 • edited Loading

stgolem commented Apr 20, 2022

michaelklishin commented Apr 20, 2022

michaelklishin commented Apr 20, 2022

michaelklishin commented Apr 20, 2022

luos commented Apr 20, 2022

stgolem commented Apr 20, 2022

SteamUpdate commented Apr 20, 2022

tronyx commented Jun 8, 2022

michaelklishin commented Jun 16, 2022 • edited Loading

HoloRin commented Jun 17, 2022

michaelklishin commented Jun 20, 2022

michaelklishin commented Jun 29, 2022

ansd commented Jun 30, 2022

FalconerTC commented Sep 23, 2021 •

edited

Loading

FalconerTC commented Nov 16, 2021 •

edited

Loading

caoheyang commented Feb 16, 2022 •

edited

Loading

nunoguerreirorosa commented Mar 1, 2022 •

edited

Loading

michaelklishin commented Jun 16, 2022 •

edited

Loading