fix(core): Prevent worker from recovering finished executions #16094

ivov · 2025-06-06T12:39:08Z

Summary

Crash recovery has plenty of known issues but I found a fundamental one.

Consider this scenario:

Main enqueues a job for execution 1 whose workflow has an error-handling workflow
Worker A picks up job, runs execution 1, which throws, finishing with error status
Worker A enqueues job for execution 2, for the error-handling workflow
Worker B picks up job, runs execution 2 successfully

In this scenario, we end up with inconsistent event logs:

n8nEventLog.log (main)
- n8n.workflow.started for execution 1
- n8n.workflow.failed for execution 1

n8nEventLog-worker.log (worker A)
- n8n.node.{started|finished} events for execution 1

n8nEventLog-worker.log (worker B)
- n8n.workflow.started for execution 2
- n8n.node.{started|finished} events for execution 2
- n8n.workflow.success for execution 2

This is because of our lifecycle hooks setup:

Main uses getLifecycleHooksForScalingMain which includes hookFunctionsWorkflowEvents
Worker uses getLifecycleHooksForScalingWorker which excludes hookFunctionsWorkflowEvents
Subexecutions use getLifecycleHooksForSubExecutions which includes hookFunctionsWorkflowEvents

This event log inconsistency impacts crash recovery, which assumes all events for an execution are in the event log for a given instance. This means crash recovery wrongly identifies execution 1 as unfinished (instead of finished with an error) leading it to wrongly amend details for a finished execution (instead of skipping it), which will also lead to another crash recovery on the next restart, as the details are incomplete.

In short, this means every worker tries to recover its own finished failed executions (instead of only unfinished crashed ones), which the worker does not even have the data for, and so this cycle repeats on restart. The root solution would be to straighten out our hooks setup and have event logs for workers remain with workers, but lifecycle hooks are so central and so legacy that this would be a big dedicated effort.

Hence for now, this fix filters out finished executions downstream and defers crash recovery logging until after we know which were the actually recovered execution IDs.

Related Linear tickets, Github issues, and Community forum posts

n/a

Review / Merge checklist

PR title and summary are descriptive. (conventions)
Docs updated or follow-up ticket created.
Tests included.
PR Labeled with release/backport (if the PR is an urgent fix that needs to be backported)

cubic-dev-ai

cubic found 1 issue across 3 files. Review it in cubic.dev

_{React with 👍 or 👎 to teach cubic. Tag @cubic-dev-ai to give specific feedback.}

packages/cli/src/eventbus/message-event-bus/message-event-bus.ts

codecov · 2025-06-06T12:44:53Z

Codecov Report

Attention: Patch coverage is 25.00000% with 6 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
...rc/eventbus/message-event-bus/message-event-bus.ts	0.00%	6 Missing ⚠️

📢 Thoughts on this report? Let us know!

fix(core): Prevent worker from recovering finished executions

355cfa8

cubic-dev-ai bot reviewed Jun 6, 2025

View reviewed changes

packages/cli/src/eventbus/message-event-bus/message-event-bus.ts Show resolved Hide resolved

ivov requested a review from despairblue June 6, 2025 12:54

n8n-assistant bot added core Enhancement outside /nodes-base and /editor-ui n8n team Authored by the n8n team labels Jun 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(core): Prevent worker from recovering finished executions #16094

fix(core): Prevent worker from recovering finished executions #16094

ivov commented Jun 6, 2025 •

edited

Loading

Uh oh!

cubic-dev-ai bot left a comment

Uh oh!

Uh oh!

codecov bot commented Jun 6, 2025

Uh oh!

Uh oh!

fix(core): Prevent worker from recovering finished executions #16094

Are you sure you want to change the base?

fix(core): Prevent worker from recovering finished executions #16094

Conversation

ivov commented Jun 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Related Linear tickets, Github issues, and Community forum posts

Review / Merge checklist

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

codecov bot commented Jun 6, 2025

Codecov Report

Uh oh!

Uh oh!

ivov commented Jun 6, 2025 •

edited

Loading