Skip to content

[Impeller] when decoding image on iOS, block on command buffer scheduling and log gpu errors. #169378

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jun 13, 2025

Conversation

jonahwilliams
Copy link
Member

@jonahwilliams jonahwilliams commented May 23, 2025

Attempted fix for #166668 . Plan is to ask for testing once rolled into g3 since we weren't given a repro.

Explaination: we can only submit command buffers to the metal command queue when the device is in the foreground. push notifications temporarily wake up the app and can trigger images to begin decoding. This decoding can fail, possibly due to the app re-backgrounding. In theory, the sync switch should prevent this from happening - but because we don't actually block on the scheduling of the command buffer there is technically a race that can happen.

Add the ability to block submission on scheduling for metal. note that other graphics APIs do not have a distinction for commited vs scheduled and so it no-ops for them.

See https://developer.apple.com/documentation/metal/preparing-your-metal-app-to-run-in-the-background

When UIKit calls your app delegate’s applicationDidEnterBackground(_:) method, make sure Metal has scheduled all command buffers you’ve already committed before your app returns control to the system. ... Finish encoding commands to render the frame and commit the command buffer, then call waitUntilScheduled().

@github-actions github-actions bot added engine flutter/engine repository. See also e: labels. e: impeller Impeller rendering backend issues and features requests labels May 23, 2025
@jonahwilliams jonahwilliams marked this pull request as ready for review May 23, 2025 18:53
@jonahwilliams jonahwilliams requested a review from gaaclarke May 23, 2025 20:15
<< "GPU Error submitting image decoding command buffer.";
}
},
/*block_on_schedule=*/true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to be inefficient, needlessly synchronizing the cpu and the gpu in most cases.

I think the better way to implement this would be for ImageDecoderImpeller to maintain a receipt from submitting the command buffer. It will add a fml::SyncSwitch::Observer to the fml::SyncSwitch, when that notifies the ImageDecoderImpeller that we are losing access to the GPU it uses the receipt to get access to the buffer and call waitUntilScheduled.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is not what block_on_scheduled does. in fact, scheduling has nothing to do with gpu usage at all (that would be waitUntilCompleted). All it blocks on is the queue acknowledging the buffer, which is not synchronous unlike GLES/Vulkan.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an asynchronous operation happening which waitUntilScheduled is causing the calling thread to block on. The approach I'm proposing will only block the thread in the case of applicationDidEnterBackground: which is my understanding of what Apple is recommending.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, but blocking a background worker for a ms or two as part of an operation (image decoding) that regularly takes 100s of ms hardly seems like a problem

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe not. I'm not quite sure why we would prefer it though when we know of an alternative. Plus the alternative doesn't make everyone that submits a command buffer learn what that flag means when it's only relevant on one platform, in one usage.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plus we're dealing with a bit of a black box with the driver. I'd say it's preferred we follow Apple's recommendation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...and I'll add that we don't know if it fixes it for real or not. If we want to engineer a better solution if it works, I'm down for that. But if we end up back at square one because it doesn't fix it I plan to revert this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From offline chat: Calling waitForScheduled when we don't have access to the gpu will cause deadlock, but this is happening only inside of a syncswitch block to guarantee we have the GPU.

<< "GPU Error submitting image decoding command buffer.";
}
},
/*block_on_schedule=*/true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this only going to be an issue for the decoder? Seems like command buffers submitted on the raster thread could run into the same problem too, no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

never gotten a report of that. Perhaps if it did happen it isn't noticable that the encoding fails since by the time the app is actually foregrounded another frame will have rendered. Whereas if the upload fails for image decoding that failure sticks around

Copy link
Member

@gaaclarke gaaclarke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I'm fine going ahead with this to test out once we've cleaned it up. Add documentation and file an tech debt issue for the extra thread blocking we could remove.

@@ -107,7 +107,8 @@ class CommandBuffer {

virtual std::shared_ptr<BlitPass> OnCreateBlitPass() = 0;

[[nodiscard]] virtual bool OnSubmitCommands(CompletionCallback callback) = 0;
[[nodiscard]] virtual bool OnSubmitCommands(bool block_on_schedule,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add docstring for block_on_schedule. (just link to SubmitCommands)

@@ -127,9 +128,10 @@ class CommandBuffer {
///
/// @param[in] callback The completion callback.
///
[[nodiscard]] bool SubmitCommands(const CompletionCallback& callback);
[[nodiscard]] bool SubmitCommands(bool block_on_schedule,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add docstring to block_on_schedule.

@jonahwilliams jonahwilliams marked this pull request as ready for review June 12, 2025 21:20
@jonahwilliams
Copy link
Member Author

Re-opening this PR as the other fix didn't fix it

@jonahwilliams jonahwilliams requested a review from gaaclarke June 12, 2025 21:20
@gaaclarke
Copy link
Member

Re-opening this PR as the other fix didn't fix it

Sounds good. I was/am fine with trying this out after the above feedback is addressed.

@jonahwilliams
Copy link
Member Author

I added the doc comment

Copy link
Member

@gaaclarke gaaclarke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I added the issue: #170599

@jonahwilliams jonahwilliams added the autosubmit Merge PR when tree becomes green via auto submit App label Jun 13, 2025
@auto-submit auto-submit bot added this pull request to the merge queue Jun 13, 2025
Merged via the queue into flutter:master with commit f7d37ca Jun 13, 2025
181 of 182 checks passed
@flutter-dashboard flutter-dashboard bot removed the autosubmit Merge PR when tree becomes green via auto submit App label Jun 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
e: impeller Impeller rendering backend issues and features requests engine flutter/engine repository. See also e: labels.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants