Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upMemory diagnoser fix for Tiered Compilation #1543
Conversation
Hmm you had mentioned on the other issue that 5.0 is free of this issue, is this really specific to 3.1? Timings have changed between 3.1 and 5.0 such that the rejits would happen sooner in 5.0 but it seems unlikely that that would affect microbenchmarks (the change is easily visible in real-world cases with many thousands of methods but there would only be a few methods here). For allocation measurements especially with long-running methods and with a strong assertion about allocation (that there shouldn't be any) generally my suggestion is to disable tiered compilation. Tier 0 jitted code may allocate where optimized code would not, and the former may allocate much more than the latter. An alternative to the delay may be to set Aside from that, I wonder what is actually causing allocation during jitting. There are some small allocation-dependent things I'm aware of regarding virtual methods that can be fixed, but otherwise on a rejit since all static construction, etc. should have already been done I'm not sure what would be allocating. Do you have an idea? |
Also I don't think there should be any guarantee that there would not be any allocation happening in the background. As we move more stuff to managed code like the thread pool, things may happen in the background unrelated to the app's code that may cause allocation and that should be ok. |
Yes, it looks like it's specific to 3.1 (I did not test 2.2 and 3.0 as they are not supported anymore) BenchmarkDotNet=v0.12.1.20201002-develop, OS=Windows 10.0.18363.1082 (1909/November2019Update/19H2)
Intel Xeon CPU E5-1650 v4 3.60GHz, 1 CPU, 12 logical and 6 physical cores
.NET SDK=5.0.100-rc.1.20452.10
[Host] : .NET Core 2.1.21 (CoreCLR 4.6.29130.01, CoreFX 4.6.29130.02), X64 RyuJIT
Job-WTPPRD : .NET 5.0.0 (5.0.20.45114), X64 RyuJIT
Job-MUUDGX : .NET Core 2.1.21 (CoreCLR 4.6.29130.01, CoreFX 4.6.29130.02), X64 RyuJIT
Job-TXYPKS : .NET Core 3.1.7 (CoreCLR 4.700.20.36602, CoreFX 4.700.20.37001), X64 RyuJIT
Job-IEOLEB : .NET Framework 4.8 (4.8.4220.0), X64 RyuJIT
| Method | Runtime | Allocated |
|----------- |--------------------- |----------:|
| Benchmark1 | .NET 5.0 | - |
| Benchmark1 | .NET Core 2.1 | - |
| Benchmark1 | .NET Core 3.1 | 9 B |
| Benchmark1 | .NET Framework 4.8 | - |
| | | |
| Benchmark2 | .NET 5.0 | - |
| Benchmark2 | .NET Core 2.1 | - |
| Benchmark2 | .NET Core 3.1 | 20 B |
| Benchmark2 | .NET Framework 4.8 | - |
| | | |
| Benchmark3 | .NET 5.0 | - |
| Benchmark3 | .NET Core 2.1 | - |
| Benchmark3 | .NET Core 3.1 | 21 B |
| Benchmark3 | .NET Framework 4.8 | - |
| | | |
| Benchmark4 | .NET 5.0 | - |
| Benchmark4 | .NET Core 2.1 | - |
| Benchmark4 | .NET Core 3.1 | 178 B |
| Benchmark4 | .NET Framework 4.8 | - |
| | | |
| Benchmark5 | .NET 5.0 | - |
| Benchmark5 | .NET Core 2.1 | - |
| Benchmark5 | .NET Core 3.1 | 101 B |
| Benchmark5 | .NET Framework 4.8 | - | |
Ok I think it is the virtual slot backpatching storage. It's probably the rejit timing that is causing the allocation to happen earlier and not show up in the benchmark in 5.0. |
I am afraid that this could lead to BDN reporting "too perfect" results that could differ from what end-users with default settings are experiencing
I was also curious and tried to use VS Memory Profiler to find out, but I've failed. In this particular case the VS Profiler shows me memory allocated for JITTing of the methods that are executed for the first time: But it does not show me anything attributed to the TP Thread and |
I totally agree. But I also expect the users to be quite suprised when they see that BDN reports allocated memory for a code that clearly does not allocate anything in explicit way. |
Possibly. With aggressive tiering, it still goes through the normal tiering stages, just more quickly. It would change rejit timings and more methods may be rejitted, though likely the code quality would be similar to default mode (no guarantee though since it can depend on timing and code paths hit in the calls before rejit). The other option is to give the test more warmup time to stabilize. |
In #1542 @ronbrogan has reported a very unusual bug - a code that was CPU bound and not allocating at all was reporting allocations for .NET Core 3.1 (it works fine for 2.1 as there we use
GC.GetAllocatedBytesForCurrentThread
insteadGC.GetTotalAllocatedBytes
):After some investigation, I've narrowed down the problem to Tiered JIT thread that from time to time would be promoting methods from Tier 0 to Tier 1 and allocating memory during the iteration where we call
GC.GetTotalAllocatedBytes
.I had few ideas, but the only one worked was putting the thread to sleep for 250ms before we call
GC.GetTotalAllocatedBytes
. In this time TC thread kicks-in and promotes the methods. It's of course far from perfect as TC might not finish the promotion before we make the first call toGC.GetTotalAllocatedBytes
. I don't want to prolong the sleeping period because it would increase the time we need to run the benchmarks.@kouvel do you have a better idea of how we could prevent TC from working at a given point of time?
I've confirmed that it works as expected by modifying the
GetExtraStats
method to emit some extra events and filtering the TC events in PerfView to this particular period of time: