The Wayback Machine - https://web.archive.org/web/20211228162208/https://github.com/dotnet/aspnetcore/pull/39216
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster ParseHeaders #39216

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft

Faster ParseHeaders #39216

wants to merge 1 commit into from

Conversation

@EgorBo
Copy link
Contributor

@EgorBo EgorBo commented Dec 28, 2021

Judging by the native traces from Platform-Plaintext TE benchmark (linux-x64) it seems we spend some noticeable time in ParseHeaders:
image

The current algorithm walks span of data to find "\r\n" and then it tries to extract name and value while doing some validation checks and trimming value. My implementation walks span just once, first it tries to find ":" using a sort of IndexOfAny(span, ':', ' ', '\t', '\n', '\r') function to find ':''s position and makes sure none of illegal symbols come before it.

I haven't done Arm64 support (waiting for some feedback on this one first) and haven't tested AVX vs SSE yet, but from what I have here it seems to show quite some stable improvements for Platform-Plaintext (up to +250_000 RPS):

                                             baseline                                mychanges               diff
[] | CPU Usage (%)          |                                   92 |                                   93 |  +1.09% |
[] | Cores usage (%)        |                                2,590 |                                2,617 |  +1.04% |
[] | Working Set (MB)       |                                   37 |                                   37 |   0.00% |
[] | Private Memory (MB)    |                                  370 |                                  370 |   0.00% |
[] | Start Time (ms)        |                                    0 |                                    0 |         |
[] | First Request (ms)     |                                   62 |                                   60 |  -3.23% |
[] | Requests/sec           |                           11,625,062 |                           11,875,956 |  +2.16% |
[] | Requests               |                          175,435,163 |                          179,301,568 |  +2.20% |
[] | Mean latency (ms)      |                                 1.21 |                                 1.16 |  -4.13% |
[] | Max latency (ms)       |                                53.73 |                                61.02 | +13.57% |
[] | Bad responses          |                                    0 |                                    0 |         |
[] | Socket errors          |                                    0 |                                    0 |         |
[] | Read throughput (MB/s) |                             1,392.64 |                             1,423.36 |  +2.21% |
[] | Latency 50th (ms)      |                                 0.71 |                                 0.69 |  -2.95% |
[] | Latency 75th (ms)      |                                 1.07 |                                 1.04 |  -2.80% |
[] | Latency 90th (ms)      |                                 1.79 |                                 1.78 |  -0.56% |
[] | Latency 99th (ms)      |                                13.91 |                                12.92 |  -7.12% |

(max RPS for my changes were around 11,913,453 req/s)

Test methodology

I was using crank like this:

crank --profile aspnet-citrine-lin \
 --application.framework net7.0 \
 --config https://raw.githubusercontent.com/aspnet/Benchmarks/main/scenarios/platform.benchmarks.yml \
 --scenario plaintext \
 --json t1.json \
 --application.options.outputFiles "/path/to/Microsoft.AspNetCore.Server.Kestrel.Core.dll"

where Microsoft.AspNetCore.Server.Kestrel.Core.dll is either a "baseline" or "baseline + my changes"

Also, I didn't optimize the Multi-span case (where header is split between two reads) but from what I see it happens quite rarely (I can print some statistics from our benchmarks if you need it).

@msftbot msftbot bot added the area-runtime label Dec 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

1 participant