Attempt to expand listener to multi-threaded #1313

jiankyu · 2025-05-06T01:29:40Z

Attempt to resolve #1312. Changes summary can be found in commit message.
This is still working in progress, stability issue present. Bring in early to check if any failed CI job can shed a light.

This commit resolves issue pistacheio#1312. Listener now supports multiple "acceptor" threads handling new client socket connection. In http this has no benefit, as there is only one `accept()` syscall involved. However, if the server is armed with ssl, this significantly increases the server performance regarding tps. The single server thread impedes the tps in two ways: 1. Regular tcp handshake is "non-blocking" to server, as long as server sends the syn+ack frame, server side is considered established, the `accept()` returns. Whereas with ssl, the server needs to send the its cert and wait for client's response, this blocks the `SSL_accept()` call until the handshake completes. If the client comes from a remote data center with large latency, the latency contributes to the blocking of listener thread, resulting very low tps. 2. The ssl handshake involves cryptographic arithmetic, which is CPU bound. Even if client to server latency can be ignored, one listening thread is only able to handle ~1000 tps on a Intel Xeron core @2.1GHz. In this commit, we add two more options to Endpoint::Options that specifies the number of "acceptor" threads and the thread name, Listener::runThreaded() consumes them and initializes this number of threads. To make every thread has a equal chance to receive epoll events, the lock operation is removed before calling `poller.poll()`, there are only two fds registered to Listener's poller, the listen_fd and shutdownFd, there is no scenario regarding fd being removed while being handled.

include/pistache/config.h

version.txt

codecov · 2025-05-06T02:11:28Z

Codecov Report

Attention: Patch coverage is 80.00000% with 9 lines in your changes missing coverage. Please review.

Project coverage is 75.90%. Comparing base (31ef837) to head (0fd80ce).
Report is 2 commits behind head on master.

Files with missing lines	Patch %	Lines
src/server/endpoint.cc	40.00%	6 Missing ⚠️
src/server/listener.cc	91.42%	3 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #1313      +/-   ##
==========================================
- Coverage   76.29%   75.90%   -0.39%     
==========================================
  Files          64       63       -1     
  Lines       11115     9946    -1169     
==========================================
- Hits         8480     7550     -930     
+ Misses       2635     2396     -239

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Should bump minor version as this PR addes new interfaces.

dgreatwood · 2025-05-06T17:59:42Z

Hi @jiankyu -

Couple of things.

1/ Windows compile error

The build sees error:
../src/server/listener.cc(589): error C2065: 'lLoggedSetThreadDescriptionFail': undeclared identifier

This is in the code to set the "thread description" in the Windows case. I think you copied this over from reactor.cc? Which is fine. But you would also need to copy over (i.e. add to listener.cc) the declaration:
static std::atomic_bool lLoggedSetThreadDescriptionFail = false;

2/ Test case

I take it from the prior conversation that you are seeing the SIGPIPE failure from your own application. It seems the SIGPIPE is not demonstrated by the existing Pistache unit tests? Presuming not, could you add a test that activates the problem?
Most likely you would add the test in https_server_test.cc, though you could also inspect https_client_test_net.cc for possibly relevant example tests.

Thanks in advance.

kiplingw · 2025-05-06T18:02:45Z

@jiankyu, I don't know if it's possible through a unit test, but it would be helpful if you are able to benchmark before and after your (finalized) PR when ready. That will help us evaluate whether the underlying resource bottleneck problem has been resolved.

jiankyu · 2025-05-07T01:06:08Z

Hi @jiankyu -

Couple of things.

1/ Windows compile error

The build sees error: ../src/server/listener.cc(589): error C2065: 'lLoggedSetThreadDescriptionFail': undeclared identifier

This is in the code to set the "thread description" in the Windows case. I think you copied this over from reactor.cc? Which is fine. But you would also need to copy over (i.e. add to listener.cc) the declaration: static std::atomic_bool lLoggedSetThreadDescriptionFail = false;

2/ Test case

I take it from the prior conversation that you are seeing the SIGPIPE failure from your own application. It seems the SIGPIPE is not demonstrated by the existing Pistache unit tests? Presuming not, could you add a test that activates the problem? Most likely you would add the test in https_server_test.cc, though you could also inspect https_client_test_net.cc for possibly relevant example tests.

Thanks in advance.

@dgreatwood

Yes, I copied the code from reactor.cc, I will also copy the declaration to resolve the compile error.
Sure, I will add the test case to reveal the problem and show the performance data I got.

jiankyu · 2025-05-07T01:24:26Z

@jiankyu, I don't know if it's possible through a unit test, but it would be helpful if you are able to benchmark before and after your (finalized) PR when ready. That will help us evaluate whether the underlying resource bottleneck problem has been resolved.

I actually did the benchmarking. Below is the test data. The numbers are generated by a multi-threaded client request generator.
Column 1 is the number of acceptWorker threads, column 2 is the average requests per second during the test, column 3-5 are the median, 90th, and 99th percentile request latency. Last column, efficiency, is defined as ops / num_threads / ops-with-1-thread, for example, with 2 threads, I got 1979.47 ops, its efficiency is 1979.47 / 2 / 1062.23 = 93.18%.

#threads        op/s       P50     P90     P99   efficiency
1            1062.23    37.282  38.688  42.565   100%
2            1979.47    20.087  21.730  23.066   93.18%
3            2850.90    13.996  15.079  16.234   89.46%
4            3820.15    10.361  11.429  13.018   89.90%
5            4397.98     9.035   9.824  10.720   82.81%
6            5085.45     7.764   8.548   9.441   79.79%
7            5874.28     6.742   7.517   8.337   79.00%
8            6431.21     6.110   6.860   7.689   75.68%
9            6939.28     5.640   6.495   7.375   72.59%
10           7561.68     5.156   5.962   6.860   71.19%
15           9806.96     3.935   4.874   5.794   61.55%
20          12185.99     3.119   3.995   5.292   57.36%
25          12980.32     2.984   3.786   5.209   48.87%
30          14252.14     2.698   3.617   4.996   44.72%
35          14826.37     2.481   3.588   5.674   39.88%
40          14826.68     2.477   3.635   5.714   34.90%
45*         15187.71     2.710   4.101   6.111   31.77%
50*         15304.74     3.016   4.558   6.615   28.81%
60*         15156.50     3.672   5.490   7.697   23.78%

As we can see, the maximum ops we can get is 15304.74 at 50 threads, further increasing the threads doesn't increase the ops.
Also, as we increase the threads, the efficiency drops, the performance gain doesn't go linearly with the threads. This makes me think of a different approach to resolve the problem - maybe we can offload the ssl handshake to the AsyncImpl::Worker thread, listener is still single threaded, but it only does accept(). This requires change in the Peer class, and listener has to provide a read-only inteface to expose its ssl context to the workers so they can create ssl object later. I didn't take this approach because it involves more change across different classes, I'm not quite confident about it, but theoretically this approach should have better performance.

dgreatwood · 2025-05-07T02:34:12Z

Cool. Out of interest, what kind of CPU was it, with how many parallel execution units?

Also, as per @kiplingw's prior comment, it would be awesome to put some form of this - e.g. 1, 2, 25, 50, 75 threads for a few seconds each - into a unit test if practical.

jiankyu · 2025-05-07T02:58:21Z

$ lscpu
Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         46 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  104
  On-line CPU(s) list:   0-103
Vendor ID:               GenuineIntel
  Model name:            Intel(R) Xeon(R) Gold 6230R CPU @ 2.10GHz
    CPU family:          6
    Model:               85
    Thread(s) per core:  2
    Core(s) per socket:  26
    Socket(s):           2
    Stepping:            7
    CPU(s) scaling MHz:  49%
    CPU max MHz:         4000.0000
    CPU min MHz:         1000.0000
    BogoMIPS:            4200.00

jiankyu · 2025-05-09T00:11:02Z

Unfortunately the current Http::Experimental::Client() can't be used to benchmark this. There are couple of problems with it.
The blocking issue is, it reuses connection and there is no way to opt out, I use this client to send out 10k requests, there are only 1 connection establishment captured.
Secondly, the implementation seems also buggy, it allows setting the "threads" used by underlying reactor, but only the first thread is doing things, when it gets 100% CPU, it can do around 2400 requests per second, while all the other threads are idle.

The Client performance issue can be resolved by multiple clients making requests simultaneously, but I got stuck at the connection reuse problem, I tried different fixes but none of them works so far.

The shutdownFd is used to tell the acceptor worker thread to exit, when there is multiple threads, the current implementation is not enough to ensure every worker has a chance to get the event, if it's busy doing something, it will miss the notification. Without changing the NotifyFd interface, the guaranteed way is to do a loop notification and check the number of alive worker threads, keep notifying them until all the threads are done.

kiplingw reviewed May 6, 2025

View reviewed changes

include/pistache/config.h Show resolved Hide resolved

kiplingw reviewed May 6, 2025

View reviewed changes

version.txt Outdated Show resolved Hide resolved

code style: fix alignment issue

27490dd

kiplingw requested review from dgreatwood, kiplingw and Tachi107 May 6, 2025 02:45

kiplingw assigned jiankyu May 6, 2025

kiplingw added bug enhancement refactor performance fix in progress security labels May 6, 2025

jiankyu added 2 commits May 6, 2025 10:12

issue-1312: bump patch version for adding new interfaces

210743d

issue-1312: bump minor version

4f52848

Should bump minor version as this PR addes new interfaces.

issue-1312: try to resolve windows build error

299cc5e

jiankyu force-pushed the issue-1312 branch from 0fd80ce to 299cc5e Compare May 7, 2025 18:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Attempt to expand listener to multi-threaded #1313

Attempt to expand listener to multi-threaded #1313

Uh oh!

jiankyu commented May 6, 2025

Uh oh!

Uh oh!

Uh oh!

codecov bot commented May 6, 2025 •

edited

Loading

Uh oh!

dgreatwood commented May 6, 2025

Uh oh!

kiplingw commented May 6, 2025

Uh oh!

jiankyu commented May 7, 2025

Uh oh!

jiankyu commented May 7, 2025

Uh oh!

dgreatwood commented May 7, 2025

Uh oh!

jiankyu commented May 7, 2025

Uh oh!

jiankyu commented May 9, 2025

Uh oh!

Uh oh!

Attempt to expand listener to multi-threaded #1313

Are you sure you want to change the base?

Attempt to expand listener to multi-threaded #1313

Uh oh!

Conversation

jiankyu commented May 6, 2025

Uh oh!

Uh oh!

Uh oh!

codecov bot commented May 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

dgreatwood commented May 6, 2025

Uh oh!

kiplingw commented May 6, 2025

Uh oh!

jiankyu commented May 7, 2025

Uh oh!

jiankyu commented May 7, 2025

Uh oh!

dgreatwood commented May 7, 2025

Uh oh!

jiankyu commented May 7, 2025

Uh oh!

jiankyu commented May 9, 2025

Uh oh!

Uh oh!

codecov bot commented May 6, 2025 •

edited

Loading