-
Notifications
You must be signed in to change notification settings - Fork 893
[core] Cleanup SRT state after a fork() (issue #3177) #3179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
API change: new API function |
Why should such a function appear in the public SRT API? Shouldn't this remain internal to libsrt, as an internal cleanup mechanism? It user applications do not need to call it, it may create some confusion. |
ls->second->m_QueuedSockets.erase(s->m_SocketID); | ||
} | ||
s->core().closeInternal(); | ||
operator delete(s); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is slick. This frees the memory, bypassing the destructor (including unsubscribing from the multiplexer and cleaning out its threads). I'm not sure if socket objects can be safely deleted this way - at least I'd do it by first closing all sockets, then waiting until all multiplexers are gone (this should not involve any waiting if we state that no threads are running, but still multiplexer objects will still exist, even if they don't have the duplicated UDP socket, and therefore they must be also destroyed). Only then can all sockets be deleted, and then the groups, stating that sockets were deleted "decently", that is, they were removed as a group member first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This frees the memory, bypassing the destructor
Why did you write this? The delete
C++ operator calls the destructor before freeing the memory. The following two forms are identical, as far as I understand:
operator delete(s);
delete s;
The one which frees memory without calling the destructor, in C++20, probably for desperate situations, is:
operator delete(s, std::destroying_delete);
Did I miss something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. You are wrong. https://en.cppreference.com/w/cpp/memory/new/operator_delete
The version from C++20 is irrelevant - it contains way more nuisances on its own.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I intentionally use delete without calling the destructor here. After a fork()
, all threads managing the sockets no longer exist in the child process. Since the destructor attempts to destroy mutexes and condition variables that are left in an undefined state, I chose to simply free the memory without performing any further cleanup to avoid potential issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't refer to just the fact that you freed the memory. What I meant was that there were more things to possibly destroy, maybe not through managing the data or exitting threads, but there are also bound objects - such as multiplexers. As file descriptors also get duplicated (every example of fork() speak about that), then definitely any open UDP sockets inside multiplexers must be closed as well. Not sure how mutexes and condition variables are handled in that case.
It's not wrong that you have only reclaimed memory and bypassed destructors - the problem is that it might be not enough to reallly free all resources that sockets occupied.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. You are wrong. https://en.cppreference.com/w/cpp/memory/new/operator_delete
You are right, my bad, delete
and operator delete
are not equivalent, I was confused by the C++20 trick.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That from C++20 works only if you overload the operator delete
in particular class this specific way - this will make the delete p
call bypass the destructor, exceptionally. I didn't even kow about this trick ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to what I have read about system objects in case of fork - at least on Linux, which has it defined - only one thread is duplicated, the one that calls fork(), and all mutexes and CVs are in the state derived from the parent. So, in order to do a complete cleanup of these things, all mutexes would have to be forcefully unlocked and CVs signaled, then all destroyed. Otherwise these objects constitute a leak. Theoretically most implementations of mutexes do not operate with dynamic allocation here, but none of the standards we use does give any guarantees on that. Of course, we don't risk any stuck threads here, as fork() only duplicates the current thread, but there is no guarantee that deallocation of memory containing a "sunken" mutex without performing any destruction action first is enough to free up all resources that the mutex was using.
Hi, I have tried this patch on the case from #3177. The symptoms are radically different, but still not working properly. For the record, sender command:
And receiver command, creating a process every 4 seconds, exhibiting the problem:
There is no longer any hung or zombie process. However, the SRT session is disconnected, probably due to SRT cleanup in the child process. The precise scenario is:
Reproducer:
If you can't reproduce this behaviour on a test program, I recommend to test with the exact commands above. I can help setting up TSDuck if necessary. |
Considering the discussions in the previous posts and the results which are observed, I wonder if these cleanup considerations are just over-engineering. First, in a multi-threaded application, we never fork to continue in the same executable on the long run. This is a parallelism pattern which can be easily replaced with multi-threading. So, if we are in a multithreaded application, we usually fork in two cases only:
In my scenario, the intermediate process is case 2 and the grand-child process is case 1. So, first consideration, we do not care about "leaks". There is no leak in threads because fork didn't recreate other threads. There is no memory leak either. Having N bytes of memory becoming unused once is just a theoretical leak, not a practical one. A practical leak is having N bytes left unused every X seconds, endlessly. In that case, and in that case only, you eventually run into a problem. Concerning the sockets (in UDP sense, not SRT socket), the child process should So, probably, a plausible scenario should be:
|
Hmm, this looks like then the only thing to do on fork() is to run over the list of multiplexers and close all UDP sockets in there. Nothing else matters, even memory allocated for the objects because all of this will be disposed anyway on exec(). |
In terms of file descriptors, probably. Not even necessary if However, there is probably some hidden issue. In the original problem #3177, the intermediate process hangs after returning from
So, there is something after completion of |
Ah, yes, this is possible. In case of after-fork destruction there's no thread to join and maybe this isn't properly checked... How about adding a condition |
Is it safe to assume that the thread state can be trusted after Why not simply set a global boolean in an atfork handler saying "after fork, in child process"? This boolean simply means "all libsrt data are unsafe, no thread is available, don't do anything on sockets, because it could interfere with parent process, which remains the sole owner of the SRT state, just close all file descriptors". |
Hi, With the latest commit, db8375d, the problem disappeared on Linux (not tested yet on macOS). There is no more hung or zombie process, the transmission continues on both sides, process are regularly created and complete without interference. |
I’m glad to hear that. Thank you for your feedback, @lelegard. I will double-check that the UDP sockets are properly closed, and if everything looks good, I will proceed with the merge. |
This change ensures proper cleanup and reinitialization of the libsrt internal state after a
fork()
system call. Since the child process does not inherit the parent’s threads, it must explicitly close all inherited SRT sockets and reinitialize the library by callingsrt_startup()
again.Without this cleanup, the child process may hang when calling
srt_cleanup()
. With this pull request, the child process should now exit cleanly without hanging.