[zeromq-dev] are zmq::atomic_ptr_t<> Helgrind warnings known?
Bill Torpey
wallstprog at gmail.com
Sun Feb 25 19:22:16 CET 2018
Hi Franceso:
A few more points below …
Good luck, and please post back if you find out anything interesting!
Regards,
Bill
> On Feb 25, 2018, at 4:54 AM, Francesco <francesco.montorsi at gmail.com> wrote:
>
> Hi Bill,
> thanks for your answer.
>
>
>
> 2018-02-24 21:49 GMT+01:00 Bill Torpey <wallstprog at gmail.com <mailto:wallstprog at gmail.com>>:
> ...
> If T2 and T4 were application code, this would be a clear violation of ZMQ’s threading rules (assuming “legacy”, non-thread-safe socket types).
> Right.
>
>
> For instance, one technique would be to perform initialization (e.g., bind’s) in the main thread, and only after that is done spin up other threads to do processing. In this case, TSAN wouldn't have any way to know that the application guarantees that accesses cannot result in a race, so TSAN would flag it. I’ve gotten in the habit of using mutexes to protect code like this even though it should not strictly be needed, just to keep tools like TSAN happy, and also because I don’t know the ZMQ code well enough to be 100% certain that the mutexes are not necessary — better safe than sorry!
> Yeah, that's a possibility but it results in a lot of "clutter" that decrease code readability and makes it harder to maintain in the long run...
>
> This situation is different, though, since T4 is not an application thread — it’s an internal ZMQ worker thread. So, I think in this case we kind of have to accept that ZMQ is doing the right thing here.
> At least, that’s the approach I’ve been taking. When I instrument my apps and libraries with TSAN I specifically do NOT instrument ZMQ, and I also use the “called_from_lib:libzmq.so” suppression (which is listed as an example for TSAN: https://github.com/google/sanitizers/wiki/ThreadSanitizerSuppressions <https://github.com/google/sanitizers/wiki/ThreadSanitizerSuppressions>).
> Understood. I have a question though: if you use “called_from_lib:libzmq.so” suppression, are you able to spot the race condition due to T2 and T4 being application threads (instead of being 1 application and 1 zmq internal)?
> I wonder if TSAN, detecting that one of the 2 threads generating the data race is inside ZMQ, entirely suppress the race warning or instead will suppress only race conditions involving 2 internal zmq threads..
Good question. I could only find one post that discusses this suppression: https://groups.google.com/forum/#!topic/thread-sanitizer/NEcgiPEG0N8 <https://groups.google.com/forum/#!topic/thread-sanitizer/NEcgiPEG0N8>
called_from_lib suppresses only interceptors (like read or memset) called directly from the lib. It's intended for non-instrumented libraries.
However, when I try this with my test code, enabling the suppression actually increases the number of false positives reported by TSAN. Disabling the suppression results in a smaller number of mostly different false positives. You may want to experiment with this — I plan to take another look at whether enabling this suppression is a good idea based on what I’ve seen in my tests.
Unfortunately, it’s not possible to use the race:libzmq.so suppression to avoid all false positives in ZMQ, since that suppresses ALL warnings where libzmq.so appears ANYWHERE in the stack trace, and that is much too broad.
So, there’s no simple answer. I’ve developed some scripts that parse the output of TSAN and generate MD5 hashes of the stack traces, which can then be used to suppress individual stack traces. Going that route is a lot of work, but it’s the only way I know of at this time to provide more granular suppressions with TSAN. It would be nice if the suppression mechanism in TSAN were more robust (e.g., more like valgrind’s), but it isn’t.
>
>
> Instrumenting libzmq and/or omitting the suppression causes a LOT of warnings, esp. in the ZMQ worker threads. So, unless I'm willing to commit the time and effort to go through and investigate each of these warnings, I feel I have little choice but to accept that at this point in its lifetime ZMQ should be race-free for all practical purposes.
>
> FWIW, I’ve done fairly extensive testing, and specifically stress testing, and have yet to find anything that looks like an honest-to-goodness bug in ZMQ. (Which is not to say that the docs are always clear about what to expect in certain situations ;-) I did have one problem which appears to have been a bug in epoll, and which was resolved by upgrading Linux, but that’s it.
>
> I agree, I was just surprised to see so many warnings...
>
>
> BTW, there’s an excellent overview of how this all works at http://zeromq.org/whitepapers:architecture <http://zeromq.org/whitepapers:architecture> — although it’s somewhat old, it appears to still be relatively accurate.
>
> thanks for the link, I read that and it's quite interesting. However it does not mention how multi-thread safety is achieved.
> Just out of curiosity I will take a look at ypipe implementation.
Yes, that would be interesting. Luca is saying that ZMQ is basically race-free, but I have never seen any documentation on how that is achieved or verified. From a quick look at the code, it appears that ZMQ uses a combination of plain old pthread mutexes along with knowledge of ZMQ’s internal threading architecture to know when mutexes are unnecessary.
>
>
>> I tried forcing use of C++11 internal atomics and all the warnings about zmq::atomic_ptr_t<> disappear (I still get tons about ypipe and "pipe")..
>> maybe it's best to prefer C++11 atomics when available?
>
> I would actually suggest using plain old pthread for synchronization primitives — TSAN’s support for pthread is pretty mature. Support for C++11 is a bit less mature, and has certain constraints (https://clang.llvm.org/docs/ThreadSanitizer.html <https://clang.llvm.org/docs/ThreadSanitizer.html>):
>
> However using old pthread for synchronization you probably loose a lot in performances... if C++11 atomics are supported by TSAN just fine, why not preferring them over the GCC intrinsics?
At least on Linux, pthread is implemented in terms of futexes, which should perform reasonably well, at least in situations where mutexes are not heavily contested. (With futexes, acquiring an uncontested mutex is an atomic operation that happens completely in userspace).
Plus pthread support in TSAN is more mature, so that may be a more practical approach in terms of avoiding false positives.
In a pinch you could test with pthread even if you plan to deploy with another mechanism.
>
>
> ThreadSanitizer is in beta stage. It is known to work on large C++ programs using pthreads, but we do not promise anything (yet). C++11 threading is supported with llvm libc++. The test suite is integrated into CMake build and can be run with make check-tsan command.
>
> In particular, to use C++11 you also need to use llvm’s libcxx, rather than gnu’s libstdc++.
>
> I think that the quoted text is referring to "C++11 threading" that is to http://en.cppreference.com/w/cpp/thread <http://en.cppreference.com/w/cpp/thread>
> As I mentioned from a simple test I do see that at least gcc 5.3.0 TSAN is handling C++11 atomics fine (without forcing you to link LLVM libc++)
Right — gcc does support TSAN with libstdc++; I should have clarified that I was referring to clang. gcc is also able to support TSAN in a shared library — I believe that clang requires that TSAN libs be statically linked with the main() function (or at least it did — that may have changed with newer versions of clang).
OTOH, TSAN support in gcc tends to lag behind the support in clang — if you are able to use clang for TSAN builds you will be using a (much?) more recent version.
>
>
>
>> Btw I will try to write a suppression file for ThreadSanitizer and ZMQ... I just start to doubt: is all ZMQ code really race-free? :)
>
> As mentioned above, that is likely to be a huge undertaking. At this point, I think the only confirmation we have that ZMQ is race-free is based on “black-box” testing. The good news is that ZMQ appears to be used widely enough that the code is well-tested. (Of course, failure to find a bug doesn’t prove that there isn’t one, so having a rigorous analysis of ZMQ’s threading behavior would be A Good Thing — but a big job).
>
> Yeah, I agree!
>
>
> Thanks,
> Francesco
>
>
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> https://lists.zeromq.org/mailman/listinfo/zeromq-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20180225/5325122d/attachment.htm>
More information about the zeromq-dev
mailing list