[zeromq-dev] are zmq::atomic_ptr_t<> Helgrind warnings known?
Luca Boccassi
luca.boccassi at gmail.com
Sun Feb 25 20:32:54 CET 2018
On Sun, 2018-02-25 at 13:22 -0500, Bill Torpey wrote:
> Hi Franceso:
>
> A few more points below …
>
> Good luck, and please post back if you find out anything interesting!
>
> Regards,
>
> Bill
>
> > On Feb 25, 2018, at 4:54 AM, Francesco <francesco.montorsi at gmail.co
> > m> wrote:
> >
> > Hi Bill,
> > thanks for your answer.
> >
> >
> >
> > 2018-02-24 21:49 GMT+01:00 Bill Torpey <wallstprog at gmail.com
> > <mailto:wallstprog at gmail.com>>:
> > ...
> > If T2 and T4 were application code, this would be a clear violation
> > of ZMQ’s threading rules (assuming “legacy”, non-thread-safe socket
> > types).
> > Right.
> >
> >
> > For instance, one technique would be to perform initialization
> > (e.g., bind’s) in the main thread, and only after that is done spin
> > up other threads to do processing. In this case, TSAN wouldn't
> > have any way to know that the application guarantees that accesses
> > cannot result in a race, so TSAN would flag it. I’ve gotten in the
> > habit of using mutexes to protect code like this even though it
> > should not strictly be needed, just to keep tools like TSAN happy,
> > and also because I don’t know the ZMQ code well enough to be 100%
> > certain that the mutexes are not necessary — better safe than
> > sorry!
> > Yeah, that's a possibility but it results in a lot of "clutter"
> > that decrease code readability and makes it harder to maintain in
> > the long run...
> >
> > This situation is different, though, since T4 is not an application
> > thread — it’s an internal ZMQ worker thread. So, I think in this
> > case we kind of have to accept that ZMQ is doing the right thing
> > here.
> > At least, that’s the approach I’ve been taking. When I instrument
> > my apps and libraries with TSAN I specifically do NOT instrument
> > ZMQ, and I also use the “called_from_lib:libzmq.so” suppression
> > (which is listed as an example for TSAN: https://github.com/google/
> > sanitizers/wiki/ThreadSanitizerSuppressions
> > <https://github.com/google/sanitizers/wiki/ThreadSanitizerSuppressi
> > ons>).
> > Understood. I have a question though: if you
> > use “called_from_lib:libzmq.so” suppression, are you able to
> > spot the race condition due to T2 and T4 being application threads
> > (instead of being 1 application and 1 zmq internal)?
> > I wonder if TSAN, detecting that one of the 2 threads generating
> > the data race is inside ZMQ, entirely suppress the race warning or
> > instead will suppress only race conditions involving 2 internal zmq
> > threads..
>
> Good question. I could only find one post that discusses this
> suppression: https://groups.google.com/forum/#!topic/thread-sanitizer
> /NEcgiPEG0N8 <https://groups.google.com/forum/#!topic/thread-
> sanitizer/NEcgiPEG0N8>
>
> called_from_lib suppresses only interceptors (like read or
> memset) called directly from the lib. It's intended for non-
> instrumented libraries.
>
> However, when I try this with my test code, enabling the suppression
> actually increases the number of false positives reported by
> TSAN. Disabling the suppression results in a smaller number of
> mostly different false positives. You may want to experiment with
> this — I plan to take another look at whether enabling this
> suppression is a good idea based on what I’ve seen in my tests.
>
> Unfortunately, it’s not possible to use the race:libzmq.so
> suppression to avoid all false positives in ZMQ, since that
> suppresses ALL warnings where libzmq.so appears ANYWHERE in the stack
> trace, and that is much too broad.
>
> So, there’s no simple answer. I’ve developed some scripts that parse
> the output of TSAN and generate MD5 hashes of the stack traces, which
> can then be used to suppress individual stack traces. Going that
> route is a lot of work, but it’s the only way I know of at this time
> to provide more granular suppressions with TSAN. It would be nice if
> the suppression mechanism in TSAN were more robust (e.g., more like
> valgrind’s), but it isn’t.
>
> >
> >
> > Instrumenting libzmq and/or omitting the suppression causes a LOT
> > of warnings, esp. in the ZMQ worker threads. So, unless I'm
> > willing to commit the time and effort to go through and investigate
> > each of these warnings, I feel I have little choice but to accept
> > that at this point in its lifetime ZMQ should be race-free for all
> > practical purposes.
> >
> > FWIW, I’ve done fairly extensive testing, and specifically stress
> > testing, and have yet to find anything that looks like an honest-
> > to-goodness bug in ZMQ. (Which is not to say that the docs are
> > always clear about what to expect in certain situations ;-) I did
> > have one problem which appears to have been a bug in epoll, and
> > which was resolved by upgrading Linux, but that’s it.
> >
> > I agree, I was just surprised to see so many warnings...
> >
> >
> > BTW, there’s an excellent overview of how this all works at http://
> > zeromq.org/whitepapers:architecture
> > <http://zeromq.org/whitepapers:architecture> — although it’s
> > somewhat old, it appears to still be relatively accurate.
> >
> > thanks for the link, I read that and it's quite interesting.
> > However it does not mention how multi-thread safety is achieved.
> > Just out of curiosity I will take a look at ypipe implementation.
>
> Yes, that would be interesting. Luca is saying that ZMQ is basically
> race-free, but I have never seen any documentation on how that is
> achieved or verified. From a quick look at the code, it appears that
> ZMQ uses a combination of plain old pthread mutexes along with
> knowledge of ZMQ’s internal threading architecture to know when
> mutexes are unnecessary.
The queues and pipes use atomic pointer-swapping and a lock-free
single-writer-single-reader algorithm (somewhat similar, although much
simpler and focused for a single use case, to some of the data
structures provided by liburcu)
You can see the implementation in ypipe.hpp and yqueue.hpp, they are
both fairly small.
--
Kind regards,
Luca Boccassi
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: This is a digitally signed message part
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20180225/059fc53e/attachment.sig>
More information about the zeromq-dev
mailing list