[zeromq-dev] are zmq::atomic_ptr_t<> Helgrind warnings known?

Luca Boccassi luca.boccassi at gmail.com
Sun Feb 25 20:32:54 CET 2018


On Sun, 2018-02-25 at 13:22 -0500, Bill Torpey wrote:
> Hi Franceso:
> 
> A few more points below …
> 
> Good luck, and please post back if you find out anything interesting!
> 
> Regards,
> 
> Bill
> 
> > On Feb 25, 2018, at 4:54 AM, Francesco <francesco.montorsi at gmail.co
> > m> wrote:
> > 
> > Hi Bill,
> > thanks for your answer.
> > 
> > 
> > 
> > 2018-02-24 21:49 GMT+01:00 Bill Torpey <wallstprog at gmail.com
> > <mailto:wallstprog at gmail.com>>:
> > ...
> > If T2 and T4 were application code, this would be a clear violation
> > of ZMQ’s threading rules (assuming “legacy”, non-thread-safe socket
> > types).  
> > Right. 
> > 
> >  
> > For instance, one technique would be to perform initialization
> > (e.g., bind’s) in the main thread, and only after that is done spin
> > up other threads to do processing.  In this case, TSAN wouldn't
> > have any way to know that the application guarantees that accesses
> > cannot result in a race, so TSAN would flag it.  I’ve gotten in the
> > habit of using mutexes to protect code like this even though it
> > should not strictly be needed, just to keep tools like TSAN happy,
> > and also because I don’t know the ZMQ code well enough to be 100%
> > certain that the mutexes are not necessary — better safe than
> > sorry!
> > Yeah, that's a possibility but it results in a lot of "clutter"
> > that decrease code readability and makes it harder to maintain in
> > the long run...
> > 
> > This situation is different, though, since T4 is not an application
> > thread — it’s an internal ZMQ worker thread.   So, I think in this
> > case we kind of have to accept that ZMQ is doing the right thing
> > here.
> > At least, that’s the approach I’ve been taking.  When I instrument
> > my apps and libraries with TSAN I specifically do NOT instrument
> > ZMQ, and I also use the “called_from_lib:libzmq.so” suppression
> > (which is listed as an example for TSAN: https://github.com/google/
> > sanitizers/wiki/ThreadSanitizerSuppressions
> > <https://github.com/google/sanitizers/wiki/ThreadSanitizerSuppressi
> > ons>).
> > Understood. I have a question though: if you
> > use   “called_from_lib:libzmq.so”  suppression, are you able to
> > spot the race condition due to T2 and T4 being application threads
> > (instead of being 1 application and 1 zmq internal)? 
> > I wonder if TSAN, detecting that one of the 2 threads generating
> > the data race is inside ZMQ, entirely suppress the race warning or
> > instead will suppress only race conditions involving 2 internal zmq
> > threads..
> 
> Good question.  I could only find one post that discusses this
> suppression: https://groups.google.com/forum/#!topic/thread-sanitizer
> /NEcgiPEG0N8 <https://groups.google.com/forum/#!topic/thread-
> sanitizer/NEcgiPEG0N8>
> 
> 	called_from_lib suppresses only interceptors (like read or
> memset) called directly from the lib. It's intended for non-
> instrumented libraries. 
> 
> However, when I try this with my test code, enabling the suppression
> actually increases the number of false positives reported by
> TSAN.  Disabling the suppression results in a smaller number of
> mostly different false positives.  You may want to experiment with
> this — I plan to take another look at whether enabling this
> suppression is a good idea based on what I’ve seen in my tests.
> 
> Unfortunately, it’s not possible to use the race:libzmq.so
> suppression to avoid all false positives in ZMQ, since that
> suppresses ALL warnings where libzmq.so appears ANYWHERE in the stack
> trace, and that is much too broad.
> 
> So, there’s no simple answer.  I’ve developed some scripts that parse
> the output of TSAN and generate MD5 hashes of the stack traces, which
> can then be used to suppress individual stack traces.  Going that
> route is a lot of work, but it’s the only way I know of at this time
> to provide more granular suppressions with TSAN.  It would be nice if
> the suppression mechanism in TSAN were more robust (e.g., more like
> valgrind’s), but it isn’t.
> 
> > 
> > 
> > Instrumenting libzmq and/or omitting the suppression causes a LOT
> > of warnings, esp. in the ZMQ worker threads.  So, unless I'm
> > willing to commit the time and effort to go through and investigate
> > each of these warnings, I feel I have little choice but to accept
> > that at this point in its lifetime ZMQ should be race-free for all
> > practical purposes.
> > 
> > FWIW, I’ve done fairly extensive testing, and specifically stress
> > testing, and have yet to find anything that looks like an honest-
> > to-goodness bug in ZMQ.  (Which is not to say that the docs are
> > always clear about what to expect in certain situations ;-)  I did
> > have one problem which appears to have been a bug in epoll, and
> > which was resolved by upgrading Linux, but that’s it.
> > 
> > I agree, I was just surprised to see so many warnings... 
> > 
> >  
> > BTW, there’s an excellent overview of how this all works at http://
> > zeromq.org/whitepapers:architecture
> > <http://zeromq.org/whitepapers:architecture> — although it’s
> > somewhat old, it appears to still be relatively accurate.
> > 
> > thanks for the link, I read that and it's quite interesting.
> > However it does not mention how multi-thread safety is achieved. 
> > Just out of curiosity I will take a look at ypipe implementation.
> 
> Yes, that would be interesting. Luca is saying that ZMQ is basically
> race-free, but I have never seen any documentation on how that is
> achieved or verified. From a quick look at the code, it appears that
> ZMQ uses a combination of plain old pthread mutexes along with
> knowledge of ZMQ’s internal threading architecture to know when
> mutexes are unnecessary.  

The queues and pipes use atomic pointer-swapping and a lock-free
single-writer-single-reader algorithm (somewhat similar, although much
simpler and focused for a single use case, to some of the data
structures provided by liburcu)

You can see the implementation in ypipe.hpp and yqueue.hpp, they are
both fairly small.

-- 
Kind regards,
Luca Boccassi
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: This is a digitally signed message part
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20180225/059fc53e/attachment.sig>


More information about the zeromq-dev mailing list