[zeromq-dev] rc == 0 (./zmq/mutex.hpp:94)

Aamir M aamirjvm at gmail.com
Mon Jul 13 18:59:39 CEST 2009


I don't know if I'm on the right track, but I think I've collected
further evidence that the mutex is being locked before it's
initialized. I am printing the mutex memory addresses in the following
way:

        inline mutex_t ()
        {
            int rc = pthread_mutex_init (&mutex, NULL);
            printf("0MQ MUTEX INIT: %p\n", (void*)&mutex);
            if (rc)
                posix_assert (rc);
        }

        inline void lock ()
        {
            int rc = pthread_mutex_lock (&mutex);
            printf("0MQ LOCK: %p\n", (void*)&mutex);
            if (rc)
            {
                printf("0MQ BROKEN LOCK: %p\n", (void*)&mutex);
                posix_assert (rc);
            }
        }

Output text log is attached. The log shows that the "0MQ INIT" print
statement always outputs before a successful lock (as expected). And
whenever an unsuccessful lock occurs ("0MQ BROKEN LOCK") the
constructor "0MQ INIT" is never printed for that memory address (which
probably means that the mutex object was never initialized). Another
interesting thing is that the offending mutex lock occurs twice on the
same mutex object (presumably corresponding to my two receiver
threads).

The puzzle for me is whether the bug is due to me incorrectly using
the 0MQ library, or whether something more sinister is happening
inside the library.

Thanks again.

Aamir

On Mon, Jul 13, 2009 at 12:14 PM, Martin Hurton<hurtonm at gmail.com> wrote:
> Hi Aamir,
>
> What OS are you running your application on? And what thread library
> are you using?
>
> Regards,
> Martin
>
>
> On Mon, Jul 13, 2009 at 5:48 PM, Aamir M<aamirjvm at gmail.com> wrote:
>> I think I've narrowed down the cause of the error to one specific
>> messaging mechanism in our application ...
>>
>> I have a process-scope exchange and two threads with local-scope
>> queues receiving messages from the exchange. We started seeing the
>> error when we changed the process-scope exchange from load-balancing
>> style to data-distribution style. The error seems to go away if we we
>> turn off the receiver threads or switch it back to load-balancing.
>>
>> Thanks,
>> Aamir
>>
>> On Mon, Jul 13, 2009 at 11:32 AM, Martin Hurton<hurtonm at gmail.com> wrote:
>>> Hi Aamir,
>>>
>>> Please apply the attached patch to 0.6.1 tree and let us know what's
>>> printed when the assertion fails.
>>>
>>> Regards,
>>> Martin
>>>
>>> On Mon, Jul 13, 2009 at 4:36 PM, Aamir M<aamirjvm at gmail.com> wrote:
>>>> Hello,
>>>>
>>>> We have a somewhat large/complex multi-threaded program that makes
>>>> heavy use of 0MQ for both process-scope and network-scope messaging.
>>>> Recently we implemented some changes and started seeing the following
>>>> error:
>>>>
>>>> Success
>>>> rc == 0 (./zmq/mutex.hpp:94)
>>>> Aborted
>>>>
>>>> 0MQ is asserting on ./zmq/mutex.hpp:94 and aborting the program.
>>>> Before the 0MQ assert occurs, some other function is causing the word
>>>> "Success" to be printed onto the screen.
>>>>
>>>> What could be causing this problem? It is proving very difficult to
>>>> debug this error because I have no idea which line triggers the
>>>> problem. Like any other bug related to a multi-threaded race
>>>> condition, the difficultly is compounded by the fact that the error
>>>> only occurs SOME of the time (i.e. it cannot be deterministically
>>>> reproduced).
>>>>
>>>> Does anyone have any ideas on how to isolate the offending code? When
>>>> does 0MQ use this pthread mutex and how could this assert happen while
>>>> sending / receiving messages?
>>>>
>>>> We have been careful to make sure that threads never share the same
>>>> zmq_api object ... each thread has its own instance of zmq_api, so I
>>>> don't think this could be the problem.
>>>>
>>>> Thanks,
>>>> Aamir
>>>> _______________________________________________
>>>> zeromq-dev mailing list
>>>> zeromq-dev at lists.zeromq.org
>>>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>>>
>>>
>>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: zmq_mutex_errror.log
Type: application/octet-stream
Size: 13653 bytes
Desc: not available
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20090713/ba9dbe77/attachment.obj>


More information about the zeromq-dev mailing list