[zeromq-dev] rc == 0 (./zmq/mutex.hpp:94)

Martin Hurton hurtonm at gmail.com
Mon Jul 13 19:19:49 CEST 2009


On Mon, Jul 13, 2009 at 6:15 PM, Aamir M<aamirjvm at gmail.com> wrote:
> Hello,
>
> I made some further modifications to mutex.hpp ... I changed the constructor to:
>
>        inline mutex_t ()
>        {
>            int rc = pthread_mutex_init (&mutex, NULL);
>            printf("0MQ MUTEX INIT: %p\n", (void*)&mutex);
>            if (rc)
>                posix_assert (rc);
>        }
>
> And the lock function to:
>
>        inline void lock ()
>        {
>            int rc = pthread_mutex_lock (&mutex);
>            if (rc)
>            {
>                printf("0MQ LOCK: %p\n", (void*)&mutex);
>                posix_assert (rc);
>            }
>        }
>
>
> I've attached the output. Notice that the mutex with memory address
> 0x100c9bc0 (which caused the lock to fail) seems to have never been
> initialized (since the address doesn't appear anywhere else in the
> output) ... Could it be that somehow the lock is being called before
> pthread_mutex_init has completed?
>
> One thing I don't understand ... I thought 0MQ uses lock-free queues
> ... so why is 0MQ calling mutex_lock so frequently? Is the mutex
> member of each 0MQ message?

I am afraid the ZeroMQ doesn't support lock free synchronisation on
PowerPC architecture now. The current implementation is based on
mutexes.

- Martin

>
> Thanks,
> Aamir
>
>
> On Mon, Jul 13, 2009 at 11:32 AM, Martin Hurton<hurtonm at gmail.com> wrote:
>> Hi Aamir,
>>
>> Please apply the attached patch to 0.6.1 tree and let us know what's
>> printed when the assertion fails.
>>
>> Regards,
>> Martin
>>
>> On Mon, Jul 13, 2009 at 4:36 PM, Aamir M<aamirjvm at gmail.com> wrote:
>>> Hello,
>>>
>>> We have a somewhat large/complex multi-threaded program that makes
>>> heavy use of 0MQ for both process-scope and network-scope messaging.
>>> Recently we implemented some changes and started seeing the following
>>> error:
>>>
>>> Success
>>> rc == 0 (./zmq/mutex.hpp:94)
>>> Aborted
>>>
>>> 0MQ is asserting on ./zmq/mutex.hpp:94 and aborting the program.
>>> Before the 0MQ assert occurs, some other function is causing the word
>>> "Success" to be printed onto the screen.
>>>
>>> What could be causing this problem? It is proving very difficult to
>>> debug this error because I have no idea which line triggers the
>>> problem. Like any other bug related to a multi-threaded race
>>> condition, the difficultly is compounded by the fact that the error
>>> only occurs SOME of the time (i.e. it cannot be deterministically
>>> reproduced).
>>>
>>> Does anyone have any ideas on how to isolate the offending code? When
>>> does 0MQ use this pthread mutex and how could this assert happen while
>>> sending / receiving messages?
>>>
>>> We have been careful to make sure that threads never share the same
>>> zmq_api object ... each thread has its own instance of zmq_api, so I
>>> don't think this could be the problem.
>>>
>>> Thanks,
>>> Aamir
>>> _______________________________________________
>>> zeromq-dev mailing list
>>> zeromq-dev at lists.zeromq.org
>>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>>
>>
>



More information about the zeromq-dev mailing list