[zeromq-dev] zmq "bug"

Martin Sustrik sustrik at fastmq.com
Mon Dec 22 17:36:40 CET 2008


Matus Hamorsky wrote:
> On Mon, Dec 22, 2008 at 3:23 PM, Martin Sustrik <sustrik at fastmq.com> wrote:
>> Matus Hamorsky wrote:
>>> Comments inlined
>>>
>>> On Mon, Dec 22, 2008 at 2:49 PM, Martin Sustrik <sustrik at fastmq.com>
>>> wrote:
>>>> Comments inlined:
>>>>
>>>>> zmq_server &
>>>>> ./q_test.py
>>>>> ./q_send.py Q_TEST hello
>>>>>  -- q_test prints 'hello'
>>>>> killall zmq_server
>>>> At this point q_test still runs, however, there is no directory service
>>>> (zmq_server) running that is able to identify actual location of q_test
>>>> on
>>>> the network.
>>> True, but the problem is that I need to reach that non-responding
>>> process. The q_test process is still running correctly and I need
>>> either to work after restarting the zmq_server or to terminate
>>> gracefully (to close all files, transactions, connections etc.).
>>>
>>> Maybe if zmq_server stored the queue:port bindings in a persistent manner,
>>> such as shared memory or like, so the information would persist
>>> between zmq_server restarts.
>> The right solution IMO would be that individual client would monitor its
>> connection to the zmq_server. Once it fails it should try to reconnect and
>> re-register all its global objects (queues & exchanges).
>>
>> First draft for this functionality is already implemented in 0.3.3 branch.
>> We would like to move it to trunk to appear in 0.5 release.
>>
>> Martin
> 
> The problem with this is that you may have a resource-heavy process
> (holding pre-allocated memory or database connections).
> After the zmq_servers fails the process is still listening on the
> queue (port) and everything is working perfectly.
> But the proposed solution would cause the process to go down if it
> cannot re-acquire the queue name or exchange name.

No, that wasn't what I meant. The idea is that applications will 
continue running as if nothing happened. At the same time they'll try to 
reconnect to zmq_server and re-register their global objects (so that 
their addresses are available for new applications).

> In my case, the restart logic would have to be moved from one central
> location (the process supervisor) to each and every module that uses
> queues or exchanges.

Yes.

> Also, for each language binding this would have to perform a callback
> or return a special value from receive.
> 
> I am not sure how stable it would be to call a python-function from a
> OS-callback handler (signal or IOCP thread) while the python runtime
> is doing some other thing.

To be investigated. If you have any idea of how to implement it, you are 
welcome. However, in many use cases the callback can be avoided 
altogether by using auto-reconnect functionality (as already done in 
0.3.3 branch). That way application won't be even notified about 
connection failure. The connection will be re-established automatically. 
  Optionally, you can ask for a special message with "there are messages 
missing at this point" semantics to be inserted into the queue.

Martin



More information about the zeromq-dev mailing list