[zeromq-dev] zmq "bug"
Martin Sustrik
sustrik at fastmq.com
Mon Dec 22 17:36:40 CET 2008
Matus Hamorsky wrote:
> On Mon, Dec 22, 2008 at 3:23 PM, Martin Sustrik <sustrik at fastmq.com> wrote:
>> Matus Hamorsky wrote:
>>> Comments inlined
>>>
>>> On Mon, Dec 22, 2008 at 2:49 PM, Martin Sustrik <sustrik at fastmq.com>
>>> wrote:
>>>> Comments inlined:
>>>>
>>>>> zmq_server &
>>>>> ./q_test.py
>>>>> ./q_send.py Q_TEST hello
>>>>> -- q_test prints 'hello'
>>>>> killall zmq_server
>>>> At this point q_test still runs, however, there is no directory service
>>>> (zmq_server) running that is able to identify actual location of q_test
>>>> on
>>>> the network.
>>> True, but the problem is that I need to reach that non-responding
>>> process. The q_test process is still running correctly and I need
>>> either to work after restarting the zmq_server or to terminate
>>> gracefully (to close all files, transactions, connections etc.).
>>>
>>> Maybe if zmq_server stored the queue:port bindings in a persistent manner,
>>> such as shared memory or like, so the information would persist
>>> between zmq_server restarts.
>> The right solution IMO would be that individual client would monitor its
>> connection to the zmq_server. Once it fails it should try to reconnect and
>> re-register all its global objects (queues & exchanges).
>>
>> First draft for this functionality is already implemented in 0.3.3 branch.
>> We would like to move it to trunk to appear in 0.5 release.
>>
>> Martin
>
> The problem with this is that you may have a resource-heavy process
> (holding pre-allocated memory or database connections).
> After the zmq_servers fails the process is still listening on the
> queue (port) and everything is working perfectly.
> But the proposed solution would cause the process to go down if it
> cannot re-acquire the queue name or exchange name.
No, that wasn't what I meant. The idea is that applications will
continue running as if nothing happened. At the same time they'll try to
reconnect to zmq_server and re-register their global objects (so that
their addresses are available for new applications).
> In my case, the restart logic would have to be moved from one central
> location (the process supervisor) to each and every module that uses
> queues or exchanges.
Yes.
> Also, for each language binding this would have to perform a callback
> or return a special value from receive.
>
> I am not sure how stable it would be to call a python-function from a
> OS-callback handler (signal or IOCP thread) while the python runtime
> is doing some other thing.
To be investigated. If you have any idea of how to implement it, you are
welcome. However, in many use cases the callback can be avoided
altogether by using auto-reconnect functionality (as already done in
0.3.3 branch). That way application won't be even notified about
connection failure. The connection will be re-established automatically.
Optionally, you can ask for a special message with "there are messages
missing at this point" semantics to be inserted into the queue.
Martin
More information about the zeromq-dev
mailing list