[zeromq-dev] Assertion failure with socket monitor when clients disconnect
Auer, Jens
jens.auer at cgi.com
Tue Oct 11 10:36:39 CEST 2016
Hi Doron,
I'm not a big fun of monitoring and prefer others solution. I actually think of deprecating it, the implementaion is buggy and actually violate the don't share the socket between threads, which is probably what causing your issue.
That’s quite a shock given that it is part of the stable API. I think in this case it should not only be deprecated but removed from API completely. On the other hand, something to monitor the connections is probably needed. Our clients are very concerned with TCP connections and would always insist on logging capabilities of TCP connection state changes. It would also be nice to get some statistics from the connections, e.g. number of received bytes or messages.
Anyway, you have two options, I think you can manage without the monitoring, I can try and help you find other solutions. Another option is to try and not listen to all events, maybe this will avoid the sharing violation.
I would like to replace the monitor with something else, but I am not sure how to do this given our requirements. Currently, we use the monitor to
- Log TCP connection events for connect, disconnect, accept, listen and reconnection retry events
- Limit the number of reconnection attempts
Unfortunately this includes disconnect events which are exactly the events causing the crashes right now. The other events are probably rare enough that there is no problem.
Is there another way to implement this without using the socket monitor? We use Router, Dealer, Sub, XPub and Stream sockets. I have full control over the protocol between the Router/Dealer, but I cannot change the protocol between Pub/Sub and external clients over Stream sockets, so I cannot add control messages here.
I think an easy fix for my issues would be to add a mutex to protect the monitor socket in socket_base_t. I guess this was not done because it would block the thread and probably impact performance, but at least it will work correctly and not crash. It should be good enough for our use-case.
An idea for a non-blocking solution would be to have an independent monitor class as there are listener and accepter classes which has an inproc SUB socket and the PAIR socket. Each ZeroMQ socket would then create a monitor when it is created, and each session object would have a PUB socket to broadcast events to the monitor. The monitor then forwards events received from individual clients/sessions on different IO threads to the PAIR socket where the application code can connect.
I have given this some more thought and I think there is an easy solution, but it will break the socket monitor API. Instead of using a ZMQ_PAIR where applications can connect, it could use a ZMQ_PUB on an inproc socket to public events. The application would create a ZMQ_SUB socket and pass its address to zmq_socket_monitor just like it is done currently with the PAIR socket. Each engine then creates its own Pub socket and connects to the provided SUB (this can also happen after engines have been created?). This will create one socket per engine and thus there is no multi-threading access to sockets anymore.
This is an API change for zmq_socket_monitor, but it could also replace the event filtering by using the event type as the subscription. I would also change the event message itself. Right now, it is a 48-bit value consisting of a 16-bit event type and a 32-bit value followed by a second frame with the socket address as a string. I don’t like the packing of two values and would propose to send a three-frame message where each part is in its own frame. The event type would be the first part and thus can be used to filter event types. I think this fits better in the whole ZeroMQ API. Zmq_socket_monitor’s signature would change because the events parameter is removed and old code would not compile anymore. I think this is good because it shows that something changed and is not compatible anymore. If the signature were identical, it would cause runtime-errors because the socket type changed.
Best wishes,
Jens
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20161011/d7bf8f9f/attachment.htm>
More information about the zeromq-dev
mailing list