[zeromq-dev] Debugging "No route to host" when sending over ROUTER

Greg Ward greg at gerg.ca
Tue Dec 17 18:13:31 CET 2013


Hi all --

I'm working on overhauling the 0MQ layer of an existing app. (It's
written in Python, using zeromq 3.2.3 and pyzmq 13.1.0.) The good
thing about the existing code is that the entire 0MQ layer is isolated
in one Python module. Unfortunately, that module appears to reinvent
0MQ: it does all 0MQ interactions on a background thread, with several
shared data structures to get stuff from the main thread to the 0MQ
interaction thread. So I'm trying to rewrite it in what I think is a
more straightforward, idiomatic way: when the application needs to
send a network message, it calls the 0MQ layer, which simply does a
0MQ send() (or whatever).

Unfortunately, I'm getting lots of "No route to host" (EHOSTUNREACH)
errors trying to send messages over a ROUTER socket. Specifically,
I've got a server process that sets up like this:

    logger.debug('binding ROUTER socket to %s', me.rr)
    self.sock_in = self.context.socket(zmq.ROUTER)
    self.sock_in.setsockopt(zmq.ROUTER_BEHAVIOR, 1)
    self.poller.register(self.sock_in, zmq.POLLIN)
    self.sock_in.bind(me.rr)

(Clients always connect with a DEALER socket.)

The 0MQ layer doesn't expose 0MQ IDs to its clients. Instead, every
node has a human-readable name which is included in every message.
When the server *receives* a message over the ROUTER, it saves the 0MQ
ID in a map:

    msg = self.sock_in.recv_multipart()
    (zmq_id, source, cdata) = msg
    if source not in self.zmq_id_map:
        logger.debug('recv: setting zmq_id_map[%r] = %r', source, zmq_id)
        self.zmq_id_map[source] = zmq_id

Then when it needs to *send* a message to a client it has already seen
previously:

    zmq_id = self.zmq_id_map[dest]
    logger.debug('send to %s: %r', dest, (zmq_id, self.name, data))
    try:
        self.sock_in.send_multipart((zmq_id, self.name, data))
    except zmq.ZMQError as err:
        logger.warning('error sending message to %s: %s (dest = %r, zmq_id_map = %r, zmq_id = %r)',
                       dest, err, dest, self.zmq_id_map, zmq_id)

If it ever tries to send a message to an unknown client, it'll blow up
with KeyError. That's fine, because that would be a bug. What I did
not expect to see was this:

  2013-12-17 11:33:56,085 DEBUG : send to worker2: ('\x00k\x8bEg', 'master1', [...])
  2013-12-17 11:33:56,086 WARNING : error sending message to worker2: No route to host (dest = 'worker2', zmq_name_map = {'worker1': '\x00k\x8bEk', 'master2': '\x00k\x8bEi', 'worker2': '\x00k\x8bEg', 'test-cli': '\x00k\x8bEj'}, zmq_name = '\x00k\x8bEg')

I've reproduced this with 0MQ 3.2.2, 3.2.3, and 3.2.4, Python 2.6 and
2.7, pyzmq 13.1.0, on a laptop running Ubuntu 13.10 and a server
running Scientific Linux 6.4.

It's *not* 100% reproducible; in some runs, it doesn't happen at all.
And it pretty much never happens on the first send_multipart() to a
particular client -- it always happens after a few message have been
successfully exchanged. That's what puzzles me. If it happened every
time, on the first send to a given client, then I would be much less
confused.

I'm about 90% sure this must be a bug in my code, because the old 0MQ
layer that I'm trying to replace never exhibits this error. Yes, it
sets ROUTER_BEHAVIOR to 1 and yes, it logs exceptions from
send_multipart().

Any tips on how to debug this? Thanks --

Greg



More information about the zeromq-dev mailing list