[zeromq-dev] Possible OpenPGM bug in git repo

dv dv at pseudoterminal.org
Sat Nov 6 23:06:12 CET 2010


Hi,

I was trying out publish-subscribe with epgm, and noticed an assert that 
popped up sometimes:

   Assertion failed: rc == 0 (connect_session.cpp:82)

It appears sometimes, and aborts the program. Sometimes though it 
doesn't, and everything works fine, I can communicate through epgm nicely.
That is, sometimes I run the program 1000 times, no errors, and the the 
1001st time, the assert happens. Later on, it works once, then fails, 
then works again etc.
so it is random, unfortunately. To reconstruct the problem, run the test 
code below several times until the assert happens.

I dug through the code, went inside the OpenPGM copy bundled with 
zeromq, and found the function parse_interface()
in if.c:276 . Full path (relative to the zeromq directory) is 
foreign/openpgm/libpgm-5.0.91~dfsg/openpgm/pgm/if.c .
The parse_interface() function is what ultimately causes the assert - it 
looks at the given network interfaces and tries to find a 
multicast-capable one.
For some reason, this does not always work reliably.

I fixed it by setting check_ifname to TRUE (it is defined and set on 
line 285 in if.c). If I do that, the assert never appears, the code 
always works.
However, I do not fully understand what is going on. Blindly setting a 
flag does not sound like a bulletproof bugfix :) anybody has an idea 
what is going on?


Here is the C++ test code I used. I use the eth1 interface, but the 
asserts also happened with lo (localhost).
NOTE: do not forget to adjust the interface name to whatever you are 
using. You might not even have an "eth1".



#include <iostream>
#include <stdint.h>
#include <zmq.hpp>


int main()
{
     static char const *url = "epgm://eth1;224.0.0.253:55555";
     int64_t const multicast_rate = 10000;


     zmq::context_t ctx(1);


     std::cout << "Sender: creating socket" << std::endl;
     zmq::socket_t sender_socket(ctx, ZMQ_PUB);

     int64_t value = multicast_rate;
     std::cout << "Sender: setting multicast rate" << std::endl;
     sender_socket.setsockopt(ZMQ_RATE, &value, sizeof(value));
     std::cout << "Sender: connecting" << std::endl;
     sender_socket.connect(url);      ///// This is where the assert happens
     std::cout << "Sender: connected" << std::endl;


     return 0;
}



and FYI, this is the comment that precedes parse_interface() in if.c 
inside OpenPGM:


/* parse interface entity into an interface-request structure.
  *
  * e.g.  eth0
  *       1.2.3.4
  *       1.2
  *       abcd::
  *       [abcd::]
  * <hostname>
  * <nss network name>
  *
  * special addresses should be ignored:
  *
  * local physical link: 169.254.0.0/16, fe80::/64
  * broadcast: 255.255.255.255
  * multicast: 224.0.0.0/4 (224.0.0.0 to 239.255.255.255), ff00::/8
  *
  * We could use if_nametoindex() but we might as well check that the 
interface is
  * actually UP and capable of multicast traffic.
  *
  * returns TRUE on success, FALSE on error and sets error appropriately.
  */

static
bool
parse_interface (
     int                family,            /* AF_UNSPEC | AF_INET | 
AF_INET6 */
     const char*          restrict    ifname,            /* NULL 
terminated */
     struct interface_req* restrict    ir,            /* location to 
write interface details to */
     pgm_error_t**          restrict    error
     )
{ ... }



More information about the zeromq-dev mailing list