[zeromq-dev] Possible OpenPGM bug in git repo
dv
dv at pseudoterminal.org
Sat Nov 6 23:15:11 CET 2010
I forgot to mention some extra info:
Using Ubuntu 10.10 x86-64, ZeroMQ from git (last commit
c0217027ccd2267b05e017af436a842755d044b0 on Sat Nov 6),
4 GB RAM, Intel DG965RY mainboard, NIC is an Intel 82566DC Gigabit
Ethernet Controller (Linux driver is e1000e).
> Hi,
>
> I was trying out publish-subscribe with epgm, and noticed an assert that
> popped up sometimes:
>
> Assertion failed: rc == 0 (connect_session.cpp:82)
>
> It appears sometimes, and aborts the program. Sometimes though it
> doesn't, and everything works fine, I can communicate through epgm nicely.
> That is, sometimes I run the program 1000 times, no errors, and the the
> 1001st time, the assert happens. Later on, it works once, then fails,
> then works again etc.
> so it is random, unfortunately. To reconstruct the problem, run the test
> code below several times until the assert happens.
>
> I dug through the code, went inside the OpenPGM copy bundled with
> zeromq, and found the function parse_interface()
> in if.c:276 . Full path (relative to the zeromq directory) is
> foreign/openpgm/libpgm-5.0.91~dfsg/openpgm/pgm/if.c .
> The parse_interface() function is what ultimately causes the assert - it
> looks at the given network interfaces and tries to find a
> multicast-capable one.
> For some reason, this does not always work reliably.
>
> I fixed it by setting check_ifname to TRUE (it is defined and set on
> line 285 in if.c). If I do that, the assert never appears, the code
> always works.
> However, I do not fully understand what is going on. Blindly setting a
> flag does not sound like a bulletproof bugfix :) anybody has an idea
> what is going on?
>
>
> Here is the C++ test code I used. I use the eth1 interface, but the
> asserts also happened with lo (localhost).
> NOTE: do not forget to adjust the interface name to whatever you are
> using. You might not even have an "eth1".
>
>
>
> #include<iostream>
> #include<stdint.h>
> #include<zmq.hpp>
>
>
> int main()
> {
> static char const *url = "epgm://eth1;224.0.0.253:55555";
> int64_t const multicast_rate = 10000;
>
>
> zmq::context_t ctx(1);
>
>
> std::cout<< "Sender: creating socket"<< std::endl;
> zmq::socket_t sender_socket(ctx, ZMQ_PUB);
>
> int64_t value = multicast_rate;
> std::cout<< "Sender: setting multicast rate"<< std::endl;
> sender_socket.setsockopt(ZMQ_RATE,&value, sizeof(value));
> std::cout<< "Sender: connecting"<< std::endl;
> sender_socket.connect(url); ///// This is where the assert happens
> std::cout<< "Sender: connected"<< std::endl;
>
>
> return 0;
> }
>
>
>
> and FYI, this is the comment that precedes parse_interface() in if.c
> inside OpenPGM:
>
>
> /* parse interface entity into an interface-request structure.
> *
> * e.g. eth0
> * 1.2.3.4
> * 1.2
> * abcd::
> * [abcd::]
> *<hostname>
> *<nss network name>
> *
> * special addresses should be ignored:
> *
> * local physical link: 169.254.0.0/16, fe80::/64
> * broadcast: 255.255.255.255
> * multicast: 224.0.0.0/4 (224.0.0.0 to 239.255.255.255), ff00::/8
> *
> * We could use if_nametoindex() but we might as well check that the
> interface is
> * actually UP and capable of multicast traffic.
> *
> * returns TRUE on success, FALSE on error and sets error appropriately.
> */
>
> static
> bool
> parse_interface (
> int family, /* AF_UNSPEC | AF_INET |
> AF_INET6 */
> const char* restrict ifname, /* NULL
> terminated */
> struct interface_req* restrict ir, /* location to
> write interface details to */
> pgm_error_t** restrict error
> )
> { ... }
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
More information about the zeromq-dev
mailing list