[zeromq-dev] [PATCH] Publisher side filtering

Martin Sustrik sustrik at 250bpm.com
Mon Jan 31 14:52:10 CET 2011


Hi Gerard,

> So what I see happen once in a while:
>
> - SUB is started
> - broker is started
> - PUB is started
> - things work ok
> - restart SUB couple of times after it sent the sub request to the broker
> - no messages yet appear in the publisher (the printf debug statement
> you put there about receiving unsub/sub requests).
> - restart broker
> - 1 second later, the PUB displays "couple of times" sub requests + 1.

So the subscriptions get to the publisher finally. It's only that they 
are delayed by 1 second, right?

Can it be caused by invoking the PUB socket (zmq_send) once a second? 
The code currently processes the incoming subscriptions on a call to 
zmq_send().

> Because there are as many messages as restarts and both the broker and
> subs were restarted, this looks like the PUB is
> piling up messages for some reason.

Yes. At the moment the subscriptions are simply forwarded upstream. 
Later on we can filter the subscriptions and *not* sent those that were 
already sent upstream.

Example:

1 publisher, 1 forwarder, 2 subscribers.

subscriber 1 subscribes to topic "A"
the subscription is forwarded to the broker
the broker forwards it to the publisher
subsceiber 2 subscribes to topic "A"
the subscription is forwarded to the broker
the broker realises that the subscription was already forwarded to the 
publisher and does nothing

> I noticed that the "pub_t::xsend"
> method doesn't call "has_in" prior to executing
> the recv in the while loop and was wondering whether something may be
> failing over there?

It's OK. It's non-blocking recv so it returns EAGAIN if there are no 
subscriptions to process.

> When this occurs at some point, it's usually after a broker restart. The
> messages from the sub fail to get to the pub.
> When the broker is restarted again, they all show up together in one go.
> I think this is related to the "has_in" method
> probably?   (as if there's an invalid pipe ahead of the new valid pipe
> that didn't get removed, or something like that).

So you are able to get the system stuck, right? Do you have the test 
programs? How can I reproduce it?

Martin



More information about the zeromq-dev mailing list