[zeromq-dev] Publisher side filtering...

Martin Sustrik sustrik at 250bpm.com
Tue Oct 26 16:59:59 CEST 2010


On 10/25/2010 11:12 AM, Gerard Toonstra wrote:

> There's a document on the 0MQ site which mentions how routing is done
> through an inverted bitmap. It can compile this matrix, because
> the number of possible queries is finite. However, even for finite
> domains, one still has to consider the dimensionality of the matrix for
> practicality.
>
> 1. "Standard" subscription forwarding in my interpretation means
> forwarding messages selectively based on a topic.
>
> 2. A topic can be considered as a single piece of metadata (metadata
> key) attached to a message (rather than thinking of it as a 'channel' ).
>
> 3. Custom filtering always involves filtering messages on multiple
> metadata keys instead of just one. These keys are generally derived from
> values in the
>      message contents. The bad thing here is that, to do this
> efficiently from a network perspective, this would require 0MQ to know
> about the message
>     format. So, either some complicated functionality exists for message
> inspection or messages have a pre-determined format.
>
> 4. Adding more metadata keys to messages is not really an option.
> Because it is assumed that producers have no knowledge which particular
> messages
>     a subscriber is interested in, the only reasonable option here is to
> add each searchable value into the metadata as a key. Taken to the extreme,
>     this means duplicating the message, once as metadata and then as
> application formatted data.
>
>
> So, yes, it sounds like custom filtering *is* a very bad idea and that
> it's a compensation for other things incomplete in the design, or chosen
> poorly.

Agreed with above.



> The power of subscription forwarding however is determined by the
> expressivity of the single metadata key and the different ways in which
> this can be matched to more specific queries, from the perspective of a
> consumer/subscriber.
>  From the perspective of a router/broker, it is more important how fast
> these comparisons can be made, because it is more concerned about
> message volume throughput.
> Those seem competing issues for an implementation.
>
> A couple of things seem necessary:
>
> 1. Come up with a suitable specification for how topics are expressed.
>         e.g.: a.b.c  ..  does it allow wildcards?  a.*.c?     Wildcards
> significantly increase the complexity.

The original decision was to allow * only at the end of the 
subscription. That allows for arbitrary topic hierarchies, however, it 
doesn't allow for complex SQL-like filtering.

> 2. Together with 2, come up with a strategy for topic matching. Inverted
> bitmaps were named in the 0MQ docs. I've been looking into
>     bloom filters and how these could be used for achieving something
> similar. The advantage of bloom filters is that less absolute
>     knowledge is required. Absolute knowledge is knowing that
> currency=USD is placed in column 15 of the inverted bitmap
>     (which has to be consistent across the cluster).  A bloom filter
> just needs to use the same hashing functions everywhere and it
>     needs to be properly dimensioned. The dimension depends on the total
> number of topics that can exist in the domain and the
>     probabililty that you allow for having a false positive.

I would say: Spec first, implement afterwards. Implementation depends on 
what you spec requires.

Anyway, the important point here is that for custom filtering algorithms 
each node has to implement the filtering algorithm. Thus, you cannot 
forward subscription to a node that you haven't upgraded first to have 
the filtering code available.

That in turn makes the solution unscalable. Just think of if you wanted 
to forward the subscription to a different company. They would have to 
install your filtering code. If it's a bank or somesuch, it would have 
to pass security audit. Legal department would have to approve it. Your 
company would have to sign some licensing agreement with the other 
company etc.

That, IMO, makes the idea of custom filtering a bad idea.

> 3. Can a single pub/sub channel have many forwarding subscriptions?

Definitely.

> Maybe a 'client device' is handy here, which uses the basic functions
>     and groups them together through a zmq_polling device, as the
> mechanism through which messages are retrieved will be very similar.
>     The idea is that for each incoming message on a particular channel,
> basically a subscription, a different callback function may be called.
>     (which has some complexities regarding threading, 100% CPU
> consumption, etc.)
>
> 4. Filters need to be communicated from sub to pub in some kind of
> handshake. If allowing for multiple subscriptions,
>     how does a client notify a filtering publisher that some of its
> subscription interests have changed over the course of its lifetime?

Yes.

> 5. Not allowing for multiple subscriptions, then for each topic of
> interest a new physical socket is opened. The broker handling a number
>     of clients may then quickly run out of resources, similar to a
> broker connecting to a router?

Mutliple subscriptions are a basic requirement. No point in having only 
a single subscription.

> 6. When some forwarding device loses a connection to a client, then its
> own set of subscriptions changes. It is impossible to do this if a new
>      channel must be set up each time a subscription list changes.
>
> 7. Custom filtering sounds like a useful addition to the client device.
> A different, configurable callback function that determines if a
>      message is passed to the actual message handling function or not.

You mean filtering on the terminal node without forwarding the 
subscriptions any firther, right?

Martin



More information about the zeromq-dev mailing list