[zeromq-dev] Publisher side filtering...
Gerard Toonstra
gtoonstra at gmail.com
Mon Oct 25 11:12:22 CEST 2010
On Sat, Oct 23, 2010 at 6:44 PM, Martin Sustrik <sustrik at 250bpm.com> wrote:
> Hi Gerard,
>
>>
>> I've read the mails about publisher side filtering here:
>>
>> http://thread.gmane.org/gmane.network.zeromq.devel/3560
>>
>> - Is there now a current ongoing effort to put publisher-side filtering in
>> 0MQ that I may possibly contribute to, which also allows
>> API users to specify their own methods of filtering as they see fit?
>>
> These are two distinct issues:
>
> 1. Publisher-side filtering a.k.a. subscription forwarding.
> 2. Custom filtering algorithms.
>
> The former is a pretty clear functionality that has to be implemented
> sooner or later. If you want to contribute to that, you are welcome.
>
> The latter is something that pops up every now and then but nobody have
> proposed any clear semantics for it yet (especially w.r.t. how it interacts
> with the subscription forwarding). Thus, if you want to contribute this kind
> of functionality, you have to define the intended semantics first.
There's a document on the 0MQ site which mentions how routing is done
through an inverted bitmap. It can compile this matrix, because
the number of possible queries is finite. However, even for finite domains,
one still has to consider the dimensionality of the matrix for practicality.
1. "Standard" subscription forwarding in my interpretation means forwarding
messages selectively based on a topic.
2. A topic can be considered as a single piece of metadata (metadata key)
attached to a message (rather than thinking of it as a 'channel' ).
3. Custom filtering always involves filtering messages on multiple metadata
keys instead of just one. These keys are generally derived from values in
the
message contents. The bad thing here is that, to do this efficiently
from a network perspective, this would require 0MQ to know about the message
format. So, either some complicated functionality exists for message
inspection or messages have a pre-determined format.
4. Adding more metadata keys to messages is not really an option. Because it
is assumed that producers have no knowledge which particular messages
a subscriber is interested in, the only reasonable option here is to add
each searchable value into the metadata as a key. Taken to the extreme,
this means duplicating the message, once as metadata and then as
application formatted data.
So, yes, it sounds like custom filtering *is* a very bad idea and that it's
a compensation for other things incomplete in the design, or chosen poorly.
The power of subscription forwarding however is determined by the
expressivity of the single metadata key and the different ways in which
this can be matched to more specific queries, from the perspective of a
consumer/subscriber.
>From the perspective of a router/broker, it is more important how fast these
comparisons can be made, because it is more concerned about message volume
throughput.
Those seem competing issues for an implementation.
A couple of things seem necessary:
1. Come up with a suitable specification for how topics are expressed.
e.g.: a.b.c .. does it allow wildcards? a.*.c? Wildcards
significantly increase the complexity.
2. Together with 2, come up with a strategy for topic matching. Inverted
bitmaps were named in the 0MQ docs. I've been looking into
bloom filters and how these could be used for achieving something
similar. The advantage of bloom filters is that less absolute
knowledge is required. Absolute knowledge is knowing that currency=USD is
placed in column 15 of the inverted bitmap
(which has to be consistent across the cluster). A bloom filter just
needs to use the same hashing functions everywhere and it
needs to be properly dimensioned. The dimension depends on the total
number of topics that can exist in the domain and the
probabililty that you allow for having a false positive.
3. Can a single pub/sub channel have many forwarding subscriptions? Maybe a
'client device' is handy here, which uses the basic functions
and groups them together through a zmq_polling device, as the mechanism
through which messages are retrieved will be very similar.
The idea is that for each incoming message on a particular channel,
basically a subscription, a different callback function may be called.
(which has some complexities regarding threading, 100% CPU consumption,
etc.)
4. Filters need to be communicated from sub to pub in some kind of
handshake. If allowing for multiple subscriptions,
how does a client notify a filtering publisher that some of its
subscription interests have changed over the course of its lifetime?
5. Not allowing for multiple subscriptions, then for each topic of interest
a new physical socket is opened. The broker handling a number
of clients may then quickly run out of resources, similar to a broker
connecting to a router?
6. When some forwarding device loses a connection to a client, then its own
set of subscriptions changes. It is impossible to do this if a new
channel must be set up each time a subscription list changes.
7. Custom filtering sounds like a useful addition to the client device. A
different, configurable callback function that determines if a
message is passed to the actual message handling function or not.
Feedback welcome,
--
Gerard
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20101025/83302e2f/attachment.htm>
More information about the zeromq-dev
mailing list