[zeromq-dev] Fwd: Exact matching on subscription topics

Staffan Gimåker staffan at spotify.com
Tue Jan 10 13:41:30 CET 2012


Apparently forgot reply all on this.

Looked the e-mail outlining the proposed wire format; it looks good. That's
essentially equivalent to what I did in my quick-n-dirty prototype, only
with larger cmd-id and algo-id fields. Is there a specific reason to use
16-bits for the cmd-id? Seems like one byte would more than suffice,
wasting a byte is not a big deal though.

How do you propose to solve the problem of intermediary devices not
necessarily being aware of all matching methods? Change XPUB/XSUB
semantics, new socket type?

/S

---------- Forwarded message ----------
From: Staffan Gimåker <staffan at spotify.com>
Date: Mon, Jan 9, 2012 at 11:12 PM
Subject: Re: [zeromq-dev] Exact matching on subscription topics
To: Martin Sustrik <sustrik at 250bpm.com>


On Mon, Jan 9, 2012 at 10:30 PM, Martin Sustrik <sustrik at 250bpm.com> wrote:

> Hi Staffan,
>
> I think this kind of thing should be definitely built into 0MQ, however,
> we should take a look at big picture first.
>
> The big picture being forwarding of subscriptions in large distribution
> trees.
>
> What we need to do is to add the capability to add new matching algorithms
> without disrupting the existing infrastructure (change in wire formats etc.)
>

Makes sense I guess.


> My idea was that subscription matching can be thought of as end-to-end
> feature (topic string is provided on topmost publisher, filtering is going
> in leaf subscribers) with subscription forwarding being an optional
> optimisation.
>

Also makes sense. Does this mean you don't want intermediaries to
explicitly store subscriptions? +1 if so.
It would be really neat if xpub didn't implicitly add subscriptions to its
mtrie when received. That way you could choose to drop subscriptions if the
node is overloaded, etc. It's pretty easy to DoS a XPUB to OOM-death by
just sending it garbage subscriptions, for example.


> That way, new matching algorithms can be added and the only nodes that
> would have to be aware of them are terminal publishers and subscribers.
> Intermediate devices could handle unknown matching algorithms by simply
> passing all the messages on.
>
> The actual wire format I've proposed for this is described here:
>
> http://groups.google.com/**group/sp-discuss-group/msg/**22ee4d6e9f82857a<http://groups.google.com/group/sp-discuss-group/msg/22ee4d6e9f82857a>
>
> Maybe it would make sense to add this to 0MQ now, while 3-1 is still in
> beta.
>
> Having this extra field in the wire format then makes it easy to add new
> matching algos like the one you've implemented. For now we could start with
> following list of algorithms:
>
> 0 - all (any message matches)
> 1 - exact
> 2 - prefix
>
> What do you think?
>

Will have a look at this tomorrow when its not close to midnight.

 I did a quick and dirty throw-away prototype that supports both prefix
> and exact matching:
> https://github.com/gimaker/libzmq/tree/exact-matching-prototype (given
> the lack of a hash map in C++03 and my laziness I used a trie for exact
> subscriptions for now, quick and dirty!)
>
> To make exact subscriptions/unsubscriptions you do:
>   zmq_setsockopt(sock, ZMQ_SUBSCRIBE_EXACT, topic, topic_len); and
>   zmq_setsockopt(sock, ZMQ_UNSUBSCRIBE_EXACT, topic, topic_len);
>
> Prefix matched topics are added and removed as normal with ZMQ_SUBSCRIBE
> and ZMQ_UNSUBSCRIBE respectively.
>
> The cost of mixed prefix/exact matching is two lookups instead of one
> (one for exact matching, one for prefix matching) but you can have fast
> paths for when only one kind of matching is used, making the added cost
> negligible unless you use a mix of prefix and exact matching.
>

Yes. The overhead is checking whether the number of prefix/exact
> subscriptions is zero, ie. couple of nanoseconds.
>
> What's "token"? I've got a bit lost...
>

Node or machine. I probably ballsed up my terminology :)


>
>
 This should scale well with regards to memory, but less so with regards
>> to throughput and bandwidth as each intermediary machine still has to
>> process all published messages. Exact matching would allow us to publish
>> to a single token rather than all of them. There are also some potential
>> headaches with having pubsub across multiple data centers that I can
>> elaborate on if anyone is interested.
>>
>
> I would be extremely interested in that. Sounds like it's relevant to
> designing efficient large-scale pub-sub topologies, what's what I am
> focusing on myself.
>

/S
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20120110/188cd152/attachment.htm>


More information about the zeromq-dev mailing list