[zeromq-dev] Exact matching on subscription topics

Chuck Remes cremes.devlist at mac.com
Mon Jan 9 22:29:16 CET 2012

On Jan 9, 2012, at 10:56 AM, Staffan Gimåker wrote:

> Hey guys,
> Having exact topic matching built into zmq would be very useful for a
> couple of reasons:
> * Hash maps are less gluttonous for memory than tries, and should be as
> fast or faster.

Interesting assertion on performance. I googled around for this and it looked like the answer was "it depends." So, tries are better sometimes than hashes.

For memory footprint, your statement is probably correct.

> The cost of mixed prefix/exact matching is two lookups instead of one
> (one for exact matching, one for prefix matching) but you can have fast
> paths for when only one kind of matching is used, making the added cost
> negligible unless you use a mix of prefix and exact matching.

I don't understand why this would be true. All topic byte comparisons should be "fail fast" so as soon as a byte mismatches the code can move on to the next filter. For an *exact* match, if the comparison hasn't failed and there are no more topic bytes to examine, the match is exact. If a prefix match gets to the same point, it can skip the test to see if there are more topic bytes to compare. 

So to me it looks like the algo for topic matching is nearly identical except the exact-match algo has a flag that says "if we get to the end without a comparison failure, make one final check that there are no more topic bytes; if none, exact match, else fail." The regular check wouldn't have that flag.

> x
> TL;DR: Our situation is this: we have a large amount of active
> subscriptions (in the range of tens of millions, and counting) that are
> propagated to all publisher instances (of the same type), which results
> in memory usage in the 10-20 GiB range per instance, just for storing
> subscriptions, independent on the how many machines we throw at the
> problem. Thus, we need some way to shard subscription information.
> Sharding subscription information is easy enough -- we're building a
> small pubsub service that sits in between publishers and subscribers
> with the sole purpose of sharding the subscription information. So, with
> N machines each machine handles a fraction 1/N of all subscriptions.
> Subscribers subscribe only to the token responsible for the topic and
> publishers publish to all intermediary machines.
> This should scale well with regards to memory, but less so with regards
> to throughput and bandwidth as each intermediary machine still has to
> process all published messages.

I don't understand this last statement. Why would all N machines have to process all messages? Wouldn't they be processing 1/N messages? And, they would also only have 1/N of the subscription filters taking up memory.

And with 3.1's improved publisher-side filtering, I don't see why any machine would ever get all of the messages.

> Exact matching would allow us to publish
> to a single token rather than all of them. There are also some potential
> headaches with having pubsub across multiple data centers that I can
> elaborate on if anyone is interested.
> Of course, the path of least resistance is to just disregard that zmq
> uses prefix matching and treat it as it was exact, but then we'd lose
> out on the memory savings (and likely better performance) of using hash
> maps. And there are probably cases where prefix matching would be useful
> for us as well, although the bulk of our traffic is better suited for
> exact matching.
> So, is this something someone else would be interested in having in zmq?
> And if we do it, what are the chances of it getting merged?

I think there is interest. And merging in code is really a community process, so unless everyone rises up and rejects it I would think the chances are pretty good. But first, we (the community) need to understand your problem a bit better. I admit I am confused over the difference between prefix matching and exact matching when one is just a subset of the other.


More information about the zeromq-dev mailing list