[zeromq-dev] Exact matching on subscription topics

Staffan Gimåker staffan at spotify.com
Mon Jan 9 23:01:46 CET 2012


On Mon, Jan 9, 2012 at 10:29 PM, Chuck Remes <cremes.devlist at mac.com> wrote:

> > * Hash maps are less gluttonous for memory than tries, and should be as
> > fast or faster.
>
> Interesting assertion on performance. I googled around for this and it
> looked like the answer was "it depends." So, tries are better sometimes
> than hashes.
>
> For memory footprint, your statement is probably correct.
>

Sorry it wasn't meant as an assertion, as the "should" implies, but merely
an educated guess. I'd also guess that it depends on the number of
subscriptions; a hash map should result in less cache misses for big
subscription sets. Which is faster would also on other things like the
ratio between matching and unmatching incoming messages.


> > The cost of mixed prefix/exact matching is two lookups instead of one
> > (one for exact matching, one for prefix matching) but you can have fast
> > paths for when only one kind of matching is used, making the added cost
> > negligible unless you use a mix of prefix and exact matching.
>
> I don't understand why this would be true. All topic byte comparisons
> should be "fail fast" so as soon as a byte mismatches the code can move on
> to the next filter. For an *exact* match, if the comparison hasn't failed
> and there are no more topic bytes to examine, the match is exact. If a
> prefix match gets to the same point, it can skip the test to see if there
> are more topic bytes to compare.
>
> So to me it looks like the algo for topic matching is nearly identical
> except the exact-match algo has a flag that says "if we get to the end
> without a comparison failure, make one final check that there are no more
> topic bytes; if none, exact match, else fail." The regular check wouldn't
> have that flag.
>

Depends on how you implement it. If implemented using a trie, for example,
then what you say is true if I understand correctly. If you use a hash map
for implementing exact matching, then you'll have to do two lookups (again,
you could have fast paths to mitigate this if only prefix/exact matching is
used).


> I don't understand this last statement. Why would all N machines have to
> process all messages? Wouldn't they be processing 1/N messages? And, they
> would also only have 1/N of the subscription filters taking up memory.
>
> And with 3.1's improved publisher-side filtering, I don't see why any
> machine would ever get all of the messages.
>

Assume there are 2 intermediary pubsub machines A and B, a publisher C and
a subscriber D. D subscribes to "foo" which is the responsibility of token
A, and thus only sends the subscription to A. C then wants to publish a
message with topic "foobar", but it can't know which token to send the
message to (correct answer: A) since foo and foobar don't necessarily
belong to the same token. Or at least I'm not aware of any such hash
functions :) I hope that made sense.

/S
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20120109/f83a1896/attachment.htm>


More information about the zeromq-dev mailing list