[zeromq-dev] Proposal for a revised zlist API

Pieter Hintjens ph at imatix.com
Sun Sep 7 19:11:52 CEST 2014


Hi Goswin,

So the upshot of this is, (a) zlist is as-was, and represents a
simple, stupid list; (b) zhash has some of the new stuff like item
destructor and duplicator, at the cost of slight breakage in
zhash_dup; (c) zring is now a hybrid list+hash container that starts
to look like something fun.

Zyre and Zbroker some other projects work correctly on the new API,
without changes.

I switched zcertstore over to zring and it works nicely. The problem
of one item in two containers disappears because we delegate
everything to zring. So no need for that user pointer, and so
container destructors remain compatible and simple.

Further steps? It might be useful to add a custom hash calculator now,
which takes the item contents and key and returns a pre-hashkey (8
bytes?) that zhash can re-hash minimally. Make it the same size as a
void *, and we can do fun things like use the item reference as-is as
pre-hashkey. This is a the second most common use case after access by
string key, IMO.

It is probably profitable now to switch zloop and zsys over to zring,
and to measure before/after performance with many inserts/delete. I
may do that later... for now I am going to leave CZMQ alone for a few
days.

-Pieter


On Sat, Sep 6, 2014 at 2:46 PM, Pieter Hintjens <ph at imatix.com> wrote:
> Also, sorted lists... it makes sense, however I think what we're
> really wanting is a container provides both keyed and sequential
> (sorted) access. Either a tree (there was a ztree class that I killed
> a while back, as it had imported a lot of horrid code from elsewhere),
> or a combination hash/ring.
>
> On Sat, Sep 6, 2014 at 11:27 AM, Pieter Hintjens <ph at imatix.com> wrote:
>> OK, after spending way too long on this, I'd like to revert all
>> changes to zlist, deprecate that class, and make zring the new V3 list
>> container, doing everything properly and cleanly. This makes it very
>> easy for developers to spot old vs new constructs.
>>
>> I think we've gone too far down the rabbit hole. Anyone using lists in
>> anger wants a doubly-linked list.
>>
>> Let's continue the careful and slow evolution of zring then, and use it in CZMQ.
>>
>> I'm still undecided on the void *user argument as it prevents us using
>> standard destructors, which is such a fine thing. I'm going to look
>> for other ways to do this.
>>
>> -Pieter
>>
>> On Fri, Sep 5, 2014 at 7:14 PM, Goswin von Brederlow <goswin-v-b at web.de> wrote:
>>> On Fri, Sep 05, 2014 at 05:49:20PM +0200, Pieter Hintjens wrote:
>>>> On Fri, Sep 5, 2014 at 3:56 PM, Goswin von Brederlow <goswin-v-b at web.de> wrote:
>>>>
>>>> > Just saw your commit.
>>>>
>>>> You only found 17 issues? :-) I was coding till 1am, surprising it
>>>> worked at all.
>>>>
>>>> > 1) Small typos
>>>>
>>>> Fixed.
>>>>
>>>> > 2) zlist_purge does not destroy the cursor allowing zlist_next and
>>>> > zlist_item access to freed memory.
>>>>
>>>> Nice catch, fixed.
>>>>
>>>> > 3) change cursor to "node_t **" but that would be internal.
>>>>
>>>> I'm not yet sure of that change, will try something else first
>>>> (separate head/tail item as we have in zring, I think).
>>>
>>> The insert_before needs access to the previous node->next. You could
>>> have the cursor always point to the previous node and handle the
>>> special cases of an empty list or one with just one item. Using the
>>> 2-stars trick avoids those special cases.
>>>
>>>> > 4) Now there is zlist_detach and zlist_delete. How about
>>>> > zlist_detach_current and zlist_delete_current for the cursor based
>>>> > flavour of the same? (needs 3)
>>>>
>>>> Not needed, you pass NULL as item and that does the job. I don't like
>>>> long method names either... clumsy to use.
>>>
>>> Great, didn't think of that.
>>>
>>>> > 5) zlist_set_comparator for a container wide compare function.
>>>>
>>>> Tried that and decided against it, due to not wanting to break
>>>> zlist_sort and not really seeing the benefit (in fact it's an extra
>>>> line of code and extra risk of errors).
>>>
>>> I didn't add that for sort but for find. :) Find is probably done far
>>> more often than sort and I use find in detach and delete too.
>>> zlist_find can be used in zloop, zmsg, zpoller and zsys (some should
>>> be rewritten to zlist_delete/detach) if that uses the comparator.
>>>
>>>> > 6) zlist_sort could allow NULL as second argument to use the container
>>>> > compare (or compare by address, however useless that is :). (needs 5)
>>>>
>>>> Yes, that was my second idea, except it leaves the method messy and I
>>>> still didn't see the benefit.
>>>
>>> It adds one line to the code if the comparator is initialized with an
>>> physical equality. Didn't seem so bad.
>>>
>>>> > 7) zlist_find can be added without name collision. (needs 5)
>>>>
>>>> So searching on content rather than item value, yes, it could work.
>>>
>>> zloop searches by timer_id and socket. zsys searches on handler.
>>> That's 3 uses already just in czmq itself.
>>>
>>>> > 8) zlist_detach and zlist_delete need to specify their affect on the
>>>> > cursor.
>>>>
>>>> zlist_detach and zlist_delete are works in progress, I left this for
>>>> today as I had other work. They should end up working and looking like
>>>> their zring counterparts.
>>>>
>>>> > 9) I would suggest having zlist_detach and zlist_delete use container
>>>> > compare function.
>>>>
>>>> OK, I'm partially convinced, however I don't have a use case, this is
>>>> engineering because "it can work" which I'd rather avoid. So far
>>>> comparison on the item void * has always worked fine.
>>>
>>> - zsys_close does detach sockets by their handle instead of item.
>>> - zloop deletes reader by their socket, poller by socket or fd
>>>
>>>> > 10) I would suggest having zlist_detach and zlist_delete use zlist_find
>>>> > + zlist_*_current. (needs 7, solves 9 implicitly)
>>>>
>>>> Again, problem statements first, solutions second.
>>>
>>> The problem is code duplication vs. reuse. Both delete and detach
>>> currently implement their own find loop which would be identical to
>>> zlist_find. 3 times the same code is 3 times the work to maintain.
>>>
>>>> > 11) zlist_t still has a "bool autofree;". The old behaviour could be
>>>> > achieved by having zlist_autofree set a duplicator (strdup) and
>>>> > destructor (something that calls free) instead.
>>>>
>>>> Indeed. I'd randomly thought that and didn't do it yet. My goal today
>>>> was the general shape of the API and keeping existing code from not
>>>> breaking.
>>>>
>>>> > 12) Can we add the "void *user" argument to czmq_destructor and
>>>> > czmq_comparator? That would be realy usefull for keeping items in 2
>>>> > container like zcertstore does among other things.
>>>>
>>>> Haven't understood the problem yet enough to evaluate the addition.
>>>> I'll get there eventually, if you can wait.
>>>
>>> I came to this idea because I needed it for my PPPP protocol
>>> implementation. The timeout handler for an item needs access to the
>>> overall peer structure (to send heartbeats to the peers socket) and
>>> protocol structures (to remove dead peers after too many timeouts). I
>>> didn't want to store the same identical pointer in thousands if not
>>> millions of items. Storing it once in the container seemed smarter and
>>> uses far less code.
>>>
>>> Check out how I changed the zcertstore to use a container wide
>>> destructor on the zlist for a simple use case. Certificates are kept
>>> both in a list and a hashtable. The list is setup so that deleting a
>>> cert from the list will automatically delete the cert from the
>>> hashtable too and then destroy it. This can also be setup that
>>> deleting from either container deletes from both.
>>>
>>>> > 13) zlist_first, zlist_next and zlist_last each duplicate zlist_item
>>>> > at the end.
>>>>
>>>> Same in zring. I'll fix that. zlist_head too.
>>>>
>>>> > 14) zlist_insert_before could be added without name collision. (needs 3)
>>>>
>>>> I've added and removed that twice now, and still don't see a use case for it.
>>>
>>> Inserting an item into a sorted list. More importantly merging 2
>>> sorted lists into one:
>>>
>>>     item_t *item = zlist_first (list2);
>>>     while (item) {
>>>         zlist_find (list1, item);
>>>         zlist_insert_before (list1, item);
>>>         item = zlist_next (list2);
>>>     }
>>>
>>> zlist_insert_before can be avoided if there is a zlist_peek_next
>>> (return the next item without advancing the cursor). But that is less
>>> elegant. But we can leave that out. I don't have a use for that for
>>> zlist.
>>>
>>>> > 15) zlist_insert(_after) could be added without name collision.
>>>> > 16) zlist_insert_sorted could be added without name collision. (needs 3+5+7)
>>>>
>>>> OK, enough! :-) It is painful to wrangle APIs, and it gets worse
>>>> whenever we add stuff we do not absolutely need. Please come with
>>>> problems that we cannot reasonably solve using the current APIs, or
>>>> places where we can make valuable improvements.
>>>
>>> Messages can get lost if a peer reconnects. In the PPPP protocol this
>>> gets detected and the lost messages get retransmitted. What happens in
>>> my implementation is that I have a zlist for incoming messages. When a
>>> message is received that isn't the next message it is inserted into
>>> the incoming list instead of passed to the application. Then when the
>>> lost message is retransmitted that is passed to the application and
>>> the incoming list needs to be processed in order of message sequence
>>> number. By using zlist_insert_sorted that is made simple.
>>>
>>> I need zlist_insert_sorted or zlist_insert_before for this. Either one
>>> will do. The reason I made zlist_insert_before was just so I could
>>> implement zlist_insert_sorted trivially.
>>>
>>> Arguably this could use a new class for a sorted list/queue/tree that
>>> uses a heap or binary tree for efficiency. But my use case should
>>> happen rarely and only for a small number of messages in the list so
>>> it didn't seem worth writing a whole new class for it just to be a bit
>>> more efficient.
>>>
>>> If you are set against it though I can switch this over to use a
>>> zhash. Processing would then try to find the next sequence number in
>>> the zhash, if found send to app, delete and repeat. Actually thinking
>>> about it that might actually be better all around. Would have O(1)
>>> complexity instead of O(n) even if n is bound to be small in this case.
>>>
>>> Ok, you have me convinced. I will rewrite my use case for this.
>>> Sometimes just writing about use cases and explaining them gives you
>>> better ideas.
>>>
>>>> > 17) zlist_last is a problem for 4 since it updates the cursor.
>>>> > Consider this stupid way to purge the list in reverse order:
>>>> >
>>>> >     while (zlist_last (list)) zlist_delete_current (list);
>>>>
>>>> zlist_purge (). I don't really care about stupid use cases.
>>>>
>>>> However, zlist_last uses the list tail. Any delete/detach has to
>>>> update the tail.
>>>
>>> Agreed. The list tail is also needed for append. But zlist_last
>>> doesn't need to change the cursor (unless it points to the last item
>>> and then it changes to NULL or head).
>>>
>>>> > If someone needs to process a zlist from the back like that then they
>>>> > should be using a zring instead. So I think this would be safe.
>>>>
>>>> Indeed. There are some basic styles we use lists for, and that's all I
>>>> really care about. Weird constructs don't count as valid use cases
>>>> since they're usually wrong in other subtle ways anyhow.
>>>
>>> Is that an agreement to make zlist_last() leave the cursor alone then?
>>> I can't think of any sane use of zlist_last() changing the cursor,
>>> which would introduce a method do detach/delete the last element in a
>>> list at the cost of O(n) complexity.
>>>
>>> As I see it leaving the cursor alone breaks nothing and keeps things sane.
>>>
>>>> > 18) You've changed the API, the cursor behaviour, slightly already. I
>>>> > wonder if any code relies on the destructive cursor behaviour of
>>>> > zlist_push, zlist_append, zlist_remove (and any others I forgot). It
>>>> > is possible to have code like this:
>>>>
>>>> I'm going to deprecate that notion that zlist_next() makes sense
>>>> without a zlist_first(). It's errorprone and silly to have magical
>>>> assumptions about list state that you can't visually verify, and can't
>>>> copy/paste as a block.
>>>>
>>>> -Pieter
>>>
>>> No complains from me. Just wanted to point out the change. Partly as
>>> argument to allow the change in 17 too.
>>>
>>> MfG
>>>         Goswin
>>> _______________________________________________
>>> zeromq-dev mailing list
>>> zeromq-dev at lists.zeromq.org
>>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev



More information about the zeromq-dev mailing list