[zeromq-dev] 0MQ vs TCP sockets - survey
Pieter Hintjens
ph at imatix.com
Wed Oct 27 09:12:44 CEST 2010
Kelly, a great report, thanks.
On Wed, Oct 27, 2010 at 2:38 AM, Kelly Brock <Kerby at inocode.com> wrote:
> Hi Pieter,
>
>> Hi y'All,
>>
>> As a kind of a survey / test to see if you're awake, I'd like to
>> solicit your views, as 0MQ users and contributors, about the most
>> significant differences between 0MQ and traditional TCP sockets. I.e.
>> how would you convince your boss that it was worth using 0MQ rather
>> than just coding your app the old fashioned way...?
>>
>> Feel free to +1 / 'this' other people's comments so we know what
>> tickles you the most. I'll collect the results in a comparison
>> document on the wiki.
>
> For the simple parts of my work I'm definitely able to sell 0mq
> already and I'm actively using it. I've wrapped up the details and hidden
> everything behind some C++ to make it look a little more traditional so some
> hard nosed snobs didn't bitch, but it works like a charm for a simple
> distributed asset processor system.
>
> My initial plans for the tool in question actually had some folks
> very scared about what I was doing. It sounded a hell of a lot more
> complicated than it was and 0mq actually helped sell it given the tiny and
> easy to use api. They all worried about the distributed nature of the tool,
> they all worried about threading issues, they all worried it would be too
> slow even when distributed, etc etc.. 0mq helped in a couple places in
> selling this:
>
> 1. Given that I have to deal with around 1 million files which add up
> to around 30-40 gigs of data, or near a terabyte if I had to process the
> source data, going distributed was an obvious answer. Everyone was thinking
> in terms of traditional sockets and all the problems that involved. I
> showed them 0mq and how easy it was, they dropped that argument.
>
> 2. Going distributed "and" multi-threaded, ack, once again, showing the
> 0mq model of single threaded "services" even if they happen to run in the
> same process, helped win them over.
>
> 3. Time involved in getting things setup. This was even more
> difficult, but they figured we need the tool so give him time. I had it
> basically functional in about a week and very little of that time was
> dealing with networking or anything, it was mostly just getting the SQL
> stuff setup and functional. Needless to say, I used the extra time to clean
> up a lot and make it clean and easy for our contractors to work with when we
> brought them onboard. They were very happy with it being done on time and
> the lack of problems scaling up the team of people working on this.
>
>
> Now, that's the good stuff. I have a couple complaints and I
> realize some of them are repeats and you have work around solutions and
> reasons you don't like them, but I'm going to repeat them anyway:
>
> 1. Lack of connect/disconnect notification. I know, talked about to
> death, but there are simply certain patterns where it is the
> cleanest/easiest and most useful way to approach notifying workers that
> their task master died or has become unavailable for some reason. Some
> reasons and ways I would use this knowledge:
>
> a) For the task master, I would start a single worker on the local
> machine and start processing. If results are not pending for say 1 second,
> bring up another worker on some other machine. (I use zeroconf to find
> other possible worker machines.) If any worker dies/disconnects,
> immediately recycle outstanding work items to other machines and potentially
> bring up another worker to replace the old one. (Down side, would need to
> forcibly disconnect and ignore any outstanding data on other sockets
> associated with the worker. Part of the reason I like shared sockets when
> head of the line is not an issue.)
>
> b) For a worker, any disconnection from the task master causes an
> immediate shutdown as there is no purpose to exist anymore. Again, because
> I use zeroconf and migrate work where ever their may be free CPU/memory,
> this just makes sense for my distribution model. I share a group of
> machines which all run a little controller daemon/service with other tools
> and processes; the controller advertises cpu/memory/disk/etc stats and a
> task master can pick the lowest utilized machine to start a new service on
> from that list. I don't want processes sitting around in memory if the task
> master dies.
>
> 2. Transacted messages or at least "acks" and knowledge or where it
> went. Eek, I know, don't like those at all. But, especially in the case of
> the way I'm using 0mq, it would make things "sooooooo" much easier since it
> is exactly what I need for this tool. I.e. I send out a work item, I "need"
> the result of the processing or it invalidates the entire run. I currently
> send the results back on a separate push channel which basically means that
> any worker that fails causes the task master to invalidate the entire run
> since it doesn't know what went where and how to recover and send out repeat
> requests. I "could" add some timeout logic, and probably will; it is just a
> case that this seems like something 0mq should support without user side
> work arounds.
>
> 3. I would greatly prefer a reactor pattern over the current select
> like behavior when dealing with multiple sockets. I know there was some
> talk of this but I have not seen much further discussion lately.
>
>
> Those are basically the only complaints I have; unfortunately they
> seem pretty significant considering how the discussions go on each of them.
>
>
> As a final note: If you are worried about adding to the header of
> the basic messages in order to supply ack/transaction etc, you might
> consider using a simple bit slicing scheme for the wire format. One of the
> items I was very happy with in my homebrew networking was the low overhead
> but high flexibility of the packet framing. Basically it was the following:
>
> Total Packet Size
> 7 bits per byte of "size" data, high bit designates "end". So, 1
> byte for any number lower than +-64. I.e. small packets. 2 bytes for 16k,
> etc. The value is encoded as: "(abs( length ) << 1) | length>0 ? 0 : 1" in
> order to prevent small negative values from taking 5 bytes.
>
> Optional Data
> The "sign" of the total packet size indicated if there was an
> "optional" field in the packet. If total packet size is negative the next
> byte afterwards designates an options length. If the first byte of the
> options length is terminal and negative i.e. has high and low bit set, then
> it is just 6 bits of "flags" how ever you want to use them. If not it
> designates that a block of control data exists and its length.
>
> Actual Payload
> Absolute value of "Total Packet Size" - if exists( optional data
> length encoded size + absolute value of optional data length ).
>
> For your uses, you could modify the encoding such that the first
> byte of length also encodes the "more" flag or whatever you use it for, and
> that still gives you +-32 byte packages at 1 byte and the optional data
> field available.
>
>
> Sorry to go on about this, you got me to thinking about my "desired
> ultimate" solution, which 0mq is only partially there in my opinion. Again
> though, I have different requirements than what it seems 0mq was written
> for.
>
> KB
>
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
>
--
-
Pieter Hintjens
iMatix - www.imatix.com
More information about the zeromq-dev
mailing list