[zeromq-dev] IOCP

Martin Sustrik sustrik at 250bpm.com
Fri Aug 27 07:30:55 CEST 2010


Hi Mike, Kelly,

>> I would be interested in working on the I/O of zmq, but first I need a
>> comprehensive overview of some of the supported layers. While digging
>> through the guts is possible, it might save some dead end encounters if we
>> got a briefing. There are a couple of key concepts in the locking and
>> threading models that I would like to clarify.
>>
>>  From what I have gathered from Martin thus far, to implement the power of
>> Windows IOCP we would need to do a low level re-write of the I/O. Without
>> digging for hours, I am not sure how abstracted this is

Feel free to ask, either by email or on IRC.

Basically, send/recv operations on a socket are encapsulated in 
src/tcp_socket.cpp. You many notice that it's already ifdef'd for Win 
platform, so any changes needed can be done there without affecting 
other platforms.

As for polling, you have to create a new poller class (iocp_t). The API 
should mimic existing polling classes (select, poll, epoll, etc.):

         //  "poller" concept.
         handle_t add_fd (fd_t fd_, struct i_poll_events *events_);
         void rm_fd (handle_t handle_);
         void set_pollin (handle_t handle_);
         void reset_pollin (handle_t handle_);
         void set_pollout (handle_t handle_);
         void reset_pollout (handle_t handle_);
         void add_timer (struct i_poll_events *events_);
         void cancel_timer (struct i_poll_events *events_);
         int get_load ();
         void start ();
         void stop ();

For plugging the new poller class in, have a look at poller.hpp. What it 
requires is a single typedef:

         typedef iocp_t poller_t;

> 	I can definitely agree that there is some learning involved, but I
> usually just dig in and get going.  My initial intentions are mostly just to
> figure out where the hooking points are going to be located.  I was planning
> on starting that probably later tonight once I fix some Oracle crap at work.
>
>> I am going to post some simpleified IOCP code on my site soon that
>> presents
>> a simple but complete solution. Most examples I have seen tend to make it
>> hard to see the wood for the trees.
>>
>> http://www.coastrd.com/windows-iocp
>
> 	I've read that article before, very good.  But, as you say, IOCP
> code tends to be very difficult, especially when you start worrying about
> all the little optimizations and details.  My thought right now is to keep
> it as brain dead simple as possible probably going so far as a single worker
> thread to start with just to get rid of the select systems and get this
> started.  Heck, it may even be slower than select in that case but at least
> there would be no more FD_SET limitation. :)

+1

Simple implementation would get the basic infrastructure right. Trying 
to optimise straight away is likely to make the design chaotic.

> 	Some further thoughts in regards to your article, I didn't see these
> points mentioned (sorry if you already know all this, but I like to get it
> all out there due to the fact that IOCP is a very black art in some areas):
>
> 1.	Any of the read/write buffers get pinned and non-swappable while
> outstanding.  If they are not pooled and are scattered around, the smallest
> pinning will take a 4k chunk out of your 2Gig (Xp) or 3Gig (Vista/7) VM
> limit (or larger chunks on some machines).  So, depending on if you intend
> say 1000+ sockets you may not want to feed buffers to IOCP since it can
> quickly run you out of available VM on 32bit machines.  But, you then trade
> off the no-memory-copy ability for the extra latency.
>
> 2.	Another source of latency I didn't see mentioned is that especially
> with the zero'd socket buffers, you want to have multiple outstanding reads
> queued up at any given time.  This means reordering on the backend which can
> be a major hassle and still requires memory copies to present a contiguous
> stream of data to the users unless they can work on a chunked stream.  The
> best number and size of outstanding reads is very "experimental", we never
> did find a good way to "guess" that worked consistently over different
> machines, hell sometimes even 2 "identical" Dell blades sometimes showed
> notable differences.  (Dell is notorious for using different series chips
> which can throw things off.)
>
> 3.	Additional to #2, when you have the multiple outstanding reads, you
> have to deal with reordering on the backend as mentioned.  Unfortunately, if
> done incorrectly, can cause issues and yet again, memory copies, which at
> this point you may as well go back to letting Win32 do the buffering and
> forget about the no-copy design.  The solution I used for 40K Online was a
> lockless circular queue with an acquire/commit/release semantic in
> conjunction with potentially "partial" chunks.  This was actually pretty
> simple but would take up considerable space to describe.  Let me know if you
> want more details.
>
> 	Other than those items, I think your article covered the rest. But
> as a final note, an example of IOCP, Boost ASIO is hugely complicated and
> takes very poor advantage of IOCP.  Yet, it is actually pretty fast in
> fairly comprehensive comparisons.  It does not leverage zero-mem copy,
> multiple outstanding reads or any of the really focused items but it is
> relatively comparable performance wise to kqueue and epoll versions running
> on identical hardware.
>
> 	I'm all for shooting for the moon on this, don't get me wrong.  But
> to start with, I like your suggestion to keep it as simple and clean as
> possible.  We can deal with all the tricks of IOCP from that starting point
> later.
>
> 	For the moment though, I'm going to setup a svn for hacking around
> on this (unless we can get a branch setup somewhere else?)

Go to http://github.com/zeromq/zeromq2 and click of 'fork' button.

You'll have to use git then which may be pain in the ass at first if you 
are not familiar with it, but it _really_ is a better collaboration tool 
than SVN.

> while I learn and
> probably make notes on my wiki server.  If you would like access let me
> know.  Oh, and I also made a Premake4 build for Zmq just because I'm
> Window's/OsX centric and tend to use XCode3 and VS2010 for my primary
> editors and debugging purposes.  I'll pop that into the svn once I get zqm
> in there.

Martin

PS. I am cc'ing the mailing list so that people know what we are cooking.



More information about the zeromq-dev mailing list