[zeromq-dev] IOCP
Martin Sustrik
sustrik at 250bpm.com
Fri Aug 27 07:30:55 CEST 2010
Hi Mike, Kelly,
>> I would be interested in working on the I/O of zmq, but first I need a
>> comprehensive overview of some of the supported layers. While digging
>> through the guts is possible, it might save some dead end encounters if we
>> got a briefing. There are a couple of key concepts in the locking and
>> threading models that I would like to clarify.
>>
>> From what I have gathered from Martin thus far, to implement the power of
>> Windows IOCP we would need to do a low level re-write of the I/O. Without
>> digging for hours, I am not sure how abstracted this is
Feel free to ask, either by email or on IRC.
Basically, send/recv operations on a socket are encapsulated in
src/tcp_socket.cpp. You many notice that it's already ifdef'd for Win
platform, so any changes needed can be done there without affecting
other platforms.
As for polling, you have to create a new poller class (iocp_t). The API
should mimic existing polling classes (select, poll, epoll, etc.):
// "poller" concept.
handle_t add_fd (fd_t fd_, struct i_poll_events *events_);
void rm_fd (handle_t handle_);
void set_pollin (handle_t handle_);
void reset_pollin (handle_t handle_);
void set_pollout (handle_t handle_);
void reset_pollout (handle_t handle_);
void add_timer (struct i_poll_events *events_);
void cancel_timer (struct i_poll_events *events_);
int get_load ();
void start ();
void stop ();
For plugging the new poller class in, have a look at poller.hpp. What it
requires is a single typedef:
typedef iocp_t poller_t;
> I can definitely agree that there is some learning involved, but I
> usually just dig in and get going. My initial intentions are mostly just to
> figure out where the hooking points are going to be located. I was planning
> on starting that probably later tonight once I fix some Oracle crap at work.
>
>> I am going to post some simpleified IOCP code on my site soon that
>> presents
>> a simple but complete solution. Most examples I have seen tend to make it
>> hard to see the wood for the trees.
>>
>> http://www.coastrd.com/windows-iocp
>
> I've read that article before, very good. But, as you say, IOCP
> code tends to be very difficult, especially when you start worrying about
> all the little optimizations and details. My thought right now is to keep
> it as brain dead simple as possible probably going so far as a single worker
> thread to start with just to get rid of the select systems and get this
> started. Heck, it may even be slower than select in that case but at least
> there would be no more FD_SET limitation. :)
+1
Simple implementation would get the basic infrastructure right. Trying
to optimise straight away is likely to make the design chaotic.
> Some further thoughts in regards to your article, I didn't see these
> points mentioned (sorry if you already know all this, but I like to get it
> all out there due to the fact that IOCP is a very black art in some areas):
>
> 1. Any of the read/write buffers get pinned and non-swappable while
> outstanding. If they are not pooled and are scattered around, the smallest
> pinning will take a 4k chunk out of your 2Gig (Xp) or 3Gig (Vista/7) VM
> limit (or larger chunks on some machines). So, depending on if you intend
> say 1000+ sockets you may not want to feed buffers to IOCP since it can
> quickly run you out of available VM on 32bit machines. But, you then trade
> off the no-memory-copy ability for the extra latency.
>
> 2. Another source of latency I didn't see mentioned is that especially
> with the zero'd socket buffers, you want to have multiple outstanding reads
> queued up at any given time. This means reordering on the backend which can
> be a major hassle and still requires memory copies to present a contiguous
> stream of data to the users unless they can work on a chunked stream. The
> best number and size of outstanding reads is very "experimental", we never
> did find a good way to "guess" that worked consistently over different
> machines, hell sometimes even 2 "identical" Dell blades sometimes showed
> notable differences. (Dell is notorious for using different series chips
> which can throw things off.)
>
> 3. Additional to #2, when you have the multiple outstanding reads, you
> have to deal with reordering on the backend as mentioned. Unfortunately, if
> done incorrectly, can cause issues and yet again, memory copies, which at
> this point you may as well go back to letting Win32 do the buffering and
> forget about the no-copy design. The solution I used for 40K Online was a
> lockless circular queue with an acquire/commit/release semantic in
> conjunction with potentially "partial" chunks. This was actually pretty
> simple but would take up considerable space to describe. Let me know if you
> want more details.
>
> Other than those items, I think your article covered the rest. But
> as a final note, an example of IOCP, Boost ASIO is hugely complicated and
> takes very poor advantage of IOCP. Yet, it is actually pretty fast in
> fairly comprehensive comparisons. It does not leverage zero-mem copy,
> multiple outstanding reads or any of the really focused items but it is
> relatively comparable performance wise to kqueue and epoll versions running
> on identical hardware.
>
> I'm all for shooting for the moon on this, don't get me wrong. But
> to start with, I like your suggestion to keep it as simple and clean as
> possible. We can deal with all the tricks of IOCP from that starting point
> later.
>
> For the moment though, I'm going to setup a svn for hacking around
> on this (unless we can get a branch setup somewhere else?)
Go to http://github.com/zeromq/zeromq2 and click of 'fork' button.
You'll have to use git then which may be pain in the ass at first if you
are not familiar with it, but it _really_ is a better collaboration tool
than SVN.
> while I learn and
> probably make notes on my wiki server. If you would like access let me
> know. Oh, and I also made a Premake4 build for Zmq just because I'm
> Window's/OsX centric and tend to use XCode3 and VS2010 for my primary
> editors and debugging purposes. I'll pop that into the svn once I get zqm
> in there.
Martin
PS. I am cc'ing the mailing list so that people know what we are cooking.
More information about the zeromq-dev
mailing list