[zeromq-dev] Efficiency for very few short messages

Jason Smith jason.nevar.smith at gmail.com
Tue Jan 29 07:58:48 CET 2013

Hi Dan,

I have found the issue with the processing times.

	for(iequation = 0 ; iequation < nequation ; iequation++)	
		zmq_msg_t msg;
		rc = zmq_msg_init (&msg);
		rc = zmq_msg_init_size(&msg, 8);
		memset (zmq_msg_data (&msg), 'A', 8);
		ithread = messageCounter % nthread ;  <---- RIGHT HERE
		messageCounter++ ;
		void * socket = socketsSend[ithread];
		rc = zmq_sendmsg (socket, &msg, 0);

The code above doesn't take into account time it takes for the
passed equation. It treats them all as being of equal "work" which they
don't appear to be. This means that some threads will sit around waiting
for a very long time while others are still busy with three or four items
on their queue.

This is where a load balancing pattern would be very handy. Search for the
line with "lbbroker: Load-balancing broker in C" in the zguide for
an explanation and example code. (http://zguide.zeromq.org/page:all).

The short of it is, have your application have a req socket in it. Send on
that req socket to a router (frontend) in another thread. All this threads
job is to do is work out which thread is not busy (first in the list if
need be) and then route the packet to that thread. This is done through
another router (backend) socket connected to each "Worker" thread that you
currently have. These then do the work and message back the result to the
router (backend) which then knows it can pass the result all the way back
to the requesting "client" (frontend). The reason for the second thread to
determine where the work has to be sent is because you won't know until its
being worked on how long something will take in this case. Predetermining
this is causing the issues with regards to only a 3 to 5 times speed up on
my machine.

The zguide has a wonderful diagram of this. Its very simplistic and doesn't
handle crashes, or overloading, etc. These would have to be worked into the
end solution based on your environments needs.

If I get a chance tonight I might knock something up using your example.
Depends on how much packing I get done, haha.

The way I found this was the issue is simply counting the time each thread
was "waiting" and "processing" found that some were super busy processing
while others were just sitting around. So you guess was right about the
sockets just sitting there in some threads. The time being "wasted" however
is sadly a design issue at this point, not so much ZeroMQ ;)

Hope that helps.

Lastly as a bonus, this load balancing pattern means you would be able to
add as many front ends and back-ends as you saw fit. Only the "balancer" is
static in this design.

- J

On 29 January 2013 16:30, dan smith <dan25311 at gmail.com> wrote:

> Hi Jason,
> Thanks a lot for devoting your time to my problem. My expertise is
> negligible in this area.
> Looks like that symptom might be CPU dependent ? I tried it just on a
> quad-core laptop, it has 16G memory though.
> This problem is really important so I started to evaluate alternative
> solutions. I found lock-free queues , more specifically lock-free
> single-producer - single-consumer circular queues. I was impressed by the
> latency: I could send 10 000 000 (ten millions) 8 bytes messages in one
> second. It is a very simple thing , there are many versions of it. Latency
> is in the 100 nanoseconds range. I do not know the reasons but looks like
> it is faster for this kind of communication.
> Using it I could reach 30 % speedup for the real problem so the parallel
> version is faster by now at least, still not fast enough though...
> Now the problem is how to notify quickly the threads that data is coming.
> I will test both solutions on a better machine with more cores. Maybe if
> we have got just few messages, they spend some time in a cache or
> something. If this is the case, is there a way to forward them to the CPU
> more quickly? Any further input will be appreciated.
> Thank you again,
> Dan
> On Mon, Jan 28, 2013 at 6:26 PM, Jason Smith <jason.nevar.smith at gmail.com>wrote:
>> Hi Dan,
>> Just tested the debug version and it does drop but not as much as you
>> listed. Also of note I have been testing on 64 bit windows 7, i7-2600 with
>> a large amount of Ram. The next test for me will be to look at where the
>> time is taken up, however thought I would report on what I have seen so
>> far.
>> - J
>> On 29 January 2013 11:16, Jason Smith <jason.nevar.smith at gmail.com>wrote:
>>> Hi Dan,
>>> Here's something I have found with your code. Testing here I see the
>>> same speed up for all numbers of equations. I am using the release version
>>> of the dll however. About to test the debug version of the dll to see if I
>>> get different behaviour.
>>> - J
>>> On 23 January 2013 13:56, dan smith <dan25311 at gmail.com> wrote:
>>>> Jason,
>>>> Thanks a lot for taking a look at it.
>>>> As for the "while(nfinish > 0" loop, my experience is that it does not
>>>> have significant effect on the time. If I remove it and allow the threads
>>>> to die, the difference is negligible. In the real application the threads
>>>> needs to remain alive of course, I just tried to check that the thread
>>>> closing is not the reason.
>>>> Closing the sockets in threads might not be the reason either, a
>>>> terminating message is sent back to the main thread before that.
>>>> I use zeromq-3.2.2.
>>>> In the real application I am sending a pointer, here the 8 As simulate
>>>> that.
>>>> I am looking forward to your further comments very much. Hope that I am
>>>> the one who made some mistake and there is a solution for sending few small
>>>> messages at the latency that I measured for large number of messages (that
>>>> was under 1 microseconds which would be cool)
>>>> On Tue, Jan 22, 2013 at 8:13 PM, Jason Smith <
>>>> jason.nevar.smith at gmail.com> wrote:
>>>>> On 23 January 2013 11:42, dan smith <dan25311 at gmail.com> wrote:
>>>>>> while(nfinish > 0)
>>>>> Haven't had a chance to compile this here. For some reason have a
>>>>> linker issue on my work machine.
>>>>> At first glance the "while(nfinish > 0)" loop assumes sequential
>>>>> thread completion for best time. For example you only know of thread
>>>>> 7 finishing only until 1 through to 6 have completed. Don't know if this is
>>>>> affecting things drastically or not. Maybe switching to polling here and
>>>>> updating a "completed" vector list might work better.
>>>>> Another area I would look into is the linger of the sockets, it
>>>>> shouldn't affect closing them down within the thread however its something
>>>>> to consider.
>>>>> When I get a chance I would be looking to place more asserts in to
>>>>> make sure messages were doing what I thought they were (send and receive
>>>>> calls return values). Then I would be checking the timing of any close down
>>>>> code.
>>>>> Hope this helps in the meantime.
>>>>> _______________________________________________
>>>>> zeromq-dev mailing list
>>>>> zeromq-dev at lists.zeromq.org
>>>>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>>> _______________________________________________
>>>> zeromq-dev mailing list
>>>> zeromq-dev at lists.zeromq.org
>>>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>> _______________________________________________
>> zeromq-dev mailing list
>> zeromq-dev at lists.zeromq.org
>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20130129/d0dd163c/attachment.htm>

More information about the zeromq-dev mailing list