[zeromq-dev] Efficiency for very few short messages
dan smith
dan25311 at gmail.com
Mon Feb 4 00:44:27 CET 2013
Hi Jason and others,
I am trying to implement the load balancing pattern idea. First I just
would like to make "lbbroker: Load-balancing broker in C" code from the
Guide to work on Windows 64.
All that I changed in it was creating the threads using Windows like this:
int client_nbr;
for (client_nbr = 0; client_nbr < NBR_CLIENTS; client_nbr++)
{
HANDLE localHandle = (HANDLE) _beginthreadex(NULL, 0, client_task, NULL, 0,
NULL);
}
int worker_nbr;
for (worker_nbr = 0; worker_nbr < NBR_WORKERS; worker_nbr++)
{
HANDLE localHandle = (HANDLE) _beginthreadex(NULL, 0, worker_task, NULL, 0,
NULL);
}
On Tue, Jan 29, 2013 at 9:06 AM, dan smith <dan25311 at gmail.com> wrote:
> Jason,
>
> Thanks for the suggestion. I will apply the lbb broker pattern right away
> to that problem and will share the results. To me it is a good news that
> this is a design issue...
>
> Dan
>
>
> On Tue, Jan 29, 2013 at 12:58 AM, Jason Smith <jason.nevar.smith at gmail.com
> > wrote:
>
>> Hi Dan,
>>
>> I have found the issue with the processing times.
>>
>>
>> for(iequation = 0 ; iequation < nequation ; iequation++)
>> {
>> zmq_msg_t msg;
>> rc = zmq_msg_init (&msg);
>> rc = zmq_msg_init_size(&msg, 8);
>> memset (zmq_msg_data (&msg), 'A', 8);
>>
>>
>> ithread = messageCounter % nthread ; <---- RIGHT HERE
>>
>>
>> messageCounter++ ;
>> void * socket = socketsSend[ithread];
>> rc = zmq_sendmsg (socket, &msg, 0);
>> zmq_msg_close(&msg);
>> }
>>
>> The code above doesn't take into account time it takes for the
>> passed equation. It treats them all as being of equal "work" which they
>> don't appear to be. This means that some threads will sit around waiting
>> for a very long time while others are still busy with three or four items
>> on their queue.
>>
>> This is where a load balancing pattern would be very handy. Search for
>> the line with "lbbroker: Load-balancing broker in C" in the zguide for
>> an explanation and example code. (http://zguide.zeromq.org/page:all).
>>
>> The short of it is, have your application have a req socket in it. Send
>> on that req socket to a router (frontend) in another thread. All this
>> threads job is to do is work out which thread is not busy (first in the
>> list if need be) and then route the packet to that thread. This is done
>> through another router (backend) socket connected to each "Worker" thread
>> that you currently have. These then do the work and message back the result
>> to the router (backend) which then knows it can pass the result all the way
>> back to the requesting "client" (frontend). The reason for the second
>> thread to determine where the work has to be sent is because you won't know
>> until its being worked on how long something will take in this case.
>> Predetermining this is causing the issues with regards to only a 3 to 5
>> times speed up on my machine.
>>
>> The zguide has a wonderful diagram of this. Its very simplistic and
>> doesn't handle crashes, or overloading, etc. These would have to be worked
>> into the end solution based on your environments needs.
>>
>> If I get a chance tonight I might knock something up using your example.
>> Depends on how much packing I get done, haha.
>>
>> The way I found this was the issue is simply counting the time each
>> thread was "waiting" and "processing" found that some were super busy
>> processing while others were just sitting around. So you guess was right
>> about the sockets just sitting there in some threads. The time being
>> "wasted" however is sadly a design issue at this point, not so much ZeroMQ
>> ;)
>>
>> Hope that helps.
>>
>> Lastly as a bonus, this load balancing pattern means you would be able to
>> add as many front ends and back-ends as you saw fit. Only the "balancer" is
>> static in this design.
>>
>> - J
>>
>>
>> On 29 January 2013 16:30, dan smith <dan25311 at gmail.com> wrote:
>>
>>>
>>> Hi Jason,
>>>
>>> Thanks a lot for devoting your time to my problem. My expertise is
>>> negligible in this area.
>>>
>>> Looks like that symptom might be CPU dependent ? I tried it just on a
>>> quad-core laptop, it has 16G memory though.
>>>
>>> This problem is really important so I started to evaluate alternative
>>> solutions. I found lock-free queues , more specifically lock-free
>>> single-producer - single-consumer circular queues. I was impressed by the
>>> latency: I could send 10 000 000 (ten millions) 8 bytes messages in one
>>> second. It is a very simple thing , there are many versions of it. Latency
>>> is in the 100 nanoseconds range. I do not know the reasons but looks like
>>> it is faster for this kind of communication.
>>>
>>> Using it I could reach 30 % speedup for the real problem so the parallel
>>> version is faster by now at least, still not fast enough though...
>>>
>>> Now the problem is how to notify quickly the threads that data is coming.
>>>
>>> I will test both solutions on a better machine with more cores. Maybe if
>>> we have got just few messages, they spend some time in a cache or
>>> something. If this is the case, is there a way to forward them to the CPU
>>> more quickly? Any further input will be appreciated.
>>>
>>> Thank you again,
>>>
>>> Dan
>>>
>>>
>>>
>>>
>>> On Mon, Jan 28, 2013 at 6:26 PM, Jason Smith <
>>> jason.nevar.smith at gmail.com> wrote:
>>>
>>>> Hi Dan,
>>>>
>>>> Just tested the debug version and it does drop but not as much as you
>>>> listed. Also of note I have been testing on 64 bit windows 7, i7-2600 with
>>>> a large amount of Ram. The next test for me will be to look at where the
>>>> time is taken up, however thought I would report on what I have seen so
>>>> far.
>>>>
>>>> - J
>>>>
>>>>
>>>> On 29 January 2013 11:16, Jason Smith <jason.nevar.smith at gmail.com>wrote:
>>>>
>>>>> Hi Dan,
>>>>>
>>>>> Here's something I have found with your code. Testing here I see the
>>>>> same speed up for all numbers of equations. I am using the release version
>>>>> of the dll however. About to test the debug version of the dll to see if I
>>>>> get different behaviour.
>>>>>
>>>>> - J
>>>>>
>>>>>
>>>>> On 23 January 2013 13:56, dan smith <dan25311 at gmail.com> wrote:
>>>>>
>>>>>> Jason,
>>>>>>
>>>>>> Thanks a lot for taking a look at it.
>>>>>>
>>>>>> As for the "while(nfinish > 0" loop, my experience is that it does
>>>>>> not have significant effect on the time. If I remove it and allow the
>>>>>> threads to die, the difference is negligible. In the real application the
>>>>>> threads needs to remain alive of course, I just tried to check that the
>>>>>> thread closing is not the reason.
>>>>>>
>>>>>> Closing the sockets in threads might not be the reason either, a
>>>>>> terminating message is sent back to the main thread before that.
>>>>>>
>>>>>> I use zeromq-3.2.2.
>>>>>>
>>>>>> In the real application I am sending a pointer, here the 8 As
>>>>>> simulate that.
>>>>>>
>>>>>> I am looking forward to your further comments very much. Hope that I
>>>>>> am the one who made some mistake and there is a solution for sending few
>>>>>> small messages at the latency that I measured for large number of messages
>>>>>> (that was under 1 microseconds which would be cool)
>>>>>>
>>>>>>
>>>>>> On Tue, Jan 22, 2013 at 8:13 PM, Jason Smith <
>>>>>> jason.nevar.smith at gmail.com> wrote:
>>>>>>
>>>>>>>
>>>>>>> On 23 January 2013 11:42, dan smith <dan25311 at gmail.com> wrote:
>>>>>>>
>>>>>>>> while(nfinish > 0)
>>>>>>>
>>>>>>>
>>>>>>> Haven't had a chance to compile this here. For some reason have a
>>>>>>> linker issue on my work machine.
>>>>>>>
>>>>>>> At first glance the "while(nfinish > 0)" loop assumes sequential
>>>>>>> thread completion for best time. For example you only know of
>>>>>>> thread 7 finishing only until 1 through to 6 have completed. Don't know if
>>>>>>> this is affecting things drastically or not. Maybe switching to polling
>>>>>>> here and updating a "completed" vector list might work better.
>>>>>>>
>>>>>>> Another area I would look into is the linger of the sockets, it
>>>>>>> shouldn't affect closing them down within the thread however its something
>>>>>>> to consider.
>>>>>>>
>>>>>>> When I get a chance I would be looking to place more asserts in to
>>>>>>> make sure messages were doing what I thought they were (send and receive
>>>>>>> calls return values). Then I would be checking the timing of any close down
>>>>>>> code.
>>>>>>>
>>>>>>> Hope this helps in the meantime.
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> zeromq-dev mailing list
>>>>>>> zeromq-dev at lists.zeromq.org
>>>>>>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> zeromq-dev mailing list
>>>>>> zeromq-dev at lists.zeromq.org
>>>>>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>>>>>
>>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> zeromq-dev mailing list
>>>> zeromq-dev at lists.zeromq.org
>>>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>>>
>>>>
>>>
>>> _______________________________________________
>>> zeromq-dev mailing list
>>> zeromq-dev at lists.zeromq.org
>>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>>
>>>
>>
>> _______________________________________________
>> zeromq-dev mailing list
>> zeromq-dev at lists.zeromq.org
>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20130203/cb4bf309/attachment.htm>
More information about the zeromq-dev
mailing list