[zeromq-dev] Efficiency for very few short messages

dan smith dan25311 at gmail.com
Mon Feb 4 00:49:01 CET 2013


Hi Jason and others,

I am trying to implement the load balancing pattern idea. First I just
would like to make "lbbroker: Load-balancing broker in C" code from the
Guide to work on Windows 64 with VC2010.

All that I changed in it was creating the threads using Windows like this:

int client_nbr;
for (client_nbr = 0; client_nbr < NBR_CLIENTS; client_nbr++)
{
HANDLE localHandle = (HANDLE) _beginthreadex(NULL, 0, client_task, NULL, 0,
NULL);

}
int worker_nbr;
for (worker_nbr = 0; worker_nbr < NBR_WORKERS; worker_nbr++)
{
HANDLE localHandle = (HANDLE) _beginthreadex(NULL, 0, worker_task, NULL, 0,
NULL);

}

For some reason in hangs in select() in zmq_poll() .

What can be the reason for that?


On Sun, Feb 3, 2013 at 5:44 PM, dan smith <dan25311 at gmail.com> wrote:

>
> Hi Jason and others,
>
> I am trying to implement the load balancing pattern idea. First I just
> would like to make "lbbroker: Load-balancing broker in C" code from the
> Guide to work on Windows 64.
>
> All that I changed in it was creating the threads using Windows like this:
>
> int client_nbr;
> for (client_nbr = 0; client_nbr < NBR_CLIENTS; client_nbr++)
> {
> HANDLE localHandle = (HANDLE) _beginthreadex(NULL, 0, client_task, NULL,
> 0, NULL);
>
> }
> int worker_nbr;
> for (worker_nbr = 0; worker_nbr < NBR_WORKERS; worker_nbr++)
> {
> HANDLE localHandle = (HANDLE) _beginthreadex(NULL, 0, worker_task, NULL,
> 0, NULL);
>
> }
>
>
>
> On Tue, Jan 29, 2013 at 9:06 AM, dan smith <dan25311 at gmail.com> wrote:
>
>> Jason,
>>
>> Thanks for the suggestion. I will apply the lbb broker pattern right away
>> to that problem and will share the results. To me it is a good news that
>> this is a design issue...
>>
>> Dan
>>
>>
>> On Tue, Jan 29, 2013 at 12:58 AM, Jason Smith <
>> jason.nevar.smith at gmail.com> wrote:
>>
>>> Hi Dan,
>>>
>>> I have found the issue with the processing times.
>>>
>>>
>>> 	for(iequation = 0 ; iequation < nequation ; iequation++)	
>>> 	{
>>> 		zmq_msg_t msg;
>>> 		rc = zmq_msg_init (&msg);
>>> 		rc = zmq_msg_init_size(&msg, 8);
>>> 		memset (zmq_msg_data (&msg), 'A', 8);
>>>
>>>
>>>
>>> 		ithread = messageCounter % nthread ;  <---- RIGHT HERE
>>>
>>>
>>>
>>> 		messageCounter++ ;
>>> 		void * socket = socketsSend[ithread];
>>> 		rc = zmq_sendmsg (socket, &msg, 0);
>>> 		zmq_msg_close(&msg);
>>> 	}
>>>
>>> The code above doesn't take into account time it takes for the
>>> passed equation. It treats them all as being of equal "work" which they
>>> don't appear to be. This means that some threads will sit around waiting
>>> for a very long time while others are still busy with three or four items
>>> on their queue.
>>>
>>> This is where a load balancing pattern would be very handy. Search for
>>> the line with "lbbroker: Load-balancing broker in C" in the zguide for
>>> an explanation and example code. (http://zguide.zeromq.org/page:all).
>>>
>>> The short of it is, have your application have a req socket in it. Send
>>> on that req socket to a router (frontend) in another thread. All this
>>> threads job is to do is work out which thread is not busy (first in the
>>> list if need be) and then route the packet to that thread. This is done
>>> through another router (backend) socket connected to each "Worker" thread
>>> that you currently have. These then do the work and message back the result
>>> to the router (backend) which then knows it can pass the result all the way
>>> back to the requesting "client" (frontend). The reason for the second
>>> thread to determine where the work has to be sent is because you won't know
>>> until its being worked on how long something will take in this case.
>>> Predetermining this is causing the issues with regards to only a 3 to 5
>>> times speed up on my machine.
>>>
>>> The zguide has a wonderful diagram of this. Its very simplistic and
>>> doesn't handle crashes, or overloading, etc. These would have to be worked
>>> into the end solution based on your environments needs.
>>>
>>> If I get a chance tonight I might knock something up using your example.
>>> Depends on how much packing I get done, haha.
>>>
>>> The way I found this was the issue is simply counting the time each
>>> thread was "waiting" and "processing" found that some were super busy
>>> processing while others were just sitting around. So you guess was right
>>> about the sockets just sitting there in some threads. The time being
>>> "wasted" however is sadly a design issue at this point, not so much ZeroMQ
>>> ;)
>>>
>>> Hope that helps.
>>>
>>> Lastly as a bonus, this load balancing pattern means you would be able
>>> to add as many front ends and back-ends as you saw fit. Only the "balancer"
>>> is static in this design.
>>>
>>> - J
>>>
>>>
>>> On 29 January 2013 16:30, dan smith <dan25311 at gmail.com> wrote:
>>>
>>>>
>>>> Hi Jason,
>>>>
>>>> Thanks a lot for devoting your time to my problem. My expertise is
>>>> negligible in this area.
>>>>
>>>> Looks like that symptom might be CPU dependent ? I tried it just on a
>>>> quad-core laptop, it has 16G memory though.
>>>>
>>>> This problem is really important so I started to evaluate alternative
>>>> solutions. I found lock-free queues , more specifically lock-free
>>>> single-producer - single-consumer circular queues. I was impressed by the
>>>> latency: I could send 10 000 000 (ten millions) 8 bytes messages in one
>>>> second. It is a very simple thing , there are many versions of it. Latency
>>>> is in the 100 nanoseconds range. I do not know the reasons but looks like
>>>> it is faster for this kind of communication.
>>>>
>>>> Using it I could reach 30 % speedup for the real problem so the
>>>> parallel version is faster by now at least, still not fast enough though...
>>>>
>>>> Now the problem is how to notify quickly the threads that data is
>>>> coming.
>>>>
>>>> I will test both solutions on a better machine with more cores. Maybe
>>>> if we have got just few messages, they spend some time in a cache or
>>>> something. If this is the case, is there a way to forward them to the CPU
>>>> more quickly? Any further input will be appreciated.
>>>>
>>>> Thank you again,
>>>>
>>>> Dan
>>>>
>>>>
>>>>
>>>>
>>>> On Mon, Jan 28, 2013 at 6:26 PM, Jason Smith <
>>>> jason.nevar.smith at gmail.com> wrote:
>>>>
>>>>> Hi Dan,
>>>>>
>>>>> Just tested the debug version and it does drop but not as much as you
>>>>> listed. Also of note I have been testing on 64 bit windows 7, i7-2600 with
>>>>> a large amount of Ram. The next test for me will be to look at where the
>>>>> time is taken up, however thought I would report on what I have seen so
>>>>> far.
>>>>>
>>>>> - J
>>>>>
>>>>>
>>>>> On 29 January 2013 11:16, Jason Smith <jason.nevar.smith at gmail.com>wrote:
>>>>>
>>>>>> Hi Dan,
>>>>>>
>>>>>> Here's something I have found with your code. Testing here I see the
>>>>>> same speed up for all numbers of equations. I am using the release version
>>>>>> of the dll however. About to test the debug version of the dll to see if I
>>>>>> get different behaviour.
>>>>>>
>>>>>> - J
>>>>>>
>>>>>>
>>>>>> On 23 January 2013 13:56, dan smith <dan25311 at gmail.com> wrote:
>>>>>>
>>>>>>> Jason,
>>>>>>>
>>>>>>> Thanks a lot for taking a look at it.
>>>>>>>
>>>>>>> As for the "while(nfinish > 0" loop, my experience is that it does
>>>>>>> not have significant effect on the time. If I remove it and allow the
>>>>>>> threads to die, the difference is negligible. In the real application the
>>>>>>> threads needs to remain alive of course, I just tried to check that the
>>>>>>> thread closing is not the reason.
>>>>>>>
>>>>>>> Closing the sockets in threads might not be the reason either, a
>>>>>>> terminating message is sent back to the main thread before that.
>>>>>>>
>>>>>>> I use zeromq-3.2.2.
>>>>>>>
>>>>>>> In the real application I am sending a pointer, here the 8 As
>>>>>>> simulate that.
>>>>>>>
>>>>>>> I am looking forward to your further comments very much. Hope that I
>>>>>>> am the one who made some mistake and there is a solution for sending few
>>>>>>> small messages at the latency that I measured for large number of messages
>>>>>>> (that was under 1 microseconds which would be cool)
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Jan 22, 2013 at 8:13 PM, Jason Smith <
>>>>>>> jason.nevar.smith at gmail.com> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> On 23 January 2013 11:42, dan smith <dan25311 at gmail.com> wrote:
>>>>>>>>
>>>>>>>>> while(nfinish > 0)
>>>>>>>>
>>>>>>>>
>>>>>>>> Haven't had a chance to compile this here. For some reason have a
>>>>>>>> linker issue on my work machine.
>>>>>>>>
>>>>>>>> At first glance the "while(nfinish > 0)" loop assumes sequential
>>>>>>>> thread completion for best time. For example you only know of
>>>>>>>> thread 7 finishing only until 1 through to 6 have completed. Don't know if
>>>>>>>> this is affecting things drastically or not. Maybe switching to polling
>>>>>>>> here and updating a "completed" vector list might work better.
>>>>>>>>
>>>>>>>> Another area I would look into is the linger of the sockets, it
>>>>>>>> shouldn't affect closing them down within the thread however its something
>>>>>>>> to consider.
>>>>>>>>
>>>>>>>> When I get a chance I would be looking to place more asserts in to
>>>>>>>> make sure messages were doing what I thought they were (send and receive
>>>>>>>> calls return values). Then I would be checking the timing of any close down
>>>>>>>> code.
>>>>>>>>
>>>>>>>> Hope this helps in the meantime.
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> zeromq-dev mailing list
>>>>>>>> zeromq-dev at lists.zeromq.org
>>>>>>>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> zeromq-dev mailing list
>>>>>>> zeromq-dev at lists.zeromq.org
>>>>>>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> zeromq-dev mailing list
>>>>> zeromq-dev at lists.zeromq.org
>>>>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> zeromq-dev mailing list
>>>> zeromq-dev at lists.zeromq.org
>>>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>>>
>>>>
>>>
>>> _______________________________________________
>>> zeromq-dev mailing list
>>> zeromq-dev at lists.zeromq.org
>>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20130203/fc19d19d/attachment.html>


More information about the zeromq-dev mailing list