[zeromq-dev] ZMQ hanged :(

Ivan Pechorin ivan.pechorin at gmail.com
Thu Apr 11 14:19:21 CEST 2013


Hi Gonzalo,

I have a small app that accepts web service requests (Apache CXF inside
Jetty servlet container) and uses JeroMQ to pass requests to a backend
server.

There is a single dedicated thread that runs ZLoop with the following
zeromq sockets:
 1) inproc PULL socket to receive requests from multiple webservice threads
that process requests;
 2) tcp DEALER socket to communicate with backend server completely
asynchronously;
 3) inproc PULL socket to receive "stop" signal (sent from some other
thread when it's time to stop).

There is one more socket for use by the multiple webservice threads:
 4) inproc PUSH socket (connected to socket #1 above).

Every webservice worker thread processes each request as follows:
 1) get an unique Id for the request;
 2) register the Id in a map of pending requests (basically, push an
instance of dummy "Waiter" class into the map, using the Id as a key);
 3) push the request to the loop using the inproc PUSH socket #4 - access
to the socket is protected by "synchronized" block, of course;
 4) put a "waiter" object into the
 4) wait for a reply on the waiter object - standard Java wait() is used.

The loop works as follows:
 1) forwards every request from the inproc PULL socket (#1) through the TCP
DEALER socket (#2) to the backend server (including the request Id);
 2) every reply received from the backend server, has request Id inside;
this Id is used to find the "waiter", pass the reply to the waiter and
notify() it.
 3) plus some heartbeating with the backend server.

So, there are just 4 zeromq sockets used to process multiple concurrent
webservice requests.

P.S. I know that using a socket from multiple threads is contrary to the
ZeroMQ way, but the solution described above works fine for me and I don't
think that using one or few hundreds of inproc sockets (one inproc socket
for each worker thread in the servlet container) can be more efficient.



2013/4/11 Gonzalo Vasquez <gvasquez at altiuz.cl>

> Thanks Min,
>
> Ok, understood. But as this code is part of a webservice (i.e. each
> request works in separate threads, and concurrently), opening a single
> socket is not an option, so I'm guessing a socket pool would help. Any
> hints/patterns/iinks on achieving such on java?
>
> Regards,
>
>   Gonzalo Vásquez Sáez
> Gerente Investigación y Desarrollo (R&D)
> Altiuz Soluciones Tecnológicas de Negocios Ltda.
> Av. Nueva Tajamar 555 Of. 802, Las Condes - CP 7550099
> +56 2 335 2461
>   gvasquez at altiuz.cl
> http://www.altiuz.cl
> http://www.altiuzreports.com
>   <https://www.facebook.com/altiuz>  <http://twitter.com/altiuz> <http://www.linkedin.com/company/altiuz>
>
> El 11-04-2013, a las 2:44, Yu Dongmin <miniway at gmail.com> escribió:
>
> Hi,
>
> I was able to reproduce your case.
>
> Basically, creating and closing ZMQ socket per every message might not be
> a good practice.
>
> I would recommend to connect once at a program start (ex, the next line of
> ZMQ.context)
>
> Even though, jeromq seems to have a bug at handling frequent socket
> creation.
>
> jzmq also showed a 4~5 secs blocking at a stress but it resumed again but
> jeromq didn't resume after blocking.
>
> I'll looking into this issue further.
>
> Thanks
> Min
>
> On Apr 11, 2013, at 5:14 AM, Gonzalo Vasquez <gvasquez at altiuz.cl> wrote:
>
> I'm using JeroMQ. I've updated the code to a much tidier one:
>
> private byte[] getByte(final String table, final String name,
> final int doc_off, final int doc_len, final int comp_off,
> final int comp_len, final char compressionType) throws Exception {
> File file = new File(cacheRoot, table.substring(0, 3) + "/DOC/" + name);
> //$NON-NLS-1$
> // Context ctx = ZMQ.context(1);
> Socket req = null;
> byte[] data = null;
> try {
> req = ctx.socket(ZMQ.REQ);
> req.connect(ENDPOINT);
>
> // TODO Crear POJO en vez de Map
> Map<String, String> params = new HashMap<String, String>();
> params.put("path", file.getAbsolutePath());
> params.put("dOff", String.valueOf(doc_off));
> params.put("dLen", String.valueOf(doc_len));
> params.put("cOff", String.valueOf(comp_off));
> params.put("clen", String.valueOf(comp_len));
> params.put("cType", String.valueOf(compressionType));
>
> ByteArrayOutputStream baos = null;
> ObjectOutputStream oos = null;
> try {
> baos = new ByteArrayOutputStream();
> oos = new ObjectOutputStream(baos);
> oos.writeObject(params);
>
> } finally {
> params.clear();
> if (oos != null) {
> oos.close();
> }
> if (baos != null) {
> baos.close();
> }
> }
>
> LOG.info("Sending Request");
> req.send(baos.toByteArray(), NO_FLAGS);
> LOG.info("Request sent");
> data = req.recv();
> LOG.info("Response received");
> } finally {
> if (req != null) {
> req.disconnect(ENDPOINT);
> req.close();
> }
> }
> // ctx.term();
> return data;
> }
>
>
> But same problem arises :(
>   Gonzalo Vásquez Sáez
> Gerente Investigación y Desarrollo (R&D)
> Altiuz Soluciones Tecnológicas de Negocios Ltda.
> Av. Nueva Tajamar 555 Of. 802, Las Condes - CP 7550099
> +56 2 335 2461
>   gvasquez at altiuz.cl
> http://www.altiuz.cl
> http://www.altiuzreports.com
>   <https://www.facebook.com/altiuz>  <http://twitter.com/altiuz> <http://www.linkedin.com/company/altiuz>
>
> El 10-04-2013, a las 17:11, Eric Hill <eric at ijack.net> escribió:
>
> Sorry I missed the code in the first email.  Are you using zmq with the
> jni binding, or jzmq?
>
> Your code looks fine to me.  Can you strip that section of code out into a
> separate jar for testing outside of WAS?
>
>
> On Wed, Apr 10, 2013 at 2:37 PM, Gonzalo Vasquez <gvasquez at altiuz.cl>wrote:
>
>> Dear Eric,
>>
>> 1.- Process 2472 actually is IBM WAS, where my client code runs.
>> 2.- I attached the code on the first email, nevertheless here is the
>> client side:
>>
>> private byte[] getByte(final String table, final String name,
>>  final int doc_off, final int doc_len, final int comp_off,
>>  final int comp_len, final char compressionType) throws Exception {
>>  File file = new File(cacheRoot, table.substring(0, 3) + "/DOC/" +
>> name); //$NON-NLS-1$
>>  // Context ctx = ZMQ.context(1);
>>  Socket req = ctx.socket(ZMQ.REQ);
>>  req.connect(ENDPOINT);
>>
>>  // TODO Crear POJO en vez de Map
>>  Map<String, String> params = new HashMap<String, String>();
>>  params.put("path", file.getAbsolutePath());
>> params.put("dOff", String.valueOf(doc_off));
>>  params.put("dLen", String.valueOf(doc_len));
>>  params.put("cOff", String.valueOf(comp_off));
>> params.put("clen", String.valueOf(comp_len));
>>  params.put("cType", String.valueOf(compressionType));
>>
>> ByteArrayOutputStream baos = new ByteArrayOutputStream();
>>  ObjectOutputStream oos = new ObjectOutputStream(baos);
>> oos.writeObject(params);
>>  oos.close();
>> params.clear();
>>  baos.close();
>>
>>  LOG.info("Sending Request");
>>  req.send(baos.toByteArray(), NO_FLAGS);
>>  LOG.info("Request sent");
>>  byte[] data = req.recv();
>>  LOG.info("Response received");
>>  req.close();
>> // ctx.term();
>>  return data;
>> }
>>
>>
>> I'm now moving the close invocation into a finally block, just in case
>> something goes wrong in between.
>>
>> 3.- Yes, I'm creating a new socket from the context on each request, but
>> closing (using close() method) it upon completion, do I have to use the
>> disconnect() method too?
>>   Gonzalo Vásquez Sáez
>> Gerente Investigación y Desarrollo (R&D)
>> Altiuz Soluciones Tecnológicas de Negocios Ltda.
>> Av. Nueva Tajamar 555 Of. 802, Las Condes - CP 7550099
>> +56 2 335 2461
>>   gvasquez at altiuz.cl
>> http://www.altiuz.cl
>> http://www.altiuzreports.com
>>   <https://www.facebook.com/altiuz>  <http://twitter.com/altiuz> <http://www.linkedin.com/company/altiuz>
>>
>> El 10-04-2013, a las 16:14, Eric Hill <eric at ijack.net> escribió:
>>
>> Process 2472 at the time of this running looks like it has a large number
>> of open sockets.  Since I don't have your code, I can only guess that
>> you're connecting new sockets for every request?  I've got a fairly large
>> system going that has at most a few dozen sockets open at any given time.
>>  The system is most likely being slow because it's running out of IP ports.
>>  Realize that there's only 65000 local ports for making outgoing
>> connections...
>>
>> Eric
>>
>>
>>
>> On Wed, Apr 10, 2013 at 1:30 PM, Gonzalo Vasquez <gvasquez at altiuz.cl>wrote:
>>
>>> Please see attached file for "netstat -ano" output
>>>   Gonzalo Vásquez Sáez
>>> Gerente Investigación y Desarrollo (R&D)
>>> Altiuz Soluciones Tecnológicas de Negocios Ltda.
>>> Av. Nueva Tajamar 555 Of. 802, Las Condes - CP 7550099
>>> +56 2 335 2461
>>>   gvasquez at altiuz.cl
>>> http://www.altiuz.cl
>>> http://www.altiuzreports.com
>>>   <https://www.facebook.com/altiuz>  <http://twitter.com/altiuz> <http://www.linkedin.com/company/altiuz>
>>>
>>> El 10-04-2013, a las 15:03, Eric Hill <eric at ijack.net> escribió:
>>>
>>> "netstat -ano" would be an interesting metric to look at.
>>>
>>>
>>> On Wed, Apr 10, 2013 at 1:02 PM, Gonzalo Vasquez <gvasquez at altiuz.cl>wrote:
>>>
>>>> No Antivirus is installed on the server. I can think of a socket
>>>> exhausted related issue (kinda ulimit in unix/linux), as I even get
>>>> disconnected from the Remote Desktop in this scenario, but I'm able to
>>>> relogin immediately.
>>>>
>>>>   Gonzalo Vásquez Sáez
>>>> Gerente Investigación y Desarrollo (R&D)
>>>> Altiuz Soluciones Tecnológicas de Negocios Ltda.
>>>> Av. Nueva Tajamar 555 Of. 802, Las Condes - CP 7550099
>>>> +56 2 335 2461
>>>>   gvasquez at altiuz.cl
>>>> http://www.altiuz.cl
>>>> http://www.altiuzreports.com
>>>>   <https://www.facebook.com/altiuz>  <http://twitter.com/altiuz> <http://www.linkedin.com/company/altiuz>
>>>>
>>>> El 10-04-2013, a las 14:50, Eric Hill <eric at ijack.net> escribió:
>>>>
>>>> 100% unresponsive and extremely slow are mutually exclusive.  I've seen
>>>> problems with antivirus programs attempting to scan inbound and outbound
>>>> network connections for possible threats.  Are you running any form of
>>>> antivirus on the server?
>>>>
>>>>
>>>> On Wed, Apr 10, 2013 at 12:14 PM, Gonzalo Vasquez <gvasquez at altiuz.cl>wrote:
>>>>
>>>>> Nope, no cpu nor high memory usage detected :(
>>>>>   Gonzalo Vásquez Sáez
>>>>> Gerente Investigación y Desarrollo (R&D)
>>>>> Altiuz Soluciones Tecnológicas de Negocios Ltda.
>>>>> Av. Nueva Tajamar 555 Of. 802, Las Condes - CP 7550099
>>>>> +56 2 335 2461
>>>>>   gvasquez at altiuz.cl
>>>>> http://www.altiuz.cl
>>>>> http://www.altiuzreports.com
>>>>>   <https://www.facebook.com/altiuz>  <http://twitter.com/altiuz> <http://www.linkedin.com/company/altiuz>
>>>>>
>>>>> El 10-04-2013, a las 13:36, Wolfgang Richter <wolf at cs.cmu.edu>
>>>>> escribió:
>>>>>
>>>>> So you've noticed 100% CPU usage on this server or high memory usage
>>>>> when it's running (thus, the unresponsiveness?)?
>>>>>
>>>>> --
>>>>> Wolf
>>>>>
>>>>>
>>>>> On Wed, Apr 10, 2013 at 11:38 AM, Gonzalo Vasquez <gvasquez at altiuz.cl>wrote:
>>>>>
>>>>>> Wolf,
>>>>>>
>>>>>> Yes, almost 100% unresponsive, even closing windows is extremely
>>>>>> slow.
>>>>>>
>>>>>> The server component is terminated by a single CTRL-C, i.e. it's
>>>>>> interrupted....as the main is invoked in a black cmd window.
>>>>>>
>>>>>> I've also realized that I also had to the terminate the client side
>>>>>> to recover 100% responsiveness, this part of the code is running as a
>>>>>> webapp in IBM WAS Server
>>>>>>
>>>>>> It's virutalized in an ESXi server.-
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>>   Gonzalo Vásquez Sáez
>>>>>> Gerente Investigación y Desarrollo (R&D)
>>>>>> Altiuz Soluciones Tecnológicas de Negocios Ltda.
>>>>>> Av. Nueva Tajamar 555 Of. 802, Las Condes - CP 7550099
>>>>>> +56 2 335 2461
>>>>>>   gvasquez at altiuz.cl
>>>>>> http://www.altiuz.cl
>>>>>> http://www.altiuzreports.com
>>>>>>   <https://www.facebook.com/altiuz>  <http://twitter.com/altiuz> <http://www.linkedin.com/company/altiuz>
>>>>>>
>>>>>> El 10-04-2013, a las 12:34, Wolfgang Richter <wolf at cs.cmu.edu>
>>>>>> escribió:
>>>>>>
>>>>>> What do you mean by:
>>>>>>
>>>>>> the server get's really "stuck" until I terminate the server
>>>>>>> component.
>>>>>>>
>>>>>>
>>>>>> Do you mean your Windows Server becomes almost unresponsive?
>>>>>>
>>>>>> Other processes can't work properly?
>>>>>>
>>>>>> How do you terminate the server component?
>>>>>>
>>>>>> Also, is this in a virtualized/cloud environment, or bare metal
>>>>>> Windows Server?
>>>>>>
>>>>>> --
>>>>>> Wolf
>>>>>> _______________________________________________
>>>>>> zeromq-dev mailing list
>>>>>> zeromq-dev at lists.zeromq.org
>>>>>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> zeromq-dev mailing list
>>>>>> zeromq-dev at lists.zeromq.org
>>>>>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>>>>>
>>>>>>
>>>>> _______________________________________________
>>>>> zeromq-dev mailing list
>>>>> zeromq-dev at lists.zeromq.org
>>>>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> zeromq-dev mailing list
>>>>> zeromq-dev at lists.zeromq.org
>>>>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>>>>
>>>>>
>>>> _______________________________________________
>>>> zeromq-dev mailing list
>>>> zeromq-dev at lists.zeromq.org
>>>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> zeromq-dev mailing list
>>>> zeromq-dev at lists.zeromq.org
>>>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>>>
>>>>
>>> _______________________________________________
>>> zeromq-dev mailing list
>>> zeromq-dev at lists.zeromq.org
>>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>>
>>>
>>>
>>> _______________________________________________
>>> zeromq-dev mailing list
>>> zeromq-dev at lists.zeromq.org
>>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>>
>>>
>> _______________________________________________
>> zeromq-dev mailing list
>> zeromq-dev at lists.zeromq.org
>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>
>>
>>
>> _______________________________________________
>> zeromq-dev mailing list
>> zeromq-dev at lists.zeromq.org
>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>
>>
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
>
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
>
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
>
>
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20130411/bdcd1c02/attachment.htm>


More information about the zeromq-dev mailing list