[zeromq-dev] frequent ZeroMQ crashes - how to diagnose?
Nick Kravitz
nick at dymcapital.com
Thu Jul 8 21:23:15 CEST 2010
ZeroMQ Assert Fail details
To diagnose, I captured the value of WSAGetLastError () before the assert failed. The value is 10014. After looking it up, this appears to be WSAEFAULT (Bad address)
Up to the point the assert failed, my application sent out approximately 6,000 messages on this socket (approximately 15 minutes) successfully. The socket transport type is tcp, and address/port is 192.168.1.7:1234 (local network)
from http://msdn.microsoft.com/en-us/library/ms740668%28VS.85%29.aspx
WSAEFAULT 10014
Bad address. The system detected an invalid pointer address in attempting to use a pointer argument of a call. This error occurs if an application passes an invalid pointer value, or if the length of the buffer is too small. For instance, if the length of an argument, which is a sockaddr structure, is smaller than the sizeof(sockaddr).
Is it possible to fix this by simply adding WSAEFAULT to the error trapping list below?
int zmq::tcp_socket_t::write (const void *data, int size)
{
int nbytes = send (s, (char*) data, size, 0);
// If not a single byte can be written to the socket in non-blocking mode
// we'll get an error (this may happen during the speculative write).
if (nbytes == SOCKET_ERROR && WSAGetLastError () == WSAEWOULDBLOCK)
return 0;
// Signalise peer failure.
if (nbytes == -1 && (
WSAGetLastError () == WSAENETDOWN ||
WSAGetLastError () == WSAENETRESET ||
WSAGetLastError () == WSAEHOSTUNREACH ||
WSAGetLastError () == WSAECONNABORTED ||
WSAGetLastError () == WSAETIMEDOUT ||
WSAGetLastError () == WSAECONNRESET))
return -1;
int testWSAGetLastError = WSAGetLastError ();
wsa_assert (nbytes != SOCKET_ERROR);
return (size_t) nbytes;
}
-----Original Message-----
From: pieterh at gmail.com [mailto:pieterh at gmail.com] On Behalf Of Pieter Hintjens
Sent: Sunday, June 20, 2010 8:04 AM
To: Nick Kravitz
Subject: Re: [zeromq-dev] frequent ZeroMQ crashes - how to diagnose?
Hi Nick,
I'm glad you're finding 0MQ useful. I'd suggest just logging the
value of nbytes in that assertion. I.e. modify the code to print the
value of nbytes if it is not SOCKET_ERROR. Most likely it is some
error that 0MQ is not yet handling properly.
Message: 1
Date: Fri, 18 Jun 2010 18:54:55 +0200
From: Martin Sustrik <sustrik at 250bpm.com>
Subject: Re: [zeromq-dev] frequent ZeroMQ crashes - how to diagnose?
To: 0MQ development list <zeromq-dev at lists.zeromq.org>
Message-ID: <4C1BA4DF.5010409 at 250bpm.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Hi Nick,
> We are a small financial startup using messaging for communication
> between our various applications.
>
> We chose ZeroMQ because of speed and flexibility - both the set of
> languages and number of distributed systems we have is growing and yet
> to be determined.
Understood.
> What should our diagnosis strategy be to chase this difficult bug down?
The only problem here seems to be that Windows returns some error we haven't expected. The only thing that needs to be done is find out what the error is and add it to the list (the long if statement in the code you've sent).
wsa_assert should print the error to stderr -- can you check it in the console?
Let me know what the error was so that I can fix it in the trunk.
Thanks!
Martin
Best regards
--
Pieter Hintjens
CEO, iMatix
More information about the zeromq-dev
mailing list