[zeromq-dev] frequent ZeroMQ crashes - how to diagnose?

Martin Sustrik sustrik at 250bpm.com
Sat Jun 19 09:04:23 CEST 2010


Nick,

> ZeroMQ crashed today.
> 
> This is a Win32 build of both ZMQ and myApp.
> myApp was running fine with several thousand messages, when the memcpy code line below threw the following exception. 
> 
> "Unhandled exception at 0x6404edd6 (msvcr90d.dll) in myApp.exe: 0xC0000005: Access violation reading location 0xfeeefeee."
> 
> debugging shows the following values:
> -		buffer	0x00d9b570 "%"	unsigned char *
> 		pos	2	unsigned int
> +		write_pos	0xfeeefeee <Bad Ptr>	unsigned char *
> 		to_copy	8190	unsigned int
> 
> looks like a bad pointer.
> 
> encoder.hpp
> 
>                 //  If there are no data in the buffer yet and we are able to
>                 //  fill whole buffer in a single go, let's use zero-copy.
>                 //  There's no disadvantage to it as we cannot stuck multiple
>                 //  messages into the buffer anyway. Note that subsequent
>                 //  write(s) are non-blocking, thus each single write writes
>                 //  at most SO_SNDBUF bytes at once not depending on how large
>                 //  is the chunk returned from here.
>                 //  As a consequence, large messages being sent won't block
>                 //  other engines running in the same I/O thread for excessive
>                 //  amounts of time.
>                 if (!pos && !*data_ && to_write >= buffersize) {
>                     *data_ = write_pos;
>                     *size_ = to_write;
>                     write_pos = NULL;
>                     to_write = 0;
>                     return;
>                 }
> 
>                 //  Copy data to the buffer. If the buffer is full, return.
>                 size_t to_copy = std::min (to_write, buffersize - pos);
> =======>        memcpy (buffer + pos, write_pos, to_copy); 
>                 pos += to_copy;
>                 write_pos += to_copy;
>                 to_write -= to_copy;
>                 if (pos == buffersize) {
>                     *data_ = buffer;
>                     *size_ = pos;
>                     return;
>                 }

It looks like a memory overwrite either in 0MQ or the application. Do 
you have a test program to reproduce the problem?

> Let me know what the error was so that I can fix it in the trunk.

Have you managed to find out what the error code is?

Martin



More information about the zeromq-dev mailing list