[zeromq-dev] Corruption problem

Brad Taylor cbradtaylor at gmail.com
Mon Dec 31 23:04:02 CET 2012


I am having a problem with what looks like a double free in zmq.  I assume it is something I am doing wrong, however I have looked fairly carefully and can not find the problem.  I will document this from the results back to the source code.

During execution I get:

==============> error during executing <==============================================

*** glibc detected *** /home/brad/mavenSDK/lz_serviceRegistry/Debug/lz_serviceRegistry:double free or corruption (!prev): 0x00000000006096e0 ***
======= Backtrace: =========
/lib64/libc.so.6[0x36b287c80e]
/home/brad/Downloads/zeromq-3.2.1/src/.libs/libzmq.so.3(+0x180b3)[0x7ffff7bcd0b3]
/home/brad/Downloads/zeromq-3.2.1/src/.libs/libzmq.so.3(+0x2f9c3)[0x7ffff7be49c3]
/home/brad/Downloads/zeromq-3.2.1/src/.libs/libzmq.so.3(+0x2fe7b)[0x7ffff7be4e7b]
/home/brad/Downloads/zeromq-3.2.1/src/.libs/libzmq.so.3(+0x2637e)[0x7ffff7bdb37e]
/home/brad/Downloads/zeromq-3.2.1/src/.libs/libzmq.so.3(+0x21512)[0x7ffff7bd6512]
/home/brad/Downloads/zeromq-3.2.1/src/.libs/libzmq.so.3(+0x15194)[0x7ffff7bca194]
/home/brad/Downloads/zeromq-3.2.1/src/.libs/libzmq.so.3(+0x1429e)[0x7ffff7bc929e]
/home/brad/Downloads/zeromq-3.2.1/src/.libs/libzmq.so.3(+0x2aba6)[0x7ffff7bdfba6]

================>Environment<======================

The service registry is a "daemon" that listens for beacons from service providers, Records those services.  It also listens for requests that "find" to find the endpoint for those services.  

It is during this request processing, while constructing the and sending the reply that the problem occurs.  The following two routines are where the problem occurs.  The second routine is the driving routine, it receives control when a "find" message is received in an upper polling loop.

 The sendResponse routine is invoked, because the requested service was found.  The only thing odd I can think of that I am doing, is the subroutine is the one that does the zmq_msg_init_size(), and upon return the zmq_msg_send and zmq_msg_close are done.

I have numbered the sequence the statements are actually executed in.  Any thoughts on how to debug this problem would be appreciated

============>Source code<==============


typedef struct MSGREPLY {
	zmq_msg_t		*pPriorMessage;
	zmq_msg_t		sPriorMessage;
}MSGREPLY;

/**
 * This routine will send a response to the find message.  This is a "delayed" response because
 * we do not know if there will be any "more" responses to be sent.
 * We utilize the multi-part message facility of zmq to do this.  Each part of the message
 * is a service that matches the requestors specification, that is for a find request
 * there may be multiple service providers for the requested service.  For list request multiple services may exist.
 */
static int sendResponse(void * context, void *pReplySocket,
						LZSERVICE *pServiceEntry,
						MSGREPLY *pMsgReply,
						LZSERVICEMSGS msgID) {

	int					msgSize;
	LZSERVICEREPLY		*pReply;

====>8	if (pMsgReply->pPriorMessage) {
			zmq_msg_send(pMsgReply->pPriorMessage,pReplySocket,ZMQ_SNDMORE);
			zmq_msg_close(pMsgReply->pPriorMessage);
		}
====>9	pMsgReply->pPriorMessage = &pMsgReply->sPriorMessage;
====>10	msgSize = sizeof(LZSERVICEREPLYHDR)+pServiceEntry->serviceLen;

====>11    zmq_msg_init_size (pMsgReply->pPriorMessage, msgSize);
====>12    pReply = zmq_msg_data(pMsgReply->pPriorMessage);
====>13    LZSERVICE_SET_REPLY(pReply);
====>14    pReply->replyHdr.rc = 0;
====>15    pReply->replyHdr.msghdr.msgID = msgID;
====>16    memcpy(&pReply->findReply.service,pServiceEntry,pServiceEntry->serviceLen);
====>17    pReply->findReply.service.pServiceName = NULL;
====>18    pReply->findReply.service.pServiceURI = NULL;

====>19	return 0;
}

/**
 * This routine will find the requested service
 * This is a "request" message sent on a request socket.  That is it requires a response
 * Since there may be multiple instances of a given service running on multiple blades or multiple processes on a blade
 * duplicate service name entries may exist.  We return all of the instances that match the requested name
 * When there is more than one, we send subsequent service responses in separate message frames
 */
static int processFindService(void * context, LZSERVICEMESSAGE *pFindMessage, void *pReplySocket) {

	int					msgSize;
	LZSERVICEREPLY		*pFindReply;
	LZSERVICE			*pServiceEntry;
	char				*pServiceName;
	GList 				*iterator = NULL;
	MSGREPLY			sReply;
	zmq_msg_t			sReplyMessage;
	char				routineName[] = "lzServiceRegistry:processFindService";

===>1	pServiceName = pFindMessage->findService.aServiceName;
===>2	sReply.pPriorMessage = NULL;

	/*--------------------------------------------------------------------------+
	 * Look for matching service names in the global service table             	|
	 * for each found entry, construct a response and send it					|
	 *-------------------------------------------------------------------------*/
====>3	for (iterator = pGlobalServices; iterator; iterator = iterator->next) {
====>4		pServiceEntry = iterator->data;
====>5		if (pServiceEntry->serviceNameLen ==  pFindMessage->findService.serviceNameLen) {
====>6			if (!memcmp(pServiceEntry->pServiceName,pServiceName,pServiceEntry->serviceNameLen)) {
====>7				sendResponse(context,pReplySocket,pServiceEntry,&sReply, LZSERVICEMSGS_FINDSERVICE);
				if (options.debugMode) {
					fprintf(stderr,"%s: service %s was found\n",routineName,pServiceName);
				}
			}
		}
	}
====>20	if (sReply.pPriorMessage) {
====>21		zmq_msg_send(sReply.pPriorMessage,pReplySocket,0);     =============This is where I get the exception ===================
		zmq_msg_close(sReply.pPriorMessage);
	}
	else {
		msgSize = sizeof(LZSERVICEREPLY);
	    zmq_msg_init_size (&sReplyMessage, msgSize);
	    pFindReply = zmq_msg_data(&sReplyMessage);
	    LZSERVICE_SET_REPLY(pFindReply);
	    pFindReply->replyHdr.msghdr.msgID = LZSERVICEMSGS_FINDSERVICE;
	    pFindReply->replyHdr.rc = LZSERVICERC_SERVICENOTFOUND;
		zmq_msg_send(&sReplyMessage,pReplySocket,0);
		zmq_msg_close(&sReplyMessage);
		if (options.debugMode) {
			fprintf(stderr,"%s: service name, %s, not found\n",routineName,pServiceName);
		}
	}


	return 0;
}
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20121231/6a0d08ad/attachment.htm>


More information about the zeromq-dev mailing list