[zeromq-dev] Disk backed distributed logger

Pieter Hintjens ph at imatix.com
Mon Aug 17 12:14:02 CEST 2015


Hi Ehud,

If you are getting partial message delivery, it's a bug, yes. This is
likely a bug already fixed in libzmq and not yet fixed in JeroMQ. Can you
reproduce it reliably? If so, make an issue on the JeroMQ git repo with
your test code.

-Pieter

On Wed, Jul 22, 2015 at 12:34 PM, Ehud Eshet <Ehud.Eshet at imperva.com> wrote:

> Hi all,
>
>
>
> I hope my original mail did not break any mailing list policy.
>
>
>
> As far as I understood, ZeroMQ guarantees that multi part messages will be
> fully (all parts) delivered or not delivered at all.
>
> My JeroMQ application use PUSH – PULL sockects to send messages across
> JAVA processes.
>
> When brutally terminating the producer JAVA process (kill -9), the
> consumer JAVA process pulls a truncated message.
>
> It happens even with single part messages.
>
>
>
> Is this a bug in JeroMQ 0.3.4?
>
> Not sure if message transfer completeness is the responsibility of JeroMQ
> or something the application should verify to achieve reliability.
>
>
>
> It would be great, if someone will review my Disk backed distributer
> logger design.
>
> I prefer go get insulting feedback than no feedback at all.
>
>
>
> Thanks,
>
> Ehud.
>
>
>
> *From:* zeromq-dev-bounces at lists.zeromq.org [mailto:
> zeromq-dev-bounces at lists.zeromq.org] *On Behalf Of *Ehud Eshet
> *Sent:* Monday, July 13, 2015 10:59 AM
> *To:* zeromq-dev at lists.zeromq.org
> *Subject:* [zeromq-dev] Disk backed distributed logger
>
>
>
> Hi all,
>
>
>
> I need to deliver audit events from dozens of producers to a cluster of
> several consuming nodes.
>
>
>
> Producers may generate up to 100,000 1KB events/s per producer node.
>
> Producer node must be able to save events locally in case of network
> failure.
>
> Each event must be consumed by one consumer with no affinity or special
> order.
>
>
>
> I built a Local Store and Forward (LSAF) process as follows:
>
> ·        LSAF writer thread which receive messages from PULL socket and
> write them to disk.
>
> ·        LSAF reader thread which reads message from disk and send them
> to PUSH socket.
>
> ·        The disk backed queue is implemented as a cyclic file using JAVA
> NIO.
>
> ·        When actual queue size fits into memory, reader thread uses
> cached buffers instead of doing real I/O.
>
> ·        Each of the sockets can bind and/or connect depending on network
> security constrains.
>
>
>
> One LSAF process is running on every producer node and on every consumer
> node.
>
> I.e. LSAF’s PUSH socket on the producer node will be connected to LSAF’s
> PULL socket of every consumer node.
>
>
>
> The end-to-end flow will be:
>
> 1.      Multiple producers on producer node use PUSH socket to send
> messages to the LSAF process on the producer node.
>
> 2.      LSAF writer on producer node (single threaded) serialize all
> incoming messages from PULL socket to disk.
>
> 3.      LSAF reader on producer node de-serialize read messages from disk
> and send them to PUSH socket (using round robin to one of the consumer
> nodes).
>
> 4.      LSAF writer on consumer node (single threaded) serialize all
> incoming messages from PULL socket to disk.
>
> 5.      LSAF reader on consumer node de-serialize read messages from disk
> and send them to PUSH socket (using round robin to one of the consumer
> nodes).
>
> 6.      Multiple consumer threads receive messages from PULL socket and
> process them.
>
>
>
>
>
>
> All nodes are known and managed by a management console.
>
> Each LSAF process needs the management console only during startup and in
> the rare occasion of introducing a new node to connect to.
>
> If management console is unavailable, all nodes will work with last known
> configuration.
>
>
>
> The entire project was developed in JAVA (JeroMQ 0.3.4).
>
>
>
> I have few questions and issues:
>
> 1.      Can you point major pitfalls in this design?
>
> 2.      Am I inventing the wheel again? (can you point on implementation
> of similar / better design)
>
> 3.      When brutally (kill -9) terminating a producer, consumer gets
> truncated messages without any exception.
>
> 4.      Graceful shutdown of JeroMQ context is quite delicate.
> If one of the threads is blocked by wait() method, it will not be able to
> close its socket.
> If the main thread use interrupt() method, the interrupted thread will
> fail closing the socket since NIO channels already closed due to interrupt.
> In both cases, closing the context hangs forever.
> where can we get a detailed description of how JeroMQ context should be
> managed and terminated in a multi-threaded application?
>
>
>
>
>
>
>
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20150817/f7ad00a1/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image002.jpg
Type: image/jpeg
Size: 111389 bytes
Desc: not available
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20150817/f7ad00a1/attachment.jpg>


More information about the zeromq-dev mailing list