[zeromq-dev] Disk backed distributed logger
Ehud Eshet
Ehud.Eshet at imperva.com
Mon Jul 13 09:59:08 CEST 2015
Hi all,
I need to deliver audit events from dozens of producers to a cluster of several consuming nodes.
Producers may generate up to 100,000 1KB events/s per producer node.
Producer node must be able to save events locally in case of network failure.
Each event must be consumed by one consumer with no affinity or special order.
I built a Local Store and Forward (LSAF) process as follows:
* LSAF writer thread which receive messages from PULL socket and write them to disk.
* LSAF reader thread which reads message from disk and send them to PUSH socket.
* The disk backed queue is implemented as a cyclic file using JAVA NIO.
* When actual queue size fits into memory, reader thread uses cached buffers instead of doing real I/O.
* Each of the sockets can bind and/or connect depending on network security constrains.
One LSAF process is running on every producer node and on every consumer node.
I.e. LSAF's PUSH socket on the producer node will be connected to LSAF's PULL socket of every consumer node.
The end-to-end flow will be:
1. Multiple producers on producer node use PUSH socket to send messages to the LSAF process on the producer node.
2. LSAF writer on producer node (single threaded) serialize all incoming messages from PULL socket to disk.
3. LSAF reader on producer node de-serialize read messages from disk and send them to PUSH socket (using round robin to one of the consumer nodes).
4. LSAF writer on consumer node (single threaded) serialize all incoming messages from PULL socket to disk.
5. LSAF reader on consumer node de-serialize read messages from disk and send them to PUSH socket (using round robin to one of the consumer nodes).
6. Multiple consumer threads receive messages from PULL socket and process them.
[cid:image003.jpg at 01D0BD5A.EB644DF0]
All nodes are known and managed by a management console.
Each LSAF process needs the management console only during startup and in the rare occasion of introducing a new node to connect to.
If management console is unavailable, all nodes will work with last known configuration.
The entire project was developed in JAVA (JeroMQ 0.3.4).
I have few questions and issues:
1. Can you point major pitfalls in this design?
2. Am I inventing the wheel again? (can you point on implementation of similar / better design)
3. When brutally (kill -9) terminating a producer, consumer gets truncated messages without any exception.
4. Graceful shutdown of JeroMQ context is quite delicate.
If one of the threads is blocked by wait() method, it will not be able to close its socket.
If the main thread use interrupt() method, the interrupted thread will fail closing the socket since NIO channels already closed due to interrupt.
In both cases, closing the context hangs forever.
where can we get a detailed description of how JeroMQ context should be managed and terminated in a multi-threaded application?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20150713/aa7eeca3/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.jpg
Type: image/jpeg
Size: 259791 bytes
Desc: image001.jpg
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20150713/aa7eeca3/attachment.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image003.jpg
Type: image/jpeg
Size: 111382 bytes
Desc: image003.jpg
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20150713/aa7eeca3/attachment-0001.jpg>
More information about the zeromq-dev
mailing list