[zeromq-dev] TCP Based Message Bus

Doron Somech somdoron at gmail.com
Sun May 12 11:36:24 CEST 2013


I also thought about repairers (I called them message store), but I don't
want it to be over complicated, I think the discovery services can also
take the job of the repairers.

Anyway, thanks for your thoughts.


On Sun, May 12, 2013 at 5:50 AM, Ian Barber <ian.barber at gmail.com> wrote:

> Sounds pretty sensible. You might want to consider having separate
> repairers from the publishers, particular if you have a bursty source of
> messages. Then if a subscriber can't keep up they can go to the repairer
> without effecting the publisher.
>
> Being smart about the batching as well can make the system perform a bit
> more smoothly in failure modes, so if a subscriber is failing to keep up
> and dropping occasional messages, it may be best to disconnect until its
> backlog is processed, pull a large batch in the recovery mode, then
> reconnect to the stream.
>
> Ian
>
>
> On Wed, May 8, 2013 at 12:18 AM, Doron Somech <somdoron at gmail.com> wrote:
>
>> Hi All,
>>
>> Usually we are using zeromq with pgm as our message bus. We are using
>> message bus to publish events between server side services.
>>
>> The issue is that we need to support environment where multicast is not
>> supported (like amazon cloud).
>>
>> I'm working on a design to make tcp based message bus and want to get
>> your thoughts on that.
>>
>> There are three major requirements, we want services to be able to come
>> and go without need to reconfigure the system, we want a brokeless design
>> and we want to be able to recover lost messages between a publisher and a
>> subscriber (caused by connection problem) like pgm does.
>>
>> We have three types of components, a discovery service, publisher and
>> subscriber.
>>
>> Discovery Service is a standalone service, the discovery service has the
>> list of all the subscribers in the network, the subscriber ping the
>> discovery service every X seconds, when specific subscriber didn't ping the
>> service for more than Y seconds it consider dead. On every new subscriber
>> the publisher publish a message to all the publishers. For high
>> availability there are more than one discovery services (probably 3).
>>
>> When publisher is starting it's asking the discovery service for all of
>> the subscribers and subscribe for new subscribers (it asked all configured
>> discovery services and takes the first answer, it subscribed for all of the
>> discovery services). After getting the list the publisher is connecting to
>> all of the subscribers. The publisher also connects to every new
>> subscriber. The publisher is ignoring dead subscribers (mostly because I
>> don't know how to handle it because the dead message can come from one of
>> the discovery service but can still be alive on others).
>>
>> All the messages the publisher is sending are numbered, also the
>> publisher is saving the X last messages it sends to support recovery of
>> lost messages. Each publisher has a unique random id.
>>
>> If publisher doesn't send a message in X seconds the publisher will send
>> a keep alive message to all subscribers.
>>
>> As mentioned the subscriber ping the discovery services every X seconds,
>> when the subscriber get a message from a publisher for the first time it's
>> saving the message number. From there if the subscriber detects a gap in
>> the messages it directly connects to the publisher (using request-response)
>> and asking for the missing messages. The only problem is that in lost
>> messages situation the subscriber will stop handle new messages from all
>> publishers until the missing messages are restored.
>>
>> If the publisher doesn't have those messages anymore the subscriber
>> should raise an exception or restart the entire service.
>>
>> The only thing the subscriber and publisher need to know is the addresses
>> of the discovery services.
>>
>> The reason I want the publisher to connect to the subscriber is to make
>> sure when the connection is dropped the publisher will be able to recognize
>> it and reconnect (the subscriber may not be able to recognize it because it
>> doesn't send any data to the publishers).
>> Thanks, I will very much appreciate your comments.
>>
>> Doron
>>
>> _______________________________________________
>> zeromq-dev mailing list
>> zeromq-dev at lists.zeromq.org
>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>
>>
>
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20130512/35fe6933/attachment.htm>


More information about the zeromq-dev mailing list