[zeromq-dev] max_app_threads = 512
matt_weinstein at yahoo.com
Wed Jun 16 15:31:47 CEST 2010
Actually, I want to do this without barrier ops, ideally I'd have
single cycle 128-bit wide load/store from/to Lx processor cache.
Presumably QPI transports a line at a time between CPUs, so I just
have to be properly aligned.
If I've only got a 64-bit wide internal bus, I'll do some bit munching
to get what I need out of the address bits :-)
I'll look around at the intel books after my current crunch is over (a
On Jun 16, 2010, at 9:15 AM, Martin Sustrik wrote:
>> I've got to look at the cache models to see what's being enforced
>> between processors, the notion is to transmit state between
>> using cache aligned writes that combine vector clocks and pointers in
>> the same line, essentially creating a micro-packet , and letting the
>> other end of the pipe handle version and consistency detection. It's
>> an old approach applied to yet another network (SMP cache). It
>> depends on having an atomic cache write that's long enough to hold
>> <vector clock, pointer> pair.
> Atomic ops on x86/64 manipulate 32/64 bit entities (=sizeof pointer).
> Thus I don't believe it's possible to have atomic <vclock,ptr> entity.
> Correct me if I'm wrong.
>> Not sure yet how to model the protocol
>> (my toolbox is a bit rusty), and I have to dig into the hardware
>> manuals to see what the guarantees are.
>> I'm assuming that QPI and HyperTransport are where I should be
>> starting, those seem to be the x86 inter-processor links these days?
> I have very rough understanding of the CPU microarchitecture myself.
> Others may help here...
>> Yes, ironically the area is not well synchronized, but will become
>> consistent ... eventually :-)
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
More information about the zeromq-dev