[zeromq-dev] max_app_threads = 512

Matt Weinstein matt_weinstein at yahoo.com
Wed Jun 16 15:31:47 CEST 2010

Actually, I want to do this without barrier ops, ideally I'd have  
single cycle 128-bit wide load/store from/to Lx processor cache.   
Presumably QPI transports a line at a time between CPUs, so I just  
have to be properly aligned.

If I've only got a 64-bit wide internal bus, I'll do some bit munching  
to get what I need out of the address bits :-)

I'll look around at the intel books after my current crunch is over (a  
few days).

On Jun 16, 2010, at 9:15 AM, Martin Sustrik wrote:

> Matt,
>> I've got to look at the cache models to see what's being enforced
>> between processors, the notion is to transmit state between  
>> processors
>> using cache aligned writes that combine vector clocks and pointers in
>> the same line, essentially creating a micro-packet , and letting the
>> other end of the pipe handle version and consistency detection.  It's
>> an old approach applied to yet another network (SMP cache).  It
>> depends on having an atomic cache write that's long enough to hold  
>> the
>> <vector clock, pointer> pair.
> Atomic ops on x86/64 manipulate 32/64 bit entities (=sizeof pointer).
> Thus I don't believe it's possible to have atomic <vclock,ptr> entity.
> Correct me if I'm wrong.
>> Not sure yet how to model the protocol
>> (my toolbox is a bit rusty), and I have to dig into the hardware
>> manuals to see what the guarantees are.
>> I'm assuming that QPI and HyperTransport are where I should be
>> starting, those seem to be the x86 inter-processor links these days?
> I have very rough understanding of the CPU microarchitecture myself.
> Others may help here...
>> Yes, ironically the area is not well synchronized, but will become
>> consistent ... eventually :-)
> :)
> Martin
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev

More information about the zeromq-dev mailing list