[zeromq-dev] Very Small Messages/Manual

Ben Kloosterman bklooste at gmail.com
Sun Jul 25 13:13:08 CEST 2010


Interesting  , I have been playing with this using inproc. I got gains up to
128 bytes at which point there was greater degradation on the reference
based messages and noticeable memory pressure.

Test is on windows Vista , using VC++ 10  ,-Ox ,  SSE2 turned on , 32 bit . 
Using a old 2GHz Core 2  ,and using the .NET wrapper to write and read it.
Messages smaller than 8 were too variable. 


Original ( size 30 message , sizeof 42  ( or 46 on 64 bit) ) 

size 8 average throughput is 1825111 [msg/s] 1460088 [Mb/s]
size 16 average throughput is 1808771 [msg/s] 1447016 [Mb/s]
size 32 average throughput is 1434060 [msg/s] 1147248 [Mb/s]
size 64 average throughput is 1363001 [msg/s] 1090400 [Mb/s]
size 128 average throughput is 1247290 [msg/s] 997832 [Mb/s]
size 256 average throughput is 1162912 [msg/s] 930329 [Mb/s]
size 512 average throughput is 967984 [msg/s] 774387 [Mb/s]
size 1024 average throughput is 600926 [msg/s] 480740 [Mb/s]
size 2048 average throughput is 351106 [msg/s] 280884 [Mb/s]
size 4096 average throughput is 174943 [msg/s] 139954 [Mb/s]
size 8192 average throughput is 105295 [msg/s] 84236 [Mb/s]
size 16384 average throughput is 51080 [msg/s] 40864 [Mb/s]
size 32768 average throughput is 25969 [msg/s] 20775 [Mb/s]


Following test all Remove pointer for vsm using union and using a flag to
indicate a vsm 

pack(1) . size 30 message , sizeof(32) 
size 8 average throughput is 1888471 [msg/s] 1510776 [Mb/s]
size 16 average throughput is 1852586 [msg/s] 1482068 [Mb/s]
size 32 average throughput is 1447779 [msg/s] 1158223 [Mb/s]
size 64 average throughput is 1373549 [msg/s] 1098839 [Mb/s]
size 128 average throughput is 1356071 [msg/s] 1084856 [Mb/s]
size 256 average throughput is 1177227 [msg/s] 941781 [Mb/s]
size 512 average throughput is 943028 [msg/s] 754422 [Mb/s]
size 1024 average throughput is 598832 [msg/s] 479065 [Mb/s]
size 2048 average throughput is 343946 [msg/s] 275156 [Mb/s]
size 4096 average throughput is 168994 [msg/s] 135195 [Mb/s]
size 8192 average throughput is 100576 [msg/s] 80460 [Mb/s]
size 16384 average throughput is 51109 [msg/s] 40887 [Mb/s]
size 32768 average throughput is 26530 [msg/s] 21224 [Mb/s]

Impact of making sizeof 32 bytes with no pay load reduction is noticeable
about 1 -4 % improvement for smaller messages, 


Default pack , size 24 message ( sizeof(32) 
using defaults inproc://1 30 100000
size 8 average throughput is 1967415 [msg/s] 1573932 [Mb/s]
size 16 average throughput is 1849839 [msg/s] 1479871 [Mb/s]
size 32 average throughput is 1470676 [msg/s] 1176540 [Mb/s]
size 64 average throughput is 1378668 [msg/s] 1102934 [Mb/s]
size 128 average throughput is 1299887 [msg/s] 1039909 [Mb/s]
size 256 average throughput is 1071963 [msg/s] 857570 [Mb/s]
size 512 average throughput is 855707 [msg/s] 684565 [Mb/s]
size 1024 average throughput is 557740 [msg/s] 446192 [Mb/s]
size 2048 average throughput is 331296 [msg/s] 265036 [Mb/s]
size 4096 average throughput is 172082 [msg/s] 137665 [Mb/s]
size 8192 average throughput is 103603 [msg/s] 82882 [Mb/s]
size 16384 average throughput is 50132 [msg/s] 40105 [Mb/s]
size 32768 average throughput is 26589 [msg/s] 21271 [Mb/s]

Faster still  but payload is 24 , so messages of size 24-30 will suffer a
20-30% penalty 

Default pack size 56 message ( sizeof(64) 

using defaults inproc://1 30 100000
size 8 average throughput is 1905016 [msg/s] 1524012 [Mb/s]
size 16 average throughput is 1772675 [msg/s] 1418140 [Mb/s]
size 32 average throughput is 1851930 [msg/s] 1481544 [Mb/s]  ++
size 64 average throughput is 1476649 [msg/s] 1181319 [Mb/s]
size 128 average throughput is 1391343 [msg/s] 1113074 [Mb/s]
size 256 average throughput is 1103520 [msg/s] 882816 [Mb/s]
size 512 average throughput is 892191 [msg/s] 713752 [Mb/s]
size 1024 average throughput is 604081 [msg/s] 483264 [Mb/s]
size 2048 average throughput is 324861 [msg/s] 259888 [Mb/s]
size 4096 average throughput is 173994 [msg/s] 139195 [Mb/s]
size 8192 average throughput is 104288 [msg/s] 83430 [Mb/s]
size 16384 average throughput is 51346 [msg/s] 41076 [Mb/s]
size 32768 average throughput is 26013 [msg/s] 20810 [Mb/s]

Noticeable fall of . a good compromise..As most messages are small.

Default pack ,size 120 ( size of 128) 
using defaults inproc://1 30 100000
size 8 average throughput is 1775882 [msg/s] 1420705 [Mb/s]
size 16 average throughput is 1615587 [msg/s] 1292469 [Mb/s]
size 32 average throughput is 1896023 [msg/s] 1516818 [Mb/s]
size 64 average throughput is 1741777 [msg/s] 1393421 [Mb/s] ++
size 128 average throughput is 1296761 [msg/s] 1037408 [Mb/s]
size 256 average throughput is 997648 [msg/s] 798118 [Mb/s]
size 512 average throughput is 883168 [msg/s] 706534 [Mb/s]
size 1024 average throughput is 571834 [msg/s] 457467 [Mb/s]
size 2048 average throughput is 337935 [msg/s] 270348 [Mb/s]
size 4096 average throughput is 167076 [msg/s] 133660 [Mb/s]
size 8192 average throughput is 105142 [msg/s] 84113 [Mb/s]
size 16384 average throughput is 50039 [msg/s] 40031 [Mb/s]
size 32768 average throughput is 25475 [msg/s] 20380 [Mb/s]

Size penalty becoming noticeable . That said it covers a lot of common
messages. Probably would be worse on machines with smaller cache line size.



Using the union appeared to work well ( and I'm primarily interested here as
I want to introduce a variable size by copy message for my queue which a
union makes easier) 
The pack(1) was interesting and had a greater cost than I expected , that
being said it made messages smaller which  provided a benefit to the smaller
ref based tests.

A better option may be to just use a int for a flags and size field ..   use
the first 8 bits for size and just cast it to a byte from an int the flags
can work on the higher bits. Alternatively you can make the size 24 bits and
flags 8 bits removing the need for a separate size field in the ref based
message ( unless anyone can see a reason for > 24 bit messages?). 

You then have small messages  up to size 28 and still fit in 32 bytes ( even
on 64 bit machines)  , may try that now and force 16 byte alignment..


Regards, 

Ben 
 




More information about the zeromq-dev mailing list