[zeromq-dev] PGM performance test utility for Win64 non-blocking sockets & IOCP
steven.mccoy at miru.hk
Thu Nov 11 17:26:22 CET 2010
I've uploaded two snapshots of 5.1.92 build of pgmping performance test
utility to compare performance of non-blocking sockets and IOCP with
blocking sockets, and the potential of ØMQ.
*Non-blocking socket version:*
*IOCP (send-side only) socket version:*
Uses only one IOCP thread on the sender side, unlimited buffer space and so
excessive growth in memory highlights a performance bottleneck with the
Unfortunately the process is manual as there is no reliable congestion
control method to auto-tune to the networks capability. Testing is
generally performed by sending from one host to the network and having
another host on the network reflect packets back to the sender. This removes
the problem of clock skew. If you play with host times you may be able to
test unidirectional performance but it is frequently non-trivial.
Basic test should always work is 1,000 packets-per-second (pps), replace the
interface IPv4 address 10.6.28.35 as appropriate, run the receiver first.
You may wish to restart the receiver between tests to clear out state of
the previous PGM transport.
pgmping -n "10.6.28.31;22.214.171.124" -r 800000000 -p 3065 -e
pgmping -n "10.6.28.35;126.96.36.199" -r 800000000 -p 3065 -m 1000
Then increase the rate till you see packet loss, or stalled progress,
pgmping -n "10.6.28.35;188.8.131.52" -r 800000000 -p 3065 -m 10000
pgmping -n "10.6.28.35;184.108.40.206" -r 800000000 -p 3065 -m 20000
pgmping -n "10.6.28.35;220.127.116.11" -r 800000000 -p 3065 -m 30000
pgmping -n "10.6.28.31;18.104.22.168" -r 800000000 -p 3065 -l
pgmping -n "10.6.28.35;22.214.171.124" -r 800000000 -p 3065 -m 1000 -o
*Windows is default configured for 100mb networks and needs registry changes
for 1000mb or faster networks. First you may wish to disable the multimedia
scheduler which restricts sockets to 10,000pps when a multimedia application
- Under HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows
- NetworkThrottlingIndex, type REG_DWORD32, value set to 0xffffffff.
Two settings define a IP stack path for datagrams, by default datagrams
above 1024 go through a slow locked double buffer, increase this to the
network MTU size, i.e. 1500 bytes.
- FastSendDatagramThreshold, type REG_DWORD32, value set to 1500.
- FastCopyReceiveThreshold, type REG_DWORD32, value set to 1500.
Unless you have hardware acceleration for timestamps incoming packets will
be tagged with the receipt time at expense of processing time,
- Disable the time stamps on both sides. This is performed by means of
the following command:
"netsh int tcp set global timestamps=disabled"
A firewall will intercept all packets causing increased latency and
processing time, disable the filtering and firewall services to ensure
direct network access,
- Disable the following services:
- "Base Filtering Engine (BFE)"
- "Windows Firewall (MpsSvc)"
If you have a multi-core box you might want to enable Receive Side Scaling
- Enabling Receive Side Scaling (RSS) is performed by means of the
"netsh int tcp set global rss = enabled"
Also consider forcing Direct Cache Access (DCA) also known as NetDMA 2.0,
this probably requires a modern high end NIC,
- Enable DCA via command line:
"netsh int tcp set global dca=enabled"
- Or, enable DCA via registry,
- EnableDCA, type REG_DWORD32, value set to 1.
You can set a socket buffer multiplier for increased bandwidth but latency
will highly likely suffer significantly as pgmping already sets large send
and receive buffers (except IOCP send which uses 0 for zero-copy):
- BufferMultiplier, type REG_DWORD32, value set to 0x400.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the zeromq-dev