Latency is a fascinating word which always gets me on my pulpit. Latency in the English language means a hidden potential delay and not a fact of life. Latency applied to networks (and other resources) should mean the time taken to do a transmission across a network and not an overhead to be avoided.

Latency is normally defined as the total time taken to perform a unit of work in IT, normally a transaction of some sort. This is often referred to as the response time. The latency of a unit of work is usually made up of sub-units such as processor work, disk I/Os and transit times in a network, each of which has a latency related to it. While not disagreeing with the common concept of latency, I feel there are advantages in dividing the latency time into two components.

Luxury swiss watch time dial gold
Luxury swiss watch time dial gold – Thinkstock / sandr2002

Latency components

Let us break the time into the following two components - the native latency and the extra latency. The native latency is the time taken for the unit of work to execute in an unconstrained environment, that is, without queuing, retransmission, retries and other items which might add to this time. This is fixed in the absence of upgrades or, in the case of a network, techniques such as data compression. Thus, for a disk access this native time would be equal to seek time plus search time plus data transfer time or, for a network link, the data size (payload + control data) divided by the line speed.

The extra latency comes into play when the traffic of work units (transactions) becomes such that queuing occurs and other time elements such as data retransmission, disk retries and so on. Queuing, for example, will add a wait (or queuing) time to the native latency, and this time will depend on the utilization (busyness) of the resource in question. For a single resource, this can be expressed as:

Tlatency = Tnative + Textra

To my way of thinking, a delay is something expected but should not really happen in a well designed, balanced system. It is like a traffic jam adding 30 minutes to your journey which normally takes 20 minutes. Hence we have the fact that:

Normal (native) time = 20, delay (extra) = 30, total time = 50 (latency)

Why this breakdown; isn’t a single latency entity sufficient? Not if you want to reduce it to reduce the response time of some unit of work, usually a transaction.

If, for example, you have a response time of 3 seconds (latency = 3 secs. in old currency), you will obviously want to reduce this. However if the native part of this response (for a network as an example) is = (payload + control data)/line speed) = 2.2 secs., you can try until you are blue in the face and you will never get below this unless you employ some network enhancement techniques like WAN acceleration or data compression. If you separate this 3 seconds response (latency) into native (unchangeable short term) and extra, you can concentrate of the extra portion, which will be given by the equation extra =(response time - native time). You then know what you are tackling and should then break this extra down into components, assess which ones contribute the most to this time and tackle them. They could be wait times in various nodes in the network, transmission errors etc.

Cache: Is Bigger is Better?

There is a myth that the bigger the buffer anywhere that data is retrieved and written, the better the response time, throughput and file transfer times. This, apparently, is not true, although in some cases it may be. Here we cast an eye over just one area where buffering can be manipulated for optimum performance - network. Buffers are also key to the performance of individual components in a network, such as routers and also databases.

Networks: As the glue between systems and users, networks can be sources of slowdown due to sub-optimal buffering. The most common network protocol used on the internet is the Transmission Control Protocol, or TCP. For maximum throughput, it is critical to use optimal TCP socket buffer sizes for the link employed. If the buffers are too small, the TCP congestion window will never open up fully, so the sender will be throttled. If the buffers are too large, the sender can overrun the receiver (like drinking from a fire hose), which will cause the receiver to drop packets and the TCP congestion window to shut down.

NFS and DNS buffers will probably benefit from some scrutiny and optimization.

For more information and Computing the Buffer Size see the readable papers:

How TCP Works

Buffer Sizing in the Internet

There are similar considerations for buffers/cache in routers and for memory cache. I have seen 32K cache processors of almost identical power outperform 64K cache processors. This all depends on the cache handling algorithms employed and in this observation the algorithms were not the same.

Terry Critchley is the author of 

High Availability IT Services, ISBN 9781498769198 - CAT# K29288