In the book Life, the Universe and Everything, author Douglas Adams devised a form of invisibility, or cloaking device, dubbed the SEP field – SEP meaning ‘somebody else’s problem’.

Gerd Altmann_Pixabay_mannequin_100w.jpg
– Gerd Altmann, Pixabay

“An SEP is something we can't see, or don't see, or our brain doesn't let us see, because we think that it's somebody else's problem. That’s what SEP means. Somebody Else’s Problem. The brain just edits it out, it's like a blind spot,” explains Ford Prefect, one of the book’s key characters.

Or, as the accompanying Hitchhiker’s Guide to the Galaxy narration puts it: “The Somebody Else's Problem field... relies on people's natural predisposition not to see anything they don't want to, weren't expecting, or can't explain.”

For data center management, few things are more likely to create an SEP field for analysts than being bombarded with wrong, repetitive or trivial information from their monitoring tools, or regularly receiving alerts irrelevant to their particular job function.

In other words, the tools need to be concise, precise and help lead analysts directly to the causes of outages or other issues, regardless of whether those issues lead back to servers, networks or any other item of data center infrastructure.

Too many data center management tools, however, fail to do this.

“Event noise is the bane of the industry. If the system is providing alerts that the vast majority of operators will ignore, it’s wasting everyone’s time. And, if most of what is presented to the operators is ignored then, psychologically, they stop looking seriously at everything, and the data and alerts that really matter get hidden among the great mass of stuff that simply doesn’t ,” says John Diamond, a senior solutions architect at data center performance monitoring specialist Entuity.

Managing data center infrastructure is a challenging task because even a modest corporate network and IT set-up, these days, can be complex and subject to minute-by-minute change.

“Networks are dynamic; they change. Virtually nobody builds a network and leaves it unchanged until it’s decommissioned,” says Diamond. “They are always changing: people are adding equipment, or equipment is being removed, hardware is reconfigured.

“Then, you’ve also got reconfigurations in terms of interconnections: all these devices are interconnected and those connections change. Servers or hypervisors might be moved from one switch to another. You may have end users connected on the access layer of the network – and those end users tend to move around,” says Diamond.

Gerd Altmann_Pixabay_cyber-1000.jpg
– Gerd Altmann, Pixabay

Assets and analysts

What this means is that data center management systems must be equally dynamic; able to map all assets, first of all, but equally able to track changes as and when they occur, to identify genuine anomalies, and to consolidate ‘flapping conditions’ – in which, say, a device or port might be up, then down, then up again – into a single alert, so that analysts are not overwhelmed by irrelevancies.

Furthermore, they must be able to provide all this data analysts need, when they need it; to highlight looming issues via trend analysis, as well as pressing issues; but, to only sound the alarm over genuine problems or threats.

That is far easier said than done.

“One of the underpinning principles, philosophically, behind all of Entuity’s design is that you’re dealing with a dynamic world, a world in which what you discover on day one may be different on day two. And it has to cope with it dynamically without a human having to sweat tears over getting all of that accurate and keeping it accurate,” says Diamond.

Indeed, on day one the Entuity system is mapping, not just the network, but all connected devices, where they connect and how they interact, and it continues doing this week after week. That means logging the equipment right down to vendor and specification, as well as configuration and interconnection, and learning all the routes taken so that, in the event of outages, “root cause logic” can be deployed. That learning is continually updated, of course.

The learning of patterns of traffic, hour by hour, week by week, allows a dynamic threshold to be generated – a typical hour’s, day’s or week’s traffic across the infrastructure, enabling significant deviations to be automatically highlighted, yet taking into account the anticipated deviations you’d expect in traffic and activity during a normal working day.

For example, if backup operations occur overnight at a particular time, saturating particular circuits, that’s fine; it’s anticipated. But if that’s happening at 3pm on a Wednesday afternoon, that’s a problem that could require intervention. “Entuity learns those patterns and adapts accordingly,” he adds.

That auto-tuning feature is based on data that can also be manually queried to determine the causes of unusual or unexpected events.

“If you’ve got, for example, a link between your Sydney and Melbourne offices and your link between Australia and London is absolutely saturated, do you increase the bandwidth on the London link? Or, do you analyze the traffic more closely and realize that a significant proportion of the traffic that is supposed to be hopping from Sydney to Melbourne is, instead, going via London?” asks Diamond.

Having that information easily available at the fingertips of a networking analyst would highlight a routing issue, which can be fixed, saving money on a bandwidth upgrade.

The data Entuity collects and stores provides understanding to be gained right down to connected hosts, whether they are individual workstations, servers or anything else connected to the access layer of the network.

“An analyst can, straightaway, query where something is connected to. As things move around the network, Entuity stays on top of those addresses and learns where they are. So, if someone wants to know where something is connected, it can be found right away,” says Diamond.

It has also stayed on top of the shift from ‘north-south’ data center traffic to ‘east-west’ traffic, as processing has shifted with the rise in virtualization from moving up and down the layers in the hierarchy, to being performed at the same level in the hierarchy, not just at one processing node, but across multiple processing nodes.

The data Entuity collects can also be exported for use in other, specialist tools, such as security software, via a well-provisioned RESTful API that can provide access to that data on an as-needed basis.

Modern architecture

This kind of all-encompassing data center network management requires the right core architecture to be workable, partly for the purpose of simply collecting all the data required to genuinely monitor and manage the network, and partly to ensure that it can scale.

“The way we approach it is that every server that gets deployed in our multi-server architecture, runs exactly the same software suite. That's important. You can imagine an architecture where you put a database over here on that server, you put ‘pollers’ over there on those servers, a user interface front end on that server over there, and maybe some NetFlow monitoring on the server over there.

“It becomes a Franken monster very quickly, just to try and maintain it; a real nightmare,” explains Diamond.

He continues: “So we've got exactly the same software on every single server, and they interlink. Then you access it from just one node and that gives you visibility to everything within the limit of your permissioning.

“An administrator will be able to see everything, but then you can have users with more limited access, who aren't necessarily permitted to see everything, but they can see what they're permitted to see across all of the servers. The fact that it is implemented on multiple servers becomes somewhat invisible and irrelevant. It is one large virtual monitoring system that just happens to be implemented across multiple VMs.”

Ultimately, too, Entuity’s data center management software dovetails perfectly with owner Park Place’s core business – third-party maintenance. The information that Entuity can convey can not only help identify when a device, such as a server, is going wrong, but also the parts that are failing so that when a Park Place technician rocks up at the data center, he or she not only knows what’s wrong, but has the parts on-hand to install straightaway, as well.

“And we’re not nickel and diming our customers by licensing in terms of ports, either. We license by device because we want to encourage our customers to manage every port on their devices, not just to cherry pick the most important. We feel that it’s important to monitor everything,” says Diamond.

Of course, the pass mark for any data center network management software intelligence test isn’t necessarily monitoring everything, although that is an essential component, but doing so without overloading the team of analysts whose role is to use it to keep everything running smoothly.

“We’ve had managed service provider clients who used to have to bring-in an additional FTE [full-time equivalent] for every five or so typical-sized customers. Then they switched to Entuity and they can’t remember the last time they took on another FTE,” says Diamond.