
OpEx: Minimizing operational expenditure
Strategies for reducing operational expenditure whilst maximizing operational resiliency and efficiency
Power problems are no longer the top cause of outages
As many of us continue to work remotely, outages have become more frequent and more acute
Although the system is resilient, there have been some alarming staff shortages
The big dangers - and how to manage them
As Internet traffic has increased and databases have become more outdated, the importance of routing validation has increased
The US Government Accountability Office recently issued a report in which it documented 34 IT outages from 2015 through 2017 that affected 11 of the 12 selected domestic US airlines included in the report
There are a range of varying claims to BGP visibility or monitoring out there, terms which are themselves quite vague
Explaining Cloudflare's major outage
Three approaches to determine the impact of changing a script
When data is unavailable, lost or unprotected, there’s a huge price to pay
A simple error can dramatically alter the service delivery landscape in the Internet
What we saw when Google went down, and how you should think about resiliency and visibility as you move into the cloud
Linking data center failures to organizational dysfunction and a lack of diversity
MySpace doesn’t make headlines too often these days — and its most recent appearance in the news was for all the wrong reasons
An effective disaster recovery plan is all about the details
The world’s most successful companies all have something in common – agility
Don’t underestimate the power of using OCPD fuses in your data center
When a service goes down, the impact can vary from disastrous to so-what
The most effective disaster recovery plan is one that caters to the business’s specific needs
ARP creates a control plane where decisions about routing electrical can be taken and enacted, based on a real-time assessment of the actual needs of the data center. Peter Judge talks to Ed Ansett about an invention due to be revealed at DCD>London
Bad weather sparked last week's Azure outage. In the long term, AI and analytics may save us from this sort of incident - but for now, you need to get ready for the next storm
Equipment, expertise, and management are so much more advanced than they were five or 10 years ago. How could failures possibly be more common?
The causes of data center downtime are diverse, but there is an important trend that puts the ultimate responsibility on generators
The biggest lesson from the meltdown at British Airways is this: don’t have a single point of failure