Ransomware is big business. Cybercriminals know that companies hate to see operations grind to a halt, and compromising one machine can quickly and easily bring down entire networks.

The likes of CyrusOne, iNSYNQ, A2 Hosting, SmarterASP.NET, and DataResolution.net were all hit with ransomware during 2019, with many of those seeing customers impacted.

Last year Equinix became the largest data center victim of ransomware in recent memory. But unlike many of its peers, it managed to contain the attack quickly and keep its customers’ operations unaffected by the incident.

“The investments that we made as a company over the past several years paid dividend in our own ransomware attack because that lateral movement into our IBX facilities was not possible thanks to a lot of the things that the company has done over the past several years,” Equinix’s Chief Security Officer (CSO) Michael Montoya tells DCD.

The attack on Equinix begins

On September 9, 2020, Equinix posted a statement saying it was ‘investigating a security incident that involves ransomware on some of our internal systems.’

The company moved quickly to contain and remediate the incident. After a month of investigation, it found that although some information accessed included internal references to Equinix customers, the data contained no sensitive information on customer operations or other material customer information.

Via previous stints at FireEye and Microsoft, Montoya said he had dealt with thousands of security incidents over the years through helping clients. But this was his first as CSO.

“It definitely feels very different when you're on this side of the table. There's no doubt about that,” he said.

Where CyrusOne’s 2019 ransomware incident reportedly affected six managed service customers served primarily from its New York data center, at no point were customer operations within Equinix facilities affected.

Equinix hasn’t released much information publicly, but according to a report by BleepingComputer a few days after the attack, the company was hit with the NetWalker strain of ransomware, and attackers asked for $4.5 million in ransom. The publication reported that systems affected held financial information, payroll, accounting, audits, and data center reports.

What is NetWalker ransomware?

According to Crowdstrike, the NetWalker ransomware variant was reportedly created by a Russian-speaking cybercrime group known as ‘Circus Spider’ in 2019. The malware is sold to criminals via an ‘as-a-Service’ model where buyers rent the capabilities from the creators for a fee or percentage of profits.

NetWalker encrypts files on the local system, maps network shares, and enumerates the network for additional shares, attempting to access them using the security tokens from all logged-in users on the victim’s system. Attackers often follow up the initial encryption of data with the threat to release information publicly.

The University of California San Francisco (UCSF) was another victim of NetWalker and revealed that it paid roughly $1.14 million in order to recover its data. The Australian logistics giant Toll Group, Pakistani power utility K-Electric, and a number of healthcare organizations have also been hit by the ransomware.

ChainAnalysis said it had tracked more than $46 million worth of funds in NetWalker ransoms since it first came on the scene in August 2019 across hundreds of victim organizations.

In January the Department of Justice, alongside law enforcement in Bulgaria, said it had arrested Canadian national Sebastien Vachon-Desjardins of Gatineau and had taken down the portals used by NetWalker ransomware affiliates to provide payment instructions and communicate with victims. According to Canadian press, Vachon-Desjardins is being denied bail while he awaits extradition to the US. He was described as a ‘key player’ to NetWalker’s operations and a likely flight risk if released.

Equinix falls victim

Montoya says the attacker was able to get into Equinix’s network through a ‘configuration management deviation’ in one of its cloud environments, which allowed a threat actor to get in through via a Remote Desktop Protocol (RDP) session.

– Getty

“That RDP session was misconfigured, and in a cloud environment that maybe didn't have the right level of management oversight. And, as a result, they were able to get into our environment and make some pivots. The biggest impact that we had is they were able to get access to some file servers that were in the process of being moved to the cloud.”

Montoya says the information the threat actors accessed were in the process of being moved to a file storage system and were not business-or mission-critical, but also acknowledges they were able to gain access to “some of our IT management systems, like our software distribution systems and a few other systems.

“But once they moved from this cloud environment into our core environment, we were able to detect very quickly within a few hours. So our defense mechanisms worked with precision.”

Montoya said the company was able to get all the encrypted data back using its own backups, and at no time did users not have access to their email, the corporate environment, sales information, customer information, etc.

“The business operated with no interruption. The biggest impact to our user base is we did force a password change to our users, and that's just more of a hygiene-related activity. We were able to operate businesses as normal because the blast radius of this impact was pretty small and it was pretty focused on some file systems,” he explained.

Equinix’s ransom response

According to IBM, the average time to identify a breach in 2020 was 228 days, while the average time to contain a breach was 80 days. It’s not uncommon for firms to become aware of incidents via third-parties.

Though the attacker was present for almost three weeks, Montoya says Equinix were able to self-detect the incident within six hours once the threat actor weaponized their attack.

“We were able to basically detect this within 19 days, which is still horrible because there's somebody who potentially had access for 19 days,” says Montoya. “But the material time where that attack was being weaponized was less than six hours before we detected it. So I think our response time from an industry perspective was pretty amazing.”

Equinix utilizes the OODA Loop (Observe, Orient, Decide, Act); a four-step process taken from the military and commonly used in cybersecurity. So rather than instantly act, the data center firm’s security team waited before acting.

“Within 15 minutes of that detection we had our full CERT (computer emergency response team) on the incident, observing what movements are they making, where they are trying to do some stuff. We did a lot of observation because we tried to figure out what type of attack is this, what are they doing and wanted to learn something about the threat actor here. And in that learning, we observed for some time and oriented ourselves in a way that we could make the right decisions on how we respond.”

“Within an hour, we started our response to that. Once we started our response after we did the right level of observation, we were able to one get a live body on the other side, [and] force them into moving faster on what they were trying to accomplish.”

Initial containment was achieved that day, within around eight hours of initial detection. And full containment was declared within a few days.

“We then pulled in our crisis plan, which included bringing in a third party Incident Response provider to help us do a formal investigation and response, pulled in obviously outside counsel, activated our full executive team and our board, so that we could properly respond to this from not necessarily the technical perspective but all the things that come with a breach right in terms of communication, etc.”

The company also created a private GitHub repository where it just shared a lot of intelligence with its customers in the security community.

Montoya says he is a “big believer” in intelligence sharing and that intelligence was a way to help partners while ‘intelligence engines caught up’.

Never let a good crisis go to waste

The relative lack of impact was a testament to previous investments and proof the current security strategy was working, but as with any incident, there were learnings to be had.

“The biggest issue for us honestly was just more urgency. We had the right strategy, we just needed to get more urgency around some key areas,” says Montoya. “Even though our segmentation was a big proactive thing for us, we're taking even additional controls and efforts around segmentation to even give us further degrees of protection and capabilities there.

“We've improved our backup strategy; although our backups were a big advantage because we were able to restore from backups, we recognized that we want to even have greater levels of resiliency in our backups and we took additional measures there as well.”

Retiring those files servers more quickly was a key lesson, and though Equinix was already on a roadmap to adopting zero trust, more speed was needed there as well: “We've accelerated all of our technology refresh plans around zero trust from a multi-year strategy to a fraction of that time as a result of this event because we realized that we had the right strategy, but we needed to move faster.”

In addition, the company decided to bring its engineering function and security operations under the same structure so the company can take more of a continuous engineering approach to security operations and perform continuous instrumentation on its technology stack.

“That's been of huge value for us to sort of in our approach to automation; creating this more continuous engineering loop between our security engineers, and our security operators.”

Overall, the attack on Equinix could have been far worse, but this was largely due to the efforts and decisions made before the incident, not during or afterwards.

“The defenses we built over the years, not just months or days, were critical in helping us respond I think very effectively,” reflects Montoya. “We know our assets, and we've done a lot of segmentation and hardening to restrict lateral movement.”

"We want to even have greater levels of resiliency in our backups and we took additional measures there as well”