Microsoft-CrowdStrike Outage Highlights Feeble Nature of Tech, Pressing Need for Cyber Resilience and Diversity in IT

Martin Dale Bolima July 22, 2024

4 minutes read

A single mistake.

That, apparently, is all it takes to disrupt the world.

On Friday, 19 July 2024, businesses around the world faced substantial service interruptions after Microsoft experienced a widespread outage that left users staring in shock at the notorious Blue Screen of Death (BSOD)—a critical error prompt that appears on Windows PCs and stops operations whenever the system encounters a serious issue.

Multiple reports indicate the severity of this issue, which appears to have disrupted entire industries globally—grounding flights of major airlines, impeding payment processing in large supermarkets and retail stores, shutting down billboards in New York, and hindering critical services of major banks in the Philippines.

“The outage that affected computer systems worldwide was severe. It affected critical systems, such as those in hospitals, airports, financial institutions, and more,” Satnam Narang, Senior Staff Research Engineer at Tenable, told Cybersecurity Asia (CSA).

Narang also clarified that Microsoft per se was not at fault for this worldwide blip, but rather the security software installed on millions of Windows computers from Tonga to Turkey to Tennessee.

“Because this is a security software, it requires a higher level of privileges to the underlying operating system, so a bad or faulty security update can result in a catastrophic impact,” Narang added. “This event is unprecedented, and the ramifications of it are still developing.”

The security software in this case is from—drum roll, please!—CrowdStrike, the Texas-based cybersecurity firm that provides cloud workload protection, endpoint security, and more.

The Blue Screen of Irony: How a Security Tool Caused Cyber Attack-Like Chaos

So, what really happened?

Evidently, CrowdStrike’s Falcon Sensor, engineered specifically to protect companies from cyber attacks, got an update just prior to the outage. But this update, according to multiple sources, had a faulty file that, when sent to Windows, caused it to crash.

Ah, the irony! Software designed to prevent a cyber attack just caused a string of cyber attack-like incidents all around the globe—highlighting the world’s dependency on tech to deliver important services.

“These outages are increasing in volume due to the sheer increase in online users and traffic. After witnessing the Blue Screen of Death, many people were quick to suspect a cyber attack or find similarities to Netflix’s Leave the World Behind, but this can often add to the confusion. It highlights the importance of these services and the millions of people they serve,” noted Jake Moore, Global Security Advisor at ESET.

“The inconvenience caused by the loss of access to services for thousands of people serves as a reminder of our dependence on Big Tech such as Microsoft in running our daily lives and businesses. Upgrades and maintenance to systems and networks can unintentionally include small errors, which can have wide-reaching consequences as experienced today by CrowdStrike’s customers.”

That’s exactly what happened, and, worse still, rectifying the issue proved to be tricky. In many cases, IT professionals had to be called in to fix it. One of the known fixes entails users replacing the offending file and keying in a unique 48-digit BitLocker key—a process that may be relatively simple for the most tech-savvy individuals, but probably not so much for the everyday employee. Barring that, reboots can solve the problem, according to a Microsoft update, but with a catch: It had to be done multiple times for the most part, with some users claiming to have done so up to 15 times.

All that, again, was caused by a single mistake.

Is Tech a House of Cards? One Error and Our World Collapses

One fault. That’s all it takes to disrupt this tech-dependent world. That’s all it takes to stop work for thousands of people. That’s all it takes to strand millions of passengers in airports. That’s all it takes to impede payment processing and financial transactions across six continents.

By now, we should’ve known better. And yet here we are, just a couple of days removed from an outage that affected millions of people worldwide. If anything, this needs to be both a stark reminder of the follies of tech and a wake-up call to prepare for this feebleness.

For all the advancements in modern technology, nothing is ever truly foolproof. In fact, even the most hi-tech systems are just one error away from a critical failure—something CrowdStrike and Microsoft users found out the hard way recently. It isn’t an isolated incident either, with that one fatal error causing all sorts of chaos, like that time when UniSuper nearly lost USD $125 billion worth of pension fund data because of human error or when a “technical network fault” left millions of Optus subscribers in Australia without Internet for hours.

Microsoft-CrowdStrike Outage Highlights Feeble Nature of Tech, Pressing Need for Cyber Resilience and Diversity in IT

And yet it seems we never learn. It has to stop now.

“Businesses must test their infrastructure and have multiple fail-safes in place, however large the company is. This is typically referred to as a cyber resilience plan,” Moore pointed out. “But as often it is with the case, it is simply impossible to simulate the size and magnitude of the issue in a safe environment without testing the actual network.”

Moore also touched on this tendency to over-rely on the same infrastructures, platforms, and tech names—something like a digital equivalent of putting all your eggs in one basket. This, according to Moore, is counterintuitive at best.

“Another aspect of this incident relates to diversity in the use of large-scale IT infrastructure. This applies to critical systems like operating systems, cybersecurity products and other globally deployed (scaled) applications,” he noted. “Where diversity is low, a single technical incident, not to mention a security issue, can lead to global-scale outages with subsequent knock-on effects.”

And yet, again, we keep having these major outages. It’s like we never learn.

This latest incident must be a wake-up call. Technology, at its core, is feeble—and we need to accept it. We ought to embrace it. We must prepare with that in mind.

Maybe then, one single mistake won’t ground the world to a halt.