Start your day with intelligence. Get The OODA Daily Pulse.

Home > Analysis > “This is Not a Security Incident or Cyberattack”: Microsoft and Crowdstrike Scramble to Patch ‘Largest IT Outage in History’

“This is Not a Security Incident or Cyberattack”: Microsoft and Crowdstrike Scramble to Patch ‘Largest IT Outage in History’

At approximately 3 AM EST, reports started crossing the transom of a global IT outage impacting a broad range of industries, causing airlines, banks, media broadcasters, and shipping lines to shut down operations.  Boston’s Logan Airport was shut down this morning, Washington D.C.’s Metrorail has been impacted, and planes were grounded at many airports around the world. This post is a quick and dirty tick-tock of the incident and the response from Microsoft and Crowdstrike. For CISOs in mitigation mode, we have compiled some technical links here as well.  

Background

Some in the IT community have an incalcitrant, legacy attitude towards Microsoft as the “Evil Empire.”  Here at OODA Loop, we consider Microsoft Security Threat Intelligence and Cyber Signals intelligence resources best-in-class, and a member of the research team has done specific research experience on Microsoft’s strategic acumen positioning the company for leadership in AI (starting with Microsoft M&A activity research in the AI space dating back to 2014).  And while some still discount his perspective and insights, we even track Bill Gates as a thought leader (although his loss of credibility  – based on his 1995 congressional testimony during the “browser wars” and one too many Winsock.dll IT troubleshoots – was completely warranted).

With that, the severity of this outage adds insult to injury, as the company has been under severe scrutiny and harsh criticism recently vis a vis its role in the 2020 Solwarwinds attack after this high-impact recent reporting by ProPublica (released throughout June and July of this year):

As for Crowdstrike: CNBC is reporting that CrowdStrike shares tanked 15% in premarket after the major outage hit businesses worldwide:

  • Shares of cybersecurity company CrowdStrike plunged 15% after an update affecting one of its key products caused a major outage that led to ripple effects in IT systems around the world.
  • Microsoft, which also reported issues affecting its Azure cloud services and Microsoft 365 suite of apps, fell 2% in premarket trading.
  • Addressing the incident Friday, CrowdStrike CEO George Kurtz said the issues were caused by “a defect found in a single content update for Windows hosts.”

Timeline of  Global Impacts (as of 9 AM EST, Friday, July 19, 2024)

Widespread Microsoft outage disrupts flights, banks, media outlets, and companies around the world

WELLINGTON, New Zealand (AP) — A global technology outage grounded flights, knocked banks and hospital systems offline, and media outlets off air on Friday in a massive disruption that affected companies and services around the world and highlighted dependence on software from a handful of providers…hours after the problem was first detected, the disarray continued — and escalated…long lines formed at airports in the U.S., Europe and Asia as airlines lost access to check-in and booking services at a time when many travelers are heading away on summer vacations. News outlets in Australia — where telecommunications were severely affected — were pushed off air for hours. Hospitals and doctor’s offices had problems with their appointment systems, while banks in South Africa and New Zealand reported outages to their payment system or websites and apps.

In the U.S., the FAA said the airlines United, American, Delta and Allegiant had all been grounded.

Airlines and railways in the U.K. were also affected, with longer than usual waiting times.

Some athletes and spectators descending on Paris ahead of the Olympics were delayed, but Games organizers said disruptions were limited and didn’t affect ticketing or the torch relay.

With athletes and spectators arriving from around the world for the Paris Olympics, the city’s airport authority said its computer systems were not affected by the outage, but that disruptions to airline operations was causing delays at two major Paris airports. The Paris Olympics organizers said the outage affected their computer systems and the arrival of some delegations and their uniforms and accreditations had been delayed.

But the impact was limited, the organizers said, and the outages had not affected ticketing or the torch relay.

DownDectector, which tracks user-reported disruptions to internet services, recorded that airlines, payment platforms, and online shopping websites across the world were affected — although the disruption appeared piecemeal and was apparently related to whether the companies used Microsoft cloud-based services.

Microsoft 365 posted on social media platform X that the company was “working on rerouting the impacted traffic to alternate systems to alleviate impact” and that they were “observing a positive trend in service availability.”

Major U.S. air carriers ground flights as mass IT outage hits Windows users

Businesses around the world reported experiencing issues with Microsoft Windows overnight Thursday into Friday, with users reporting “blue screen of death” (BSOD) errors. At least some resumed functioning shortly afterward.

Taipei Taoyuan International Airport, the largest airport in Taiwan, reported computer service disruptions that affected some airlines using a Microsoft cloud system, according to a Facebook post from the airport. Delta and United Airlines have suspended flight departures from the Taoyuan airport. Six budget airlines — AirAsia, Hong Kong Express, Jeju Air, Jetstar, Scoot, and Tigerair Taiwan — resorted to manual check-in for all flights. At least two major hospitals in Taipei experienced internet outages for up to an hour in the early afternoon, and services are now back to normal, local media reported.

In Germany, Axel Schmidt, a spokesman for Berlin-Brandenburg BER Airport, told The Post that flights resumed shortly after 10 a.m. CET (4 a.m. Eastern) after being briefly suspended. “We now have a backlog of flights to work through,” he said. “We’re trying to get everyone to their destinations as quickly as possible.” The outage hit shortly after most schools began their summer break.

Earlier Friday, Washington’s Metrorail said it was “affected by a known issue impacting computer systems across the globe.” Train information was not showing up on screens in some stations, and some WMATA websites appeared to be down. However, “all Metrorail stations opened on time & service is running as scheduled,” Metrorail said in a post on X. It also wrote that the Metro Transit Police Department “can still be reached at (202) 962-2121 or by texting MYMTPD (696873).”

Mass Microsoft outage: Flights grounded, systems offline globally

Passengers queued at Gatwick Airport amid a global IT outage on Friday in Crawley, United Kingdom.

Passengers queued at Gatwick Airport amid a global IT outage on Friday in Crawley, United Kingdom.  (Image Source:  Boston Globe)

Travelers lined up at the check-in counters of the Hong Kong International Airport on Friday.

Travelers lined up at the check-in counters of the Hong Kong International Airport on Friday.  (Image Source:  Boston Globe)

Federal agencies affected by worldwide IT outage

A flaw in CrowdStrike software has impacted Microsoft products, with malfunctions resulting in problems for government services.

  • President Joe Biden has been briefed on the matter and his team is in touch with CrowdStrike and “impacted entities,” a White House official told a pool reporter.
  • The Social Security Administration, meanwhile, has closed all offices Friday because of a “global IT outage,” according to the agency’s website. SSA said that individuals should expect longer call wait times for its national 800 number and that “some online services are unavailable.”
  • The Justice Department was also affected and has alerted users, according to an emailed statement from a spokesperson.
  • “The DOJ Office of the Chief Information Officer (OCIO) is actively troubleshooting possible workarounds with Component CIOs and technical teams while CrowdStrike, the vendor, is attempting to resolve the problem,” the spokesperson said.
  • An agency manager within the Department of Homeland Security reported to FedScoop that some of their staff had trouble logging into desktop computers and had to spend the morning working on phones or through virtual desktop or web pages applications.
  • The Enterprise Service Desk at the Department of Veterans Affairs is also down, according to a person familiar with the matter, though it’s not yet clear if it’s related to the CrowdStrike flaw.
  • The Federal Aviation Administration said it was “closely monitoring a technical issue impacting IT systems at U.S. airlines” but did not say whether the issue had impacted government-operated systems. The agency later added that current “FAA operations are not impacted by the global IT issue” but that it is monitoring the situation.
  • The Energy Department’s website also appears to be offline. Several attempts to visit the energy.gov domain resulted in a 503 error noting that a backend fetch failed. The Energy Department did not respond to a request for comment about whether the CrowdStrike issue is related to the outage.
  • A Nuclear Regulatory Commission spokesperson told FedScoop in an email Friday that the agency is operating normally. Additionally, the spokesperson said that U.S. commercial nuclear facilities reported that they are operating safely.
  • In 2021, CISA brought on CrowdStrike to provide endpoint detection and response technology to enhance its Continuous Diagnostics and Mitigation (CDM) program and protect federal civilian agencies’ networks.

Global IT outage affects ports in US, Europe and Africa

Ports confirmed to be affected, according to Everstream Analytics

  • Port of Gothernburg, Sweden: Operational disruptions confirmed, but operations reported resumed as of 11.20 CET
  • Port of Felixstowe, UK: Operational disruptions confirmed
  • Port of Dover, UK: Operators expect longer wait times due to technical issues
  • Port of Gdansk, Poland: Vessels have been asked to refrain from sending containers to the port
  • Port of Genoa, Italy: Trucks are currently unable to access the port gate
  • Port of Aarhus, Denmark: Container gate-in and gate-out operaitons temporarily suspended, but operations reported resumed as of 11:15 CET
  • Port of Valencia, Spain: Unable to open gate doors at APM terminals
  • Port of Los Angeles, US: APM terminals Pier 400 cancelled parts of the second shift on 18th July, resumption expected on 19th July
  • Port of Mobile, US: Operators warn of operational delays as terminals may need longer to start on 19th July
  • Port Elizabeth, South Africa: APM terminals have delayed opening on 19th July

For a very regional perspective from the EU and UK, see:

What Next?  Remediation, Service Status, and Technical Resources for CISOs

Microsoft Service Health Status: https://status.cloud.microsoft/

Azure Status:  https://azure.status.microsoft/en-us/status

Crowstrike:  Statement on Falcon Content Update for Windows Hosts

CrowdStrike is actively working with customers impacted by a defect found in a single content update for Windows hosts. Mac and Linux hosts are not impacted. This is not a security incident or cyberattack.  The issue has been identified, and isolated and a fix has been deployed. We refer customers to the support portal for the latest updates and will continue to provide complete and continuous updates on our website.  We further recommend organizations ensure they’re communicating with CrowdStrike representatives through official channels. Our team is fully mobilized to ensure the security and stability of CrowdStrike customers.

Update 9:22am ET, July 19, 2024:

We are working hard to provide comprehensive and continuous updates with our global customers as quickly as possible. Below is the latest CrowdStrike Tech Alert with more information about the issue and workaround steps organizations can take. We will keep this page updated with new information as it’s available.

Summary

  • CrowdStrike is aware of reports of crashes on Windows hosts related to the Falcon Sensor.

Details

  • Symptoms include hosts experiencing a bugcheckblue screen error related to the Falcon Sensor.
  • Windows hosts which have not been impacted do not require any action as the problematic channel file has been reverted.
  • Windows hosts which are brought online after 0527 UTC will also not be impacted
  • Hosts running Windows 7/2008 R2 are not impacted
  • This issue is not impacting Mac- or Linux-based hosts
  • Channel file “C-00000291*.sys” with timestamp of 0527 UTC or later is the reverted (good) version.
  • Channel file “C-00000291*.sys” with timestamp of 0409 UTC is the problematic version.

Current Action

  • CrowdStrike Engineering has identified a content deployment related to this issue and reverted those changes.
  • If hosts are still crashing and unable to stay online to receive the Channel File Changes, the following steps can be used to workaround this issue: go to this link.

SC Magazine, Bleeping Computer, and Information Week provide reports that also have technical explanations and resources intertwined with general

Additional OODA Loop Resources:  The Uptick in Global IT Supply Chain Breaches (Frequency and Specific Targeting)

This all comes fast on the heels of the eerily prescient discussion in the June 2024 OODA Network Monthly Meeting on The Uptick in Global IT Supply Chain Breaches (Frequency and Specific Targeting):

At the June 2024 OODA Network Member Meeting – held on Friday, June 21, 2024 – the network discussed The Uptick in Global IT Supply Chain Breaches (Frequency and Specific Targeting), amongst other topics.

The central discussion at the June 2024 OODA Network Monthly Meeting revolved around the increasing frequency and specific targeting of supply chain breaches, with concerns raised about the rising risk associated with these attacks. Participants highlighted the supply chain as a major target for cyberattacks and emphasized the importance of addressing vulnerabilities in the supply chain to mitigate risks. The discussion also touched on the significance of supply chain attacks as a means to exploit systems beyond just ransomware, referencing previous notable incidents like Log4J and SolarWinds. The meeting emphasized the significance of supply chain security, with one participant noting that supply chains are among the most targeted in the world, underscoring the evolving threat landscape and the need for robust defenses to combat the growing menace of supply chain attacks.

Topics and themes discussed by the OODA Network which apply to this global IT outage include:

  • A CISO Playbook:  For the CISOs on the call or in the larger OODA Network, an emphasis was put on the need for transparency or “covering oneself” to avoid a perception or narrative that the entire onus of a supply chain breach falls internally on a CISO (and not a provider or vendor or XaaS platform).
  • The significance of vendor reporting and the evolving landscape of community defense in dealing with supply chain vulnerabilities.
  • “What does outsourcing really mean?  What does offshoring really mean?  What does a vendor agreement mean?  Are the survey-based approaches that we’ve used in the past for doing due diligence and counterparty risk still valid?”
  • The lack of effective third-party vendor and software security measures and tools for monitoring networks:  A speaker on the call pointed out the strategic nature of such attacks, indicating that they serve as a pathway to achieving larger objectives beyond mere ransomware incidents.
  • Why the market doesn’t react to this increased IT supply chain activity?
    • One member highlighted that such cyber incidents are not perceived as material, thus not impacting stock prices significantly.
    • He also cautioned against taking advice from politicians on private sector matters, emphasizing the market’s autonomy – with insights into the market’s behavior and the significance of cybersecurity in the evolving digital landscape.
    • He highlighted the challenge of determining the threshold for an event to influence the market.
  • The discussion ended with a network member reinforcing the importance of documentation and raising concerns about the lack of focus on remediation and basic controls in cybersecurity incidents – emphasizing the need for companies to remember and address these issues to differentiate between survival and failure in such incidents.

https://oodaloop.com/archive/2024/07/18/the-june-2024-ooda-network-monthly-meeting-the-uptick-in-global-it-supply-chain-breaches-frequency-and-specific-targeting/

Daniel Pereira

About the Author

Daniel Pereira

Daniel Pereira is research director at OODA. He is a foresight strategist, creative technologist, and an information communication technology (ICT) and digital media researcher with 20+ years of experience directing public/private partnerships and strategic innovation initiatives.