
2024 CrowdStrike-related IT outages
Global computer systems outage
On 19 July 2024, the American cybersecurity company CrowdStrike distributed a faulty update to its Falcon Sensor security software that caused widespread problems with Microsoft Windows computers running the software. As a result, roughly 8.5 million systems crashed and were unable to properly restart in what has been called the largest outage in the history of information technology and "historic in scale".
The outage disrupted daily life, businesses, and governments around the world. Many industries were affected—airlines, airports, banks, hotels, hospitals, manufacturing, stock markets, broadcasting, gas stations, retail stores, and governmental services, such as emergency services and websites. The worldwide financial damage has been estimated to be at least US$10 billion.
Within hours, the error was discovered and a fix was released, but because many affected computers had to be fixed manually, outages continued to linger on many services.
Background
CrowdStrike produces a suite of security software products for businesses, designed to protect computers from cyberattacks. Falcon, CrowdStrike's endpoint detection and response agent, works at the operating system kernel level on individual computers to detect and prevent threats. Patches are routinely distributed by CrowdStrike to its clients to enable their computers to address new threats.
CrowdStrike's own post-incident investigation identified several errors that led to the release of a faulty update to the "Crowdstrike Sensor Detection Engine":
Outage
On 19 July 2024 at 04:09 UTC, CrowdStrike distributed a faulty configuration update for its Falcon sensor software running on Windows PCs and servers. A modification to a configuration file which was responsible for screening named pipes, Channel File 291, caused an out-of-bounds memory read in the Windows sensor client that resulted in an invalid page fault. The update caused machines to either enter into a bootloop or boot into recovery mode.
Almost immediately, Windows virtual machines on the Microsoft Azure cloud platform began rebooting and crashing, and at 06:48 UTC, Google Compute Engine also reported the problem. The problem affected systems running Windows 10 and Windows 11 running the CrowdStrike Falcon software. Most personal Windows PCs were unaffected, since CrowdStrike's software was primarily used by organisations. The CrowdStrike software did not provide a way for subscribers to delay the installation of its content files. Computers running macOS and Linux were unaffected, as the problematic content file was only for Windows, but similar problems had affected Linux distributions of CrowdStrike software in April 2024.
CrowdStrike reverted the content update at 05:27 UTC, and devices that booted after the revert were not affected.
At 07:15 UTC, Google stated that the CrowdStrike update was at fault. Within hours, CrowdStrike CEO George Kurtz confirmed that CrowdStrike's faulty kernel configuration file update had caused the problem. At 09:45 UTC, Kurtz confirmed that the fix was deployed and that the problem was not the result of a cyberattack.
The impact to companies in the Central United States was exacerbated by an unrelated outage with Microsoft Azure the previous day. On 18 July, the Azure platform had an outage that blocked some companies' access to their storage and to Microsoft 365 applications in Azure's Central United States region.
Remedy
Affected machines could be restored by rebooting while connected to the network, ideally while connected to Ethernet, thus providing the opportunity to download the reverted channel file, with multiple reboots reportedly required.
If crashes persisted, remediation required booting into safe mode or the Windows Recovery Environment and deleting any .sys file beginning with C-00000291- and with timestamp 04:09 UTC in the %windir%\System32\drivers\CrowdStrike\ directory. As this process needed to be done locally on each individual machine, it was "expected to take days" for affected businesses to restore all systems. Technical staff needed to reboot the affected computers individually with manual intervention on each system.
On devices with Windows' BitLocker disk encryption enabled, which corporations often use to increase security, the problem was exacerbated because the 48-digit numeric Bitlocker recovery keys (unique to each system) required manual input, with additional challenges supplying the recovery keys to end users working remotely. Additionally, several organisations utilising local servers for Bitlocker recovery key storage could not access keys that were stored on servers that themselves had crashed.
Microsoft also recommended restoring a backup from before 18 July to fix the issue.
Impact
Outages were experienced worldwide, reflecting the wide use of Microsoft Windows and CrowdStrike software by global corporations in numerous business sectors. At the time of the incident, CrowdStrike said it had more than 24,000 customers, including nearly 60% of Fortune 500 companies and more than half of the Fortune 1000. On 20 July, Microsoft estimated that 8.5 million devices were affected by the update, which it said was less than one percent of all Windows devices.
Widespread outages were immediately reported across multiple countries, with major global disturbances experienced by the general public. At 04:09 UTC on 19 July, the time when the faulty update was issued, it was the middle of the business day in Oceania and Asia, the early morning hours in Europe, and midnight in much of the Americas.
Some countries were less affected. China, which has striven toward self-sufficiency in IT, saw no impact on its daily services including airlines and banks, although some foreign branch companies and luxury hotels in the country were affected. Russia and Iran—both restricted by international sanctions from using the services of American high-tech companies—reported no disruptions.
Cyber risk quantification company Kovrr calculated that the total cost to the UK economy will likely fall between £1.7 and £2.3 billion ($2.18 and $2.96 billion).
Specialist cloud outage insurance firm Parametrix estimated that the top 500 US companies by revenue, excluding Microsoft, had faced near $5.4bn (£4.1bn) in financial losses because of the outage, but only between $540m (£418m) to $1.08bn (£840m) of those losses would be insured.
CrowdStrike liability
CrowdStrike's own terms and conditions for their Falcon software limit liability to "fees paid", effectively a refund. Larger customers may have negotiated different terms.
In the EU, it is possible that CrowdStrike will be held liable under a GDPR regulation related to the impact of security incidents on user data. The regulation is best known in relation to data leaks but also applies to data destruction. It is unclear whether temporary loss of access to data is enough to trigger liability, or whether GDPR applies to all incidents related to security or only unauthorised access.
Further, the incident could be classed as a "personal data breach" which would be a data breach of the GDPR under Article 4, "Definitions", paragraph 12. On 19 July 2024, a data-protection expert reported a breach of Article 32, "Security of processing".
Air transport
Globally, 5,078 air flights, 4.6% of those scheduled that day, were cancelled. An unrelated Microsoft Azure outage, affecting services such as Microsoft 365, compounded airlines' problems.
Oceania
Australian airlines Qantas, Virgin Australia, and Jetstar were affected. A Sydney Airport spokesperson said that the outage had affected some operations and that "there may be some delays throughout the evening". Melbourne Airport saw check-in procedures disrupted; officials advised passengers to consult with their airlines. The Adelaide, Brisbane, Canberra, Darwin, Hobart, Launceston, and Perth airports were also affected. In New Zealand, Christchurch Airport also had problems.
Asia
Hong Kong International Airport experienced delays during check-in, primarily for passengers of the local budget carrier Hong Kong Express, whose staff members used handwritten signs to direct passengers to check-in counters. The Hong Kong Airport Authority activated an emergency response after airline websites and automatic check-in malfunctioned. The booking systems of local airlines Cathay Pacific, Hong Kong Express, and Hong Kong Airlines were unavailable. HKExpress cancelled some flights on 20 July. Jeju Air and Spring Japan experienced problems. Jetstar Japan cancelled many (mostly domestic) flights. Some of the self-check-in kiosks in Singapore Changi Airport were affected, delaying and forcing airlines to switch to manual check-in, and Singapore Airlines and Scoot reported service difficulties on 19 July. Cebu Pacific and Philippines AirAsia flights were delayed. Long queues formed at Ninoy Aquino International Airport. In Taiwan, airline system disruptions were reported at Taoyuan International Airport. In Indonesia, disruptions were reported for the check-in systems of AirAsia and Citilink. In Thailand, Thai AirAsia's reservation and check-in systems were affected.
Content sourced from Wikipedia under CC BY-SA 4.0