3.15 AM Wednesday, 30 October 2024

Advanced

Global IT Outage: The Full Story

Published Friday, July 19, 2024

By E247

A global IT outage affecting Microsoft systems caused significant disruptions worldwide early this morning. The incident led to flight cancellations, bank outages, and media broadcast interruptions on Friday, impacting businesses and services across the globe.

Initially, the exact cause, nature, and scope of the outage were unclear, as Microsoft's posts on X indicated an improving situation. However, service interruptions continued globally.

Several hours later, George Kurtz, CEO of cybersecurity firm CrowdStrike, revealed that the issue had been "identified." He disclosed that a vulnerability was found in a content update for Windows users, stressing that it was not a security incident or a cyberattack.

What Happened?

The global technological outage halted flights, interrupted banking services, and stopped media broadcasts on Friday, highlighting the heavy reliance on software from a few service providers.

The issue affected Microsoft 365 applications and services, with disruptions persisting even after the tech company announced gradual repairs.

Down Detector, a site tracking user-reported internet outages, noted increased service disruptions in companies like Visa, ADT Security, Amazon, and airlines including Delta and American Airlines.

Australian media reported that airlines, telecom providers, banks, and media outlets were affected due to loss of access to computer systems. Airlines in the UK, Europe, and India also reported issues, with some New Zealand banks going offline.

Microsoft 365 posted on X that the company was "redirecting affected traffic to alternative systems to mitigate impact more suitably" and noted a "positive trend in service availability." The company did not respond to requests for comment and did not clarify the cause of the outage.

New Zealand's Acting Prime Minister, David Seymour, stated on X that officials were "moving swiftly to understand the potential impacts" of the global issue, noting that there were no indications of malicious cyber activity and that the problem was causing "inconvenience" to the public and businesses.

In the US, the Federal Aviation Administration reported that United, American, Delta, and Allegiant airlines were all affected. Passengers at Los Angeles International Airport slept on the tarmac using backpacks and other luggage as pillows due to delayed United flights to Dallas International Airport early Friday.

Airlines, railways, and television stations in the UK were disrupted by computer issues, affecting low-cost airline Ryanair, train operators TransPennine Express and Govia Thameslink Railway, and Sky News. Ryanair advised passengers to arrive at the airport at least three hours before their scheduled departure due to a network-wide outage caused by a third-party global IT issue.

Edinburgh Airport reported longer-than-usual wait times due to the outage. Stansted Airport in London stated that some airline check-in services were being completed manually, but flights continued to operate.

In Australia, widespread issues were reported, with long lines and some passengers stranded due to disrupted online check-in and self-service kiosk functions. Passengers in Melbourne waited over an hour to check in, despite flights operating.

India's airline operations were also disrupted, affecting thousands. Private airline IndiGo informed passengers on X that the Microsoft outage affected airline operations in India, causing inconvenience to thousands of travelers. Many airlines issued statements on X stating they were using manual check-in and boarding procedures and warned of delays due to technical issues.

Hong Kong Airport Authority reported that some airlines were affected at the city's airport, leading to manual check-in. Amsterdam's Schiphol Airport noted that the outage had a "significant impact on flights" to and from the busy European hub, occurring at the start of the summer holiday season for many people.

In Germany, Berlin Airport announced delays in check-in due to a technical fault, with flights suspended until 10 a.m. local time. Rome's Leonardo da Vinci Airport experienced delays for some US-bound flights, while others were unaffected.

Australia was particularly hard-hit, with reported outages affecting NAB, Commonwealth, Bendigo banks, airlines Qantas and Virgin Australia, and internet and phone providers like Telstra.

Hospitals in the UK and Germany also reported issues. Several NHS trusts in England said the outage affected their clinical computer systems containing medical records, hindering appointment scheduling and information access. In northern Germany, Schleswig-Holstein University Hospital, with branches in Kiel and Lübeck, canceled all elective surgeries scheduled for Friday, though patient care and emergencies were unaffected.

Australian media outlets, including ABC and Sky News, were unable to broadcast on their TV and radio channels, reporting sudden shutdowns of their Windows-based computers. Some broadcasters streamed live from darkened offices, with computers displaying "blue screen of death."

In South Africa, at least one major bank reported "nationwide service outages," with customers unable to make payments using bank cards at grocery stores and gas stations. New Zealand banks ASB and Kiwi Bank announced service disruptions.

A user on X posted a screenshot of an alert from CrowdStrike indicating the company was aware of "outage reports on Windows servers related to its Falcon Sensor platform."

Identifying the Problem

CrowdStrike's CEO announced that the issue causing the IT outage, which crippled many companies worldwide on Friday, had been "identified" and was being "corrected."

George Kurtz posted on X and LinkedIn that CrowdStrike was actively working with affected customers to address a vulnerability found in a content update for Windows users, confirming it was not a security incident or cyberattack. He added, "The issue has been identified, isolated, and a fix has been deployed."

About CrowdStrike

CrowdStrike is a US-based cybersecurity technology company headquartered in Austin, Texas, with an estimated value of £65 billion. A technical issue related to the company caused the global disruption of Microsoft systems.

CrowdStrike helps companies manage their security in IT environments, protecting against data breaches, ransomware, and cyberattacks. Its major clients include global investment banks, universities, and the Australian betting agency TAB Corp.

The cybersecurity landscape has rapidly evolved recently, with an increasing presence of threat actors targeting major companies like Ticketmaster, Medibank, and Optus. As a result, more businesses are turning to firms like CrowdStrike to protect their customer information.

What is CrowdStrike Used For?

One of the company's key products is CrowdStrike Falcon, described on its website as providing "real-time attack indicators, high-precision detection, and automated protection" from potential cybersecurity threats.

Thousands of companies worldwide use CrowdStrike Falcon to safeguard their data, with Friday's server outage believed to have caused the global disruption of Microsoft products. Earlier this week, CrowdStrike announced an update to its Falcon product, promising "unprecedented speed and precision" in detecting security breaches.

A CrowdStrike spokesperson said in a statement posted on its website following the outage that a potential issue with the Falcon product likely caused the incident.

Who Owns CrowdStrike?

Founded by former McAfee employee George Kurtz in 2012, CrowdStrike's ownership structure comprises a mix of individual, institutional, and retail investors. Institutional investors hold about 40% of the company's shares, while public companies and individual investors own roughly 57%.

The largest shareholder is Vanguard Group, an American investment fund, with approximately a 6.79% stake in the company.

To prevent similar global IT outages in the future, several proactive steps can be implemented. These steps focus on improving system resilience, enhancing security measures, and ensuring rapid response capabilities. Here are some key recommendations:

1. Enhanced Monitoring and Incident Response

24/7 Monitoring: Implement continuous monitoring of systems and networks to detect issues early.
Incident Response Plans: Develop and regularly update comprehensive incident response plans, including clear roles, communication strategies, and procedures for different types of incidents.
Automated Alerts: Use automated systems to alert IT teams of anomalies or potential issues immediately.

2. Regular Updates and Patch Management

Timely Updates: Ensure all systems and software are regularly updated with the latest patches and updates.
Automated Patch Management: Use automated patch management tools to deploy patches quickly across all systems.

3. Redundancy and Failover Mechanisms

Redundant Systems: Implement redundant systems and components to ensure continuity in case of failure.
Failover Mechanisms: Establish failover mechanisms that automatically switch to backup systems when primary systems fail.

4. Comprehensive Backup Strategies

Regular Backups: Conduct regular backups of critical data and systems.
Offsite Storage: Store backups in multiple locations, including offsite and cloud-based solutions, to protect against localized disasters.

5. Security Enhancements

Vulnerability Management: Continuously scan for and remediate vulnerabilities in systems and software.
Security Training: Provide regular training for employees on cybersecurity best practices and emerging threats.
Multi-Factor Authentication (MFA): Implement MFA to add an extra layer of security to critical systems.

6. Testing and Simulation

Regular Testing: Conduct regular testing of backup systems, failover mechanisms, and incident response plans to ensure they work as intended.
Disaster Recovery Drills: Perform disaster recovery drills and simulations to prepare for real-world scenarios and improve response times.

7. Collaboration and Information Sharing

Industry Collaboration: Collaborate with other organizations, industry groups, and government agencies to share information about threats and best practices.
Threat Intelligence: Use threat intelligence services to stay informed about new and emerging threats.

8. Third-Party Vendor Management

Vendor Assessment: Regularly assess the security practices and resilience of third-party vendors.
Service Level Agreements (SLAs): Ensure SLAs with vendors include provisions for uptime, security, and incident response.

9. Regulatory Compliance

Compliance Audits: Regularly audit systems and practices to ensure compliance with relevant regulations and standards.
Adherence to Best Practices: Follow industry best practices and guidelines for cybersecurity and IT resilience.

10. Public Communication Strategy

Transparent Communication: Develop a communication strategy for keeping stakeholders informed during outages, including status updates and expected resolution times.
Customer Support: Provide robust customer support to address concerns and provide assistance during disruptions.

By implementing these steps, organizations can enhance their resilience against IT outages, minimize disruption, and ensure quicker recovery in case of incidents.

Follow Emirates 24|7 on Google News.

The page was last updated on: 19 July 2024 17:17