Resiliency Is Key to Surviving a CDN OutageAkamai Incident Highlights Risks of Relying on a Single Provider
A short-lived outage at the content delivery network supplier Akamai on Thursday, which briefly knocked offline many corporate websites, is another indicator that companies need resiliency built into their systems. That means they should avoid relying on just one CDN provider, security experts say.
Among the websites briefly knocked offline midday Thursday when Akamai had a major outage were those of Delta Airlines, Amazon Web Services and AT&T.
Akamai said its roll out of a new software configuration for its Edge DNS service triggered a bug in the DNS system, which caused a disruption impacting the availability of some customers' websites.
The disruption lasted up to an hour. Once the company rolled back the software configuration update, the services resumed normal operations, Akamai says.
"While service providers like Akamai and Cloudflare have built and deployed more resilient and secure versions of DNS, it is nearly impossible to completely prevent software bugs, operational mishaps or an unanticipated cascade or hardware failures from temporarily taking down such services," says Oliver Tavakoli, CTO at security firm Vectra.
To minimize the impact of a vendor's outage, organizations must take steps to ensure resiliency, cybersecurity pros say (see: Not So Fastly: Global Outage Highlights Cloud Challenges).
"There is an art to building your application in a way that is resilient to any one outage or even any two outages," Tavakoli says. "Many services are delivered from a single AWS availability zone because that is less complicated - but the inability to automatically switch to another AZ when your primary one fails exposes you to more frequent service delivery disruptions."
Because a single point of dependency equates to a single point of weakness, companies must endeavor to spread out the potential danger as much as possible, security experts say.
"If you rely on a single organization to provide a service that's integral to your platform and there's an issue, you're going to have downtime; there's no getting around that," says Shawn Smith, director of infrastructure at the application security firm nVisium.
In March, the National Security Agency and U.S. Cybersecurity and Infrastructure Security Agency released guidance on choosing and deploying a Protective Domain Name System service to strengthen security.
In June, Akamai experienced an outage for one of its Prolexic DDoS services that impacted about 500 customers.
On Aug. 30, 2020, the network and cloud solutions firm CenturyLink was offline for part of the day, which in turn led to the websites of Cloudflare, Discord, Feedly, Hulu, PlayStation Network, Xbox Live and others going down.
An October 2016 DDoS attack on the domain name resolver Dyn Inc. knocked the websites Amazon, PayPal, Spotify, Twitter and others offline.