At around 8 AM on Jun 25th, 2019 EST, or when the Elastos main chain reached the height 408476, most of the top 24 active supernodes participating in DPoS consensus became inactive. As a result, they became inactive and the standby supernodes ranked 25-96 took over the consensus mechanism. Even though the root is still being investigated by the core Elastos team, it is believed that perhaps a widespread BGP (Border Gateway Protocol) routing leak affected a number of Internet services and a portion of traffic to Cloudflare.
To understand why this happened in the first place, we should first understand what Cloudfare is and how it fits into all of this. To read more on the outage caused by a Verizon BGP error, read https://www.datacenterdynamics.com/news/bgp-route-leak-causes-cloudflare-outages-aws-issues/
According to the article, the intermittent outage lasted roughly 1hr 42 minutes, briefly taking down services including popular chat Discord, Reddit, Twitch, and others. In addition, Cloud provider Amazon Web Services (AWS) that serves its sites through the Cloudflare CDN also suffered issues.
The BGP error affected internet connectivity in multiple AWS Regions. Most of the Elastos DPoS supernodes participating in blockchain consensus use AWS as their back-end infrastructure to run various services required to validate the different blocks of the Elastos blockchain. Since the outage only affected certain AWS regions and only affected network connectivity for a brief period of time, when the top 24 supernodes could not sign any blocks on time and became inactive, the candidate supernodes stepped in to take over the top 24 active supernode spots.
The error is explained by this quote from the Cloudflare article, “BGP acts as the backbone of the Internet, routing traffic through Internet transit providers and then to services like Cloudflare. There are more than 700k routes across the Internet.”
It is revealed that the BGP route leaks are not that uncommon. “This incident is yet another example of how incredibly easy it is to dramatically alter the service delivery landscape in the Internet. The deeply interconnected nature of the Internet means that a glitch in one part of the infrastructure can very easily have cascading effects on another.”
This event highlights two important points for both Elastos blockchain and the internet in general. First, it is important to note how fragile the internet really is because one small routing error can have such an enormous side-effect on the entire internet as a whole, and thus, cause several sites and servers to be disconnected from the internet entirely.
Secondly, this also highlights the robustness of the Elastos blockchain because as soon as some of the top 24 supernodes stopped responding, the standby supernodes took over and DPoS consensus continued to function well.
On such a peer to peer network, as long as the base criteria is set, the blockchain will keep functioning; AuxPoW miners will keep on producing blocks and DPoS supernodes will keep on validating and signing blocks.
Thus, the more supernodes there are on the Elastos blockchain network, the more robust the blockchain becomes.