In the banner roblox has said it should slowly come back, roblox will be a bit sluggish for the first couple hours/days, but then it will regain it’s strength again!
For those wondering, here is an official statement on what happened.
In a short summary this was what caused the issues:
The outage lasted 73 hours.
The root cause was due to two issues. Enabling a relatively new streaming feature on Consul under unusually high read and write load led to excessive contention and poor performance. In addition, our particular load conditions triggered a pathological performance issue in BoltDB. The open source BoltDB system is used within Consul to manage write-ahead-logs for leader election and data replication.
A single Consul cluster supporting multiple workloads exacerbated the impact of these issues.
Challenges in diagnosing these two primarily unrelated issues buried deep in the Consul implementation were largely responsible for the extended downtime.
Critical monitoring systems that would have provided better visibility into the cause of the outage relied on affected systems, such as Consul. This combination severely hampered the triage process.
We were thoughtful and careful in our approach to bringing Roblox up from an extended fully-down state, which also took notable time.
We have accelerated engineering efforts to improve our monitoring, remove circular dependencies in our observability stack, as well as accelerating our bootstrapping process.
We are working to move to multiple availability zones and data canters.
We are remediating the issues in Consul that were the root cause of this event.