What HappenedAround 8:30 a.m., Monday, Feb. 26, the university experienced a major outage that affected all systems on campus, as well as authentication (login) to those services hosted in the cloud, like Gmail, Box and Canvas.
Our teams were able to isolate the cause of the outage to our storage layer (more about that in a bit). We immediately involved our vendor, who identified the issue as a bug with the software that manages our storage.
In an effort to get systems up and running quickly, we implemented a short-term fix. So, in the middle of the day Monday, you may have noticed you were able to access Canvas and course rosters and the other things you needed.
That fix didn’t stick, though. The software bug reappeared later that afternoon, at which point we worked with the vendor once again on an upgrade to resolve the problem for good. Once the system was patched, services began to come back online and were restored in full by 9:30 p.m. It was a very long day.
Why This HappenedThe technology you use in your day-to-day lives works in layers. When you check your grades through Banner, you’re using software that delivers data via a server, which uses a storage layer.
On Monday, that storage layer failed. So everything that relies on that storage — including our authentication systems — failed, too.
Of course, we plan for failure, as do our vendors. There were redundancies built into the storage infrastructure in case something went wrong. This time, though, the software bug overwhelmed that process.
What We’re DoingOutages will happen from time to time. Even the biggest companies with the largest staffs and the most comprehensive resources sometimes run into trouble.
Fortunately, they don’t happen around here too often. And each time they do, we work as a team and with our vendors to learn from them and guard against issues in the future.
While our systems are stable once again on campus, our investigation of the bug is still open with our vendor. We want to ensure this particular thing never happens again, and we’ve already identified some steps to take within our infrastructure to lessen the impact of any future problems.
We want to thank our St. Edward’s community for their patience. Mondays are hard enough without a technology “event.” We’ll do better.