Grades & Progress export failure

Incident Report for Xen Education

Postmortem

What happened?

Our DEVOPS engineers performed an upgrade on the AWS services managing this job on April 12.

While updating the VPC network settings, the latest code was overwritten with an older commit. This older code was not fetching the latest grades.

On Saturday 20 April, a Senior engineer identified the root cause and re-deployed the correct code. On Monday 22 April, the impacted customer confirmed that the latest data were being ingested again.

Did the monitoring fail?

No. The build was working and pushing the data to the customers systems and the files were well-formed and not corrupted.
Sine the issue was with the quality of the data itself and not with the lack of data generated, all the checked passed the monitoring, as expected.

How to prevent this from happening again?

New steps for QA to check the code in staging as part of the pipeline deployment were added, until a fully automated check can be finalised..

Are there additional learnings?

In addition to the implemented actions, we identified a break in the reporting process, whereas the remote recipient system was out of sync since April 11 but it was not reported to Xen until April 18, in the middle of the night. An earlier escalation would have drastically mitigated the incident’s duration, especially if raised during business hours in Victoria, Australia.

Posted Jun 25, 2024 - 18:31 AEST

Resolved

Latest Grades are not included in the export of Grades and Progress files.
Posted Apr 20, 2024 - 00:00 AEST