Advancing failure prediction and mitigation—introducing Narya
“This post continues our Advancing Reliability series highlighting initiatives underway to constantly improve the reliability of the Azure platform. In 2018 we shared steps we’re taking to improve virtual machine (VM) resiliency using live migration. In 2019 we shared how we’re further improving virtual machine resiliency with Project Tardigrade, which identifies host failures and recovers from…