Disaster recovery exercise … with an early Saturday morning ‘restart’

Disaster recovery exercise … with an early Saturday morning ‘restart’

Summary

Originally published on Adur & Worthing Councils — Our Stories, Your Councils, 25th June 2018 It’s been another busy (and longer than usual) week for the Digital team. Last Saturday we carried out a planned disaster recovery exercise on our IT systems. It was an early 7:30am start with us assembl...

Originally published on Adur & Worthing Councils — Our Stories, Your Councils, 25th June 2018

It’s been another busy (and longer than usual) week for the Digital team. Last Saturday we carried out a planned disaster recovery exercise on our IT systems. It was an early 7:30am start with us assembled in the lobby of Worthing Town Hall. Who decided it was a good idea to start that early on a Saturday morning?

Even with the seriousness of the exercise, working on the weekend with the team had a slightly more relaxed feel. Recently our team has merged with IT Operations. It was a great opportunity for me to learn more about how they work; see more of the ‘behind the scenes’ hardware (yes, I’m a geek!) and have a more sociable experience with the team. Getting six large pizzas delivered to the Town Hall to fuel us through the day was a nice treat.

Press enter or click to view image in full size
Photo: Inside one of our server rooms

The test involved turning the mains power off to our data centre and running on battery backup while over 200 systems were shutdown. Then it involved moving to generator power, restarting all the systems, before moving back to mains power again. Finally, we tested all the systems had come back online and were correctly configured ready for Monday morning. No mean feat considering that some of these systems are rarely shut down, and a huge responsibility falling on the shoulders of some of our team.

Photo: Power countdown on a battery backup
Photo: Power countdown on a battery backup

Running these type of exercises is a great way to identify weakness and build resilience in our IT infrastructure. Resilience is something we are working really hard to improve and one of our current projects involves moving a large percentage of our systems to ‘the cloud’ using Amazon Web Services. We have already set up the framework in the cloud and will be moving systems over in the coming months.

Earlier this year I was able to really expand my knowledge after completing three days training on Amazon Web Services. One of my first tasks will be to setup our main public website across several servers. This means that should one server fail there will be no loss of service to our customers.

If you’re not a geek like me, then you’ll be pleased to hear that next week’s blog will feature much less tech talk! I’ll be writing about how we motivate the team with a great mixture of work events like our recent Hackathon day and more sociable events like payday drinks and murder mystery evenings.