Program

Tutorials and Workshop take place in at University of Michigan.

Thursday 15 June

Speaker

Title

Dan Hyde & Steve Simmons, University of Michigan

Distributed Backup And Disaster Recovery for AFS

The umich.edu afs cell is currently 9.4TB and growing at a rate that sees it double in size every 18 months. Backup and disaster recover have become major issues. We are currently implementing a disk-based backup system that should allow the nightly fulls and incrementals to complete in a small number of hours without ever impinging on production time. This is done by distributing backups across a number of systems such that ever server can (if needed) have a dedicated backup host, with all servers backing up in parallel. We expect this implementation to be in production by June 1, 2006.

Dan and Steve have been designing a system for disaster recovery on AFS servers based on using shadow volumes. There are two core parts to this work - tightening up the definition of shadows and their iteraction with the rest of AFS (and doing the code to support that definition), and implementing the hardware and processes necessary to actually build the disaster recovery system.