Introduction

replication technique used in the HARP filesystem
- HARP (Highly Available Reliable Persistent) filesystem
high probability, data in files is not lost
modifications to files recorded at several nodes atomically
primary-copy replication technique:
- client writes to single server, doesn’t respond to client until all copies to other nodes are successful
records affects of modifications to a log, applied to FS in the background
volatile log + UPS prevents sudden failures?
- what about other hardware faults?
implemented in the VFS interface

primary-copy replication technique
witness schemes
Harp ensures consistency unlike Coda/Locus
HA-NFS technique for reliability:
- write to shared disk which can be read by backup system

Implementation

all requests make changes at primary first, then replicate
modifications (writes) need a 2-phase protocol
- first inform backups about the operations
- return to client, then let backups proceed to commit in background
- changes not actually applied to the FS until committed to the log
failure/failure recovery a failover protocol gets run: view change
- essentially a new primary is selected, clients must be redirected to the new primary
store n+1 copies to ensure availability
organize system into groups so each node is the primary of one group, backup of another, and witness of a 3rd group
normal case processing
- primary log stored in volatile memory
- distinguish between records which are committed/not committed with a commit point (CP); the index of latest committed op.
- return by committing a new CP then giving results
- maintain a lower bound pointer (LB) to find out which commits have actually been persisted to disk
- maintain global lower bound, GLB across all primary+backups
upon failure, log is a “redo” log

correctness: operations must appear in order
reliability effects of committed operations must survive all single failures
availability: must provide service whenever any two group members are up
timeliness: failovers must be achieved in a timely manner
views number to determine which view the group has.
designated witness will act as a backup where a primary or backup is missing
witness takes part in processing file operations when promoted
- witness does not have copy of FS, so many not apply committed ops
- never discards entries from log
witness promotion –> receive all log records not yet on disk of primary and backups
bring other node up to date by reading witness log (as long as there was no disk failure)
state of new view reflects all committed operations from previous views
every committed op is at two servers
view promoted witness lasts a long time – can prune unneeded entries
node starting the view change is the coordinator
- coordinator communicates with other members to form a view view
- once all nodes agree, move to phase 2
  - write new view number to disk and informs all nodes of the new view
- on system crash, some volatile memory stays in order to recover lost FS state
client switching:
- ip multicast
- insert changes to nfs client code, switch to new node when view changes