M. Ji, E. Felten, R. Wang, and J. P. Singh.
Archipelago: An Island-Based File System For Highly Available
And Scalable Internet Services.
Proc. Fourth USENIX Windows Systems Symposium.
August 2000. Best Student Paper Award.
Maintaining availability in the face of failures is a critical
requirement for Internet services. Existing approaches in
cluster-based data storage rely on redundancy to survive a small
number of failures, but the system becomes entirely unavailable if
more failures occur. We describe an approach that allows a cluster
file server to isolate failures so that the system can continue to
serve most clients. Our approach is complementary to existing
redundancy-based methods: redundancy can mask the first few failures,
and failure isolation can take over and maintain availability for the
majority of clients if more failures occur.
The building blocks of our design are self-contained and load-balanced file servers called islands. The main idea underlying island-based design is the one-island principle: as many operations as possible should involve exactly one island. The one-island principle provides failure isolation because each island can function independently of other islands' failures. It also helps the file system scale with the system and workload sizes because communication and synchronization across islands are reduced. We implemented a prototype island-based file system called Archipelago on a cluster of PCs running Windows NT 4.0 connected by Ethernet. The measurement of micro benchmark shows that Archipelago adds little overhead to NTFS and Win32 RPC performance; while the measurement of operation mixes based on NTFS traces shows a speedup of 15.7 on 16 islands.