Next: Eager-Writing Disk Arrays
Up: Configuring and Scheduling an
Previous: Configuring and Scheduling an
Introduction
Transaction processing applications such as those exemplified by
TPC-C [27] tend to pose more difficult challenges to storage
systems than office workloads. These
applications exhibit little locality or sequentiality;
a large percentage of the I/O requests are writes,
many of which are synchronous; and there may be little idle time.
Traditional techniques
that work well for office workloads tend to be
less effective for these transaction processing applications. Memory
caching provides little relief in the presence of poor locality and a
small read-to-write ratio. As disk areal density improves at
60-100% annually [9], as memory density improves at
only 40% per year, and as the amount of data in transaction
processing systems continues to grow, one can expect little
improvement in cache hit rates in the near future. Delayed write
techniques become less applicable in the presence of a large number of
synchronous writes that must satisfy strict reliability
requirements, requirements that are sometimes not met by even
expensive NVRAM-based solutions [13]. Even when it is possible
to buffer delayed writes in faster storage levels, such as an NVRAM,
the poor write locality implies that there are very few overwrites before
the buffered data reaches disks.
Furthermore, high throughput requirements in conjunction with
scarce idle time make it difficult to schedule background activities,
such as de-staging from NVRAM [13,20], garbage
collection in log-structured
solutions [13,21,22,24], and data
relocation [19], without impacting foreground I/O activities.
The net effect of these challenges is that transaction
processing applications tend to be more closely limited by disk
latency, a performance characteristic that has seen an annual
improvement of only about 10% [9].
Although the traditional caching and asynchronous I/O techniques
have not been very successful, a number of other techniques have proven
promising. One is mirroring: a mirrored system can improve read
latency by sending a read request to the disk whose head is closest to a
target replica [2,5], and it can improve throughput
by intelligently scheduling the requests in a load-balanced manner.
Mirroring, unfortunately, is not without its challenges. Chief among
them is the cost of replica propagation--each write request is turned
into multiple physical writes that compete for I/O bandwidth with
normal I/O operations. High update rates, the lack of idle time for
masking replica propagation, and poor locality
only make matters worse.
While mirroring is more effective for improving a read-dominant
workload, a technique called eager-writing is more effective for
improving a write-dominant workload. Eager-writing refers to the
technique of allocating a free block that is closest to the current
disk head position to satisfy a write
request [4,6,28]. Under the right conditions,
by eliminating almost all of seek and rotational delay, eager-writing
can deliver very fast write performance without compromising reliability
guarantees, even for workloads that comprise of synchronous I/Os and
have poor locality. What eager-writing does not address,
however, is read performance.
Since data replication in a mirrored system improves read performance,
and since eager-writing improves write performance, reduces the cost of
replica propagation, and ensures a high degree of data reliability, it
is only natural to integrate these two techniques so that we may
harvest the best of both worlds. We call the result of this
integration an eager-writing array or an EW-Array: in the
simplest form, an EW-Array is just a mirrored system that supports
eager-writing.
This integration, however, is not without its own tension. In order
to achieve good write performance under eager-writing, one must
reserve enough disk space to ensure that an empty block can be
located close to the current disk head position. At the same time,
to achieve good read performance under mirroring,
the system needs to devote disk space to store a sufficient number
of replicas so that it can choose a conveniently located replica to read.
Given a fixed budget of disks, one
must resolve this tension by carefully balancing the number of disks
devoted to each of these two dimensions. To further complicate the
matter, striping can improve both read and write performance, so one
must also consider this third dimension of the number of disks devoted
to striping. Although configuring a storage system based on the
number of disk heads instead of capacity for TPC-C-like workloads is a
common practice, and some previous studies such as the ``Doubly
Distorted Mirror'' have incorporated ``write-anywhere'' elements in a
disk array [19], what is not well understood is how to
balance the number of disks devoted to each one of the mirroring,
eager-writing, and striping dimensions to get the most out of a given
number of disks.
While properly configuring an EW-Array along these three
dimensions presents one challenge, request scheduling on such a
disk array presents another. In the request queue of a traditional
update-in-place storage system, the locations of all the queued
requests are known. Although the scheduler can sometimes choose among
several mirrored replicas to satisfy a request, the degree of freedom
is limited. This is no longer the case for an
EW-Array: while the locations of the read requests are known, the
scheduler has the freedom of choosing any combination
of free blocks to satisfy
the write requests. Although disk scheduling is a well-studied
problem in conventional systems, what is not well understood is how a
good scheduler can exploit this large degree of freedom to optimize
throughput.
The main contributions of this paper are:
-
- a disk array design that integrates eager-writing with mirroring
in a balanced configuration to provide the best read and write
performance for a transaction processing workload,
-
- a number of disk array scheduling algorithms that can
effectively exploit the flexibility afforded by the
location-independent nature of eager writing, and
-
- evaluation of a number of alternative strategies that share the
common goal of improving performance by
introducing extra disk capacity.
We have designed and implemented a prototype EW-Array.
Our experimental results demonstrate that the EW-Array can
significantly outperform conventional systems.
For example, under the TPC-C workload, a properly
configured EW-Array delivers 1.4 to 1.6 times
lower latency than that achieved on highly optimized striping and
mirroring systems. The same EW-Array achieves approximately 2
times better sustainable throughput.
The remainder of this paper is organized as
follows. Section 2 motivates the integration of
eager-writing with mirroring in an EW-Array.
Section 3 explores different EW-Array
configurations as we change the way the extra disk space is
distributed. Section 4 analyzes a number of new disk
scheduling algorithms that exploit the location independent nature of
eager-writing. Section 5 describes the integrated
simulator and prototype EW-Array.
The experimental results of Section 6 evaluate a wide
range of disk array configuration alternatives.
Section 7 describes some of the related work.
Section 8 concludes.
Next: Eager-Writing Disk Arrays
Up: Configuring and Scheduling an
Previous: Configuring and Scheduling an
Chi Zhang
2001-11-16