Posted on

How to achieve 1 GByte/sec I/O throughput with commodity IDE disks

How to achieve 1 GByte/sec I/O throughput with commodity IDE disks

Jens Mache, Joshua Bower-Cooley, Jason Guchereau, Paul Thomas, and Matthew
Wilkinson Lewis & Clark College Portland, OR 97219

The Problem

In order to compete with custom-made systems, PC clusters have to provide not
only fast computation and communication, but also high-performance disk access.
I/O performance can play a critical role in the completion times of many
applications that transfer large amounts of data to and from secondary storage,
for example simulations, computer graphics, file serving, data mining or

An I/O throughput of 1 GByte/sec was first achieved on ASCI Red with I/O
hardware costing over one million dollars. We set out to achieve similar I/O
performance on our PC cluster by harnessing the power of commodity IDE disks on
remote nodes.

The Approach

We set out to achieve an I/O throughput of 1 GByte/sec on a PC cluster that (1)
has as few as 32 nodes and (2) uses less than ten thousand dollars worth of I/O
hardware. In order to reach this goal, each node must be able to access data at
a rate of at least 32 MBytes/sec.

The novelty of our approach is (A) on each node to use two commodity IDE disks
(not SCSI disks) in a software RAID configuration and (B) to configure the
parallel file system such that each nodes acts as both I/O node and compute

In our first experiment, we measured the local read and write performance of our
two IDE drives (IBM 20GB ATA100 7200rpm costing $112 each), configured as a
software RAID 0. Using the Bonnie disk benchmark, we measured up to 68.23

In our second experiment, we measured the performance of a concurrent read/
write test program that sits on top of PVFS, an open-source parallel file
system. Parallel file systems allow transparent access to disks on remote nodes.
We configured each machine as both an I/O and a compute node to best make use of
our limited number of nodes. Using MPI and the native PVFS API, I/O throughputs
were well above 1 GByte/sec. We achieved up to 2007.199 MBytes/sec read
throughput and 1698.896 MBytes/sec write throughput (with appropriate file view
and stripe size such that most disk accesses were local).

In additional experiments, we measured the I/O performance of a ray tracing
application and studied how I/O performance is sensitive to configuration and
programming choices.

Our conclusions are as follows:

· High-performance I/O is now possible on PC clusters with commodity IDE disks.

· Compared to ASCI Red, price/performance for I/O improved by over a factor of
100. (To achieve 1 GByte/sec, we used 64 IDE drives costing $112 each and the
ASCI Red had 18 SYMBIOS RAIDs costing $60,000 each.)

· In contrast to the ASCI Red, I/O nodes in our cluster have a higher throughput
than the interconnect. (Using ttcp, we measured 38 to 46 MBytes/sec network
throughput for our copper Gigabit Ethernet Foundry switch and Intel cards. ASCI
Red’s SYMBIOS RAID can write data at 70 MBytes/sec, while the custom-made
network can transfer data at 380 MBytes/sec.)

Impact, Importance, Interest, Audience

Interest in cluster computing is at an all time high. While there is no I/O
category in the top500 ranking (nor for SC awards) yet, I/O performance is
getting more and more attention (“the I/O bottleneck”).

The impact of our work is

(A) showing how commodity IDE disks on remote nodes can be harnessed,

(B) reporting of I/O performance sensitivities.

(C) reporting on extremely good price/performance (factor of 100 better)

Thus, parallel I/O now seems affordable, even for small businesses and colleges.

Our sensitivity results are highly valuable

(1) to give performance recommendations for application development,

(2) as a guide to I/O benchmarking (which will play an important role in
compiling the new “clusters @ top500″ ranking), and

(3) as a guide to further improvement of parallel file systems.

Visual Presentation

First, we’ll have a traditional color poster display (32″x40”), describing the
problem, our approach (IDE disks in RAID configuration, PVFS with overlapped
nodes), our experiments (graphs and tables) and our conclusions.

Second, we plan to show the performance of application and benchmark runs “on
demand”. (It only takes a laptop and an Internet connection for us to start
programs on our cluster from Denver and get the performance results back.)

2009-06-25 17:17

Leave a Reply

Your email address will not be published.