Why
pay millions for new big-iron storage and related networking software
when you've already got it? From individual power users to enterprises,
today's wasted disk space can be harnessed to tomorrow's dynamically
scaling storage networks, retiring the appliance and monolithic
approaches for all but high-end and specialty needs, researchers say.
Topologies
for such a network might include a decentralized file sharing program
that is more akin to the Nullsoft Inc.-founded Gnutella network than to
Napster Inc.'s original peer-to-peer file sharing service. They also
might look like an in-house version of the more-serious-sounding grid-computing notion.
Whatever the look and feel, the possibilities of P2P storage are
attracting the interest of everyone from government-funded university
researchers to startups such as PeerStor Inc. to corporate giants such
as Hewlett-Packard Co., EMC Corp., IBM and Microsoft Corp.
"For certain direct-attached architectures, I could leverage
underutilized storage assets on other servers without having to develop
a SAN [storage area network]" that was initially used for replication
and backup, said analyst Tony Prigmore, of Enterprise Storage Group
Inc., in Milford, Mass. "There's no reason to believe that for certain
small-to-[midsize] enterprises, peer storage couldn't be successful."
The concept begins in laboratories such as Randolph Wang's, a
computer science professor at Princeton University. Wang's graduate
students are working on a project called PersonalRAID to ensure data
availability for mobile professionals. It's an alternative approach to
just carrying all data locally; instead of worrying about constant
synchronizing and acquiring the latest storage gadgets such as IBM's
Microdrive, such storage could be automatically embedded with Internet
connections or via Bluetooth or 802.11 wireless transmissions, said
Wang, in Princeton, N.J. When a device needs data, it asks the user's
PersonalRAID to find it, first checking the local storage, then nearby
devices via Bluetooth, the local network with 802.11 and finally
wide-area resources through the Internet. The query method replaces a
central data table, he said.
"We basically have a pretty real prototype, [but] at this stage, we
don't have any relationship with any company. We just hacked it up on
our own," Wang said. The patent-pending technology is years from
commercialization. The next step is to make it transparent. "From the
user's perspective, it's a single-disk drive," Wang said.
A more imminent application of P2P storage is for midsize companies.
PeerStor is a New York startup taking the ad hoc, miniature-SAN
approach. "What our product allows a user to do is install the software
and set up the drive location anywhere on the WAN or LAN to ... perform
automatic failover in real time with open files," said Joe Pennino, CEO
of PeerStor. Only the data changes will be mirrored, but to multiple
locations, he said.
The software will cost about $150 per node and will work with a
company's networked storage and unused server space, Pennino said. The
technology was first developed in DOS 10 years ago but was never made
into a product. Now, "we'd like to have this out in the next month or
two," he said.
It will scale to "anybody but the Fortune 1000," Pennino said.
Large-enterprise implementation of P2P storage is the furthest away,
as stated in various papers presented at the FAST Conference on File
and Storage Technologies, held earlier this year, and at last month's
related Usenix Technical Conference,
both in Monterey, Calif. A project called Cooperative Backup System, or
CBS, is similar to PeerStor's approach, but it's Internet-based. CBS
uses a technology called Reed-Solomon erasure codes, invented at the
Massachusetts Institute of Technology's Lincoln Laboratory in 1960, to
rebuild data from its parts, even if some parts are destroyed or
missing. Because it works best across multiple sites, CBS is most
cost-effective for individual users and for very large enterprises,
said lead researcher Mark Lillibridge, of HP's Systems Research Center,
in Palo Alto, Calif.
While the CBS project lacks speed—it could take two weeks to restore
data from it, Lillibridge said—an area it excels at is security. "You
challenge your partners occasionally to make sure they still have your
data. If they don't, then you drop their data in retaliation," after a
short buffer period in case the partner machine has crashed, he said.
It's unclear when CBS could become a product, though it's been a
prototype since the summer of 2000. "I haven't talked to the HP product
folks, so I don't know what the story is now. ... It might become an
open-source project," Lillibridge said. "It's been on hold due to
various chaos with the merger [of HP and Compaq Computer Corp.]," he
said.
With related file system research being funded by companies such as
IBM and Microsoft, P2P storage, especially for backup and recovery
purposes, could become a significant technology before the decade's
out, said Peter Christy, an analyst and co-founder of NetsEdge Research
Group, in Los Altos, Calif. "The Internet makes data a lot more
valuable because you can get to it anywhere, any time," Christy said.
"Will people be using random disk drives all over their company for the
storage of important business assets? At the moment I think the answer
is no, [but] there's a lot of room for creativity for making that
happen."
Related Stories:
Interview: Brown Bets on Diverse Offerings
Scaling Toward the Petabyte
Softek Lays Out Plan for Storage Space
|