Anne Butman [AB] (convener), Judy Gardner [JG], Michael Giarlo
[MG] (recorder), Nick Gonzaga [NG], Dave Hoover [DH]
Agenda:
- Discussion of IBM's StorageTank technology as a digital repository solution
- Presentation of enterprise mass storage solutions by ADIC sales team
Attachment
A quick and dirty schematic I drew of a simple SAN setup [Click to view]
Action Items
- AB will contact IBM and have a representative come discuss ST
(functions, price, configuration options, differences from ADIC's StorNext
software) with the DAWG-I subgroup so we can learn more about the product.
All will further think about and discuss specific storage
needs in regard to ADIC's questions, such as how much of the RUL digital
repository will need to be immediately accessible, and how we expect it to
grow over a 5-year span.
Meeting convened at 2:30pm.
- IBM's StorageTank (ST) technology -- 2:30pm -> 3:30pm
- Is remote, automatic replication going to be possible using ST as
it would with the EMC Centera solution? Based on the ST documentation we
have found thus far, it seems to function as part of a SAN environment. By
its nature, a SAN does not span WAN links; it is more of a LAN-like
technology. ST does not seem to support remote, automatic replication,
according to the resources we have read to this point.
- AB posed question about Snapshot function of ST
software. Documentation is a bit convoluted, and seems to suggest that
multiple copies of files may be kept which would be an administrative
nightmare when filesystem maintenance needs to be performed.
- Unlike the EMC Centera technology, ST enables easy deletion of
files stored within (which would be especially useful when we use the mass
storage solution as backup, so we can intelligently delete old backups
based on whatever backup policies we devise).
- The group discussed at a general, high level the pros and cons of
using EMC versus those of using ST in a SAN environment:
- EMC Centera is proprietary, and a 'black box' to us. SANs are
standardized, set up with well-known protocols, hardware, and network
technologies, and can be upgraded or added to using a wide variety of vendors.
- EMC Centera is a horribly expensive solution. SANs aren't
inexpensive either, but should cost much less than a Centera.
- EMC Centera requires minimal time and effort to set up, since
EMC handles setup and maintenance. SANs are quite complex and take time
and expertise to set up and maintain.
- EMC Centera handles remote, automatic replication natively,
though you pay extra for this software. SANs will use off-site backups to
accomplish the same thing, requiring tape-swapping and paying for an
off-site tape storage service like the one Systems currently uses.
- EMC Centera integration with existing applications and operating
systems requires installation of client and/or potentially writing to the
API. Files on a SAN, however, appear native to the servers attached to
them. (More on this in the ADIC discussion below.)
- There are more, but these are all the ones I have written down
and can remember at this time.
- The group concluded that a digital respository solution running ST
on a SAN -is- most definitely a viable option.
- ADIC presentation and discussion -- 3:30pm -> 5:15pm
- The ADIC solution, simplified, goes like this: we purchase
everything through them, including fibrechannel (FC) switch, FC host bus
adapters (HBAs), SAN administration software ("StorNext"), tape library,
mass disk storage, setup, and SAN training/documentation. That is, they
handle not only tapes or disks, but also setup of SANs. They help us get a
SAN up and running, but we are -not- required to use them in the future for
any SAN upgrades.
- With the ADIC/SAN solution, mass disk storage could be shared among
all servers connected to the SAN, rather than carved up into LUNs and
assigned individually. Additionally, this shared storage can be shared by
Windows, UNIX, Linux, etc. without requiring any special setup. The same
files and folders could be seen from Windows and Linux without any API
writing necessity. (I.e. if sallie is attached to the SAN, DSpace and
Fedora can store their files, and even the applications themselves, on the
mass storage of a SAN without any special configurations!)
- ADIC/SAN solution solves backup needs through the StorNext
software, which makes the SAN appear a single resource to the servers on
the SAN. When a server stores a file to the SAN -- e.g. a user on a
workstation stores a file on a mapped network drive, which is a logical
connection to the SAN via the server which provides said mapped drive -- it
is placed on the mass disk storage so that it may be immediately retrieved
by any clients making requests for it, then an automatic backup to the tape
library is performed, and potentially another backup to tapes designated
for "off-site" which are ejected nightly (for instance). This method is
one of many ways we can use the StorNext software, since it supports robust
policies, allowing us to handle storage and backup how we want when we
want. Under this method, though, nightly backups to tape are unnecessary,
since all files are immediately backed up to tape!
- Additionally, we can use the SAN for standard backups since most
existing backup software will see it with no problem.
- Assuming this configuration and the need to have 5 TB, ADIC
recommends against buying 5TB of disk storage, since 1) it's expensive, 2)
it does not account for a backup solution (e.g. tape), and 3) much of that
5TB may not need to be immediately accessible. For instance, we may need
to store 2TB of original TIFs, but these files are basically stored and
forgotten about. Why store that 2TB on expensive disk? The argument is
usually that tape is too slow, and requires manual swapping of tapes. With
ADIC's tape libraries, however, files can be retrieved off tape in under a
minute and swapping of tapes is controlled by a robotic process much like
in a jukebox. We would ultimately control (via StorNext policies) what
files stay on disk and which live exclusively on tape, and could raise our
disk or tape capacities whenever we needed to, so this isn't a bad
recommendation on ADIC's part.
- In order to discuss price and further configuration options, we
should meet with ADIC again in the near future after discussing the issue
above: namely, how much of the 5-10TB of storage do we want IMMEDIATELY
accessible? How much of it will be accessed by end-users via the web, and
how much will be accessed only by staff members (who presumably can deal
with a 30 second wait every now and again)? Once we can answer these
questions, we can get a price quote from ADIC and a better understanding of
what their solution will be in our environment.
Meeting adjourned at 5:15pm.