Interview with Grace Agnew about NSF grant-supported Virtual Data Collaboratory
Rutgers University Libraries is a key part of a team that won a $4 million grant to establish a regional data-sharing network called the Virtual Data Collaboratory. This is a huge grant that involves other departments at Rutgers University, as well as several regional university partners. We shared a press release about this initiative on our website in October, but I recently sat down with Grace Agnew who is coordinating the Libraries’ participation, to get a better sense of what it means for the Libraries and for Rutgers.
Jessica Pellien: You are part of a team that has won a multimillion dollar grant from the National Science Foundation. What is the grant for?
Grace Agnew: The grant will build an infrastructure where research data created at Rutgers and other collaborating universities can be stored, discovered, and reused. Rutgers is among the nation's top 20 public universities in terms of obtaining research grants and number 7 among Big Ten universities, yet the university lacks a cohesive strategy for efficiently managing research data. Research data often ends up silo-ed in individual departments where it is not easily discovered and reused. Also, because we do not have a shared infrastructure that can be easily repurposed, financial and personnel resources that could be dedicated to the research itself are instead expended on duplicating infrastructure that exists in silos around Rutgers. A large scale research data infrastructure is critical for Rutgers to continue to advance as a research institution, which is part of the university's three-fold mission.
JP: This grant involves many units at Rutgers and other regional universities. What role will Rutgers University Libraries play?
GA: The Libraries are uniquely positioned because we engage with and support Rutgers users across the spectrum, from incoming first year students to faculty members engaged in groundbreaking research. What we bring to the table is understanding and representing user needs. We are tasked with designing the data services layer which is the user-facing part of the project. Our design encompasses adding, discovering, and reusing data. We took a unique approach to ensuring the discoverability and reuse of data by designing an interface that links data with the person who created it, the tools used to analyze it, and the intermediate research products--analyses, reports, etc.--that are created around the data before the peer-reviewed publications begin. In other words, we designed a strategy that not only supports the workflow of the researcher but helps other researchers, perhaps in other disciplines, understand the context of the data and how it is used, as part of the discovery process. We will work with the lead department, Rutgers Discovery Informatics Institute (RDI2) to implement the data services layer according to our design. In addition to myself, Ron Jantz is helping to design the architecture for the data services layer and Ryan Womack will be working closely with the two use cases, the Protein Data Bank with Helen Berman, Center for Integrative Proteomics Research at Rutgers and Vasant Honavar of Penn State and with Jie Gong. Civil and Environmental Engineering at Rutgers, to ensure that the design of the data services layer meets their research and workflow needs. Other librarians involved in the data services design are Karen Estlund at Penn State and Joe Lucia at Temple University.
JP: So, what is the Virtual Data Collaboratory?
GA: The Virtual Data Collaboratory is intended, ultimately, as a "one stop shop" for the storage, discovery and reuse of data. It is immediately collaborative because we are building parallel facilities at Rutgers and Penn State. Other participating universities in Pennsylvania, include Drexel and Temple. The VDC will ultimately be available to other universities in both states through the Internet2 high speed networking facilities, KINBER in Pennsylvania and NJEdge in New Jersey. The term collaboratory references both the universities involved in the design, as well as the opportunities for collaboration that the data services layer will promote. The VDC is also designed to bridge to existing collaboratories, such as the Protein Data Bank, so much of the data in the VDC will be "virtual" because they exist in other collaboratories but are accessible via the VDC.
JP: There are existing places to store data. What will distinguish our effort from others?
GA: Other universities have collaboratories. We believe the VDC has a unique focus on both robust storage and preservation of data and a user focus on multidisciplinary discovery and reuse of data. Also, the existing places are largely single university initiatives or single discipline initiatives. They are very well designed and very supportive of their users, particularly those with a disciplinary focus. The VDC will work with existing facilities and will bring new users and increased impact from other disciplines through bridges to those facilities.
JP: You note that the VDC will integrate with other regional and national efforts. Can you paint a picture of what this actually means for your average researcher? If I am a scientist doing research on X, how would VDC help me?
GA: VDC is leveraging the technologies already funded in the NSF DIBBS initiative, so the design is inherently collaborative with other large scale data facilities. What the VDC will provide is an infrastructure that the researcher can use to ensure her data is preserved, is accessible, and can be analyzed and reused by the researcher and by others. Currently, researchers at Rutgers have to build an infrastructure according to granting agency requirements to ensure that data is preserved and made openly available to others or they can deposit in disciplinary repositories. Once deposited in a disciplinary repository, the researcher generally cannot continue to work with the data, unless the data is downloaded for use. VDC is envisioned as a workflow-oriented repository with a suite of tools for reusing data and the ability to store and link data products, such as analyses, which otherwise reside on the researcher's local server or desktop. So the VDC is somewhat unique in designing full integration in merging storage and working space for the active scientist.
JP: Will faculty and researchers at non-participating universities have access to the VDC?
GA: It is open to everyone for discovery of data. I don't think policies for membership in the collaboratory have been developed yet. Membership enables you to upload your data, use tools, etc. The Advisory Board will assist with the development of policies for membership.
JP: When will the VDC be available?
GA: This is a four year grant that began in September 2016. The goal is to use agile methodologies to build a prototype and layer on functionality, so hopefully there will be something real to show early in 2018.
So there you have it, the team behind the VDC is already hard at work. Currently, their focus is on designing a collaboratory for sciences, though Grace was quick to point out that social sciences and humanities wouldn’t be turned away if they were interested.
When it is completed, the VDC will meet or exceed requirements for open access data management by granting agencies and will be a tremendous accomplishment for Rutgers.