1. Trang chủ
  2. » Ngoại Ngữ

Preservation and Transition of NCSTRL Using an OAI-Based Architecture

3 2 0

Đang tải... (xem toàn văn)

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 3
Dung lượng 153 KB

Nội dung

Preservation and Transition of NCSTRL Using an OAI­Based Architecture H Anan, X Liu, K Maly, M J C French E Fox, P Shivakumar Nelson, M Zubair University of Virginia Virginia Tech Old Dominion University Charlottesville, Virginia Blacksburg, Virginia USA Norfolk, Virginia USA USA {fox,pshivaku}@vt.edu {anan,liu_x,maly,nelso_m,zubair} french@cs.virginia.edu @cs.odu.edu Abstract NCSTRL (Networked Computer Science Technical Reference Library) is a federation of digital libraries providing computer science materials The architecture of the original NCSTRL was based largely on the Dienst software It was implemented and maintained by the digital library group at Cornell University until September 2001 At that time, we had an immediate goal of preserving the existing NCSTRL collection and a long-term goal of providing a framework where participating organizations could continue to disseminate technical publications Moreover, we wanted the new NCSTRL to be based on OAI (Open Archives Initiative) principles that provide a framework to facilitate the discovery of content in distributed archives In this paper, we describe our experience in moving towards an OAI-based NCSTRL Introduction NCSTRL (http://www.ncstrl.org), organized and supported at Cornell University, has been a successful digital library (DL) in operation from 1994-2001 with over 100 international participants and over 20,000 digital objects [1] However, recent changes in the publication paradigm for scientific material and realignments of Cornell's DL research interests have caused Cornell to cease coordinating operations of NCSTRL This fact, along with the widening acceptance of OAI [2], motivated us to look at an alternative architecture to preserve and sustain NCSTRL Besides the immediate goal of preserving the old NCSTRL collection, we had a long-term goal to support existing NCSTRL collections by making them OAI compliant, possibly with new large collections at the department/organization level based on e-prints software (www.eprints.org), and individual publishers using Kepler software (http://kepler.cs.odu.edu; [3]) to create small OAI compliant repositories (Figure 1) Preserving Existing Collections We first extracted both the metadata and data from the existing Dienst servers and ftp sites This process, including cleaning of metadata, was automated by writing scripts Next we provided an OAI wrapper around the extracted metadata enabling it to be harvested by the new NCSTRL search service The extracted documents and their metadata are currently being kept at Virginia Tech while the NCSTRL search/browse service is being hosted at Old Dominion University Old NCSTRL Collections with OAI Layer New NCSTRL Collections /at Large Organization University OAI Eprints Software)Compliant ( Repository Manual Registration Service for NCSTRL OAI Compliant Repository Individual Publisher OAI Compliant Repository (KeplerArchivelet) Automated LDAP Based Registration Service for Kepler Archivelets NCSTRL Search Service (Arc like Service) Figure OAI based NCSTRL vision Search Service  We implemented the NCSTRL search service based on the architecture of the Java servletbased Arc (http://arc.cs.odu.edu; [4]) with an Oracle database in the backend The architecture is platform independent and can work with any web server Moreover, minimal changes are required to work with different relational databases such as MySQL The search service provides means to retrieve documents by their metadata It supports both simple and advanced search as well as result sorting by archive or by discovery date Simple search allows users to search free text across archive contents Advanced search allows users to search in specific metadata fields Users also can search/browse specific archives and/or archive partitions in case they are familiar with specific data providers Author, title, and abstract search are based on user input; the input can use Boolean operators (Figure 2) Repository Service The repository stores the metadata for the documents Currently Dublin Core (DC) is used in representing the metadata The actual documents are stored independently in the providers' archives and URLs are provided in the metadata records The metadata fields are stored in an indexed Oracle database that provides fast search capabilities through the metadata sets Figure NCSTRL search service Harvester Service When harvesting metadata, some of the archives such as CNRI (Corporation for National Research Initiatives) and LTRS (NASA Langley Technical Reports Server) were already OAIcompliant, which facilitated harvesting and collecting the metadata However, most of the archives were not OAI compliant Other protocols such as Dienst were available on these archives to enable collecting the metadata, and, where available the actual documents To provide a historical snapshot of NCSTRL at the time of conversion from the Dienst-based operation, we developed a system that allowed collecting data from these archives, providing transformation and filtering tools The result then was established as an OAI service provider that is used in the Arc-powered search service Conclusion and Future Work We have begun the initial steps for the conversion of the NCSTRL digital library, replacing Dienst with an OAI infrastructure We have completed the capture and the preservation of the content that was embedded within the Dienst installations We believe the OAI framework of NCSTRL will result in more individuals and institutions participating in NCSTRL, as well as make for a simpler and easier to maintain DL During the first phase of the NCSTRL project, we have moved the old NCSTRL collection into a new OAI based architecture The more difficult phase of converting existing publication paradigms used by NCSTRL data providers (for serving their ongoing publication collections) lies ahead The issues we face can be partitioned into technical and logistic ones The technical issues include handling metadata that is richer than DC, providing mirror sites, archiving, metadata normalization, caching, and handling of web crawlers We have clear ideas on solving these problems; it is a matter of implementing solutions The logistic issues involve site management, getting faculty to accept publication tools, finding funding for maintenance, and managing code evolution These are mostly unexplored and open for comment from the DL community References 1.Davis, J R & Lagoze, C (2000) NCSTRL: design and deployment of a globally distributed digital library Journal of the American Society for Information Science, 51(3), 273-280 2.Lagoze, C & Van de Sompel, H (2001) The Open Archives Initiative: building a low-barrier interoperability framework Proceedings of the First ACM/IEEE Joint Conference on Digital Libraries (pp 54-62), Roanoke, VA 3.Liu, X., Maly, K., Zubair, M & Nelson, M L (2001) Arc - An OAI service provider for digital library federation D-Lib Magazine, 7(4) 4.Maly, K., Zubair, M & Liu, X (2001) Kepler An OAI Data/Service Provider for the Individual D-Lib Magazine, 7(4) ... maintenance, and managing code evolution These are mostly unexplored and open for comment from the DL community References 1.Davis, J R & Lagoze, C (2000) NCSTRL: design and deployment of a globally... can be partitioned into technical and logistic ones The technical issues include handling metadata that is richer than DC, providing mirror sites, archiving, metadata normalization, caching, and. .. individuals and institutions participating in NCSTRL, as well as make for a simpler and easier to maintain DL During the first phase of the NCSTRL project, we have moved the old NCSTRL collection

Ngày đăng: 18/10/2022, 14:20

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN

w