Paradisec background, current structures, and thoughts on international collaborations paradisec structure

Yüklə 524 b.
ölçüsü524 b.

PARADISEC background, current structures, and thoughts on international collaborations

PARADISEC structure

PARADISEC rationale

  • prioritises Asia-Pacific region materials not otherwise catered for;

  • provides a rational framework for prioritising and managing University research recordings using international archival formats and standards;

  • implements IP arrangements tailored to University needs and practices;

  • involves researchers in specialist description of resources;

  • streamlines consortium processes to salvage important recordings and make them available for research in a timely and cost-effective way

Research applications

  • Making Australian research available internationally

  • Fieldwork - use for elicitation and documentation, and for language learning in preparation for fieldwork

  • Return of materials to communities

  • Digital tools for optimal transcription and analysis

  • Comparative studies - historical recordings give time depth for area language and music studies

  • Better understanding of diversity - data from some languages only in older recordings

  • Incorporation of primary data in presentations and, ultimately, publications

Staged approach

  • Metadata - 1623 records, to make resources discoverable even if not yet digitised

  • PIs and content metadata need to be assigned before digitisation (some refinement during process)

  • Repository - 807 items digitised to date, some complex e.g. fieldnotes (page images) or transcripts accompanying tapes

Metadata November 2004

  • 1623 records in the metadata repository with data from 24 countries in Asia-Pacific (Australia, Chile, Cook Islands, Fiji, French Polynesia, Hong Kong, Indonesia, India, Japan, Korea, Lao, Malaysia, Federated States of Micronesia, Myanmar (Burma), New Zealand, Palau, Papua New Guinea, Reunion, Singapore, Solomon Islands, Taiwan, Tonga, Vanuatu, Vietnam)

Metadata OLAC harvest

Repository contents

  • Repository totals 26 November 2004

    • total files: 2582
    • total items: 807
    • total size: 1.0TB
    • total hours audio: 627.3 hours
    • file types: .wav, .mp3 (1040); .tif, (179), .jpg (46), .pdf (34), .txt (3), .rtf (8), .xml (32)

Repository Collections

Regional links

  • Institute of Papua New Guinea Studies

  • Vanuatu Kaljoral Senta

  • Archive of Maori and Pacific Music, U. Auckland

  • University of Hawai’i

  • New Caledonia - Tjibaou Cultural Centre

  • Indonesia - UIN, Jakarta

  • Malaysia - Universiti Malaya

  • Rapa Nui - Museo antropologico P. Sebastian Englert

  • Micronesia - Historical Preservation Office, Yap

Audio Ingest

  • Initially ingested as raw WAV on AudioCube 5 Dell 670 workstations running Wavelab (2005 will add remote Pyramix workstations)

    • Masters 24-bit 96khz Broadcast WAV Format (uncompressed audio with encapsulated metadata)
    • Some lower rate if digital original (e.g. 16bit 48khz from DAT)
  • WAV > BWF by Quadriga software

    • derivatives produced by batch processing - CD-audio quality (16-bit, 44.1khz) and mp3 quality(128bps)

Digital preservation

  • “Azoulay” server partitioned for working files and archive partition for sealed masters - current capacity 750GB (>3TB in 2005)

  • Sealed masters archived to 100GB data tapes on University of Sydney LTO Mass Data Storage System (high-low watermark script) - duplicate data tapes kept at 2 locations on campus

  • Sealed masters mirrored to APAC national Store facility (Canberra) nightly - nearline storage

  • Password-protected online access to Store facility

PDSC data flow


  • Main campuses (University of Sydney, University of Melbourne, Australian National University) connected by Grangenet (next generation research network, 10Gbps connections)

    • Pay subscription, not traffic costs
  • Satellite campus UNE connected by AARnet (Australian research and education network - currently billed traffic cost, 155Mbps connection)

  • Both with connections to APAN community (Asia Pacific Advanced Networks) - potential for linking to regional and international R&E networks - potential traffic costs an issue


  • Australian Partnership for Advanced Computing National Facility Mass Data Storage System - Hierarchical Storage Manager system

  • Tape robot system - can handle 1.2PB

    • PARADISEC will add 2-3TB per year once satellite ingest commissioned
    • Current horizon of facility 2008 - project PARADISEC collection up to 9TB by then
    • Will need to apply to host material/share data from other DELAMAN collections


  • GrangeNet streaming server currently in trial mode - only available within network

  • Soon to have automatic copying of main collection to streaming server

  • Foresee higher demand for access when scaled streaming access to excerpts available; but also greater resources needed to mount and manage

    • Will depend on researchers’ provision of timecoded transcripts/glosses
    • Access and authentication protocols yet to be developed
    • Testbed for citation/integration into e-publications


  • Initial metadata database in Filemaker Pro 6 with periodic XML dumps for OLAC static harvesting

  • Currently being ported to MySQL/PHP to allow dynamic harvesting and other functionality

  • Python software for managing repository and website (Stuart Hungerford, ANU)

  • Developing Java-based geographic search interface (TimeMap)

  • All based on Open Source tools


  • Implementations will change over time - foundation for cooperation must be agreements and alignment of strategic objectives

    • Minimal shared standards needed on formats, ethics, description, rights - what else?
  • Possibility of staged modular approach

    • federated discovery platform
    • proof-of-concept pilot studies/trials
      • targeted data sets for exchange
      • dark hosting/mirroring
      • tools development and testing


  • Transnational projects - how to identify and coordinate international funding opportunities?

  • Projections of international traffic & storage charges - funding implications

  • Sustainability of our collections - how to cost overheads and source long-term funding commitments

  • DELAMAN governance and administration structures? How to resource and support without duplication/reinventing the wheel, adding to administrative burden?

  • How to involve all stakeholders (including local/national bodies of originating communities)?

APAN Bangkok 2005

  • E-science workshop: Toward a semantic web for digital data archives (convenor V. Balaji, Princeton)

  • Immense quantities of digital data and images are now archived and publicly available through the web. These include domain-specific data archives, covering such domains as weather and climate, seismology and geophysics, astronomy and particle physics, as well as images and digital copies of non-textual human cultural production. Describing, cataloguing, searching and locating information within digital data and image archives is one of the grand technological challenges of the semantic web era. This session will draw together participants from diverse fields of science and the humanities to share their experience on metadata, standards and techniques for access to large digital archives.  

  • Tentative Titles of presentations:

    • 1) The Hierarchical Data Format for EOS (HDF-EOS), Richard Ullman, NASA Goddard Space Flight Center (Invited)
    • 2) Metadata Requirements for Global Climate Models, V. Balaji, NOAA Geophysical Fluid Dynamics Laboratory 
    • 3) DELAMAN?? Remote presentation…

PARADISEC gratefully acknowledges support from:

  • Partner Universities (Sydney, Melbourne, ANU, UNE)

  • Australian Research Council LIEF scheme

  • Australian Partnership for Sustainable Repositories (SORRT testbed)

  • Australian Partnership for Advanced Computing

  • Grangenet

  • ANU Internet Futures

Contact us


  • (Director)

  • (Project Manager)

Relevant URLs

  • PARADISEC website

  • PARADISEC repository login

  • PARADISEC streaming trial

  • Transcript page image trial

  • TimeMap digitiser tool proof of concept

Yüklə 524 b.

Dostları ilə paylaş:

Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur © 2024
rəhbərliyinə müraciət

gir | qeydiyyatdan keç
    Ana səhifə