Change page style: 

Summary of 1998 Science Archive Workshop Presentations

Section 1: Gemini Background

Science Rationale

Jean-Rene Roy presented the science rationale for a Gemini Archive. He outlined the three categories of research opportunities: 1) old data, new science, 2) old data, new technique, and 3) collective effects of data sets from multiple runs. He argued that the motivation for a Gemini science archive is 1) increased quality & quantity of Gemini science, 2) added value to the partnership since all partners have effective access to 100% of the telescope time, 3) increased ability to characterize and monitor the instruments, 4) improvement in education and public outreach, and 5) preservation of scientifically valuable data.

Overview of Planned Gemini Capabilities I

Phil Puxley presented an overview of planned Gemini capabilities. He presented the rationale and context for queue observing, the observing infrastructure (e.g. observing tool, phase I and phase II concepts, atmospheric monitoring), the initial instrument complement, and an overview of the Gemini Engineering Archive (GEA). He also discussed proprietary data rights, data rates, and calibration issues.

It was pointed out that Gemini may be overlooking atmospheric science applications for the archive. For instance, IPAC has had a number of requests for atmospheric data derived from their 2MASS project.

Overview of Planned Gemini Capabilities II

Kim Gilles gave a technical overview of the Gemini Engineering Archive. He motivated the engineering archive by noting that it was necessary in order to commission the telescopes and instruments and to monitor their health. He also noted that the engineering archive and on-site operations were closely integrated.

It was asked whether the Gemini North and South engineering archives would be unified, to which the answer was No, they would have to be independently queried. A discussion ensued in which it was agreed that the GEA had to feed the science archive rather than have the science archive make requests of the GEA.

The availability of engineering data as the GEA migrates to storage was discussed and it was generally agreed some form of data products, e.g. paper graphs, ought to be compiled for some of the most useful data.

Section 2: Perspectives on Capabilities, Options, Requirements and Specifications for an Effective Science Archive

Canadian Astronomy Data Centre (CADC)

David Schade presented the CADC perspective on requirements for a Gemini science archive. He broke down the requirements into fundamental requirements and advanced capabilities. Many of the fundamental requirements are already being built into the Gemini observing system. David concluded that an effective Gemini science archive is viable, that it must be viewed as part of the operations environment, and that the GEA is its foundation.

Daniel Durand continued with a more technical discussion by breaking the requirements into those which were basic, scientific, operational, or policy oriented. These included the topics of data storage, catalogue access, proposal and observation information, auxiliary information, instrument and telescope information, calibration, data distribution, catalogue content, search mechanisms, data retrieval, costs, and planned evolution.

The question arose whether CADC’s model implies that the PI can only recover their data from the archive. A long discussion ensued and it was generally agreed that there were many advantages to having PIs retrieve their data from the archive, that this could be encouraged with calibration, processing, and distribution aids, but that classical observers would be able to take home their data on the then supported media if they so choose.

Calibration was also a topic of discussion. The CADC perspective was that all calibration files be immediately in the public domain. It was pointed out that the science observations themselves are often part of the calibration (e.g. sky flats) so the subject can become tricky. There was general agreement that obvious calibration files should not have a proprietary period.

Space Telescope Science Institute (STScI)

Marc Postman presented the STScI archive perspective and argued that the mission of HST has been greatly enhanced by their archive. He pointed out that HST archives a similar data volume to that expected at Gemini, and that many times more data are recovered from the archive than ingested. He discussed the HST view on data retrieval times and the general need to plan for regular equipment upgrades and archival medium migrations. He also discussed the need for complete and reliable header information, database links between all relevant quality assessment information, reliable calibration data, links to remote archives, media options and costs, and costs and speeds of electronic data distribution.

There was a general discussion of media migration costs and how this needed to be built into the overall budget, although each media migration usually saves money over a <5 year period.

Infrared Processing and Analysis Center (IPAC)

Roc Cutri presented the IPAC archive perspective with 2MASS as his case study. He noted that internal connectivity (e.g. linking spectra with their images) and external connectivity (e.g. interface to other archives) were very important. He noted that the archive should be scalable, that we should understand the limitations of visitor instruments or non-standard calibration procedures, that the archive should allow various means of visualizing the data, that the user load needs to be anticipated, and that the processing pipeline requires rigorous quality assurance. He also believes that archive budgets are always underscoped.

There were further discussions on how all major archives should be interconnected.

Joint Astronomy Centre (JAC)

Remo Tilanus presented the JAC archive perspective. He discussed some useful data visualization tools, problems with naive uses of archive data, how raw data may eventually be of no use when an instrument or technique is outdated, and how careful final calibration may be necessary to preserve such data.

Discussion followed on the topic of naive uses of archival data. There were mixed views on the necessity to guard against it.

Australia

Michael Drinkwater presented archive suggestions from the Australian perspective. He mentioned that the Australians are particularly concerned about the speed of their internet access and were interested in possible work-arounds such as cache-sites. He also discussed the need for complete headers, observing and weather logs, links to proposals, and the need for an easy-to-use web interface. He suggested reduced data be part of the archive.

Discussions of headers and the nature of reduced data followed. Both of these topics had been discussed throughout the day and there was general agreement that good header information is important, although header information often necessarily evolves as the instrument is better understood. There was less agreement on what reduced data should be in the archive, or even what reduced data meant.

Section 3: Discussion

Following the talks a general discussion ensued in which we tried to determine prioritized specifications and requirements for the Gemini Science Archive. The Archive Requirements I & II presented by David Schade were used as a starting point. We first isolated those items that Gemini will be providing irrespective of a Gemini science archive, those items that are necessarily closely collaborative between Gemini and the Archive, and those items that are largely the responsibilities of the Archive. Following this we agreed on elements in the three categories of Fundamental Requirements, Key Capabilities, and Advanced Capabilities (see list in the Summary, below).

A number of additional subjects were also discussed during this period. Perhaps the most important and difficult topic was whether a standard calibration would be imposed on classical observers. Although Phil Puxley noted that such a standard calibration was not currently in the Gemini plan it was strongly agreed by nearly everyone present that it should be. Participants noted that it would be more difficult to impose a standard calibration once operations commence, that often observers under calibrate, and that approximately 50% of the Gemini Science Archive would be at risk if standard calibrations were not imposed on all programs. It was agreed that the standard calibrations would depend on the instrument mode, would be the same as those used in queue observing, and would not impose an overhead of typically more than 5 to 10%. It was agreed that this overhead should be built into the phase I tools although enough flexibility should be incorporated into the process such that PIs with a compelling scientific case could drop the standard calibration procedure.

All present felt that an effective Gemini science archive was technically and scientifically viable and that this was due in large part to the nature of the progressive planning and operations inherent in the Gemini queue, the phase I & II observation planning process, the engineering archive, and the data processing pipelines. The workshop participants also prioritized the requirements for a Gemini science archive into three levels, Fundamental Requirements, Key Capabilities, and Advanced Capabilities.

Attendees:

Gemini: Kim Gillies, Fred Gillett, Phil Puxley, Ted von Hippel, Steve Wampler

Committee of Gemini Offices: Gary Da Costa (Australian Project Scientist), Hugo Levato (Argentinian Project Manager), Pat Roche (UK Project Scientist), Jean-Rene Roy (Canadian Project Scientist), Maria Teresa Ruiz (Chilean Project Scientist), Adrian Russell (UK Project Manager), Thaisa Storchi Bergmann (Brazilian Project Manager), Andrew Woodsworth (Canadian Project Manager)

Invitees: Roc Cutri (IPAC), John Davies (JAC), Megan Donahue (STScI), Michael Drinkwater (Australia), Daniel Durand (HIA), Marc Postman (STScI), David Schade (HIA), Remo Tilanus (JAC), Gillian Wright (UKATC)