The Gemini Science Archive Prototype: Released !!
November 19, 2003
The prototype of the Gemini Science Archive (GSAp), shown in the picture below, was recently released to the astronomical community and the general public. This is a significant event in Gemini's maturation since it adds an end-point to the development sequence of high-level software systems. Now a Gemini user can seamlessly move from the application process via the Phase I Tool (the PIT), to observation definition using the Observing Tool (the OT), to observation execution using the Observatory Control System (the OCS), through data handling and storage (via the DHS), and finally into a permanent data archive (via the GSA).
The release of the GSAp is an auspicious event and marks the first availability of Gemini science data to not just a project's principal investigator (PI) but to anyone who wants to make use of it. Many ground-based observatories and all space-based telescopes have previously developed data archives (for example, the HST data archive) but few, if any, ground-based archives have been designed from the ground-up to be as comprehensive, far-reaching and state-of-the-art as the GSA. In order to fully appreciate the GSA, let's consider the purpose of a science archive and the limitations often imposed by the design process.
A typical ground-based astronomical science archive contains data obtained from that facility over many years of its operation. It is generally accepted that data in a science archive is not made available to the general astronomical community or to the public before the expiration of a fixed period of time, known as the "proprietary period". This allows the researcher, who requested the data, to be the first to analyze it and extract the scientifically interesting information and then publish the results. The proprietary period varies from facility to facility, and in the case of Gemini, is 18 months from the date of data acquisition. For the length of the proprietary period, therefore, the data is solely accessible to the project PI. However, it is often the case that the same dataset may be very interesting to other astronomers with slightly different research goals. Even if they cannot use it until the proprietary period is over, it is useful for them to know the data exist, and they may then not have to waste time duplicating the observations elsewhere. Without the release of the data through a science archive, this additional future usage is difficult.
Because scientific archives are not always complete enough to allow data from the facility to be used by other researchers, a primary aim of the GSA is to make the data from the Gemini telescopes immediately useful once the proprietary period has expired. A typical science archive developed over the last decade has generally concentrated on giving access to raw data. As is always the case with astronomical data, considerable effort has to be expended to turn raw science data coming straight from the instrument into something scientifically useful (or even a pretty picture). Also, this process is not always possible since only raw data is available with little association to required calibration data and other "meta-data" necessary for data reduction. Meta-data is an important concept to science archives even though it is not necessarily scientifically useful by itself, but it enables researchers to more precisely assess the quality of the science dataset. Examples of meta-data include the weather maps and conditions during the acquisition period of the observations, the observing sequences executed to acquire the data, and instrument setup during the acquisition. An example follows of a typical weather map stored in the meta-data database.
A key reason that Gemini has been able to implement these features now is because plans for the development of the GSA were initiated in the late 1990s during the design period of the Gemini telescopes themselves. The Canadian Astronomical Data Center (CADC) in Victoria, BC, was selected to host the GSA and to develop the software required for its operation. Since then, Gemini staff members have worked closely with CADC staff to produce a design that makes this a world-class archive. The release of the prototype GSA marks both the start of the operational phase of the GSA at CADC, and the incremental release of the GSA and it's many advanced features to the community.
Even at this early stage, the GSA is a considerable improvement over many existing science archives. The prototype will lead, within the next year, to the "basic archive," which will be fully functional for the searching and retrieval of data, and will only lack some of the more advanced features that will make future versions of the GSA an extremely powerful tool for data mining.
Using the prototype requires users to have a CADC account, as is the case for all CADC hosted archives. These are given freely to everyone upon completion of a simple registration form on the CADC homepage. The interface of the GSA prototype is web-based using a number of forms (see the above picture) to search for data based upon a variety of observation parameters. For example, one could search for all data taken with Gemini's facility near-infrared imager (NIRI), between the wavelengths of 2 and 3 microns on sources between Right Ascension 9 and 10 hours and Declination 60 through 70 degrees. Such a search currently produces 12 observations available from Gemini whose proprietary periods have expired as of November 13th, 2003 (see example below). The same search can be performed for all data, even those whose proprietary period has not yet expired. This latter feature is important and since it allows researchers to find all data that has been taken and determine when it will be publicly available. However, only data whose proprietary period has expired will be available for download.
Another very useful feature displayed on the results page are the hyperlinks that will either take you to all the data acquired on the same night as a specific entry, or to a view of the calibration data acquired during that night. This enables off-line processing of raw data to take place with the correct and most appropriate calibration files.
The next step in the development of the GSA will be the release of the complete basic archive, which is anticipated during the first half of 2004. The complete basic archive will contain, among other features, links to appropriate meta-data including weather information, links to the science programs under which the data was taken, augmented FITS header information, data-set catalogs of related observations and archived On-Line Data Processing (OLDP) results for instant preview.
We are currently working on an additional contract with CADC to implement advanced capabilities for the GSA that will allow the execution of more complex tasks and assure compatibility of the GSA with the Virtual Observatory (VO) world. Advanced capabilities include on-the-fly reprocessing of raw data to provide fully reduced datasets for download, cross-referencing archive content with other GSA contents well as other archives (e.g., the HST archive), and advanced catalog creation (e.g., source catalogs with fluxes cross-referenced to other similar catalogs perhaps containing fluxes at other wavelengths).
We hope you are convinced that the release of the GSA prototype is a major step in the development of the Gemini Observatory and of the usefulness of archives in general. You are welcome to try out the GSAp and make use of the growing archive of Gemini data!
The Gemini Science Archive prototype is powered by software developed by CONICYT and CADC, and contains data and meta-data provided by the Gemini Observatory.