You are here

Downloading and Understanding the Data

Content owned by bill.vacca

1.1 New data notification Emails

We send automatic emails in the morning following observing to notify PIs and/or Co-Is that new data were taken for their program. This facility is provided by our internal data handling systems. Email addresses to send notifications to are updated from the Observing Database daily. If you would like to add email addresses to receive the notifications, please simply add them to the "PI / PC Email" field in the top level of your Gemini Science Program in the OT and store your program to the database.

Email notifications for data taken during the night go out at 8:00am in the morning following the observing night. The email contains a summary table of all the relevant data files.

These Emails are triggered purely on the presence of science data files from night time observing. These data will not generally have been Quality Assessed, or their metadata validated, at the time the Email notifications go out. Data Quality and FITS header information in these files will be updated as QA proceeds. Data should be retrieved from the Gemini Observatory Archive in the normal manner.

1.2 Data Quality Assessment Process

Gemini Data Quality Assessment

Gemini science and calibration data go through a quality assessment (QA) procedure. This is a two-step process, with real-time QA performed by the observer at the summit followed by off-line QA done at the relevant base facility by the data analyst (DA) on duty, usually the day after the observations are taken. The QA determines whether the files can be passed or need to be repeated.

For all queue observations, the PI will have specified four observing condition constraints: image quality, cloud cover, water vapor content and sky background. These conditions are defined to be within one of several percentile bins. By comparing the requested to the actual conditions at the time of the observation, the QA process identifies whether the PI criteria are met.

Unchecked Data

As of semester 2013A, we are no longer able to review all the data. All band 1 data will be checked, however, as well as any programs deemed to be high priority by the QC (up to 30% of the night's data in total).  Other programs, including band 4 and classical programs, are not checked, and may be left with their QA state set to UNDEFINED if the night-time observer was unable to review them in real-time. The DA will however monitor the automatic ingestion of files into the archive.

If you are a Gemini PI and you find some issues with your data (i.e. your data was left unchecked but did not meet requirements, or it was checked but you don't agree with the quality assessment) then please follow the usual routes for requesting a repeat observation (see below), provided we are still in the same semester, and that the target is still reachable:

  • Contact the head of science operations for the Gemini telescope in question.
  • Please supply information on what does not meet the requirements - be sure to include the program and observation ID.
  • CC your email to the contact scientists.

Real-time Data QA

The observer will continuously monitor the observing conditions and choose observations from the various observing plans provided by the queue coordinator (QC). For example, if the observer starts the night with a program requiring IQ20 conditions (the best seeing) she/he will continuously monitor the seeing and if conditions deteriorate, stop the ongoing program and switch to the next available observation in one of the IQ70 (or other appropriate) plans. The same is done for cloud cover and water vapour. The sky background is calculated by the queue planning software and taken into account in the preparation of the plans, so is only checked by the observer when deviating from the plan.

The observer will also check nighttime data, including calibrations such as flats, arcs and standard stars, for saturation, obvious program setup errors, telescope/instrument problems and other possible issues. Other "sanity checks" are also performed. For instance, if the IR spectrum of a high-redshift galaxy requiring a blind offset acquisition is not detected in individual sky-subtracted pairs, this will not necessarily raise any red flags. If a faint spectrum is seen for a telluric standard, though, the observer may choose to troubleshoot, leave a note in the nightlog requesting daytime followup, or abandon the observation and move to one using a different instrument, depending on the circumstances.

At the end of the night the observer will queue the requested daytime calibrations as defined in the observing tool (OT).

Off-line Data Processing

While the observer will have made every effort to ensure that good data were taken during the night, this is done in parallel with acquiring targets, anticipating and reacting to the weather, and sometimes dealing with faults, in the middle of the night and (on Mauna Kea) at >4000 m above sea level. The daytime DA on duty is specifically focused on data checking and has the benefit of knowing the evolution of the observing conditions during the night.

The off-line data QA is split into three parts: 

  • Checking that the observing condition constraints were met
  • Looking for issues such as telescope/instrument problems, incorrect observational setups, standard stars with low counts, etc.
  • Checking that the necessary calibrations were obtained, with the correct setup

Some of the tools used can be seen below. Figure 1 shows seqplot, a quick-look tool that lets the observer or DA rapidly view a sequence of data frames, as well as check the most important header keywords, saturation levels, etc.

Figure 1: The "seqplot" quick-look tool displaying GMOS-N longslit data.

Gemini's QAP (Quality Assessment Pipeline) can be set to run automatically at night, immediately reducing new images as soon as they are written to disk. It outputs its results to a web-based GUI (as shown in figure 2), enabling the user to quickly access the measured seeing, cloud cover and sky brightness values for suitable data.

Figure 2: The Gemini Quality Assessment Pipeline (QAP)'s GUI, showing measured seeing, extinction, and sky brightness values.

view_wfs is another tool to help assess the data quality. It displays the guide counts as well as the wavefront sensor's seeing estimates over any desired range of frames.  It is useful for extrapolating seeing values (since we don't always take imaging data) and for checking for the existence of clouds.  Figure 3 shows an example from a rather cloudy night, using GMOS and its on-instrument wavefront sensor (OIWFS) for longslit spectroscopy. The gradual drop in counts was caused by thin - and subsequently, thick - cirrus moving through the telescope's field of view. The various colored lines represent the extinction corresponding to the CC bins.

Figure 3: The view_wfs tool displays the guide counts and seeing estimates for a specified range of files.

The DA will leave notes in the Observing Log section in the OT about any unusual issues and include their actual measurements of the IQ if applicable. They will then set the QA flag in the OT. This can either be PASS, USABLE or FAIL. FAIL is quite unusual and only applied if the file is not at all useful. This might apply to a badly saturated flat field, for example. FAILed data are not ingested into the archive.

Data flagged as USABLE generally do not meet the PI's requirements. Time for these files is not charged to the program or partner country in the OT. For this reason, the USABLE flag is sometimes used for time accounting purposes. For instance, if an observer repeated a standard star observation with a longer exposure time to increase the counts, but this was not in fact necessary, the "extra" files might be given a QA state of USABLE. In these cases an explanatory note will usually be left in the OT.

The DA will report any issues found during data checking to the QC. The QC may work together with the DA and the program's Contact Scientist(s) to make a decision. They will sometimes will contact the PI to ask them about data taken in borderline conditions tentatively set to PASS, or for advice on other issues with the data, instrument configuration, instructions, etc.

Aside from the entries in the OT the QA state is also reflected in the FITS headers by the following keywords:

REQIQ   = '85-percentile'      / Requested Image Quality
REQCC   = '50-percentile'      / Requested Cloud Cover
REQBG   = 'Any     '           / Requested Background
REQWV   = 'Any     '           / Requested Water Vapour

RAWIQ   = '70-percentile'      / Raw Image Quality
RAWCC   = '50-percentile'      / Raw Cloud Cover
RAWWV   = 'Any     '           / Raw Water Vapour/Transparency
RAWBG   = 'Any     '           / Raw Background

RAWPIREQ= 'YES     '           / PI Requirements Met
RAWGEMQA= 'USABLE  '           / Gemini Quality Assessment

The RAWIQ,  RAWCC, RAWWV, and  RAWBG keywords reflect the actual observing conditions as set by the observer and checked, where possible, by the DA. The RAWPIREQ keyword, with values of YES|NO|UNKNOWN shows whether the PI-specified observing conditions were either all met (YES) or at least one condition was violated to the worse side (NO). This keyword is set to "UNKNOWN" by default, and will be left at this value for unchecked programs. 

RAWGEMQA, with a value of BAD|USABLE|UNKNOWN, is a more globally applicable parameter. It is set to USABLE if the data are useful in any way. For example, a field observed in poorer seeing than requested by the PI may still be useful for future users of the archive. A completely saturated flat field, on the other hand, would be set to BAD.

This table shows how the QA-related keywords in the headers relate to the QA states in the OT:


Repeating Observations 

Observations will be scheduled to be repeated if there was a technical problem, the observing conditions did not meet the PI's constraints, or if a significant error was made by Gemini staff. When an observation is scheduled to be repeated, the data from the original observation will be distributed to the PI and to the science archive with the normal proprietary period. The original observation does not count towards the PI's time allocation, and the re-scheduled observation will have the same weighting in the queue as the original observation. An observation that was defined incorrectly by the PI or included insufficient information to execute the program as desired by the PI (inadequate finding charts, for example) may be repeated, but the time will usually be charged to the program.

1.3 Gemini Observatory Archive

The Gemini Observatory Archive (GOA) located at is now the primary conduit to obtain data from Gemini Observatory. The Help and About pages also linked at the top of the archive page give more information.

The GOA provides a simple yet powerful interface to search for and download science data, calibrations and observation logs both for Gemini program PIs and Co-Is, and also for people searching for Gemini data by observation details such as instrument configuration, sky coordinates, observing date etc. Paul Hirst is the GOA project lead.

Data Flow and Distribution

All Gemini data distribution to users is via the archive web interface. Data are transferred to the archive automatically during observing, and each file is typically available for download from the archive within a minute or two of the observation completing. Data Quality Assessment proceeds over the following days and the archive is updated if QA states or other metadata are updated.

The archive will automatically find all the calibration data associated with search results, and presents this in the associated calibrations tab in the search form. Additionally, it's possible to search for calibration data directly in the search form simply by setting the search fields appropriately.

The archive contains mostly raw data. Some data is processed by the observatory and this processed data is uploaded to the archive. Notably, this includes GMOS bias frames and imaging twilight flat fields, along with MOS preimaging data. Other processed data is produced and archived on a somewhat ad-hoc basis.

Gemini metadata, including sky coordinates, target name and instrument configuration are in general immediately world public, and the full FITS headers of proprietary data files can be viewed by any archive user. This is a deliberate policy and allows people to avoid writing proposals that would duplicate observations whose data are still proprietary. The actual pixel values are proprietary and only made available to the project they were observed for, typically for 12 months (18 prior to semester 2016A). A few projects (for example some of the exoplanet campaign projects with NICI and GPI) have been granted a proprietary metadata status where certain items of metadata (sky coordinates and target name), and the full FITS header text, assume the same proprietary privacy restrictions as the pixel values.

Downloading proprietary data

The archive is open to the public to search for data, and download any data which is no longer proprietary. If you are a gemini PI or Co-I and would like to access proprietary data for your program you will need to complete the following steps:

  • Create an account on the archive
  • Register your Gemini program with your archive account. To do this you need the program ID and the ODB key you were given in your time award notification email.

More details on how do to this are provided in the User Accounts section of the archive help page

Getting data automatically (e.g., as part of reduction pipeline) 

Please, read point 11 from the GOA help. You can also find an example of a script from the AAT Observational Techniques Workshop 2016.

Transition from the Gemini Science Archive

Gemini science data were initially stored in the Gemini Science Archive (GSA), operated by the Canadian Astronomy Data Center (CADC). From November 2015, all data are instead stored in, and accessed via, the Gemini Observatory Archive, which is operated by the Observatory.

All Gemini data has been transferred into the new archive. However, user accounts and program registrations are not transferable - you will need to create an account on the new archive and register any programs from which you would like to download proprietary data with that new account.

Unlike the GSA, the GOA does not have the concept of "PI packages". PIs simply search for and download their data (and optionally the associated calibrations) by project ID using the main search form.