Advances Workflow Examples Using FDSN Webservices

Data requests using webservices and workflow

Two templates for a workflow example of using webservices are presented. The first template uses direct HTTP requests explicitly, whereas the second template uses the abstracted ObsPy FDSN Client implementation to manage data requests. Data is requested using FDSNWS-Dataselect, FDSNWS-Station, and FDSNWS-Event. All example code is written in Python.

Using Direct HTTP Requests

This example concerns multiple requests for station metadata, event metadata, and raw waveform data using FDSN standards. The finished source code for this example can be downloaded.

As setup, native Python modules and libraries need to be imported at the top of the script.

import urllib
import os
import tempfile
import pandas as pd
import numpy as np
from obspy.core import UTCDateTime, read

I. Station Metadata

The query for station information of stations from networks FR and RA with channels HN? (wildcard character "?" matches any character) is stored in variable sta_request. Python function urlretrieve of the urllib module transfers the content of the web request to text file RESIF_station_file.

RESIF_station_file = 'RESIF_station_file.txt'
# request station metadata at channel level
sta_request = ',RA&channel=HN?&format=text&level=channel'
urllib.urlretrieve(sta_request, RESIF_station_file)

A few columns from RESIF_station_file are selected and stored in variable RAP_sta, with the corresponding column header names in RAP_col. Function read_sta_file uses read_csv from the pandas package.

RESIF_sta_names = list(['NET', 'STA', 'LOCID', 'CHA', 'LAT', 'LON', 'ELEV',
                        'DEP', 'AZ', 'DIP', 'INSTRUM', 'SCALE', 'SFREQ',
                        'SUNIT', 'SAMPRATE','START', 'END'])

RAP_sta, RAP_col = read_sta_file(RESIF_station_file)

def read_sta_file(filename):

    pd_sta = pd.read_csv(filename, sep='|', header=0, names=RESIF_sta_names)
    short_list = list(['NET', 'STA', 'LOCID', 'LAT', 'LON', 'INSTRUM', 'SAMPRATE', 'START', 'END'])
    values =  pd_sta[short_list].values
    values_unique = np.vstack({tuple(row) for row in values})
    nr, nc = values_unique.shape
    ista = short_list.index('STA')

    for v in values_unique:
        v[ista] = v[ista].strip()

    return values_unique, short_list

To transfer information between various requests, a Python class called SeismicStation is created, and station objects are instantiated. Every station object has attributes assigned to it that can be accessed from anywhere in the program, for example and self.start_times in the code below. The word self refers to the station object. All station objects are gathered in the list sta_dict.

END_TIME = UTCDateTime(2017, 1, 1) # create station objects using class SeismicStation sta_dict = create_station_objects(RAP_sta, RAP_col) def create_station_objects(X, names_list): ista = names_list.index('STA') stations = X[:, ista] unique_stations = np.unique(stations) nsta = len(unique_stations) stadict = {} for sta in unique_stations: lines = X[X[:, ista]==sta, :] stadict[sta] = SeismicStation(lines, names_list) return stadict class SeismicStation(object): def __init__(self, sta_lines, sta_col_names): nr, nc = sta_lines.shape ista = sta_col_names.index('STA') ilat = sta_col_names.index('LAT') ilon = sta_col_names.index('LON') istart = sta_col_names.index('START') iend = sta_col_names.index('END') self.sta = np.unique(sta_lines[:, ista])[0] lats = np.empty(nr, dtype=float) lons = np.empty(nr, dtype=float) start_times = np.empty(nr, dtype=object) end_times = np.empty(nr, dtype=object) for i in xrange(nr): line = sta_lines[i] lats[i] = float(line[ilat]) lons[i] = float(line[ilon]) start_times[i] = UTCDateTime(line[istart]) try: end_times[i] = UTCDateTime(line[iend]) except TypeError: # case of missing end time end_times[i] = END_TIME if end_times[i] > END_TIME: end_times[i] = END_TIME = np.mean(lats) self.lon = np.mean(lons) self.n_periods = len(start_times) time_indexes = np.argsort(start_times) self.start_times = start_times[time_indexes] self.end_times = end_times[time_indexes]

Station Metadata at Response Level

A station request at response level yields more detailed metadata than the first station request at channel level. In the URL 'level=channel' is changed to 'level=response'. The networks (FR, RA) and channels (HN?) are the same as before. StationXML file RESIF_response_file is produced by urlretrieve.

RESIF_response_file = 'RESIF_response_file.xml' # request station metadata at response level resp_request = ',RA&channel=HN?&level=response' urllib.urlretrieve(resp_request, RESIF_response_file)

II. Event Webservice

For every station object, data of events within a 50 km radius (event_radius) of that station are requested.

RENASS_TSV_NAMES = list(['ID', 'OTIME', 'LAT', 'LON', 'MAG', 'MAGTYPE', 'DEPTH', 'DEPTYPE', 'CAT', 'EVAL1', 'NSTA', 'NPHA', 'AZGAP', 'EVAL2', 'TOWN', 'DIST']) event_radius = 50 for sta in sta_dict.values(): # get event metadata sta.get_ReNaSS_events(event_radius) # get event waveform data sta.get_RESIF_data(nmax=None)

Event Metadata

In class module get_ReNaSS_events event metadata is retrieved and processed. A web request is set up, similar to the previous one, requesting event metadata from ReNaSS (Réseau National de Surveillance Sismique), since RESIF does not support FDSNWS-Event. The data format is TSV (tab-separated values), which can be read by read_csv from the pandas package. Useful metadata is passed to attributes of the station objects.

#class SeismicStation(object): def get_ReNaSS_events(self, radius_km): # set up web-request request = '' request = request + '&longitude=%.2f' % self.lon request = request + '&latitude=%.2f' % request = request + '&starttime=%s' % self.start_times[0].isoformat() request = request + '&endtime=%s' % self.end_times[-1].isoformat() request = request + '&maxradius=%.2f' % (radius_km / 111.25) # retrieve catalog data f_, fname = tempfile.mkstemp() urllib.urlretrieve(request, fname) # read data and remove file pd_ev = pd.read_csv(fname, sep='\t', names=RENASS_TSV_NAMES) os.unlink(fname) # get only the info we want to keep short_list = list(['ID', 'OTIME','MAG']) values = pd_ev[short_list].values iid = short_list.index('ID') itime = short_list.index('OTIME') imag = short_list.index('MAG') nr, nc = values.shape # clean data ids = values[:, iid] otimes = np.empty(nr, dtype=object) ev_mags = np.empty(nr, dtype=float) for i in xrange(nr): v = values[i] otimes[i] = UTCDateTime(v[itime]) ev_mags[i] = float(v[imag]) self.n_events = len(otimes) time_indexes = np.argsort(otimes) self.ev_ids = ids[time_indexes] self.otimes = otimes[time_indexes] self.ev_mags = ev_mags[time_indexes] os.close(f_)

Event Waveform Data

For the actual waveform data, a fdsn-dataselect query is constructed in class module get_RESIF_data. This request gathers miniSEED waveform data from the events identified in the event metadata query. There will not be waveform data available for every event, since the station may not have been active in the requested time frame. The read module from ObsPy is used here to read the file gained from the web request into a Stream object (for more info, see ObsPy workflow below) and write it to a SAC file. The SAC file can subsequently be plotted and processed in SAC. Writing to other formats, such as miniSEED (format="MSEED"), is possible.

#class SeismicStation(object): def get_RESIF_data(self, nmax=None): if nmax is not None: n_ev = nmax else: n_ev = self.n_events data = np.empty(n_ev, dtype=object) n_data = 0 # get data from 1 minute before (t1) to 10 minutes after (t2) origin times for otime in self.otimes[0:n_ev]: print otime t1 = otime - 60 t2 = otime + 10*60 request = '' % (self.sta, t1, t2) f_, fname = tempfile.mkstemp() urllib.urlretrieve(request, fname) try: st = read(fname) os.unlink(fname) if len(st) > 0: data[n_data] = st n_data += 1 st.write("data/%s_%s.sac" % (self.sta, otime), format="SAC") print 'Waveform data found!' except TypeError: os.unlink(fname) print 'no RESIF data in this time window' continue os.close(f_) data.resize(n_data) = data

Figure showing raw waveform data of event requested from RESIF FDSNWS-Dataselect.

ObsPy FDSN Client

An alternative way of requesting waveform data and station/event metadata is through ObsPy, a Python framework for seismology. The obspy.clients.fdsn module is in many cases the best option, because of its large number of data centers and modern data formats. See the ObsPy tutorial for documentation and more information on this module. With this module, the user can obtain station and event metadata, and waveform data, and instantly plot the results. The complete source code for this template can be found here.

Start by initializing a client object specifying either the name of the webservice, e.g. "RESIF", or the base URL, e.g. "".

from obspy.clients.fdsn import Client import numpy as np from obspy.core import UTCDateTime, read, Stream # specify webservice client = Client("RESIF")

Three data requests that the client module supports are get_stations, get_events and get_waveforms, although not all clients support all three types. As mentioned before, RESIF, among others, does not support get_events, which is why the client is switched to IRIS for that part of the workflow.

I. Station metadata at channel level

To get station information at channel level, use the get_stations module. An Inventory object is created, which has a network-station-channel hierarchy. Station codes and coordinates, amongst others, are attributes of the station objects that this request yields.

# request station metadata at channel level station_chan = client.get_stations(network="FR,RA",station="*",channel="HN?",level="channel") # plot stations station_chan.plot() # number of stations for FR network nsta_FR = station_chan[0].selected_number_of_stations # number of stations for RA network nsta_RA = station_chan[1].selected_number_of_stations # get coordinates of every station FR_coords = np.empty([nsta_FR, 2]) RA_coords = np.empty([nsta_RA, 2]) for k in range(nsta_FR): FR_coords[k,0] = station_chan[0][k].latitude FR_coords[k,1] = station_chan[0][k].longitude # get station names FR_sta = ["" for x in range(nsta_FR)] for p in range(nsta_FR): FR_sta[p] = station_chan[0][p].code

II. Station metadata at response level

To get metadata at response level, simply change the level key in get_stations from "channel" to "response". Instrument responses can be plotted with plot_response (see figure below).

# request station metadata at response level client = Client("RESIF") station_resp = client.get_stations(network="FR,RA", station="*", channel="HN?", level="response") # plot responses for network FR, station GRN station_resp.plot_response(min_freq=0.001,network="FR",station="GRN")

II. Event data

The get_events method in the code below finds events from the first of January until the first of March 2002 and requests their metadata, creating a Catalog object. This information can be written to file in, for example, XML-format. Similar to the previous workflow example, events can be requested within a certain distance of stations using the maxradius parameter. To reduce computational time, only stations that are active in the requested time window are taken into consideration. For the events that are found, waveform data is requested using get_waveforms, creating Stream objects containing one or more Trace objects. Common data processing operations, such as removing instrument response, filtering and resampling, can be performed on Stream objects and its traces. For an overview of built-in ObsPy methods, refer to the ObsPy documentation.

# get event metadata of events within radius=0.5 degrees of stations starttime = UTCDateTime("2002-01-01") endtime = UTCDateTime("2002-03-01") event_radius = 0.5 # degrees for p in range(nsta_FR): # create empty Stream object st = Stream() # check if station was active during requested time frame act = station_chan[0][p].is_active(starttime=starttime, endtime=endtime) if act is True: lat = FR_coords[p,0] # station latitude lon = FR_coords[p,1] # station longitude client = Client("IRIS") try: FR_events = client.get_events(starttime=starttime, endtime=endtime, latitude=lat, longitude=lon, maxradius=event_radius) # produces a Catalog object FR_events.write("%s_events.xml" %FR_sta[p], format="QUAKEML") n_events = FR_events.count() # number of events found print '%i event(s) found for station %s' %(n_events,FR_sta[p]) # request waveform data from origin time minus 1 minute to origin time plus 10 minutes: client = Client("RESIF") for j in range(n_events): # event origin time otime = UTCDateTime(FR_events[j].origins[0].time) try: st += client.get_waveforms("FR",FR_sta[p],"*","HN?",otime-60, otime+10*60) print 'Waveform data found for station %s event %i!' %(FR_sta[p],j+1) except Exception: print 'No waveform data found for station %s event %i...' %(FR_sta[p],j+1) continue FR_events.plot(projection="local",title="Event locations %s" %FR_sta[p]) st.plot() # write data to file st.write("%s_data.MSEED" %FR_sta[p], format="MSEED") except Exception: print 'No events found for station %s...' %FR_sta[p] continue else: print 'Station %s was not active in requested time frame' %FR_sta[p]
a) Waveform data (miniSEED)
b) Event locations
c) Instrument Frequency Response

Figures showing a) raw waveform data b) earthquake magnitudes and c) instrument frequency response for three components.