Observatories and Research Facilities for EUropean Seismology
Volume 6, no 2 December 2004 Orfeus Newsletter

Conversion of SEED format to XML representation for a new standard of seismic waveform exchange

Seiji Tsuboi1 and Shin'ya Morino2

1Institute for Frontier Research on Earth Evolution (IFREE), JAMSTEC, Japan
2Hakusan Corporation, Japan

Abstract - Introduction - Seed Format - Conversion rules for headers -
Data - Programs - XML-SEED for synthetic database - Summary -
References

Abstract

We represent a header structure in XML (eXtensible Markup Language) of Standard for the Exchange of Earthquake Data (SEED), which is an international standard format for the exchange of digital seismological data. We show that this representation allows the extension of header content without introducing any modification to the existing mini-SEED waveform data structure.

Introduction

The Standard for the Exchange of Earthquake Data (SEED) has been designed as an international standard format for the exchange of digital seismological data (SEED Manual, 1993). It is now widely used by the community that maintains the broadband seismograph networks and it is recognized as a standard format for data exchange. A SEED volume consists of header records and data records. The format for data records is called mini-SEED and it is closely related to the format recorded in data loggers. Each header is made up of a sequence of blockettes. Since SEED blockettes are defined as a collection of named fields with fixed length, this introduces difficulties of extension of data structures. And because there Already exist a huge amount of waveform data saved in mini-SEED format, it would be quit a job to fully revise the current SEED format to allow future flexible extensions. Although it has been recognized that the revision of SEED format is necessary, there has been no attempt for major revision since its latest release of Ver. 2.3 in February, 1993 because of this difficulty. Here we propose an XML representation of the SEED header structure and show that a flexible design and robust validation in data models will be realized at the same time. Technical difficulties for constructing a network-based system will also be reduced by introducing XML to the SEED data description. We also mention the extension of XML-SEED format to synthetic seismogram databases.

Seed Format

SEED was adopted as a standard format for international digital seismic data exchange in 1987 by the Federation of Digital Seismographic Network (FDSN), which was formed under the International Association for Seismology and Physics of the Earth's Interior (IASPEI). Before the SEED format was adopted, digital seismic data exchange was complicated because of different data logger formats. SEED was designed to accommodate comprehensively differences in data format that originated from the type of data logger. The SEED format consists of one logical volume, which contains two format objects: (1) control headers and (2) time series. The first one is formatted in ASCII and contains auxiliary information about the volume. The second one contains raw binary data, the digital seismogram. Control headers are categorized as (1) volume index control headers, (2) abbreviation dictionary control headers, (3) station control headers and (4) time span control headers. These headers are used to provide information such as the definition of abbreviations used in the control headers, operating characteristics for a station and its channels, and the time span of the data. Because of these comprehensive descriptions of the SEED volume in the control headers, the SEED format can be used to provide digital seismograms recorded by almost any kinds of data loggers. Each control header consists of a series of blockettes, which contain a sequence of data fields specific to that blockette type. Because blockettes are defined as a collection of named fields with fixed length, this introduces difficulties of extension of data structures.
On the other hand, header structure is designed to be modular, which is similar to XML. This similarity motivated us to represent the SEED format structure in XML. Data structures of XML document are very flexible, because the length of the fields are not fixed. Defining new fields and blockettes only requires a new tag name and hierarchy specification. To describe types of data, XML has its schema language, which is called XML-Schema. This schema language is also used for validation of XML document. By introducing XML into SEED, it is apparent that a flexible design and robust validation in data models will be realized.

Conversion rules for headers

We have set up the following rules for converting control headers of SEED format into XML representation.

Document type

Document type is defined with the root node named 'xseed'. This root node has 4 child nodes for headers as shown in the following list. Every header node stores XML representations of the corresponding blockettes.

Conversion of blockettes

All control header blockettes have a 'Blockette type' field and a 'Length' field to represent their type and size. Both the blockette type and name are used as a unique-identifier for that type of blockette. In the current SEED volumes only the type is used. In the XML representation, the blockette name suits as an identifier of the blockette types, because it's more descriptive, and tells the role of blockettes by itself. The blockette type is added as an attribute for the convenience of people familiar with blockette type numbers. The blockette will be represented as follows with blockette name as 'blockette_name', and blockette type as '555':

The 'Length' field is not required in the XML representation. A blockette begins with a starting tag named blockette_name and ends with an ending tag having the same name as the starting tag. Data for every field can have a variable length. Field values are represented by markup entity between the beginning and the ending tags

<field_name>Value</field_name>

where 'field_name' represents the name of the field, and 'Value' represents the entity of field.

References and sequence number

An important characteristic of the SEED format is that some fields of particular blockettes refer to another blockettes. To describe these references the language XPath is chosen. Referenced blockettes always have fields for the identifier, which is converted to string identifiers. Sequence numbers of logical records are ignored in the 'volume index control header', because XML documents are not stored in logical records. However the 'time span control header' includes the description of location of waveform data that is pointed by sequence numbers, and written as zero-filled numbers with 6 numeric characters.

Data

When we represent the SEED header structure in XML we do not modify anything regarding the format of the time series data. To include binary mini-SEED format digital seismograms in XML-represented SEED volumes we consider two scenarios. The first scenario is the separation of the header file and the data. The data can be located in other data files at data servers connected via networks. One can get a stand-alone header file to know about an event, properties of stations and data locations. This is the same concept as dataless-SEED volumes. In order to retrieve the complete seismic waveform data one combines the two separate files; the data file is accessed the data server following the description in the data file.
The second scenario is the same as the current full-SEED volume so that the XML-SEED volume includes both the SEED header represented in XML and the binary seismic wave data. The header specifies the location of the data that is stored in the same file. This composition is basically possible in the following way. For example, the first line describes the length of the header, followed by a blank line. The header XML document starts at the third line. The format of the header part is plain-ASCII and is not based on logical records. The data part starts at the position specified in the first line. The data is stored in logical records like in the current SEED volumes. With this schema the reader programs can determine the location of the seismic wave data by using the values specified in the first line and the sequence numbers in time span control headers.
So far, we have not changed the current SEED header structure. Now, propose one possible extension of a SEED volume by introducing a new tag, which has no corresponding blockette in the current SEED format. It is the <data_record> tag. Data records are split into two parts, the 'Fixed Section of Data Header' (FSDH) and the Base64-encoded seismic wave data. Field names and their values of members in FSDH are expanded in the same style as headers do. The FSDH part is placed under the <data_header> tag. Seismic wave data are encoded as Base64 and are placed under the <chunk> tag. The 'data_record_length' attribute at the <chunk> tag represents the byte length of the decoded data. By doing this, both header and data are represented in one volume of XML-SEED. Encoding by base64 increases the total size of the volume, but not by an excessive amount.

Programs

Programs to convert current full-SEED volumes to XML-SEED volumes and read XML-SEED volumes to extract seismograms are available. Currently we provide digital broadband seismograms from the Ocean Hemisphere Project geophysical network by XML-SEED format through IFREE data center. The image of the webpage is shown in Figure 1.


Figure 1. Web site for distribution of XML-SEED formatted broadband seismograms.

XML-SEED for synthetic databases

Recently, we have demonstrated that we can calculate global theoretical seismograms for realistic 3D Earth models based upon the combination of a precise numerical technique (the spectral-element method) and a sufficiently fast supercomputer (the Earth Simulator) (Tsuboi et al., 2002). It has now become possible to routinely calculate synthetic seismograms for earthquakes greater than a certain magnitude. Starting in 2003, we selected earthquakes with magnitudes greater than 6.5 from the Harvard CMT catalog and calculated theoretical seismograms for the stations in the Global Seismographic Network. To distribute this synthetic seismogram database to the seismological community we modifed the XML-SEED to include metadata entries, which are characteristic to the synthetic seismogram database, such as the numerical technique we used to generate the synthetic seismograms (Tsuboi et al., 2004). We distribute these theoretical seismograms through IFREE/JAMSTEC and Caltech (and select "synthetic seismograms"). The advantage of using XML for the exchange of both observations and synthetics is illustrated in Figure 2. We are now developing software that allows the users to retrieve both synthetics and observations at the same time using the same user interface based on the web services technique. For this software to work efficiently it is important that both data and synthetics are in XML.


Figure 2. Concept of web service based software to retrieve both data and synthetics using the same user interface. The description of the user interface and data transfer is summarized in Web Service Description Language (WSDL). This figure is created by Takuya Arai of Fujitsu Corp., Japan.

Summary

We have shown that the current SEED format can be directly translated to an XML representation without introducing any modifications to the current format. The advantages of using the XML representation of SEED are that a) XML is a text-based language and easy to extend, b) XML documents support hierarchical data structures, c) XML is platform independent, and d) XML suits network-based technologies. It is straightforward to add any necessary information at a later stage by defining tag names and include these into the schema. Although we have not modified the current SEED control headers, there should be various ways to extend SEED by taking full advantage of XML. One example could be the status report of the data logger. If the data logger reports its status or parameter settings in XML format with its digital seismograms, this information can be directly incorporated into the database directory in the data center. This should greatly simplify data quality checks done at the data center. Another example is data distribution through the web service. As data exchange protocols for web service is in XML. If SEED data is in XML format, we may use the control header content described in the XML format for data exchange and distribution. We have developed a network data center system based on Java RMI (Takeuchi et al., 2002). We may distribute our XML-SEED formatted digital seismograms through our network data center system to fully utilize the XML represented header structure.

References

  • Federation of Digital Seismograph Network: Standard for the Exchange of Earthquake Data, Reference Manual, SEED Format Version 2.3, Incorporated Research Institution for Seismology, 1993.
  • Takeuchi, N., Watada, S., Tsuboi, S., Fukao, Y., Kobayashi, M., Matsuzaki, Y., and Nakashima, T., 2002. Application of distributed object technology to seismic waveform distribution, Seismological Research Letters, 73-2, 166-172.
  • Tsuboi, S., Komatitsch, D., Ji, C., and Tromp, J., 2003. Broadband modeling of the 2002 Denali Fault earthquake on the Earth Simulator, Physics of the Earth and Planetary Interior, 139, 305-312.
  • Tsuboi, S., Tromp, J., and Komatitisch, D., 2004. An XML-SEED Format for the Exchange of Synthetic Seismograms, EOS Transactions of American Geophysical Union, suppl., SF31B-03.
page 10
Copyright © 2004. Orfeus. All rights reserved.