Near Real Time Data at NORSAR for CTBT Monitoring
J. Fyen and
K. Iranpour
NORSAR, Instituttveien 25, 2007 Kjeller, Norway
The Norwegian National
Data Center (NDC) is responsible for the design, installation, operation
and maintenance of NORSAR's field installations, the data recording and transmission,
and the processing and analysis of data. NORSAR's field installations
include seismic array stations, three-component seismic stations and radionuclide
stations. An infrasound station is planned for installation in 2005. The NDC also maintains
databases of seismic data containing digital data of earthquakes, nuclear
and non-nuclear explosions since around 1970.
NORSAR performs the technical
duties of Norway relating to the Comprehensive Test Ban Treaty (CTBT). The NDC section at NORSAR
is tailored to construct, maintain and operate the six Norwegian stations
of the International Monitoring System (IMS) established for the verification
of compliance with the treaty. Figure 1 shows a map over IMS stations in
the Nordic countries including the six Norwegian stations. Under treaty's
provisions and described in IMS stations operational manual a set of requirements
are to be met. These requirements mostly deal with the issues of data quality
and communication between IMS stations and the CTBTO in Vienna, in particular
what is defined in the IMS operational manual as data timeliness, data availability
and data reliability. The purpose of this report is to describe NORSAR's
solution to the issues addressed above.
Figure 1. Nordic / Arctic IMS stations. Station codes signify the station type and its
number in the IMS network. "PS" and "AS" represent primary and auxiliary
seismic stations, "R" represents Radio nuclide stations and "IS" is an InfraSound station.
Norsar operates three IMS seismic arrays. These are the large teleseismic Norsar array
(NOA PS27)* with a diameter of 60km, the regional Arcess array (ARCES PS28) with a diameter of 3km
and the small Spitsbergen array (SPITS AS72) with a diameter of 1km.
The NOA array consists of 42 different sites with a total of 63 instruments.
These are organized in 7 different subarrays. This is the largest array in
the IMS network. The ARCES array has 25 sites with 36 instruments and the
SPITS array has 9 sites with 12 instruments. SPITS array represents minimum
requirements for the size of an IMS array. Figure 2. shows the approximate
design of the three arrays and their relative size.
Figure 2. Schematic plot of Norway's IMS seismic arrays.
On the left the ARCES array in Karasjok and the SPITS array on the island
of Spitsbergen. On the right the NOA array with its group of seven subarrays
near the town of Hamar. The ARCES and SPITS arrays are included for comparison.
Near real time data may take several paths from the provider* (station or NDC) to the
consumer (IDC). Common to these is the first phase of data transmission and
is the transmission of data from individual array element (digitizer) to
the Central Recording Facility (CRF). From there on the data is either directly
forwarded to the IDC in what is termed basic topology or to the NDC and thereafter
to the IDC in what is termed the independent subnetwork. The latter is the
choice of implementation for ARCES and SPITS. For NOA, the CRF is at the
NDC.
Data from primary stations
arriving at IDC must be in the Continuous Data Format, CD1.0 or the more
recent version CD1.1. In addition all IMS data destined for the IDC must
be authenticated (signed). After January 2000 data must be signed at the digitizer.
The Continuous Data Format
CD1.0 is a straightforward TCP/IP program-to-program socket communication
and is used to send binary formatted data (frames) from the provider to the
IDC or conversely from the IDC to the provider. After the initial connection
has been established between the sender application and the receiver at IDC,
the station sends a Station Identification Frame and receives the designated
port for transmission of further data from the other end. Second a Data Format
Frame is sent to the IDC identifying the station channels to which the subsequent
data belongs. Finally a continuous stream of data frames is sent. The Frames
consist of a header containing the nominal time of channel data and a number
of subframes each containing data for one channel (normally 10 seconds),
time, number of samples in the subframe as well as some state of health data
and authentication information (signature). Table 1 shows how a CD1.0 frame
is constructed. The status field in each subframe is used for additional
state of health data like power on/off, tampering switch, vault open/close
etc. Canadian compression is applied on the samples. The "Alpha library"
developed by the "Science Applications International Corporations" (SAIC)
for the prototype IDC (pIDC) in early nineties is an example of an application
used for data exchange between the provider and the IDC. Application of Alpha
library is briefly discussed later in this report.
| 20 bytes Data Frame Header |
| 4 length |
40 signature |
8 time stamp |
4 # samples |
Compressed data samples - 10 seconds |
| 4 length |
40 signature |
8 time stamp |
4 # samples |
Compressed data samples - 10 seconds |
| 4 length |
40 signature |
8 time stamp |
4 # samples |
Compressed data samples - 10 seconds |
| 4 length |
40 signature |
8 time stamp |
4 # samples |
Compressed data samples - 10 seconds |
. . N . . |
| 4 length |
40 signature |
8 time stamp |
4 # samples |
Compressed data samples - 10 seconds |
Table 1. CD1.0 Data Frame. The number in each cell represents the length of the field in bytes.
Continuous Data Format CD1.1 offers some improvements
to its predecessor CD1.0. While channel identification is done immediately
after the connection is established in CD1.0 through a frame specially designed
for that purpose, this happens at the subframe level in CD1.1. Each subframe
carries channel information, making it easier to discover errors in the data.
Another aspect unique to CD1.1 is the "application acknowledgment" inherent
in the design. If the communication protocol does not guarantee error free
data transmission, the application acknowledgement level of CD1.1 would compensate
for that deficiency. A third addition to the CD1.1 is its ability to issue
commands e.g. generating public key, etc.
As to this date the CD1.1 implementation is only available
at two IMS stations. The remaining stations still use CD1.0.
SAIC 's "Public Software Bundle" library offers a
comprehensive solution to the whole problem of intra station communication,
storage and data transmission to the consumer based on the CD1.1 formatted
data. The solution is built around the concept of Framestores. A Framestore
is basically a set of directories and files for buffering CD1.1 formatted
data. Through applications which form part of the public bundle, data can
be retrieved and transported to a similar Framestore or larger Framestores
formed by multiplexing single ones on the receiving end (CRF/NDC/IDC).
Within an array, each individual
site (digitizer) using some error free protocol communicates with the CRF.
A frame of information (packet) of some vendor specific format is sent to
the CRF by asynchronous,
synchronous, UDP or TCP/IP protocol. Data is then converted to CD format
and authenticated (signed). The individual site then sends the signature
and status information separately from sample data to the CRF. At CRF the
CD1 subframes are recreated and along with the corresponding signatures form
a 10 seconds CD1 frame which is then sent to the NDC or the IDC. Current
systems that use CD1.1 send signed data from digitizer/authenticator in CD1.1
subframes to the CRF.
The arrays operated by
NORSAR was built before 1 January 2000 and thus escaped the requirements
later imposed concerning the signing of data at the individual sites rather
than centrally. The data arrives at the CRF in some vendor specific format.
At PS27 this is Science Horizon's AIM24 single second packets of compressed
data. The protocol for transmission is synchronous SDLC (ADCCP). Each frame
is synchronized to the start of a second. A communication Interface Module
(CIM) connects to the digitizers using an RS422 interface, buffers and
delivers one second data frames on the SCSI interface connected to a SUN
solaris workstation.
At PS28 Nanometrics
HRD24 digitizers pack 17 bytes of compressed data into one frame and transmit
this using asynchronous communication to RM4 multiplexers at the center.
The RM4 multiplexers are connected with the central SUN solaris workstation
in the local ethernet based network. The RM4 acts as a server that can deliver
15 times 17 bytes data frames using UDP protocol.
In both cases
the NORSAR applications collect the frames from the AIM24 or the RM4 and
store the data into a circular disk buffer. Another application then reads
the disk buffer, reformat the data into CD1 subframes and record the data
into NORSAR style, time indexed disk loop. Then a third application keeping
track of the last transmitted CD1 frame, sends the newly formed frames to
the NDC. At the NDC a receiving application takes the CD1 frames and write
to a corresponding CD1 indexed diskloop. The concept of NORSAR diskloops
is discussed later in this document. Figure 3 and Figure 4 are schematic
illustrations of the various steps in NORSAR arrays data communication.
Figure 3. Schematic plot of NORSAR data acquisition from
individual sites to the CRF. Data packets are written to a frameloop, converted
to CD format before being dispatched to the NDC.
Figure 4. From CRF to NDC. Socket communication used to transmit CD1 frames from one
diskloop at CRF to corresponding diskloop at NDC over a VSAT link.
Arriving at the NDC, the CD1 frames are forwarded to the IDC using the Alpha library over the GCI
link (at NORSAR this is a frame relay. VSAT is more commonly used).
The AlphaRead reads samples from disk loop, and calls a subroutine of alphalib to create CD1 formatted
data and write it to the heap file.
The AlphaSend empties the heap file by sending CD1 frames to the IDC using LIFO- last in, first out
sequence.
The NORSAR diskloop is
simply a UNIX file system consisting of as many files as the number of hours
of data the disk loop spans. Thus a weeklong diskloop would have 168 files.
Each file then contains a number of record slots proportional to the length
of each frame. Frames of 10 seconds thus will define files containing 360
slots. Then indexing into the diskloop for reading or writing is a matter
of simple arithmetic given the time of the first sample for a record. This
structure is independent of the format of data.
When arriving at the NDC,
the data now in NORSAR style diskloop is converted to continuous CSS3.0 format.
The CSS formatted data is written into a file system organized around the
date of data and the station of origin. The file system is input to the automatic
array processing tasks. It also serves as the input to the archiving process.
A tape robot with a capacity of 30 terabyte reads the data from the CSS file
system and writes an indexed copy into a tape where the data can be easily
retrieved if needed. NORSAR keeps a comprehensive archive of data dating
back to 1971. Segmented data from that time is recorded on \275 inch magnetic tapes. Since September 1982, all continuous
data is recorded on tape. This archive is added to by 2.5 Gigabyte of data
every day.
In addition
to Norwegian stations, NORSAR collects data from the Finnish IMS primary
station FINES (PS17) over the internet and the Swedish HAGFORS array (AS101)
over a TDMA VSAT link. Data from AS101 is then forwarded to Stockholm over
the internet.\240 Figure 5 is an illustration of various paths of data from
different sources to the NDC and from there to its final destination at IDC.
Figure 5. NORSAR independent subnetwork
Solutions to the challenges
of data communication and data processing of seismic arrays have evolved
over more than 30 years at NORSAR. The principle idea throughout these years
have been to develop simple and at the same time robust solutions. The acquisition
and processing tasks have been reduced to smaller and more manageable modules,
each concerning itself with only a section of the entire process. The key
link between the various modules is the time of latest processed data. All
the applications, both those directly communicating with the hardware and
those managing the processing, analysis and storage tasks are time sequential.
To ensure recovery of the
system in case of a problem leading to stoppage of some of the sub tasks,
UNIX crontabs are extensively used. The time sequential concept inherent
in all the subtasks allows easy location and recovery of the error by the
crontab processes.
* The code represents the designated station code in IMS station network. Seismic stations are assigned
either PS (primary station) or AS (Auxiliary station).
* These terms are used in the IDC manual.
|