[lug] MEAS Data file format

Fri Aug 11 11:43:17 MDT 2000

A typical problem here at NIST is the exchange of measurement data between
various measurement systems.  Typically, there are a large number of
researchers generating data, and most of whom want to spend as little time
as possible coding.  This usually means that each researcher creates
"flat" row and column formated data and simply assigns some filename that
is supposed to have some meaning.  Unfortunately, filenames convey very
little information about the measurement, its conditions, or purpose; and
to make things worse can easily get changed.  This results in collections
of files simply containing sets of numbers without any memory of how or
why these data were gathered.

As if we really need yet another file format, I've created the following
file description loosely based on Hewlett Packard's CITIFILE format, and
have tried to make this simple so that it has a chance of being used by
someone who would normally just create a row and column set of data.  The
idea is that the file can be simply used by most graphing programs such as
gnuplot without modification, but can also be used to convey information
needed for more careful analysis or report generation.  I figured I'd post
this hear to see if anyone has comments, good or bad.  What do you think?

- Wayde
  (wallen at boulder.nist.gov)
----------------------------------------------------------------------------

        Measurement Exchange and Storage (MEAS) File Format
        ===================================================

The Measurement Exchange and Storage (meas) is designed to be a simple
data exchange and storage mechanism based on the following principles: 

   - the data file should be self documenting so that you can figure out
     the files contents without needing external documentation,

   - the structure needs to be highly extensible to deal with changing
     data and contents,

   - the resulting file and code needed to read it should not be
     "brittle".  In other words, if someone creates an extension to the
     file this should NOT invalidate old legacy code,

   - and finally, the structure should be as simple to implement as
     possible.

In order to achieve these goals, there are two key observations.  The
first being that the most common, and hence "simplest", representation of
file data is the so-called row and column format.  The second is that it
is common practice to write data parsing programs to ignore lines that
start with a special character.  A typical character to use for these
so-called "comment" lines is the '#' symbol.  This structure alone allows
us to write data files in a "natural" form, and to embed extra information
in the form of comment lines.

However, we often want to be able to extract some of the information in
the comment lines for use in processing these data.  This is easily done
by introducing keywords that begin with the # symbol.  For instance, using
a keyword such as '#DATE:' allows us to enter a line in this file that
identifies the location of the date information while still complying with
the comment line idea.  This means that programs written to specifically
look for the #keyword data can do something useful with this information.
Any other program simply treats such lines as comments and ignores them.
This has the added advantage that keywords for additional data can be
added at any time without causing programs not specifically looking for
these keywords to fail.  The current list of established keywords is: 

   #              - Any line beginning with # followed by whitespace is
                    an undefined keyword.  This allows for comments
                    internal to the measurement file itself.

   #BEGIN_TEST    - Marks the beginning of a test set.  This allows more than one
                    test to be included in a single file if desired.

   #END_TEST      - Marks the end of a test block

   #BEGIN_DATA    - Marks the beginning of a data block so more than one
                    set of data can be included for a given test.

   #END_DATA      - Marks the end of a data block

   #COMMENT:      - Comments used to describe the measurement.  These
                    are cumulative, and subsequent comment lines are
                    appended to those that came before in the file.

   #CUSTOMER:      - Who these data belong to

   #DATATYPE:      - Are the numbers represented as COMPLEX or MAGPHASE

   #DATE:          - Date the data was taken

   #DEVICE:        - The relevant device ID(s)

   #FILENAME:      - The name of this data file

   #FOLDER:        - The ID number of the measurement folder

   #FREQSCALE:     - The frequency units (Hz, KHz, MHz, GHz, etc.)

   #MANUFACTURER:  - The device manufacturer

   #OPERATOR:      - Who was running the equipment that took these data

   #STANDARDS:     - Space separated list of calibration standard ID's used

   #SYSTEM:        - What measurement system was used to obtain these data

   #VERSION:       - This describes the file revision number

The following is a sample meas datafile:

#BEGIN_TEST
#VERSION: HighPower 1.0.0
#DEVICE: 813592
#DATE: Tuesday, April 18, 2000
#FILENAME: stdsdat
#STANDARDS:   814211, 814212, 814214
#CUSTOMER: NIST
#MANUFACTURER:  Hewlett Packard
#OPERATOR: Wayde Allen
#SYSTEM: 6-port
#DATATYPE: MAGPHASE
#FREQSCALE: GHz
#COMMENT:  This file contains measurement data for the gamma_g
#COMMENT:  program.  These data are the result of a compilation
#COMMENT:  of measurements done on the devices by both the 6-port
#COMMENT:  and low frequency impedance labs.
#
# Freq(GHz) Short(Magnitude, Phase)  Open(Magnitude,Phase) Load(Magnitude,Phase)
0.010   0.9996  179.89  0.9998  -0.13   0.0007  123.48
0.020   1.0001  179.75  1.0000  -0.31   0.0006  125.05
0.030   1.0000  179.61  1.0000  -0.46   0.0005  88.07
0.040   1.0000  179.48  1.0000  -0.62   0.0007  62.40
0.050   1.0000  179.36  0.9999  -0.78   0.0009  52.96
0.060   0.9996  179.24  0.9999  -0.93   0.0011  45.63
0.070   0.9994  179.10  0.9998  -1.09   0.0011  47.02
#END_DATA
#END_TEST

As can be seen, this structure is humanly readable, but can also be
simply parsed by a computer program.  The Keywords provide standardized
documentation for both the subsequent computer program or user.  Also the
computer code needed to read this only needs to look for known keywords,
and to discard blank lines or lines beginning with '#'.  Anything else
will be treated as data.  This means that you can add any number of
arbitrary lines or additional keywords without affecting the code used to
read the file.  Only code written to use any new keywords will care.