[lug] MEAS Data file format

Fri Aug 11 12:16:32 MDT 2000

Wow.  You've obviously put a lot of thinking into this format and it
does seem to achieve your objective, which is the important point.
My question, which is intended for clarity, not criticism, is why you
didn't choose XML.  It would seem that you could create a set of tags
in XML to achieve the same objective, though notably the file size
would be a bit larger to accommodate all the tags.  My goal in asking
this is simply to find out why someone would choose to create a new
format over a standard, in this case, XML, but in a different case,
perhaps something different.

Forgive me, I'm a data storage and transfer nut.

--Kevin
kevin at precisonline.com
http://www.precisonline.com
http://www.precisonline.com/gold.htm

-----Original Message-----
From: Wayde Allen <wallen at boulder.nist.gov>
To: List: Boulder Linux User's Group <lug at lug.boulder.co.us>
Cc: List: Front Range Pythoneers <FRPythoneers at lists.tummy.com>;
Cleland, Bob <cleland at afmetcal.af.mil>; Hough, Cristopher
<HOUGH at wylemail.afmetcal.af.mil>
Date: Friday, August 11, 2000 11:49 AM
Subject: [lug] MEAS Data file format

>
>A typical problem here at NIST is the exchange of measurement data
between
>various measurement systems.  Typically, there are a large number of
>researchers generating data, and most of whom want to spend as
little time
>as possible coding.  This usually means that each researcher creates
>"flat" row and column formated data and simply assigns some filename
that
>is supposed to have some meaning.  Unfortunately, filenames convey
very
>little information about the measurement, its conditions, or
purpose; and
>to make things worse can easily get changed.  This results in
collections
>of files simply containing sets of numbers without any memory of how
or
>why these data were gathered.
>
>As if we really need yet another file format, I've created the
following
>file description loosely based on Hewlett Packard's CITIFILE format,
and
>have tried to make this simple so that it has a chance of being used
by
>someone who would normally just create a row and column set of data.
The
>idea is that the file can be simply used by most graphing programs
such as
>gnuplot without modification, but can also be used to convey
information
>needed for more careful analysis or report generation.  I figured
I'd post
>this hear to see if anyone has comments, good or bad.  What do you
think?
>
>
>- Wayde
>  (wallen at boulder.nist.gov)
>--------------------------------------------------------------------
--------
>
>
>        Measurement Exchange and Storage (MEAS) File Format
>        ===================================================
>
>The Measurement Exchange and Storage (meas) is designed to be a
simple
>data exchange and storage mechanism based on the following
principles:
>
>   - the data file should be self documenting so that you can figure
out
>     the files contents without needing external documentation,
>
>   - the structure needs to be highly extensible to deal with
changing
>     data and contents,
>
>   - the resulting file and code needed to read it should not be
>     "brittle".  In other words, if someone creates an extension to
the
>     file this should NOT invalidate old legacy code,
>
>   - and finally, the structure should be as simple to implement as
>     possible.
>
>In order to achieve these goals, there are two key observations.
The
>first being that the most common, and hence "simplest",
representation of
>file data is the so-called row and column format.  The second is
that it
>is common practice to write data parsing programs to ignore lines
that
>start with a special character.  A typical character to use for
these
>so-called "comment" lines is the '#' symbol.  This structure alone
allows
>us to write data files in a "natural" form, and to embed extra
information
>in the form of comment lines.
>
>However, we often want to be able to extract some of the information
in
>the comment lines for use in processing these data.  This is easily
done
>by introducing keywords that begin with the # symbol.  For instance,
using
>a keyword such as '#DATE:' allows us to enter a line in this file
that
>identifies the location of the date information while still
complying with
>the comment line idea.  This means that programs written to
specifically
>look for the #keyword data can do something useful with this
information.
>Any other program simply treats such lines as comments and ignores
them.
>This has the added advantage that keywords for additional data can
be
>added at any time without causing programs not specifically looking
for
>these keywords to fail.  The current list of established keywords
is:
>
>
>
>   #              - Any line beginning with # followed by whitespace
is
>                    an undefined keyword.  This allows for comments
>                    internal to the measurement file itself.
>
>   #BEGIN_TEST    - Marks the beginning of a test set.  This allows
more than one
>                    test to be included in a single file if desired.
>
>   #END_TEST      - Marks the end of a test block
>
>   #BEGIN_DATA    - Marks the beginning of a data block so more than
one
>                    set of data can be included for a given test.
>
>   #END_DATA      - Marks the end of a data block
>
>   #COMMENT:      - Comments used to describe the measurement.
These
>                    are cumulative, and subsequent comment lines are
>                    appended to those that came before in the file.
>
>   #CUSTOMER:      - Who these data belong to
>
>   #DATATYPE:      - Are the numbers represented as COMPLEX or
MAGPHASE
>
>   #DATE:          - Date the data was taken
>
>   #DEVICE:        - The relevant device ID(s)
>
>   #FILENAME:      - The name of this data file
>
>   #FOLDER:        - The ID number of the measurement folder
>
>   #FREQSCALE:     - The frequency units (Hz, KHz, MHz, GHz, etc.)
>
>   #MANUFACTURER:  - The device manufacturer
>
>   #OPERATOR:      - Who was running the equipment that took these
data
>
>   #STANDARDS:     - Space separated list of calibration standard
ID's used
>
>   #SYSTEM:        - What measurement system was used to obtain
these data
>
>   #VERSION:       - This describes the file revision number
>
>The following is a sample meas datafile:
>
>#BEGIN_TEST
>#VERSION: HighPower 1.0.0
>#DEVICE: 813592
>#DATE: Tuesday, April 18, 2000
>#FILENAME: stdsdat
>#STANDARDS:   814211, 814212, 814214
>#CUSTOMER: NIST
>#MANUFACTURER:  Hewlett Packard
>#OPERATOR: Wayde Allen
>#SYSTEM: 6-port
>#DATATYPE: MAGPHASE
>#FREQSCALE: GHz
>#COMMENT:  This file contains measurement data for the gamma_g
>#COMMENT:  program.  These data are the result of a compilation
>#COMMENT:  of measurements done on the devices by both the 6-port
>#COMMENT:  and low frequency impedance labs.
>#
># Freq(GHz) Short(Magnitude, Phase)  Open(Magnitude,Phase)
Load(Magnitude,Phase)
>0.010   0.9996  179.89  0.9998  -0.13   0.0007  123.48
>0.020   1.0001  179.75  1.0000  -0.31   0.0006  125.05
>0.030   1.0000  179.61  1.0000  -0.46   0.0005  88.07
>0.040   1.0000  179.48  1.0000  -0.62   0.0007  62.40
>0.050   1.0000  179.36  0.9999  -0.78   0.0009  52.96
>0.060   0.9996  179.24  0.9999  -0.93   0.0011  45.63
>0.070   0.9994  179.10  0.9998  -1.09   0.0011  47.02
>#END_DATA
>#END_TEST
>
>As can be seen, this structure is humanly readable, but can also be
>simply parsed by a computer program.  The Keywords provide
standardized
>documentation for both the subsequent computer program or user.
Also the
>computer code needed to read this only needs to look for known
keywords,
>and to discard blank lines or lines beginning with '#'.  Anything
else
>will be treated as data.  This means that you can add any number of
>arbitrary lines or additional keywords without affecting the code
used to
>read the file.  Only code written to use any new keywords will care.
>
>
>_______________________________________________
>Web Page:  http://lug.boulder.co.us
>Mailing List: http://lists.lug.boulder.co.us/mailman/listinfo/lug
>
>