[teqc] helpful tip of week 1954

Lou Estey lou at unavco.org
Wed Jun 21 07:20:04 MDT 2017


This week's tip: using '+diag' output to split raw binary data files

This tip I've shared, off and on through the years, with various teqc users.
This is mainly needed only if, for some reason, you need to split up a raw
binary data file into smaller binary data files or need to extract a part of
a raw data file as another raw data file.  (If you only need to split up the
raw data into RINEX or BINEX, then review tip of week 1917 using the '-tbin'
and '-ast' options, http://postal.unavco.org/pipermail/teqc/2016/002152.html, and
tip of week 1919 using the global time window options '-st', '-e', and the
associated delta options, http://postal.unavco.org/pipermail/teqc/2016/002159.html.)

As most of you probably know by now, teqc generally cannot output raw binary data
formats, unless you want to consider BINEX as a raw binary format.  (There is one
exception and that is Canadian Marconi Corp. (CMC) binary, where at one time we
had to have a streamlined way of decimating 1-second CMC raw data to 10-second CMC
raw data.)  However, with a little effort (and some extra tools), you can learn how
to use teqc's '+diag' output to split or chop up a raw data file into smaller raw
data files.  This can readily be done for most raw formats that teqc can read,
but not all of them.

The basic idea is to make use of the '+diag' option which, as discussed in tip of
week 1939 http://postal.unavco.org/pipermail/teqc/2017/002235.html, has been
generalized (since 12 Aug 2016) to be:

[752] teqc +help | grep diag
         +diag[nostics] .     output parsing and other diagnostics to stdout
         +diag[nostics] ..    output parsing and other diagnostics to stderr
         +diag[nostics] name  output parsing and other diagnostics to file 'name'
         ++diag[nostics] name append parsing and other diagnostics to file 'name'
         -diag[nostics]       don't output parsing and other diagnostics (default)

... and then use that output to break up the file using another tool that can read
a file and then write an arbitrary number of bytes starting from a specified point
from that file.

For example, suppose you have a Leica MDB file myfile.m00 and execute:

teqc -lei mdb +diag . -O.obs - myfile.m00 > parse.txt

... and then open up the file parse.txt.  You should see lines in it
that start out something like:

Leica MDB frame 0x9cae @ 0o00000000 = 0x00000000 = 00000000   type= 0x65 = 101
Leica MDB frame 0x9cae @ 0o00000212 = 0x0000008a = 00000138   type= 0x66 = 102
Leica MDB frame 0x9cae @ 0o00000740 = 0x000001e0 = 00000480   type= 0x68 = 104
Leica MDB frame 0x9cae @ 0o00001131 = 0x00000259 = 00000601   type= 0x82 = 130
...

What this is showing are the start positions of the various MDB records as
they are found and CRC/checksum verified.  The position of the start of each
record is shown in octal, hexadecimal, and decimal notation from the beginning
of the file (where the first byte in the file is at location "zero").

Now, the '+diag' option does not prevent teqc from converting the file into
RINEX, dumped as usual to stdout, so later in parse.txt you will see some RINEX
lines, for example in the 1-hour long, 1-second sampling MDB file I just did:

...
Leica MDB frame 0x9cae @ 0o00023607 = 0x00002787 = 00010119   type= 0x71 = 113
Leica MDB frame 0x9cae @ 0o00023756 = 0x000027ee = 00010222   type= 0x78 = 120
      2.11           OBSERVATION DATA    M (MIXED)           RINEX VERSION / TYPE
teqc  2017Jun19     Lou Estey           20170621 13:00:23UTCPGM / RUN BY / DATE
Solaris 5.10|UltraSparc IIIi|cc -m64 SS12.1|=+|*Sparc       COMMENT
BIT 2 OF LLI FLAGS DATA COLLECTED UNDER A/S CONDITION       COMMENT
-Unknown-                                                   MARKER NAME
-Unknown-           -Unknown-                               OBSERVER / AGENCY
451114              LEICA GRX1200GGPRO  4.00/3.010          REC # / TYPE / VERS
                     LEIAT504        NONE                    ANT # / TYPE
   4263867.7193   722560.9803  4672990.9819                  APPROX POSITION XYZ
         0.0000        0.0000        0.0000                  ANTENNA: DELTA H/E/N
      1     1                                                WAVELENGTH FACT L1/2
      0                                                      # / TYPES OF OBSERV
      1.0000                                                 INTERVAL
Default                                                     COMMENT
Project creator:                                            COMMENT
  SNR is mapped to RINEX snr flag value [0-9]                COMMENT
   L1 & L2: min(max(int(snr_dBHz/6), 0), 9)                  COMMENT
   2006     5    10     3     0    0.0000000     GPS         TIME OF FIRST OBS
     14                                                      LEAP SECONDS
                                                             END OF HEADER
  06  5 10  3  0  0.0000000  0 13G07G21G16G15G10G30G06G25R19R20R04R05
                                 R21
Leica MDB frame 0x9cae @ 0o00024756 = 0x000029ee = 00010734   type= 0x78 = 120
...

In the MDB format, each record 120 (hexadecimal 0x78) is a GNSS data record for one epoch
and upon finding the first one, teqc outputs the RINEX header and the information for
the first epoch.  So we find from the combined output that the first MDB record 120
starts at byte 10222 in my MDB file (where, again, the first byte is at a byte location
of 0 by definition).

The option '-O.obs -', which is only usable with a '+diag' option, suppresses
output of all GNSS observables.  This doesn't need to be used, but for this particular
application there is no need to see the actual observables; we just want to see the
epoch times mixed in with the parsing results, and using '-O.obs -' reduces the
overall size of the file parse.txt.

Now, let's suppose I want to break off the first 10-minute's worth of MDB records
into a separate file.  Then in this example I search for epoch 03:10:00

...
Leica MDB frame 0x9cae @ 0o01111733 = 0x000493db = 00299995   type= 0x78 = 120
  06  5 10  3 10  0.0000000  0 13G07G21G16G15G10G30G06G25R19R20R04R05
                                 R21
...

So the epoch for 03:10:00 is from record 120 which starts at byte 299995.
Therefore I need to split off the first 299995 bytes of my 1-hour MDB file
to make the first 10-minute MDB file from it, which will go from the beginning
of the file at 03:00:00 through to and including the epoch 03:09:59.

Now, what do you use to do the actual splitting?  Here you're going to have to do
a little digging around or write your own code.  (I wrote a simple C program
to do this years ago for our group.)  On UNIX/Linux, you can try the `dd` command
using options 'bs=1' and then 'skip=...' and/or 'count=...' set to the appropriate
values.

With a little practice, you'll get the idea.  Our data engineers that handle
data for the UNAVCO archive have been trained how to do this for several different
formats and we routinely use this approach if the data file to be archived spans
more than 7 days -- i.e. the file is split into smaller files that are 1 week or
less in length.  We also sometimes have cases where we need to trim off a
problematic part of a raw data file.

Now, even if you split the file perfectly, is this a perfect solution?  No,
because there are other records, like initial metadata records, which are not
going to be part of the file that you've split off in this way.  In my 1-hour
example file, the first 10222 bytes of the file (i.e. everything before the
start of the first record 120) are initial metadata and/or ephemeris records.

And there are also several formats where this splitting cannot easily be done.
Two that come readily to mind are both formats of Ashtech receivers: the U-file
format and the B-file format.  The problems with these is that more than just
file splitting is required to recover usable files.

Happy teqc-ing!

cheers,
--lou

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Louis H. Estey, Ph.D.              office:  [+001] 303-381-7456
UNAVCO, 6350 Nautilus Drive           FAX:  [+001] 303-381-7451
Boulder, CO  80301-5554            e-mail:  lou  unavco.org

"If the universe is the answer, what is the question?"
                                                -- Leon Lederman
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Past helpful tips:

week 1894: using teqc config files - http://postal.unavco.org/pipermail/teqc/2016/002067.html
week 1895: qc of high-rate data - http://postal.unavco.org/pipermail/teqc/2016/002071.html
week 1896: UNIX/Linux shells for Windows - http://postal.unavco.org/pipermail/teqc/2016/002072.html
week 1897: '-' vs. '+' teqc options - http://postal.unavco.org/pipermail/teqc/2016/002076.html
week 1898: auto-identification of formats - http://postal.unavco.org/pipermail/teqc/2016/002092.html
week 1899: auto-identification vs. format flags - http://postal.unavco.org/pipermail/teqc/2016/002096.html
week 1900: square brackets in options - http://postal.unavco.org/pipermail/teqc/2016/002105.html
week 1901: using option '+mds' - http://postal.unavco.org/pipermail/teqc/2016/002108.html
week 1902: qc results w/ problematic nav messages - http://postal.unavco.org/pipermail/teqc/2016/002113.html
week 1903: '-no_orb[it]' and '-no_pos[ition]' options - http://postal.unavco.org/pipermail/teqc/2016/002115.html
week 1904: '-week' option - http://postal.unavco.org/pipermail/teqc/2016/002117.html
week 1905: using '+bcf' for XYZ/geodetic conversion - http://postal.unavco.org/pipermail/teqc/2016/002126.html
week 1906: the '+v[erify]' option - http://postal.unavco.org/pipermail/teqc/2016/002128.html
week 1907: '+C2', '+L5', "+L6', '+L7', '+L8', '+all' options - http://postal.unavco.org/pipermail/teqc/2016/002130.html
week 1908: getting RINEX doppler and L2 - http://postal.unavco.org/pipermail/teqc/2016/002131.html
week 1909: using paths w/ file names - http://postal.unavco.org/pipermail/teqc/2016/002132.html
week 1910: the (un)importance of file names - http://postal.unavco.org/pipermail/teqc/2016/002133.html
week 1911: notices, warnings, and errors - http://postal.unavco.org/pipermail/teqc/2016/002134.html
week 1912: the '-max_rx_SVs' option - http://postal.unavco.org/pipermail/teqc/2016/002137.html
week 1913: the end of '++igs' and '+igs' - http://postal.unavco.org/pipermail/teqc/2016/002140.html
week 1914: splicing together RINEX files - http://postal.unavco.org/pipermail/teqc/2016/002144.html
week 1915: using '-O.int' and '-O.dec' - http://postal.unavco.org/pipermail/teqc/2016/002145.html
week 1916: '+doy' option - http://postal.unavco.org/pipermail/teqc/2016/002146.html
week 1917: '-tbin' and '-ast' options - http://postal.unavco.org/pipermail/teqc/2016/002152.html
week 1918: mp12 RMS before/after Oct 2013 - http://postal.unavco.org/pipermail/teqc/2016/002158.html
week 1919: the global windowing options - http://postal.unavco.org/pipermail/teqc/2016/002159.html
week 1920: '-M.dec' and '-N.dec' options - http://postal.unavco.org/pipermail/teqc/2016/002163.html
week 1921: combining time filtering options - http://postal.unavco.org/pipermail/teqc/2016/002176.html
week 1922: helping me (or someone else on the list) help you - http://postal.unavco.org/pipermail/teqc/2016/002187.html
week 1923: the "build" line - http://postal.unavco.org/pipermail/teqc/2016/002190.html
week 1924: the qc '-w[idth]' option - http://postal.unavco.org/pipermail/teqc/2016/002193.html
week 1925: qc with explicit time windowing - http://postal.unavco.org/pipermail/teqc/2016/002194.html
week 1926: the '+rx_state' option - http://postal.unavco.org/pipermail/teqc/2016/002200.html
week 1927: the '-O.sum' option - http://postal.unavco.org/pipermail/teqc/2016/002204.html
week 1928: the '+meta' and '+mds' options - http://postal.unavco.org/pipermail/teqc/2016/002206.html
week 1930: more on '-O.sum' - http://postal.unavco.org/pipermail/teqc/2017/002207.html
week 1931: the '-O.s[ystem]' option - http://postal.unavco.org/pipermail/teqc/2017/002208.html
week 1932: leap seconds - http://postal.unavco.org/pipermail/teqc/2017/002215.html
week 1936: what you can (and shouldn't) do in a RINEX obs file - http://postal.unavco.org/pipermail/teqc/2017/002229.html
week 1938: the '+psp' option - http://postal.unavco.org/pipermail/teqc/2017/002231.html
week 1939: the '+diag' option - http://postal.unavco.org/pipermail/teqc/2017/002235.html
week 1951: '-n_<system>' and SV filtering options - http://postal.unavco.org/pipermail/teqc/2017/002277.html
week 1953: more with '+diag' option - http://postal.unavco.org/pipermail/teqc/2017/002287.html

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://postal.unavco.org/pipermail/teqc/attachments/20170621/11bd2816/attachment-0001.html>


More information about the teqc mailing list