[lug] REGEX DEVELOPMENT TOOL

Anthony Foiani tkil at scrye.com
Tue May 14 14:05:29 MDT 2013


Gordon Golding <gordongoldin at aim.com> writes:

> Legacy systems send info - client info, transactions, info about
> suppliers for the transaction, etc.. in packed text files.  Basic
> structure is known, but there are so many variants, even from one
> client.

I agree with another poster earlier in the thread that you probably
want to split this into two halves: first, normalize all messages into
a single unified format (can be XML or whatever); second, process that
unified format.

That is, don't try to embed parsing logic into your processor:
structure your solution into two pieces (parser/normalizer and then
business logic).

(In the heavy iron world, the first step is known as "ETL": extract,
transform/translate, load.  The idea is to take whatever particular
input you're given, transform it to a standard format, then load it
into a database representation of that standard format.  After the
data is loaded, the business logic works only with the data in the
DB.)

> So there needs to be a flexible library of regexs which can be
> re-used and extended.

It's not clear exactly what you mean by "extended" here.

As for "re-use", most regex-wielding languages allow for one regex to
be embedded in another, or for two regexes to be concatenated.

> Are there any tools better than regex?  Like a powerful parser tool
> - like the front end of a compiler?

lex or flex, then yacc or bison.

Both are substantially more complicated than "just regex" (although
the former tend to rely on regexes in the first place.)

> What about the best tool to develop and manage a tree of Regexs?

If you really can organize all your input as a tree of regexes, then
you might very well want to try the flex/bison route.

> Like the way a code management system gives you the tree - I could
> see the parents and siblings and easily see differences, so I could
> easily visualize and grab "this from branch A and this from C and
> quickly create my hybrid".

I don't know of any GUI that will do what you want.

As a start, I would recommend creating a simple library of regex
"atoms" that match the lowest-level items in your data stream; then
you can concatenate them to handle a particular stream of data.

If this is still not clear, a few examples of the kind of input (and
variation in input) that you're seeing would probably let us help you
better.

Good luck,
Tony

p.s. No need to shout... all caps is rarely necessary.



More information about the LUG mailing list