[lug] parsing tool for linux

Bear Giles bgiles at coyotesong.com
Sun Apr 7 21:51:10 MDT 2002


> Well, yes, I realize it is a tokenizer; but, I want what a *C* compiler
> thinks is a token, not what I think a C compiler thinks is a token. 

This is what confuses me.  The C tokens were documented in K&R, and
countless documents since then.  There's not much rule for uncertainty
here, and there are countless tokenizers out there.  For instance look
at any of the decent code colorizers.

> I would really
> like to know what C is treating as names of variables and functions for a
> given source file, and that means that C has to know what is and is not a
> function, variable, operator, etc.

You could always compile the source with "-g" and examine the object
file....

With the tokenizer approach, after deciding what to do with the
preprocessor the rest is pretty straightforward - handle strings,
character literals and comments, then everything else is either an
operator, a keyword, or an identifier.  Operators and keywords are
well documented, but figuring out whether an identifier is a function
or variable is non-trivial.

Consider this code:

  struct a { int a; };

  struct a * a(struct a *a) { return a; }

What's "a"?  Yet this is legal code (try it!), and fairly simple at
that since I didn't have nested structures.

Bear



More information about the LUG mailing list