[lug] parsing tool for linux
Bear Giles
bgiles at coyotesong.com
Sun Apr 7 21:51:10 MDT 2002
> Well, yes, I realize it is a tokenizer; but, I want what a *C* compiler
> thinks is a token, not what I think a C compiler thinks is a token.
This is what confuses me. The C tokens were documented in K&R, and
countless documents since then. There's not much rule for uncertainty
here, and there are countless tokenizers out there. For instance look
at any of the decent code colorizers.
> I would really
> like to know what C is treating as names of variables and functions for a
> given source file, and that means that C has to know what is and is not a
> function, variable, operator, etc.
You could always compile the source with "-g" and examine the object
file....
With the tokenizer approach, after deciding what to do with the
preprocessor the rest is pretty straightforward - handle strings,
character literals and comments, then everything else is either an
operator, a keyword, or an identifier. Operators and keywords are
well documented, but figuring out whether an identifier is a function
or variable is non-trivial.
Consider this code:
struct a { int a; };
struct a * a(struct a *a) { return a; }
What's "a"? Yet this is legal code (try it!), and fairly simple at
that since I didn't have nested structures.
Bear
More information about the LUG
mailing list