[lug] XEmacs quoting madness!
Tkil
tkil at scrye.com
Mon Aug 20 23:49:11 MDT 2001
>>>>> "dajo" == dajo <David> writes:
dajo> (concat
dajo> ;; beginning of line + initial quote
dajo> (bas-regcomp-leading-anchor) ; "^"
dajo> (bas-lispify-string (bas-codestring-quote)) ; "\\\"" see below
dajo> ;; software bit
dajo> (bas-regcomp-group-open) ; "\\("
dajo> (bas-make-character-set (list ; "[^\\\"]"
dajo> (bas-regcomp-diy-not)
dajo> (bas-regcomp-quote)))
dajo> (bas-regcomp-one-or-many) ; "+"
dajo> (bas-regcomp-group-close) ; "\\)"
dajo> ;; intervening commas and quotes
dajo> (bas-make-character-set (list ; "[\\\",]"
dajo> (bas-regcomp-quote)
dajo> (bas-codestring-comma)))
dajo> (bas-regcomp-one-or-many) ; "+"
dajo> ;; description
dajo> (bas-regcomp-group-open) ; "\\("
dajo> (bas-make-character-set (list ; "[^\\\"]"
dajo> (bas-regcomp-diy-not)
dajo> (bas-regcomp-quote)))
dajo> (bas-regcomp-one-or-many) ; "+"
dajo> (bas-regcomp-group-close) ; "\\)"
dajo> ;; rest of line.
dajo> (bas-regcomp-any-character) ; "."
dajo> (bas-regcomp-zero-or-many) ; "*"
dajo> (bas-regcomp-trailing-anchor) ; "$"
dajo> )
letting my mind wander, i see two things i'd change about this.
first, i'd use scoping (let ...) to use shorter names (i like
terseness!). (having full namespaces available would be nice, too;
see below.)
second, i'd use function calls of some sort to handle the groups and
possibly the quantifiers. you already do this for creating charsets;
i'd just suggest extending it for creating groups, and any other
constructs that have to be "balanced". [1] it might also make the
quantifiers a little more "readable", in a way:
| (one-or-more (charset comma quote))
this flipping of the postfix to a prefix notation seems to fit better
for reading code out loud (which, oddly enough, i've often found to be
a good measure of self-documenting code).
sick little monkey: i could even use this idea for a whole-line
matcher. [2] Jeffrey Friedl also talks about standard patterns for
dealing with matched delimiters such as quoted material; this type of
functional building technique should be a near-perfect match for
that.
further magic would make use of macros for this, but my lisp is a bit
too rusty to venture there. i suspect that most of the "mapconcat"
stuff, in particular, could go away.
taken together, this would allow for an interesting representation of
the slightly tree-like nature of this particular regex.
| (defun tkil-re-group (re)
| (concat "\\(" re "\\)"))
|
| (defun tkil-re-charset (&rest chars)
| (concat "[" (mapconcat 'identity chars "") "]"))
|
| (defun tkil-re-inverse-charset (&rest chars)
| (concat "[^" (mapconcat 'identity chars "") "]"))
|
| (defun tkil-re-quantify (re min max)
| (concat re
| (cond ((and (equal min 0) (equal max 1)) "?")
| ((and (equal min 0) (equal max 'infinity)) "*")
| ((and (equal min 1) (equal max 1)) "")
| ((and (equal min 1) (equal max 'infinity)) "+")
| (t (concat "\\{" min "," max "\\}")))))
|
| (defun tkil-re-silly ()
| (interactive)
| (insert
| (let ((quote "\\\"")
| (comma ",")
| (beginning-of-line "^")
| (end-of-line "$")
| (any-char "."))
| (concat beginning-of-line
| quote
| (tkil-re-group
| (tkil-re-quantify
| (tkil-re-inverse-charset quote) 1 'infinity))
| (tkil-re-quantify
| (tkil-re-charset quote comma) 0 'infinity)
| (tkil-re-group
| (tkil-re-quantify
| (tkil-re-inverse-charset quote) 1 'infinity))
| quote
| (tkil-re-quantify any-char 0 'infinity )
| end-of-line))))
if we don't care about namespace pollution, we could strip the
"tkil-re-" off the front of all the functions, and it becomes even
more readable:
| (concat beginning-of-line
| quote
| (group
| (quantify
| (inverse-charset quote) 1 'infinity))
| (quantify
| (charset quote comma) 0 'infinity)
| (group
| (quantify
| (inverse-charset quote) 1 'infinity))
| quote
| (quantify any-char 0 'infinity )
| end-of-line))))
trying to go one step further, i hit an interface problem. i tried to
create:
| (defun tkil-re-match-one-of (&rest res)
| (mapconcat 'identity res "\\|"))
but should it force a grouping? to really match its name, it should;
doing that would lose the one-to-one correspondence between
"tkil-re-group" and submatches that we'd otherwise have. note that
there's a workaround for this in perl: the (?:...)
group-but-don't-save option.
in a related vein, take a look at what Abigail did to generate a
regular expression that matches (a superset of all) URLs. the
background is at:
http://www.foad.org/~abigail/Perl/url2.html
the program is at:
http://www.foad.org/~abigail/Perl/url3.pl
and the actual output is at:
http://www.foad.org/~abigail/Perl/url3.regex
t.
[1] this lets lisp scoping close things properly for you, instead of
relying on the programmer to remember the appropriate closing tag.
as a similar example, the perl CGI.pm module allows you to build
up nested HTML with a similar trick:
$Q->table( { -columns => 2,
-border => 1,
-rules => 'rows' },
$Q->Tr( { -valign => 'top' },
[ map $Q->td($_),
[ $arch_label, $arch_control ],
[ $model_label, $model_control ],
[ $product_label, $product_control ] ] ) ),
i haven't quite managed to get Tr(td([...])) to do what i want it
to, but i think i'm just being dense. also, note that "Tr" is
capitalized oddly because "tr" is a reserved word.
[2] using this idea for matching lines, we add:
| (defun tkil-re-whole-line (&rest res)
| (concat "^" (mapconcat 'identity res "") "$"))
and then change the main building code to be:
| (tkil-re-whole-line
| quote
| (tkil-re-group
| (tkil-re-quantify
| (tkil-re-inverse-charset quote) 1 'infinity))
| (tkil-re-quantify
| (tkil-re-charset quote comma) 0 'infinity)
| (tkil-re-group
| (tkil-re-quantify
| (tkil-re-inverse-charset quote) 1 'infinity))
| quote
| (tkil-re-quantify any-char 0 'infinity ))
More information about the LUG
mailing list