[lug] XEmacs quoting madness!

Tkil tkil at scrye.com
Mon Aug 20 23:49:11 MDT 2001


>>>>> "dajo" == dajo  <David> writes:

dajo> (concat
dajo>  ;; beginning of line + initial quote
dajo>  (bas-regcomp-leading-anchor)                         ; "^"
dajo>  (bas-lispify-string  (bas-codestring-quote))         ; "\\\"" see below
 
dajo>  ;; software bit
dajo>  (bas-regcomp-group-open)                             ; "\\("
dajo>  (bas-make-character-set  (list                       ; "[^\\\"]"
dajo>                            (bas-regcomp-diy-not)
dajo>                            (bas-regcomp-quote)))
dajo>  (bas-regcomp-one-or-many)                            ; "+"
dajo>  (bas-regcomp-group-close)                            ; "\\)"
 
dajo>  ;; intervening commas and quotes
dajo>  (bas-make-character-set  (list                       ; "[\\\",]"
dajo>                            (bas-regcomp-quote)
dajo>                            (bas-codestring-comma)))
dajo>  (bas-regcomp-one-or-many)                            ; "+"
 
dajo>  ;; description
dajo>  (bas-regcomp-group-open)                             ; "\\("
dajo>  (bas-make-character-set  (list                       ; "[^\\\"]"
dajo>                            (bas-regcomp-diy-not)
dajo>                            (bas-regcomp-quote)))
dajo>  (bas-regcomp-one-or-many)                            ; "+"
dajo>  (bas-regcomp-group-close)                            ; "\\)"
 
dajo>  ;; rest of line.
dajo>  (bas-regcomp-any-character)                          ; "."
dajo>  (bas-regcomp-zero-or-many)                           ; "*"
dajo>  (bas-regcomp-trailing-anchor)                        ; "$"
dajo>  )

letting my mind wander, i see two things i'd change about this.
first, i'd use scoping (let ...) to use shorter names (i like
terseness!).  (having full namespaces available would be nice, too;
see below.)

second, i'd use function calls of some sort to handle the groups and
possibly the quantifiers.  you already do this for creating charsets;
i'd just suggest extending it for creating groups, and any other
constructs that have to be "balanced". [1] it might also make the
quantifiers a little more "readable", in a way:

|  (one-or-more (charset comma quote))

this flipping of the postfix to a prefix notation seems to fit better
for reading code out loud (which, oddly enough, i've often found to be
a good measure of self-documenting code).

sick little monkey: i could even use this idea for a whole-line
matcher.  [2] Jeffrey Friedl also talks about standard patterns for
dealing with matched delimiters such as quoted material; this type of
functional building technique should be a near-perfect match for
that.

further magic would make use of macros for this, but my lisp is a bit
too rusty to venture there.  i suspect that most of the "mapconcat"
stuff, in particular, could go away.

taken together, this would allow for an interesting representation of
the slightly tree-like nature of this particular regex.

| (defun tkil-re-group (re)
|   (concat "\\(" re "\\)"))
| 
| (defun tkil-re-charset (&rest chars)
|   (concat "[" (mapconcat 'identity chars "") "]"))
| 
| (defun tkil-re-inverse-charset (&rest chars)
|   (concat "[^" (mapconcat 'identity chars "") "]"))
| 
| (defun tkil-re-quantify (re min max)
|   (concat re 
|           (cond ((and (equal min 0) (equal max 1))         "?")
|                 ((and (equal min 0) (equal max 'infinity)) "*")
|                 ((and (equal min 1) (equal max 1))         "")
|                 ((and (equal min 1) (equal max 'infinity)) "+")
|                 (t (concat "\\{" min "," max "\\}")))))
| 
| (defun tkil-re-silly ()
|   (interactive)
|   (insert
|    (let ((quote             "\\\"")
|          (comma             ",")
|          (beginning-of-line "^")
|          (end-of-line       "$")
|          (any-char          "."))
|      (concat beginning-of-line
|              quote
|              (tkil-re-group
|               (tkil-re-quantify
|                (tkil-re-inverse-charset quote) 1 'infinity))
|              (tkil-re-quantify
|               (tkil-re-charset quote comma) 0 'infinity)
|              (tkil-re-group
|               (tkil-re-quantify
|                (tkil-re-inverse-charset quote) 1 'infinity))
|              quote
|              (tkil-re-quantify any-char 0 'infinity )
|              end-of-line))))

if we don't care about namespace pollution, we could strip the
"tkil-re-" off the front of all the functions, and it becomes even
more readable:

|      (concat beginning-of-line
|              quote
|              (group
|               (quantify
|                (inverse-charset quote) 1 'infinity))
|              (quantify
|               (charset quote comma) 0 'infinity)
|              (group
|               (quantify
|                (inverse-charset quote) 1 'infinity))
|              quote
|              (quantify any-char 0 'infinity )
|              end-of-line))))

trying to go one step further, i hit an interface problem.  i tried to
create:

| (defun tkil-re-match-one-of (&rest res)
|   (mapconcat 'identity res "\\|"))

but should it force a grouping?  to really match its name, it should;
doing that would lose the one-to-one correspondence between
"tkil-re-group" and submatches that we'd otherwise have.  note that
there's a workaround for this in perl: the (?:...) 
group-but-don't-save option.

in a related vein, take a look at what Abigail did to generate a
regular expression that matches (a superset of all) URLs.  the
background is at:

   http://www.foad.org/~abigail/Perl/url2.html

the program is at:

   http://www.foad.org/~abigail/Perl/url3.pl

and the actual output is at:

   http://www.foad.org/~abigail/Perl/url3.regex

t.

[1] this lets lisp scoping close things properly for you, instead of
    relying on the programmer to remember the appropriate closing tag.
    as a similar example, the perl CGI.pm module allows you to build
    up nested HTML with a similar trick:

           $Q->table( { -columns => 2,
                        -border  => 1,
                        -rules   => 'rows' },
                      $Q->Tr( { -valign => 'top' },
                              [ map $Q->td($_),
                                [ $arch_label,    $arch_control    ],
                                [ $model_label,   $model_control   ],
                                [ $product_label, $product_control ] ] ) ),

    i haven't quite managed to get Tr(td([...])) to do what i want it
    to, but i think i'm just being dense.  also, note that "Tr" is
    capitalized oddly because "tr" is a reserved word.

[2] using this idea for matching lines, we add:

    | (defun tkil-re-whole-line (&rest res)
    |   (concat "^" (mapconcat 'identity res "") "$"))

    and then change the main building code to be:

    |      (tkil-re-whole-line
    |        quote
    |        (tkil-re-group
    |         (tkil-re-quantify
    |          (tkil-re-inverse-charset quote) 1 'infinity))
    |        (tkil-re-quantify
    |         (tkil-re-charset quote comma) 0 'infinity)
    |        (tkil-re-group
    |         (tkil-re-quantify
    |          (tkil-re-inverse-charset quote) 1 'infinity))
    |        quote
    |        (tkil-re-quantify any-char 0 'infinity ))



More information about the LUG mailing list