Tuesday, February 12, 2008

Scripting in CL

CL has been overlooked as a scripting language, even though scripting in CL is faster than it is in traditional scripting languages. CL's interactivity and macro system make it ideal for quick-and-dirty hacking.

For one thing, its syntax is so simple that almost anything can be typed in at the CL command-line:
CL-USER> 23
23
That's a valid CL "program". It returns 23. So say that you forgot how part of the language works; resolve the problem by typing some test code at the command-line without breaking your concentration. For example, say you forget how division works in CL; you may have forgotten whether CL has support for rationals. Don't look it up, just try it out immediately:
CL-USER> (/ 23 45)
23/45
Sure enough, it doesn't return a float! This interactivity isn't just useful for exploring (or remembering) basic language features. It's also how you script in CL.

For example, here is how to write a CL version of the popular Unix utility grep, which you might then use as the basis for a log analyzer or some other script. Start by working interactively on a small part of the problem, such as how to see if a string contains a regexp. Of course our goal is to pick lines out of a file, but let's ignore the file part for now, we have to walk before we run. First define some sample data:
CL-USER> (defparameter *lines* "hi there
my name is bob
I live in Kalamazoo")
Now google for how to use regular expressions in CL. All roads lead to CL-PPCRE. To install it, do this:
CL-USER> (require :asdf-install) ; if you haven't already
CL-USER> (asdf-install:install :cl-ppcre) ; if you don't have it already
To use it, do this:
CL-USER> (require :cl-ppcre)
CL-USER> (use-package :cl-ppcre) ; to save typing cl-ppcre: a lot
Next use SLIME's autocomplete to figure out what functions are available:
CL-USER> (cl-ppcre:<TAB>
Flags: boundp fboundp generic-function class macro special-operator package

Completion: Flags: Score:
---------------------------------- ------- --------
cl-ppcre: b------ 98.44
cl-ppcre-test: b------ 90.11
cl-ppcre:*allow-named-registers* b------ 0.98
cl-ppcre:*allow-quoting* b------ 0.98
cl-ppcre:*regex-char-code-limit* b------ 0.98
cl-ppcre:*use-bmh-matchers* b------ 0.98
cl-ppcre:all-matches -f----- 0.98
cl-ppcre:all-matches-as-strings -f----- 0.98
cl-ppcre:create-scanner -fg---- 0.98
cl-ppcre:define-parse-tree-synonym -f--m-- 0.98
cl-ppcre:do-matches -f--m-- 0.98
cl-ppcre:do-matches-as-strings -f--m-- 0.98
cl-ppcre:do-register-groups -f--m-- 0.98
cl-ppcre:do-scans -f--m-- 0.98
cl-ppcre:parse-tree-synonym -f----- 0.98
cl-ppcre:ppcre-error ---c--- 0.98
...etc...
Although this technique may seem haphazard, in practice one finds what one wants in about a minute or two (at most), typically without losing one's concentration. Most of the functions can be trivially filtered out simply based on their names (e.g., PARSE-TREE-SYNONYM is obviously not what I'm looking for). Of the rest, you can either quickly check their documentation (C-c C-d d if you're using SLIME) or else, heck, just try them out immediately. In this particular case I settled on SCAN.
CL-USER> (scan "Kalamazoo" *line*)
34
43
#()
#()
It doesn't matter what those return values mean (probably the offset of the match is in there somewhere), the key point is that SCAN would have returned NIL if there hadn't been a match. Or so we hope. We had better verify that (quickly; don't look at the documentation, just try it!):
CL-USER> (scan "snord" *line*)
NIL

Ok, now how do we read in a line from a file? Well, first we have to open the file:
CL-USER> (open "/etc/motd")
#<SB-SYS:FD-STREAM for "file /etc/motd" {A78C2E9}>
That looks promising. A little more research on Google shows that it's best to use WITH-OPEN-FILE because that will automatically close the file (even if an exception is thrown).
CL-USER> (with-open-file (stream "/etc/motd"))
NIL
How do you read a line from a file? Turns out there's a function READ-LINE:
CL-USER> (with-open-file (stream "/etc/motd") (read-line stream nil nil))
"Linux D830 2.6.22-14-generic #1 SMP Fri Feb 1 04:59:50 UTC 2008 i686"
Ok, so we can read lines from a file, now what? Well, let's play around with the LOOP macro to print out all the lines of the file (don't worry about the regexp matching just yet):
CL-USER> (defparameter *motd* (open "/etc/motd"))
*MOTD*
CL-USER> (loop for line = (read-line *motd* nil nil)
while line
do (format t "~A~%" line))

Linux D830 2.6.22-14-generic #1 SMP Fri Feb 1 04:59:50 UTC 2008 i686

The programs included with the Ubuntu system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by
applicable law.
Ok, that worked. Now let's scan for all the lines that contain the word "Ubuntu".
CL-USER> (defparameter *motd* (open "/etc/motd"))
*MOTD*
CL-USER> (loop for line = (read-line *motd* nil nil)
while line
do (when (scan "Ubuntu" line)
(format t "~A~%" line)))

The programs included with the Ubuntu system are free software;
Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by
Ok, that looks good. Now let's write a function with everything we just did:
(defun grep (regexp filename)
(with-open-file (stream filename)
(loop for line = (read-line stream nil nil)
while line
do (when (scan regexp line)
(format t "~A~%" line)))))
CL-USER> (grep "Ubuntu" "/etc/motd")
The programs included with the Ubuntu system are free software;
Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by
At this point you've seen what a typical CL scripting session is like, working from the bottom up.

Now the above was a pretty simplistic example, and it used (apart from CL-PPCRE) good old fashioned CL, which hasn't changed since the mid-1980s. Since then, scripting languages like Perl have been invented by smart people like Larry Wall, having many built-in features that are missing from stock CL. Why suffer without them? It would seem that you have to either choose between a general-purpose, compiled language like CL or purpose-built, interpreted Perl. Fortunately, this is not the case! I've found that each piece of Perl functionality I've wanted has taken on the order of 45 minutes to 1 hour to implement using CL's powerful DEFMACRO facility.

Let's take, for example, Perl's magic <> operator. In Perl, you can write:
#!/usr/bin/env perl
# file: "foo.pl"

for $line (<>) {
print $line;
}
Which will slurp up all the filenames on the command-line and let you iterate over each line. E.g.,
% foo.pl /etc/motd /etc/resolv.conf
would print out those files, sort of like cat. Wouldn't it be nice to have a similar feature in CL? Let's start by taking our GREP function and adding the ability to customize the action taken on each line:
(defun do-lines (fn filename)
(with-open-file (stream filename)
(loop for line = (read-line stream nil nil)
while line
do (when (scan regexp line)
(funcall fn line)))))
DO-LINES is almost exactly the same as GREP, except that it just calls a function on each line of the file rather than assume grep-like behavior. To verify that we can implement GREP in terms of DO-LINES we try the following at the command-line:
CL-USER> (do-lines (lambda (line)
(when (scan "Ubuntu" line) (format t "~A~%" line))) "/etc/motd")

The programs included with the Ubuntu system are free software;
Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by
Isn't that great? Baaaarf. Not really, eh? Kind of verbose compared to the Perl code, and it doesn't even handle multiple files like the Perl code does!

What we'd really like to be able to do is something CL-esque yet approximately equal to the Perl code in brevity. This sounds like a good use for a CL macro. Let's start by just fantasizing about what we'd like:
(with-lines-from-files (line "/etc/motd" "/etc/resolv.conf") 
(format t "~A~%" line)) ; just a simple example, could put anything here
Well, before going further, we need to enhance DO-LINES to take more than one file:
(defun do-all-lines (fn &rest filenames)
(dolist (filename-or-list filenames)
;; For convenience we allow you also to pass in lists of filenames
(dolist (filename (typecase filename-or-list
(cons filename-or-list)
(t (list filename-or-list))))
(with-open-file (stream filename)
(loop
for line = (read-line stream nil nil)
while line
do (funcall fn line))))))
Now we just need to write a macro so that you don't have to write all that crazy LAMBDA stuff:
(defmacro with-lines-from-files ((var &rest filenames) &body body)
`(do-all-lines (lambda (,var) ,@body) ,@filenames))
Armed with this macro, we can re-write GREP with way fewer lines (and the ability to grep across multiple files):
(defun grep (regexp &rest filenames)
(with-lines-from-files (line filenames)
(when (scan regexp line)
(format t "~A~%" line))))
CL-USER> (grep "Ubuntu|name" "/etc/motd" "/etc/resolv.conf")
The programs included with the Ubuntu system are free software;
Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by
nameserver 68.87.69.146
nameserver 68.87.85.98
If it weren't for macros, we'd be stuck implementing grep with the clunky use of LAMBDA:
(defun clunky-grep (regexp &rest filenames)
(do-all-lines (lambda (line)
(when (scan regexp line)
(format t "~A~%" line)))
filenames))
The LAMBDA-free version of GREP is much easier to read, it almost reads like English: "with lines from the files, when lines scan the regexp, format the line". Whereas the LAMBDA version reads "with lines, LAMBDA (huh!!!?), when lines scan the regexp, format the line, across the filenames". No contest!

Similarly, other Perl operations can be easily implemented with macros:
;; The following macro is mostly like Perl's foreach, in the sense
;; that you can pass in as many references to sequences or "scalars"
;; as you want and it will iterate over them and allow you to modify
;; them. Unlike the Perl code, it sets the variable IT to each
;; element rather than $_. Also, you have to just pass in the hash
;; table directly, not a flattened list of hash keys.
(defmacro perl-foreach ((&rest refs) &body body)
(let* ((gensyms (loop repeat (length refs) collect (gensym))))
(list*
'let
(mapcar #'list gensyms refs)
(loop
for ref in refs
and indirect-ref in gensyms
collect
`(typecase ,indirect-ref
(hash-table
(maphash #'(lambda (key value)
(declare (ignore value))
(symbol-macrolet ((it (gethash key ,indirect-ref)))
,@body))
,indirect-ref))
((and (or vector list) (not string))
(map-into ,indirect-ref
#'(lambda (it)
,@body
it)
,indirect-ref))
(t
(symbol-macrolet ((it ,ref))
,@body)))))))

;; trim whitespace in the scalar, the list, the array, and all the
;; values in the hash
(perl-foreach (scalar my-list my-array my-hash)
(setf it (regex-replace "^\\s+" it ""))
(setf it (regex-replace "\\s+$" it "")))
Or:
(defmacro perl-splice (sequence-place &optional (offset 0) length replacement-sequence)
(let* ((seq (gensym "SEQUENCE-PLACE-"))
(off-arg (gensym "OFFSET-ARG-"))
(off (gensym "OFFSET-"))
(len (gensym "LENGTH-"))
(end (gensym "END-"))
(rep (gensym "REPLACEMENT-SEQUENCE-"))
(left-part (list `(subseq ,seq 0 ,off)))
(right-part (when length
(list `(subseq ,seq ,end)))))
`(let* ((,seq ,sequence-place)
(,off-arg ,offset)
(,off (if (minusp ,off-arg)
(+ (length ,seq) ,off-arg)
,off-arg))
(,len ,length)
(,end (when ,len
(if (minusp ,len)
(+ (length ,seq) ,len)
(+ ,off ,len))))
(,rep ,replacement-sequence))
(prog1 (subseq ,seq ,off ,end)
(when (or ,rep (not (eql ,off ,end)))
(setf ,sequence-place (concatenate (typecase ,seq
(cons 'list)
(t 'vector))
,@left-part
,rep
,@right-part)))))))

;; Now the syntax is almost exactly the same.
(setf front (perl-splice my-array 0 n))
(setf end (perl-splice my-array 0 (- n)))

Avoiding fallback in distributed systems

As previously mentioned , I was recently able to contribute to the Amazon Builders' Library . I'd also like to share another post t...