*sometimes*need to do to achieve excellent performance. While the paper concentrates on C, I was wondering if some of the same effects could be observed in a high-level language like CL (for the record, I think CL is

*both*high- and low-level, but whatever...) I did a quick experiment which

*suggests*that at least one of Ulrich's optimizations works in CL. [Note: the following is

*not*an attempt to produce mathematically correct results, as Ulrich did; I just wanted to see if the order in which memory was accessed seemed to matter.]:

(declaim (optimize (speed 3) (safety 0)))Note the only difference between the two functions is the order in which the elements of the 2nd array are accessed. After compiling the above file I did the following from the CL prompt:

(defun matrix-multiply-fast? ()

(let ((m1 (make-array '(1000 1000) :element-type 'fixnum))

(m2 (make-array '(1000 1000) :element-type 'fixnum))

(m3 (make-array '(1000 1000) :element-type 'fixnum)))

(loop repeat 100

do

(loop for i upto 1000

do

(loop for j upto 1000

do

(setf (aref m3 i j) (* (aref m1 i j) (aref m2 i j))))))))

(defun matrix-multiply-slow? ()

(let ((m1 (make-array '(1000 1000) :element-type 'fixnum))

(m2 (make-array '(1000 1000) :element-type 'fixnum))

(m3 (make-array '(1000 1000) :element-type 'fixnum)))

(loop repeat 100

do

(loop for i upto 1000

do

(loop for j upto 1000

do

(setf (aref m3 i j) (* (aref m1 i j) (aref m2 j i))))))))

CL-USER> (time (matrix-multiply-fast?))Assuming the above is all correct, it sort-of duplicates Ulrich's result where accessing an array in a column-major order is slower (see figure 6.1 for an illustration). Now, I have to admit that, for lack of time, I didn't read Ulrich's paper in great depth and kind of lazily jumped to section six. I'd be curious to know if this sort of optimization is already well-known in the CL world (and/or for other "high-level" programming languages). Some casual googling turned up an excellent paper about optimizing CL via type annotations, but I didn't see anyone directly addressing Ulrich's points. I'd be curious if any readers of this blog know more.Evaluation took:

2.776 seconds of real time

2.508157 seconds of user run time

0.048003 seconds of system run time

[Run times include 0.028 seconds GC run time.]

0 calls to %EVAL

0 page faults and

12,000,160 bytes consed.

CL-USER> (time (matrix-multiply-slow?))Evaluation took:

3.926 seconds of real time

3.632227 seconds of user run time

0.016001 seconds of system run time

0 calls to %EVAL

0 page faults and

12,008,280 bytes consed.