This article covers fallback strategies and why we almost never use them at Amazon. In the world of distributed systems, fallback strategies are among the most difficult challenges to handle, especially for time-sensitive services. Compounding this difficulty is that bad fallback strategies can take a long time (even years) to leave repercussions, and the difference between a good strategy and a bad strategy is subtle. In this article, the focus will be on how fallback strategies can cause more problems than they fix. We’ll include examples of where fallback strategies have caused problems at Amazon. Finally, we’ll discuss alternatives to fallback that we use at Amazon.
A Nickel's Worth
Thursday, December 19, 2019
Avoiding fallback in distributed systems
Challenges with distributed systems
Developing distributed utility computing services, such as reliable long-distance telephone networks, or Amazon Web Services (AWS) services, is hard. Distributed computing is also weirder and less intuitive than other forms of computing because of two interrelated problems. Independent failures and nondeterminism cause the most impactful issues in distributed systems. In addition to the typical computing failures most engineers are used to, failures in distributed systems can occur in many other ways. What’s worse, it’s impossible always to know whether something failed. This article reviews the concepts that contribute to why distributed computing is so, well, weird.
Thursday, May 26, 2016
Please Don't Write Bruby
This doc is about how to write Ruby code instead of writing hybrid yuck Bruby code. Ruby code, is, well, Ruby code; i.e., code written in Ruby and not some other language. I strongly believe that once you have chosen to write code in Ruby, you should try to keep writing code in Ruby. Specifically, please don't write Bruby, which is an unholy mishmash of Bash and Ruby. Bruby is hard to read, flaky, slower (usually), and always harder to reason about. It's even harder to write. There is almost no reason to shell out to Bash from Ruby.
Here's a canonical example of Bruby code:
output = `somecommand | egrep -v '^snord[123]' | sort | uniq` if $? == 0 # ...do stuff with 'output' such as: foo output end
There are a number of problems with this code. For one thing, it's unnecessary to use Bash in the first place. Ruby is just as good at grepping, regular expressions, sorting and uniqifying. The more you mix different programming languages, the more the reader of the code has to switch gears mentally. As a frequent code-reviewer, I implore you: please don't make your readers switch gears mentally (unless there's a really good reason), it causes brain damage after a while. Also, Bash pipelines easily hide bugs and mask failures. By default, Bash ignores errors from an "interior" command in a pipeline, as the following code illustrates:
def lower_nuclear_plant_control_rods # The following line will *not* raise an exception, even though it # will barf. rods = `cat /etc/plants/springfield/control_rods | sort --revesre | uniq` # 'rods' is now an empty string and $? will be 0, indicating that # everything is ok, when it isn't. if $? != 0 # This will never happen. Millions of lives will be lost in the # ensuing calamity. page_operator "Springfield Ops Center", :sev_1, "Meltdown Imminent" else rods.split.each do |rod| lower rod end end end
Despite the subtly bad arguments to sort
, the above code won't raise
an exception, and will fail to lower the control rods and also fail
to notify the operators (because $?
will be 0
). Thus the town of
Springfield will be wiped off the map. Do you want the same type of
errors to accidentally blow away all of your production databases from
an errant invocation of some admin script?
One way to fix the above would be to add the pipefail
option into
the string of Bash code:
rods = `set -o pipefail; cat /etc/plants/springfield/control_rods | sort --revesre | uniq`
But that's easy to forget, difficult to mechanistically catch, and ugly to read. The better solution is to remember that you can write Ruby in Ruby:
def lower_nuclear_plant_control_rods File.readlines('/etc/plants/springfield/control_rods').sort.reverse.uniq do |rod| lower rod end rescue page_operator "Springfield Ops Center", :sev_1, "Meltdown Imminent" raise end
The above code is shorter, easier to read (no switching mental gears) and is guaranteed to raise exceptions if something is wrong. The specific changes are:
cat
- Ruby's good at opening files, just use
File.readlines
if you want to read a file line-by-line. sort
- Ruby Enumerables have
sort
built in. --reverse
- Use Enumerable's built-in
reverse
method. uniq
- Use Enumerable's
uniq
method. - Error-handling
- Any errors in the invocation (for example
misspelling
reverse
asrevesre
) will result in an exception being raised).
Tips
The rest of this document contains a series of tips to help you write Ruby instead of Bruby.
Tip 1: Google for Pure Ruby Bash Equivalents
Whenever you're tempted to shell out in a Ruby script, stop, flagellate yourself 23 times with a birch branch, and then Google for an alternative in Pure Ruby. For example, imagine your script must create a directory, and not fail if the directory already exists, and also create any intermediary directories. But you don't want to write that code yourself because it's yucky and complicated and you know you'll fail to handle some edge condition properly. If you're familiar with Unix, your first inclination might be to write Bruby:
system "mkdir -p '#{ROOT_DIR}'" if $? != 0 raise "Unable to create directory #{ROOT_DIR}" end
The above is clunky. It's also easy to forget to check the value of
$?
, in which case your code will silently continue even though the
directory was not created. Fortunately, this is easy to fix. Just
Google for it (after flagellating yourself).
You will see that Ruby has a built-in version of mkdir -p
.
require 'fileutils' FileUtils.mkdir_p ROOT_DIR
This version is shorter and easier to read, and, most importantly, raises an exception if anything went wrong:
irb(main):666:0> FileUtils.mkdir_p "/bogon/adfadf" Errno::EACCES: Permission denied @ dir_s_mkdir - /bogon from /usr/local/Cellar/ruby/2.3.0/lib/ruby/2.3.0/fileutils.rb:253:in `mkdir' from /usr/local/Cellar/ruby/2.3.0/lib/ruby/2.3.0/fileutils.rb:253:in `fu_mkdir' from /usr/local/Cellar/ruby/2.3.0/lib/ruby/2.3.0/fileutils.rb:227:in `block (2 levels) in mkdir_p' from /usr/local/Cellar/ruby/2.3.0/lib/ruby/2.3.0/fileutils.rb:225:in `reverse_each' from /usr/local/Cellar/ruby/2.3.0/lib/ruby/2.3.0/fileutils.rb:225:in `block in mkdir_p' from /usr/local/Cellar/ruby/2.3.0/lib/ruby/2.3.0/fileutils.rb:211:in `each' from /usr/local/Cellar/ruby/2.3.0/lib/ruby/2.3.0/fileutils.rb:211:in `mkdir_p' from (irb):666 from /usr/local/bin/irb:11:in `<main>'
This obnoxious error might be obnoxious, but it also might save your life.
Tip 2: Keep unavoidable Bash usage to a minimum
Sometimes it does make sense to shell out. For example, it is hard to
find a Ruby equivalent of the command-line dig
DNS utility. In these
situations, don't throw out the baby with the bathwater; keep the Bash
to a minimum. For example, in Bruby, you would write:
dig_command = "dig +qr www.springfield.us any -x 127.0.0.1 isc.org ns +noqr" mail_server = `#{dig_command} | egrep -w MX | awk '{print $6}'`.chomp
Whereas in Ruby, by contrast, you should use Bash only to run dig
,
everything else (such as processing the standard output of dig
)
should be done in Ruby. The built-in Open3
module makes this
straightforward in many cases:
require 'open3' output, error, status = Open3.capture3 dig_command
Open3.capture3
returns a stream containing the standard output of
running dig_command
, as well as the status of running the command.
The next tip covers how to actually replicate the rest of the above
pipeline that populated the mail_server
variable.
Tip 3: Use Enumerable to replace Bash pipelines
A distinguishing feature of Bash is its ability to chain commands together using pipes, as illustrated in the previous tip:
mail_server = `#{dig_command} | egrep -w MX | awk '{print $6}'`.chomp
Most of the time you can replicate pipelines using Ruby's built-in
Enumerable
module. Almost everything you think might be an
Enumerable
actually is an Enumerable
: arrays, strings, open
files, etc. In particular, if you're trying to convert Bruby to Ruby,
you can use methods like IO.popen
, or better yet the methods in
Open3
, to get an Enumerable
(or else a string, which can be
converted into one with split
) over the standard output of that
process. From there, you can take advantage of methods such as
Enumerable.grep
(which, for example, seamlessly handles regular
expressions).
In those cases where Enumerable
itself doesn't immediately solve the
problem, you have the Ruby programming language itself at your beck
and call. For example, many of the features of Awk can be found
directly in Ruby (if you did enough programming language research
you'd probably dig up some indirect connection between the two
languages, but that shall be left as an exercise for the reader).
Here are all the equivalents of each part of the above Bash pipeline.
Bruby | Ruby |
---|---|
`#{dig_command}` |
Open3.capture3(dig_command) |
egrep -w MX |
split("\n").grep(/\WMX\W/) |
awk '{print $6}' |
split[5] |
Putting it all together:
require 'open3' dig_command = "dig +qr www.springfield.us any -x 127.0.0.1 isc.org ns +noqr" o, e, s = Open3.capture3 dig_command mail_server = if s.success? o.split("\n").grep(/\WMX\W/)[0].split[5] else raise "Failed: #{dig_command}\n#{e}" end
Here are some other common mappings from Bash to Enumerable
methods:
Bash | Enumerable |
---|---|
head -n 2 |
take(2) |
tail -n 2 |
reverse_each.take(2) |
sort |
sort |
uniq |
to_a.uniq |
grep |
grep |
grep -v |
grep_v |
Thursday, April 14, 2016
A Guide to Naming Variables
Names Considered Useful
Software is written for people to understand; variable names should be chosen accordingly. People need to comb through your code and understand its intent in order to extend or fix it. Too often, variable names waste space and hinder comprehension. Even well-intentioned engineers often choose names that are, at best, only superficially useful. This document is meant to help engineers choose good variable names. It artificially focuses on code reviews because they expose most of the issues with bad variable names. There are, of course, other reasons to choose good variable names (such as improving code maintenance). This document is also a work-in-progress, please send me any constructive feedback you might have on how to improve it.
Why Name Variables?
The primary reason to give variables meaningful names is so that a human can understand them. Code written strictly for a computer could just as well have meaningless, auto-generated names1:
int f1(int a1, Collection<Integer> a2) { int a5 = 0; for (int a3 = 0; a3 < a2.size() && a3 < a1; a3++) { int a6 = a2.get(a3); if (a6 >= 0) { System.out.println(a6 + " "); } else { a5++; } } System.out.println("\n"); return a5; }
All engineers would recognize the above code is needlessly difficult
to understand, as it violates two common guidelines: 1) don't
abbreviate, and 2) give meaningful names. Perhaps surprisingly, these
guidelines can be counter-productive. Abbreviation isn't always bad,
as will be discussed later. And meaningful is vague and subject to
interpretation. Some engineers think it means that names should always
be verbose (such as MultiDictionaryLanguageProcessorOutput
). Others
find the prospect of coming up with truly meaningful names daunting,
and give up before putting in much effort. Thus, even when trying to
follow the above two rules, a coder might write:
int processElements(int numResults, Collection<Integer> collection) { int result = 0; for (int count = 0; count < collection.size() && count < numResults; count++) { int num = collection.get(count); if (num >= 0) { System.out.println(num + " "); } else { result++; } } System.out.println("\n"); return result; }
Reviewers could, with effort, understand the above code more easily than the first example. The variable names are accurate and readable. But they're unhelpful and waste space, because:
processElements
- most code "processes" things (after all, code
runs on a "processor"), so
process
is seven wasted characters that mean nothing more that "compute".Elements
isn't much better. While suggestive that the function is going to operate on the collection, that much was already obvious. There's even a bug in the code that this name doesn't help the reader spot. numResults
- most code produces "results" (eventually); so, as
with
process
,Results
is seven wasted characters. The full variable name,numResults
is suggestive that it might be intended to limit the amount of output, but is vague enough to impose a mental tax on the reader. collection
- wastes space; it's obvious that it's a collection
because the previous tokens were
Collection<Integer>
. num
- simply recapitulates the type of the object (
int
) result
,count
- are coding cliches; as with
numResults
they waste space and are so generic they don't help the reader understand the code.
However, keep in mind the true purpose of variable names: the reader is trying to understand the code, which requires both of the following:
- What was the coder's intent?
- What does the code actually do?
To see how the longer variable names that this example used are actually a mental tax on the reader, here's a re-write of the function showing what meaning a reader would actually glean from those names:
int doSomethingWithCollectionElements(int numberOfResults, Collection<Integer> integerCollection) { int resultToReturn = 0; for (int variableThatCountsUp = 0; variableThatCountsUp < integerCollection.size() && variableThatCountsUp < numberOfResults; variableThatCountsUp++) { int integerFromCollection = integerCollection.get(count); if (integerFromCollection >= 0) { System.out.println(integerFromCollection + " "); } else { resultToReturn++; } } System.out.println("\n"); return resultToReturn; }
The naming changes have almost made the code worse than the auto-generated names, which, at least, were short. This rewrite shows that coder's intent is still mysterious, and there are now more characters for the reader to scan. Code reviewers review a lot of code; poor names make a hard job even harder. How do we make code reviewing less taxing?
On Code Reviews
There are two taxes on code reviewers' mental endurance: distance and boilerplate. Distance, in the case of variables, refers to how far away a reviewer has to scan, visually, in order to remind themselves what a variable does. Reviewers lack the context that coders had in mind when they wrote the code; reviewers must reconstruct that context on the fly. Reviewers need to do this quickly; it isn't worth spending as much time reviewing code as it took to write it2. Good variable names eliminate the problem of distance because they remind the reviewer of their purpose. That way they don't have to scan back to an earlier part of the code.
The other tax is boilerplate. Code is often doing something complicated; it was written by someone else; reviewers are often context-switching from their own code; they review a lot of code, every day, and may have been reviewing code for many years. Given all this, reviewers struggle to maintain focus during code reviews. Thus, every useless character drains the effectiveness of code reviewing. In any one small example, it's not a big deal for code to be unclear. Code reviewers can figure out what almost any code does, given enough time and energy (perhaps with some follow-up questions to the coder). But they can't afford to do that over and over again, year in and year out. It's death by 1,000 cuts.
A Good Example
So, to communicate intent to the code reviewer, with a minimum of characters, the coder could rewrite the code as follows:
int printFirstNPositive(int n, Collection<Integer> c) { int skipped = 0; for (int i = 0; i < c.size() && i < n; i++) { int maybePositive = c.get(i); if (maybePositive >= 0) { System.out.println(maybePositive + " "); } else { skipped++; } } System.out.println("\n"); return skipped; }
Let's analyze each variable name change to see why they make the code easier to read and understand:
printFirstNPositive
- unlike
processElements
, it's now clear what the coder intended this function to do (and there's a fighting chance of noticing a bug) n
- obvious given the name of the function, no need for a more complicated name
c
collection
wasn't worth the mental tax it imposed, so at least trim it by 9 characters to save the reader the mental tax of scanning boilerplate characters; since the function is short, and there's only one collection involved, it's easy to remember thatc
is a collection of integersskipped
- unlike
results
, now self-documents (without a comment) what the return value is supposed to be. Since this is a short function, and the declaration ofskipped
as anint
is plain to see, calling itnumSkipped
would have just wasted 3 characters i
- iterating through a
for
loop usingi
is a well-established idiom that everyone instantly understands. Give thatcount
was useless anyway,i
is preferable since it saves 4 characters maybePositive
num
just meant the same thingint
did, whereasmaybePositive
is hard to misunderstand and may help one spot a bug
It's also easier, now, to see there are two bugs in the code. In the
original version of the code, it wasn't clear if that the coder
intended to only print positive integers. Now the reader can notice
that there's a bug, because zero isn't positive (so n
should be
greater than 0, not greater-than-or-equals). (There should also be
unit tests). Furthermore, because the first argument is now called
maxToPrint
(as opposed to, say, maxToConsider
), it's clear the
function won't always print enough elements if there are any
non-positive integers in the collection. Rewriting the function
correctly is left as an exercise for the reader.
Naming Tenets (Unless You Know Better Ones)
- As coders our job is to communicate to human readers, not computers.
- Don't make me think. Names should communicate the coder's intent so the reader doesn't have to try to figure it out.
- Code reviews are essential but mentally taxing. Boilerplate must be minimized, because it drains reviewers' ability to concentrate on the code.
- We prefer good names over comments but can't replace all comments.
Cookbook
To live up to these tenets, here are some practical guidelines to use when writing code.
Don't Put the Type in the Name
Putting the type of the variable in the name of the variable imposes a mental tax on the reader (more boilerplate to scan over) and is often a poor substitute for thinking of a better name. Modern editors like Eclipse are also good at surfacing the type of a variable easily, making it redundant to add the type into the name itself. This practice also invites being wrong; I have seen code like this:
Set<Host> hostList = hostSvc.getHosts(zone);
The most common mistakes are to append Str
or String
to the name,
or to include the type of collection in the name. Here are some
suggestions:
Bad Name(s) | Good Name(s) |
---|---|
hostList, hostSet | hosts, validHosts |
hostInputStream | rawHostData |
hostStr, hostString | hostText, hostJson, hostKey |
valueString | firstName, lowercasedSKU |
intPort | portNumber |
More generally:
- Pluralize the variable name instead of including the name of a collection type
- If you're tempted to add a scalar type (int, String, Char) into your
variable name, you should either:
- Explain better what the variable is
- Explain what transformation you did to derive the new variable (lowercased?)
Use Teutonic Names Most of The Time
Most names should be Teutonic, following the spirit of languages like Norwegian, rather than the elliptical vagueness of Romance languages like English. Norwegian has more words like tannlege (literally "tooth doctor") and sykehus (literally "sick house"), and fewer words like dentist and hospital (which don't break down into other English words, and are thus confusing unless you already know their meaning). You should strive to name your variables in the Teutonic spirit: straightforward to understand with minimal prior knowledge.
Another way to think about Teutonic naming is to be as specific as
possible without being incorrect. For example, if a function is
hard-coded to only check for CPU overload, then name it
overloadedCPUFinder
, not unhealthyHostFinder
. While it may be used
to find unhealthy hosts, unhealthyHostFinder
makes it sound more
generic that it actually is.
// GOOD Set<Host> overloadedCPUs = hostStatsTable.search(overCPUUtilizationThreshold); // BAD Set<Host> matches = hostStatsTable.search(criteria); // GOOD List<String> lowercasedNames = people.apply(Transformers.toLowerCase()); // BAD List<String> newstrs = people.apply(Transformers.toLowerCase()); // GOOD Set<Host> findUnhealthyHosts(Set<Host> droplets) { } // BAD Set<Host> processHosts(Set<Host> hosts) { }
The exceptions to the Teutonic naming convention are covered later on in this section: idioms and short variable names.
It's also worth noting that this section isn't suggesting to never
use generic names. Code that is doing something truly generic should
have a generic name. For example, transform
in the below example is
valid because it's part of a generic string manipulation library:
class StringTransformer { String transform(String input, TransformerChain additionalTransformers); }
Move Simple Comments Into Variable Names
As illustrated earlier, variable names cannot (and should not) replace all comments. But if a short comment can fit into the variable name, that's probably where it should go. This is because:
- It's less visual clutter for the code reviewer to wade through (comments are a mental tax, so should provide true value)
- If a variable is used a long distance from the comment, the code reviewer doesn't have to break their focus and scroll back to the comment to understand the variable
For example,
// BAD String name; // First and last name // GOOD String fullName; // BAD int port; // TCP port number // GOOD int tcpPort; // BAD // This is derived from the JSON header String authCode; // GOOD String jsonHeaderAuthCode;
Avoid Over-used Cliches
In addition to not being Teutonic, the following variable names have been so horribly abused over the years that they should never be used, ever.
val
,value
result
,res
,retval
tmp
,temp
count
str
The moratorium on these names extends to variations that just add in a
type name (not a good idea anyway), such as tempString
, or intStr
,
etc.
Use Idioms Where Meaning is Obvious
Unlike the cliches, there are some idioms that are so widely-understood and unabused that they're safe to use, even if by the letter of they law they're too cryptic. Some examples are (these are Java/C specific examples, but the same principles apply to all languages):
- use of
i
,j
andk
in straightforwardfor
loops - use of
n
for a limit/quantity when it's obvious what it would do - use of
e
for an Exception in acatch
clause
// OK for (int i = 0; i < hosts.size(); i++) { } // OK String repeat(String s, int n);
Warning: idioms should only be used in cases where it's obvious what they mean.
May Use Short Names Over Short Distances When Obvious
Short names, even one-letter names, are preferable in certain circumstances. When reviewers see a long name, they tend to think they should pay attention to them, and then if the name turns out to be completely useless, they just wasted time. A short name can convey the idea that the only useful thing to know about the variable is its type. So it is okay to use short variable names (one- or two-letter), when both of the following are true:
- The distance between declaration and use isn't great (say within 5 lines, so the declaration is within the reader's field of vision)
- There isn't a better name for the variable than its type
- The reader doesn't have too many other things to remember at that point in the code (remember, studies show people can remember about 7 things).
Here's an example:
void updateJVMs(HostService s, Filter obsoleteVersions) { // 'hs' is only used once, and is "obviously" a HostService List<Host> hs = s.fetchHostsWithPackage(obsoleteVersions); // 'hs' is used only within field of vision of reader Workflow w = startUpdateWorkflow(hs); try { w.wait(); } finally { w.close(); } }
You could also write:
void updateJVMs(HostSevice hostService, Filter obsoleteVersions) { List<Host> hosts = hostService.fetchHostsWithPackage(obsoleteVersions); Workflow updateWorkflow = startUpdateWorkflow(hosts); try { updateWorkflow.wait(); } finally { updateWorkflow.close(); } }
But this takes up more space with no real gain; the variables are all
used so close to their source that the reader has no trouble keeping
track of what they mean. Furthermore, the long name updateWorkflow
signals to the reviewer that there's something significant about the
name. The reviewer then loses mental energy determining that the name
is just boilerplate. Not a big deal in this one case, but remember,
code reviewing is all about death by 1,000 cuts.
Remove Thoughtless One-Time Variables
One-time variables (OTVs), also known as garbage variables, are those variables used passively for intermediate results passed between functions. They sometimes serve a valid purpose (see the next cookbook items), but are often left in thoughtlessly, and just clutter up the codebase. In the following code fragment, the coder has just made the reader's life more difficult:
List<Host> results = hostService.fetchMatchingHosts(hostFilter); return dropletFinder(results);
Instead, coders should make a pass through their code and simplify to:
return dropletFinder(hostService.fetchMatchingHosts(hostFilter));
Use Short OTVs to Break Long Lines
Sometimes you need an OTV in order to break up a really long line:
List<Host> hs = hostService.fetchMatchingHosts(hostFilter); return DropletFinderFactoryFactory().getFactory().getFinder(region).dropletFinder(hs);
This is ok, although since the distance is so short, you can give
the OTV and one- or two-letter name (hs
) to save visual clutter.
Use Short OTVs to Break up Complicated Expressions
It can be difficult to read code like this:
return hostUpdater.updateJVM(hostService.fetchMatchingHosts(hostFilter), JVMServiceFactory.factory(region).getApprovedVersions(), RegionFactory.factory(region).getZones());
So it's ok to write:
List<Host> hs = hostService.fetchMatchingHosts(hostFilter); Set<JVMVersions> js = JVMServiceFactory.factory(region).getApprovedVersions(), List<AvailabilityZone> zs = RegionFactory(region).getZones(); return hostUpdater.updateJVM(hs, js, zs);
Again, because the distances are all short and the meanings are all obvious, short names can be used.
Use Longer OTVs to Explain Confusing Code
Sometimes you need OTVs simply to explain what code is doing. For example, sometimes you have to use poorly-named code that someone else (ahem) wrote, so you can't fix the name. But you can use a meaningful OTV to explain what's going on. For example,
List<Column> pivotedColumns = olapCube.transformAlpha(); updateDisplay(pivotedColumns);
Note in this case you should not use a short OTV:
List<Column> ps = olapCube.transformAlpha(); updateDisplay(ps);
That just adds too much visual clutter without helping to explain the code sufficiently.
Updates
- 26-May-2016
- fixed some of the issues brought up in the comments - thanks for the feedback!
Tuesday, April 28, 2009
Six Audacious Goals for Your System
- Make your entire system runnable (for debugging/development) on a single machine, via a single command,
- ...without manual configuration,
- ...without an Internet connection.
- Demonstrate that your system works flawlessly even when you permanently and unexpectedly unplug the power cable from any one machine (even your most precious database).
- Make one of your N-tier architectures M-tier, where M < N.
- Optimize every user-visible action so that it is perceived to complete in less than one second.
Wednesday, May 21, 2008
Ulrich Drepper's Memory Paper and CL
(declaim (optimize (speed 3) (safety 0)))Note the only difference between the two functions is the order in which the elements of the 2nd array are accessed. After compiling the above file I did the following from the CL prompt:
(defun matrix-multiply-fast? ()
(let ((m1 (make-array '(1000 1000) :element-type 'fixnum))
(m2 (make-array '(1000 1000) :element-type 'fixnum))
(m3 (make-array '(1000 1000) :element-type 'fixnum)))
(loop repeat 100
do
(loop for i upto 1000
do
(loop for j upto 1000
do
(setf (aref m3 i j) (* (aref m1 i j) (aref m2 i j))))))))
(defun matrix-multiply-slow? ()
(let ((m1 (make-array '(1000 1000) :element-type 'fixnum))
(m2 (make-array '(1000 1000) :element-type 'fixnum))
(m3 (make-array '(1000 1000) :element-type 'fixnum)))
(loop repeat 100
do
(loop for i upto 1000
do
(loop for j upto 1000
do
(setf (aref m3 i j) (* (aref m1 i j) (aref m2 j i))))))))
CL-USER> (time (matrix-multiply-fast?))Assuming the above is all correct, it sort-of duplicates Ulrich's result where accessing an array in a column-major order is slower (see figure 6.1 for an illustration). Now, I have to admit that, for lack of time, I didn't read Ulrich's paper in great depth and kind of lazily jumped to section six. I'd be curious to know if this sort of optimization is already well-known in the CL world (and/or for other "high-level" programming languages). Some casual googling turned up an excellent paper about optimizing CL via type annotations, but I didn't see anyone directly addressing Ulrich's points. I'd be curious if any readers of this blog know more.Evaluation took:
2.776 seconds of real time
2.508157 seconds of user run time
0.048003 seconds of system run time
[Run times include 0.028 seconds GC run time.]
0 calls to %EVAL
0 page faults and
12,000,160 bytes consed.
CL-USER> (time (matrix-multiply-slow?))Evaluation took:
3.926 seconds of real time
3.632227 seconds of user run time
0.016001 seconds of system run time
0 calls to %EVAL
0 page faults and
12,008,280 bytes consed.
Sunday, March 23, 2008
Optimizing CL
$soundex_nocode = undef;We do not implement the auxiliary behavior of the Perl version, namely we do not optionally accept a list of strings, nor do we support overriding the NIL return case. Both would be easy to implement but would be tangential to this blog entry. We port the rest of the functionality as faithfully as possible, though, so that the Perl can serve as a useful performance benchmark.
sub soundex
{
local (@s, $f, $fc, $_) = @_;
push @s, '' unless @s; # handle no args as a single empty string
foreach (@s)
{
$_ = uc $_;
tr/A-Z//cd;
if ($_ eq '')
{
$_ = $soundex_nocode;
}
else
{
($f) = /^(.)/;
tr/AEHIOUWYBFPVCGJKQSXZDTLMNR/00000000111122222222334556/;
($fc) = /^(.)/;
s/^$fc+//;
tr///cs;
tr/0//d;
$_ = $f . $_ . '000';
s/^(.{4}).*/$1/;
}
}
wantarray ? @s : shift @s;
}
Perl's soundex uses regular expressions and string substitution operators heavily, some having analogues in CL and some not. For example, CL lacks Perl's tr/// operator, so we implement a crude version:
(defparameter *ascii-table* (let ((table (make-array '(256) :element-type 'character)))Our TR supports only the limited case needed by SOUNDEX (i.e., mapping one set of characters to another). The Perl version can do more, such as removing letters that don't appear in the first set, and removing duplicates. In fact, the Perl soundex relies on that ability to remove duplicates, so we implement a function to do that as well.
(loop
for i below 256
do (setf (aref table i) (code-char i)))
table))
(defun tr (string from-table to-table)
"Crude version of Perl's tr/// operator."
(let ((table (copy-seq *ascii-table*)))
(loop
for from-char across from-table
and to-char across to-table
do (setf (aref table (char-code from-char)) to-char))
(map 'string
#'(lambda (c) (aref table (char-code c)))
string)))
(defun uniq! (seq)UNIQ! coalesces adjacent duplicate items into one item. CL's built-in DELETE-DUPLICATES doesn't work because it coalesces all duplicates, not just adjacent ones.
(cond
((> (length seq) 1)
(do* ((cur 0)
(cur-elt (elt seq cur) (elt seq cur))
(next 1 (1+ next)))
((>= next (length seq)) (subseq seq 0 (1+ cur)))
(let ((next-char (elt seq next)))
(unless (eql cur-elt next-char)
(incf cur)
(setf (elt seq cur) next-char)))))
(t seq)))
These two utility functions make porting the rest of the soundex easy. The following shows the CL equivalents of each important line of soundex:
Perl | What It Does | CL Equivalent |
---|---|---|
$_ = uc $_; | Uppercase the string | (string-upcase ...) |
tr/A-Z//cd; | Remove any non-upper-alpha characters | (remove-if-not 'alpha-char-p string) |
($f) = /^(.)/; | Gets the first character of the string | (char s 0) |
tr/AE.../00.../; | Map letters to digits values | (tr s "AE..." "00...") |
s/^$fc+//; | Remove leading copies of the character in $fc | (string-left-trim (vector fc) s2) |
tr///cs; | Remove adjacent duplicates | (uniq! ...) |
tr/0//d; | Delete any '0' characters | (delete #\0 ...) |
$_ = $f . $_ . '000'; | Concatenate, plus ensure length of at least 4 | (concatenate 'string ...) |
s/^(.{4}).*/$1/; | Strip off all but the first 4 characters | (subseq ... 0 4) |
Here is the actual code:
(defun soundex (string)
(let ((s (string-upcase (remove-if-not 'alpha-char-p string))))
(when (plusp (length s))
(let ((f (char s 0)))
(let* ((s2 (tr s "AEHIOUWYBFPVCGJKQSXZDTLMNR" "00000000111122222222334556"))
(fc (char s2 0)))
(setf s2 (delete #\0
(uniq! (string-left-trim (vector fc) s2))))
(subseq (concatenate 'string (vector f) s2 "000") 0 4))))))
Now let's see if it works. Using SLIME, type C-c C-k to compile the file. Then try SOUNDEX at the CL prompt:
CL-USER> (soundex "supercalifrag")If you try the Perl version, you'll find it returns the same thing (I tried other test cases, as well, but they are not relevant to this blog)."S162"
At this point you might be feeling rather proud of yourself, after all you ported that Perl code pretty quickly, right? And I bet it even performs better already; after all Perl is interpreted and CL is compiled! Let's verify that assumption using the TIME macro built in to CL:
CL-USER> (time (dotimes (i 100000) (soundex "supercalifrag")))Now let's compare that to the Perl code's performance:Evaluation took:
2.644 seconds of real time
2.640165 seconds of user run time
0.012001 seconds of system run time
[Run times include 0.076 seconds GC run time.]
0 calls to %EVAL
0 page faults and
207,987,480 bytes consed.
$ time ./soundex-bench.pl 100000D'oh! The Perl code kicked our ass! It's more than twice as fast as our CL! How could this have happened to us? Maybe we forgot to turn on some optimizations? We add:real 0m1.069s
user 0m1.064s
sys 0m0.008s
(declaim (optimize (speed 3) (safety 0)))
to the top of our file and recompile. Now let's see the results:CL-USER> (time (dotimes (i 100000) (soundex "supercalifrag")))Ok, that's a little better, but come on, it is still approximately twice as slow as the Perl! And look at how much memory is allocated ("consed") in the course of doing only 100,000 calls: that's approximately 175 megabytes (admittedly not all at once, but still, that's just plain embarrassing!)Evaluation took:
2.061 seconds of real time
2.040128 seconds of user run time
0.020001 seconds of system run time
[Run times include 0.1 seconds GC run time.]
0 calls to %EVAL
0 page faults and
183,988,752 bytes consed.
Now before diving into optimization, let us review a good approach to optimizing CL.
- Measure first.
- Avoid guessing!
- Fix your algorithm(s) first (not shown in this blog entry).
- Fix memory consumption next.
- Then go after CPU consumption, primarily by adding type information.
Following step one from the above strategy, start by profiling. Define a function, MANY-SOUNDEX that performs our TIME loop from above. Also define a function PROFILE-SOUNDEX that employs SBCL's sb-profile package to profile the functions involved in implementing SOUNDEX, including some built-ins that it calls. To profile a function, pass its name to SB-PROFILE:PROFILE. After exercising the code, call SB-PROFILE:REPORT, which prints timing information. Call SB-PROFILE:RESET between runs unless you want the results to accumulate (we don't, in this case).
(defun many-soundex ()If you notice some unfamiliar functions there, don't worry, they're going to be defined later; for the purposes of this blog I just want to define this function once, even though in real life I modified it many times.
(time
(dotimes (i 100000)
(soundex "supercalifrag"))))
(defun profile-soundex ()
;; Don't accumulate results between runs.
(sb-profile:reset)
;; Calling this every time through in case any of the user-defined
;; functions was recompiled.
(sb-profile:profile soundex soundex-tr uniq!
concatenate make-string subseq
string-left-trim delete-if-not
tr delete string-upcase nsubseq)
(many-soundex)
(sb-profile:report))
Let's see what it produces:
CL-USER> (profile-soundex)This shows that we probably should optimize TR for memory consumption first. What's wrong with TR? Every call to TR copies *ASCII-TABLE* into the TABLE local variable, which will later be used to map each character to a (potentially) different one. The inefficiency is that TABLE only varies based on the 2nd and 3rd arguments (FROM-TABLE and TO-TABLE). Since SOUNDEX always passes in the same thing for those two arguments every time, it is wasteful to continually recreate the same TABLE. To fix it use a closure that "closes over" a single instance of TABLE:3.713 seconds of real time
3.160197 seconds of user run time
0.548034 seconds of system run time
[Run times include 0.08 seconds GC run time.]
0 calls to %EVAL
0 page faults and
200,022,056 bytes consed.
seconds | consed | calls | sec/call | name
-----------------------------------------------------------
1.394 | 140,115,928 | 100,000 | 0.000014 | TR
0.618 | 26,230,456 | 100,000 | 0.000006 | CONCATENATE
0.210 | 661,840 | 100,000 | 0.000002 | REMOVE-IF-NOT
0.170 | 3,864,248 | 100,000 | 0.000002 | DELETE
0.164 | 9,865,784 | 100,000 | 0.000002 | SOUNDEX
0.131 | 0 | 100,000 | 0.000001 | UNIQ!
0.130 | 9,274,680 | 100,000 | 0.000001 | STRING-UPCASE
0.058 | 10,012,888 | 100,003 | 0.000001 | SUBSEQ
0.000 | 0 | 7 | 0.000000 | STRING-LEFT-TRIM
-----------------------------------------------------------
2.878 | 200,025,824 | 800,010 | | Total
(defun make-tr-fn (from-table to-table)This is an example of the sort of rote transformations you often need to do in order to speed up performance.
(let ((table (copy-seq *ascii-table*)))
(loop
for from-char across from-table
and to-char across to-table
do (setf (aref table (char-code from-char)) to-char))
(lambda (string)
(declare ((simple-array character) string))
(map-into string
#'(lambda (c) (aref table (char-code c)))
string))))
(defparameter *soundex-tr-fn* (make-tr-fn "AEHIOUWYBFPVCGJKQSXZDTLMNR" "00000000111122222222334556"))
(defun soundex-tr (string)
(funcall *soundex-tr-fn* string))
- Separate out the part of the function that varies dynamically into a LAMBDA.
- Return the LAMBDA instead of the original value so that it can be reused over and over again.
- Create a class.
- Have the constructor do the stuff that only needs to happen once (equivalent to the part outside of the LAMBDA.)
- Create a method that does the dynamic part.
You may notice another difference between MAKE-TR-FN and TR; it's now calling MAP-INTO instead of MAP, which avoids making a copy of the result, thus reducing the memory consumption further. Now this particular optimization does change the semantics of the function to modify its argument. In this case it's ok because we've already made a copy of the string passed to SOUNDEX, but do not use this technique without thinking through the consequences. Also, the DECLARE statement fixes a minor compiler warning caused by the change. DECLARE is discussed later.
When originally doing this work, I re-ran PROFILE-SOUNDEX after making each change, to see if it helped. That is not shown here in order to save space, but obviously you'd want to do the same thing. Let's now move on to the next biggest offender in the list, CONCATENATE. It's called towards the end of SOUNDEX in order to ensure that the return value is at least 4 characters (right-padded with '0' characters). This was a direct port of the Perl code (or at least as direct as you can get without using CL-PPCRE), and it's inefficient. Since the return value is always a 4-character string, we need only allocate a fixed-size string (using MAKE-STRING), prefilled with '0', and then copy in the (up to) 4 characters we need. We no longer need SUBSEQ which allocates a copy of its result (note: most of the memory consumed by SUBSEQ comes from it being called by UNIQ!).
In fact, since SUBSEQ is next on our list of memory hogs, let's find a way to fix it. When UNIQ! calls it, it's operating on a so-called "garbage" sequence (S2 from SOUNDEX); i.e., an intermediate result that is not used outside of the bowels of SOUNDEX. As such, we could modify it, so it is a shame to use SUBSEQ on it within UNIQ!. CL has a loosely-followed convention that "destructive" (non-copying) version of functions begin with the letter N. There is no built-in NSUBSEQ but a quick Google search finds one that works for our purposes:
;; From http://darcs.informatimago.com/lisp/common-lisp/utility.lispWe'll call this from UNIQ! and that should take care of that 10MB of allocations. The last bit of memory we're going to take care of is that allocated by STRING-UPCASE and REMOVE-IF-NOT. Now we have to be careful here because we want one copy of the argument to SOUNDEX; it would be rude to transform the caller's argument unexpectedly. So we actually relying on either STRING-UPCASE or REMOVE-IF-NOT making a copy of STRING. As such, we have to pick one to optimize. Since REMOVE-IF-NOT has a destructive equivalent, DELETE-IF-NOT, we will use that instead. Unfortunately, it's not as simple as just dropping in DELETE-IF-NOT. If you read the CLHS entry you'll discover that STRING-UPCASE is allowed to return the same string it was passed in (i.e., not a copy) if it doesn't need to change it (e.g., all the letters are already uppercase). We account for this by calling REMOVE-IF-NOT in the case where STRING-UPCASE does this by seeing if the return value is EQ (same object) to the original string. You can see this in the new code listing, which has certain key optimizations highlighted for your convenience:
(defun nsubseq (sequence start &optional (end nil))
"
RETURN: When the SEQUENCE is a vector, the SEQUENCE itself, or a displaced
array to the SEQUENCE.
When the SEQUENCE is a list, it may destroy the list and reuse the
cons cells to make the subsequence.
"
(if (vectorp sequence)
(if (and (zerop start) (or (null end) (= end (length sequence))))
sequence
(make-array (- (if end
(min end (length sequence))
(length sequence))
start)
:element-type (array-element-type sequence)
:displaced-to sequence
:displaced-index-offset start))
(let ((result (nthcdr start sequence)))
(when end
(setf (cdr (nthcdr (- end start -1) sequence)) nil))
result)))
(defparameter *ascii-table* (let ((table (make-array '(256) :element-type 'character)))As mentioned above, in the course of actually doing this work I ran PROFILE-SOUNDEX many times, but I don't show the intermediate results here in order to save space. Now that we've completed a major chunk of work, however, let's see how we're doing:
(loop
for i below 256
do (setf (aref table i) (code-char i)))
table))
(defun make-tr-fn (from-table to-table)
(let ((table (copy-seq *ascii-table*)))
(loop
for from-char across from-table
and to-char across to-table
do (setf (aref table (char-code from-char)) to-char))
(lambda (string)
(declare ((simple-array character) string))
(map-into string
#'(lambda (c) (aref table (char-code c)))
string))))
(defparameter *soundex-tr-fn* (make-tr-fn "AEHIOUWYBFPVCGJKQSXZDTLMNR" "00000000111122222222334556"))
(defun soundex-tr (string)
(funcall *soundex-tr-fn* string))
(defun uniq! (seq)
(cond
((> (length seq) 1)
(do* ((cur 0)
(cur-elt (elt seq cur) (elt seq cur))
(next 1 (1+ next)))
((>= next (length seq)) (nsubseq seq 0 (1+ cur)))
(let ((next-char (elt seq next)))
(unless (eql cur-elt next-char)
(incf cur)
(setf (elt seq cur) next-char)))))
(t seq)))
(defun soundex (string)
(let ((s (let ((maybe-a-copy (string-upcase string)))
;; STRING-UPCASE can return original string if no changes
;; were needed.
(if (eq maybe-a-copy string)
;; REMOVE-IF-NOT makes a copy
(remove-if-not 'alpha-char-p maybe-a-copy)
;; DELETE-IF-NOT doesn't
(delete-if-not 'alpha-char-p maybe-a-copy)))))
(when (plusp (length s))
(let ((f (char s 0)))
(let* ((s2 (soundex-tr s))
(fc (char s2 0))
(result (make-string 4 :initial-element #\0)))
(setf s2 (delete #\0 (uniq! (string-left-trim (vector fc) s2))))
(setf (char result 0) f)
(loop
for i from 0 below (min (length s2) 4)
do (setf (char result (1+ i)) (char s2 i)))
result)))))
CL-USER> (many-soundex)We run MANY-SOUNDEX here in a fresh instance of SBCL to remove any injected profiling code or other stuff that might affect the results. It also has speed optimizations turned on. As you can see, the benefit of simply removing memory allocations is significant. The new code is a whopping 4.4 times faster! It's also now more than twice as fast as the Perl code.Evaluation took:
0.464 seconds of real time
0.460029 seconds of user run time
0.0 seconds of system run time
[Run times include 0.008 seconds GC run time.]
0 calls to %EVAL
0 page faults and
22,402,176 bytes consed.
We could continue to to whittle away at the memory allocation, as you can see we're still at around 21MB of memory allocated. However, since we have to copy the argument to SOUNDEX no matter what, 21MB is only about twice the minimum (if you look at the profiling results, we can't do better than the memory allocated by STRING-UPCASE). To keep this blog entry to a reasonable length we shall deem this acceptable.
On to CPU optimization. SBCL automatically detects certain performance problems when you up SPEED to 3. Fix these first, as it makes little sense to manually search out problems when the compiler has already found some. Using SLIME makes it easy:
- Put
(declaim (optimize (speed 3) (debug 0) (safety 0)))
at the top of the file. - Type C-c C-k to compile the file.
- Hit M-n and M-p to get SLIME to highlight the next (or previous) compiler warning.
(defun soundex-tr (string)This example seems a little strange at first, if, say, you're used to primitive type systems such as Java's. CL types can be defined in sophisticated ways that deserve a blog entry of their own. In this particular case, the compiler has inferred the type of *SOUNDEX-TR-FN* to be (OR FUNCTION SYMBOL), meaning it isn't sure if it could sometimes be null (figuring that out would require "global" analysis to ensure that *SOUNDEX-TR-FN* is never modified by any function anywhere, which is still, apparently, beyond the scope of most compilers). We can fix the warning with a THE expression. THE is one of CL's ways of adding manual static typing (which you may be familiar with from more primitive languages such as Java) into your code. Although CL implementations such as SBCL perform type inference, the compiler occasionally needs your help:
(funcall *soundex-tr-fn* string)); note: unable to
; optimize away possible call to FDEFINITION at runtime
; due to type uncertainty:
; The first argument is a (OR FUNCTION SYMBOL), not a FUNCTION.
(funcall (the function *soundex-tr-fn*) ...)
Since we're sure that *SOUNDEX-TR-FN* is effectively constant (it's all private code under our own control, after all), it is safe to add in this THE.Another interesting set of warnings comes from NSUBSEQ:
(defun nsubseq (sequence start &optional (end nil))Those warnings go on for a while. What they all boil down to, though, is that the compiler doesn't know the type of START or END. Now here is a key point. Should you find yourself in this sort of situation, always remember what the great computer scientist Igor Stravinsky once said, "Lesser artists borrow, great artists steal." In other words, don't try to figure out the type declarations; steal them from SBCL. Using SLIME makes this easy. Just type M-. and then type SUBSEQ, because obviously it should take the same parameters. This will take you to the source code (you should always have the SBCL source code handy) for SUBSEQ:
(if (vectorp sequence)
(if (and (zerop start) (or (null end) (= end (length sequence))))
sequence
(make-array (- (if end
(min end (length sequence))
(length sequence))
start)
:element-type (array-element-type sequence)
:displaced-to sequence
:displaced-index-offset start))
(let ((result (nthcdr start sequence)))
(when end
(setf (cdr (nthcdr (- end start -1) sequence)) nil))
result))); in: DEFUN NSUBSEQ
; (ZEROP START)
; ==>
; (= START 0)
;
; note: unable to
; open-code FLOAT to RATIONAL comparison
; due to type uncertainty:
; The first argument is a NUMBER, not a FLOAT.
;
; note: unable to
; optimize
; due to type uncertainty:
; The first argument is a NUMBER, not a (COMPLEX SINGLE-FLOAT).
;
; note: unable to
; optimize
; due to type uncertainty:
; The first argument is a NUMBER, not a (COMPLEX DOUBLE-FLOAT).
...blah blah blah...
(defun subseq (sequence start &optional end)Since we only care about the vector case here, move the cursor to VECTOR-SUBSEQ* and hit M-. again, to see how it declares its arguments:
#!+sb-doc
"Return a copy of a subsequence of SEQUENCE starting with element number
START and continuing to the end of SEQUENCE or the optional END."
(seq-dispatch sequence
(list-subseq* sequence start end)
(vector-subseq* sequence start end)
(sb!sequence:subseq sequence start end)))
(defun vector-subseq* (sequence start end)Ah! So what is the type INDEX? Use M-. once more and you'll discover this:
(declare (type vector sequence))
(declare (type index start)
(type (or null index) end))
;; blah blah blah
(def!type index () `(integer 0 (,sb!xc:array-dimension-limit)))This makes sense. The type obviously has to be an integer, cannot be negative, and cannot be more than the maximum array length allowed by your CL implementation. This particular type definition is internal to SBCL and cannot be used directly, but we create our own. Improved NSUBSEQ looks like:
(deftype index () `(integer 0 ,array-dimension-limit))Note we use LOCALLY only when the argument is actually a vector; we can't make the same assumptions about other types of sequences. Plus we don't care about the list case (in fact a compiler warning for that case remains, but since we're not using lists in this example we'll skip the fix for it).
;; From http://darcs.informatimago.com/lisp/common-lisp/utility.lisp
(defun nsubseq (sequence start &optional (end nil))
(if (vectorp sequence)
(locally
(declare (index start)
((or null index) end))
(if (and (zerop start) (or (null end) (= end (length sequence))))
sequence
(make-array (- (if end
(min end (length sequence))
(length sequence))
start)
:element-type (array-element-type sequence)
:displaced-to sequence
:displaced-index-offset start)))
(let ((result (nthcdr start sequence)))
(when end
(setf (cdr (nthcdr (- end start -1) sequence)) nil))
result)))
Let's look at optimizing UNIQ!. When we compile it, we get a number of warnings such as:
(defun uniq! (seq)Here the compiler is telling us that it could make some optimizations if it knew for sure that the SEQ were a SIMPLE-ARRAY. What the heck is a simple array? Clicking link (or using C-c C-d h in SLIME), shows:
(cond
((> (length seq) 1)
(do* ((cur 0)
(cur-elt (elt seq cur) (elt seq cur))
(next 1 (1+ next)))
((>= next (length seq)) (nsubseq seq 0 (1+ cur)))
(let ((next-char (elt seq next)))
(unless (eql cur-elt next-char)
(incf cur)
(setf (elt seq cur) next-char)))))
(t seq))); in: DEFUN UNIQ!
; (LENGTH SEQ)
;
; note: unable to
; optimize
; due to type uncertainty:
; The first argument is a SEQUENCE, not a (SIMPLE-ARRAY * (*)).
;
; note: unable to
; optimize
; due to type uncertainty:
; The first argument is a SEQUENCE, not a VECTOR.
; (ELT SEQ CUR)
;
; note: unable to
; optimize
; due to type uncertainty:
; The first argument is a SEQUENCE, not a (SIMPLE-ARRAY * (*)).
...blah blah blah...
The type of an array that is not displaced to another array, has no fill pointer, and is notIn case you're wondering, "displaced" means an array slice (as in Perl or Python), and the other two concepts refer to various ways an array can be (or appear to be) extensible. Is it safe for UNIQ! to assume it is receiving a simple array? Yes; its input comes from STRING-LEFT-TRIM. Although STRING-LEFT-TRIM is allowed to return its input when there are no changes to be made, we know that it will always make a change because we always remove at least the first character of S2, and thus its return value will always be simple. So let's add a function declaration for UNIQ!:
expressly adjustable is a subtype of type simple-array. The concept of a simple array exists to
allow the implementation to use a specialized representation and to allow the user to declare that
certain values will always be simple arrays.
(declaim (ftype (function (simple-array) string) uniq!))
This tells the compiler that UNIQ!'s single argument is a SIMPLE-ARRAY and that it returns a STRING. The reason we say it returns a STRING (instead of a "simple" sequence type) is that UNIQ! uses NSUBSEQ, which does return an array slice. Unfortunately, we still get compiler warnings after making this change (albeit fewer):(cur-elt (elt seq cur) (elt seq cur))This is telling us that it could generate better code if it knew the sequence was a string. ELT is a generic accessor that works on any sequence type; presumably if the compiler knows the sequence type is a string it can take advantage of the fact that each element is the same size. Anyway, this is easy to fix. SOUNDEX only deals in strings, so we can safely change the declaration from SIMPLE-ARRAY to SIMPLE-STRING. Sure enough, this makes the compiler warnings disappear.; note: unable to
; optimize
; due to type uncertainty:
; The first argument is a (SIMPLE-ARRAY * (*)), not a SIMPLE-STRING.
...blah blah blah...
The remaining changes made were generally of the same nature as the above so are not covered here in detail, but you can see them in the following complete code listing that highlights all the CPU-related changes:
(declaim (optimize (speed 3) (debug 0) (safety 0)))
(defparameter *ascii-table* (let ((table (make-array '(256) :element-type 'character)))
(loop
for i below 256
do (setf (aref table i) (code-char i)))
table))
(defun make-tr-fn (from-table to-table)
(declare (simple-string from-table to-table)
(simple-array *ascii-table*))
(let ((table (the (simple-array character) (copy-seq *ascii-table*))))
(loop
for from-char across from-table
and to-char across to-table
do (setf (aref table (char-code from-char)) to-char))
(lambda (string)
(declare ((simple-array character) string))
(map-into string
#'(lambda (c) (aref table (char-code c)))
string))))
(defparameter *soundex-tr-fn* (make-tr-fn "AEHIOUWYBFPVCGJKQSXZDTLMNR" "00000000111122222222334556"))
(declaim (ftype (function (simple-string) simple-string) soundex-tr))
(defun soundex-tr (string)
(funcall (the function *soundex-tr-fn*) string))
(deftype index () `(integer 0 ,array-dimension-limit))
;; From http://darcs.informatimago.com/lisp/common-lisp/utility.lisp
(defun nsubseq (sequence start &optional (end nil))
"
RETURN: When the SEQUENCE is a vector, the SEQUENCE itself, or a displaced
array to the SEQUENCE.
When the SEQUENCE is a list, it may destroy the list and reuse the
cons cells to make the subsequence.
"
(if (vectorp sequence)
(locally
(declare (index start)
((or null index) end))
(if (and (zerop start) (or (null end) (= end (length sequence))))
sequence
(make-array (- (if end
(min end (length sequence))
(length sequence))
start)
:element-type (array-element-type sequence)
:displaced-to sequence
:displaced-index-offset start)))
(let ((result (nthcdr start sequence)))
(when end
(setf (cdr (nthcdr (- end start -1) sequence)) nil))
result)))
(declaim (ftype (function (simple-string) string) uniq!))
(defun uniq! (seq)
(let ((seq-len (length seq)))
(cond
((> seq-len 1)
(do* ((cur 0)
(cur-elt (elt seq cur) (elt seq cur))
(next 1 (1+ next)))
((>= next seq-len) (nsubseq seq 0 (1+ cur)))
(let ((next-char (elt seq next)))
(unless (eql cur-elt next-char)
(incf cur)
(setf (elt seq cur) next-char)))))
(t seq))))
(defun soundex (string)
(let ((s (the simple-string
(let ((maybe-a-copy (string-upcase string)))
;; STRING-UPCASE can return original string if no changes
;; were needed.
(if (eq maybe-a-copy string)
;; REMOVE-IF-NOT makes a copy
(remove-if-not 'alpha-char-p maybe-a-copy)
;; DELETE-IF-NOT doesn't
(delete-if-not 'alpha-char-p maybe-a-copy))))))
(when (plusp (length s))
(let ((f (char s 0)))
(let* ((s2 (soundex-tr s))
(fc (char s2 0))
(result (make-string 4 :initial-element #\0)))
(setf s2 (the string (delete #\0 (uniq! (string-left-trim (vector fc) s2)))))
(setf (char result 0) f)
(let ((end (min (length s2) 4)))
(loop
for i from 0 below end
do (setf (char result (1+ i)) (char s2 i))))
result)))))
Here's the effect of the CPU optimizations:
CL-USER> (many-soundex)Although the results vary from run to run, 0.357 seconds is a pretty typical on my hardware. This is approximately a 23% reduction in time compared to before. Not nearly as big of a gain, which is why one should optimize memory first. Still, 23% is nothing to sneeze at for a relatively small number of changes, most of which were pointed out to us by the compiler! If you're interested in learning more about optimizing CL, I recommend checking out SBCL's SB-SPROF package, which is more sophisticated than the SB-PROFILE package used here.
Evaluation took:
0.357 seconds of real time
0.360022 seconds of user run time
0.0 seconds of system run time
[Run times include 0.008 seconds GC run time.]
0 calls to %EVAL
0 page faults and
22,406,384 bytes consed.
Avoiding fallback in distributed systems
As previously mentioned , I was recently able to contribute to the Amazon Builders' Library . I'd also like to share another post t...
-
Names Considered Useful Software is written for people to understand; variable names should be chosen accordingly. People ...
-
What's the connection between soda and the failure of programming languages? I would have thought "zero" had I not recently s...
-
Tip #0: Use Emacs 22 Emacs 22 is super stable. About half of my .emacs file (before I cleaned it up) was loading stuff that's now part ...