Thursday, May 26, 2016

Please Don't Write Bruby

This doc is about how to write Ruby code instead of writing hybrid yuck Bruby code. Ruby code, is, well, Ruby code; i.e., code written in Ruby and not some other language. I strongly believe that once you have chosen to write code in Ruby, you should try to keep writing code in Ruby. Specifically, please don't write Bruby, which is an unholy mishmash of Bash and Ruby. Bruby is hard to read, flaky, slower (usually), and always harder to reason about. It's even harder to write. There is almost no reason to shell out to Bash from Ruby.

Here's a canonical example of Bruby code:

output = `somecommand | egrep -v '^snord[123]' | sort | uniq`
if $? == 0
  # stuff with 'output' such as:
  foo output

There are a number of problems with this code. For one thing, it's unnecessary to use Bash in the first place. Ruby is just as good at grepping, regular expressions, sorting and uniqifying. The more you mix different programming languages, the more the reader of the code has to switch gears mentally. As a frequent code-reviewer, I implore you: please don't make your readers switch gears mentally (unless there's a really good reason), it causes brain damage after a while. Also, Bash pipelines easily hide bugs and mask failures. By default, Bash ignores errors from an "interior" command in a pipeline, as the following code illustrates:

def lower_nuclear_plant_control_rods
  # The following line will *not* raise an exception, even though it
  # will barf.
  rods = `cat /etc/plants/springfield/control_rods | sort --revesre | uniq`
  # 'rods' is now an empty string and $? will be 0, indicating that 
  # everything is ok, when it isn't.
  if $? != 0
    # This will never happen. Millions of lives will be lost in the
    # ensuing calamity.
    page_operator "Springfield Ops Center", :sev_1, "Meltdown Imminent"
    rods.split.each do |rod|
      lower rod

Despite the subtly bad arguments to sort, the above code won't raise an exception, and will fail to lower the control rods and also fail to notify the operators (because $? will be 0). Thus the town of Springfield will be wiped off the map. Do you want the same type of errors to accidentally blow away all of your production databases from an errant invocation of some admin script?

One way to fix the above would be to add the pipefail option into the string of Bash code:

rods = `set -o pipefail; cat /etc/plants/springfield/control_rods | sort --revesre | uniq`

But that's easy to forget, difficult to mechanistically catch, and ugly to read. The better solution is to remember that you can write Ruby in Ruby:

def lower_nuclear_plant_control_rods
  File.readlines('/etc/plants/springfield/control_rods').sort.reverse.uniq do |rod|
    lower rod
  page_operator "Springfield Ops Center", :sev_1, "Meltdown Imminent"

The above code is shorter, easier to read (no switching mental gears) and is guaranteed to raise exceptions if something is wrong. The specific changes are:

Ruby's good at opening files, just use File.readlines if you want to read a file line-by-line.
Ruby Enumerables have sort built in.
Use Enumerable's built-in reverse method.
Use Enumerable's uniq method.
Any errors in the invocation (for example misspelling reverse as revesre) will result in an exception being raised).


The rest of this document contains a series of tips to help you write Ruby instead of Bruby.

Tip 1: Google for Pure Ruby Bash Equivalents

Whenever you're tempted to shell out in a Ruby script, stop, flagellate yourself 23 times with a birch branch, and then Google for an alternative in Pure Ruby. For example, imagine your script must create a directory, and not fail if the directory already exists, and also create any intermediary directories. But you don't want to write that code yourself because it's yucky and complicated and you know you'll fail to handle some edge condition properly. If you're familiar with Unix, your first inclination might be to write Bruby:

system "mkdir -p '#{ROOT_DIR}'"
if $? != 0
  raise "Unable to create directory #{ROOT_DIR}"

The above is clunky. It's also easy to forget to check the value of $?, in which case your code will silently continue even though the directory was not created. Fortunately, this is easy to fix. Just Google for it (after flagellating yourself). You will see that Ruby has a built-in version of mkdir -p.

require 'fileutils'
FileUtils.mkdir_p ROOT_DIR

This version is shorter and easier to read, and, most importantly, raises an exception if anything went wrong:

irb(main):666:0> FileUtils.mkdir_p "/bogon/adfadf"
Errno::EACCES: Permission denied @ dir_s_mkdir - /bogon
 from /usr/local/Cellar/ruby/2.3.0/lib/ruby/2.3.0/fileutils.rb:253:in `mkdir'
 from /usr/local/Cellar/ruby/2.3.0/lib/ruby/2.3.0/fileutils.rb:253:in `fu_mkdir'
 from /usr/local/Cellar/ruby/2.3.0/lib/ruby/2.3.0/fileutils.rb:227:in `block (2 levels) in mkdir_p'
 from /usr/local/Cellar/ruby/2.3.0/lib/ruby/2.3.0/fileutils.rb:225:in `reverse_each'
 from /usr/local/Cellar/ruby/2.3.0/lib/ruby/2.3.0/fileutils.rb:225:in `block in mkdir_p'
 from /usr/local/Cellar/ruby/2.3.0/lib/ruby/2.3.0/fileutils.rb:211:in `each'
 from /usr/local/Cellar/ruby/2.3.0/lib/ruby/2.3.0/fileutils.rb:211:in `mkdir_p'
 from (irb):666
 from /usr/local/bin/irb:11:in `<main>'

This obnoxious error might be obnoxious, but it also might save your life.

Tip 2: Keep unavoidable Bash usage to a minimum

Sometimes it does make sense to shell out. For example, it is hard to find a Ruby equivalent of the command-line dig DNS utility. In these situations, don't throw out the baby with the bathwater; keep the Bash to a minimum. For example, in Bruby, you would write:

dig_command = "dig +qr any -x ns +noqr"
mail_server = `#{dig_command} | egrep -w MX | awk '{print $6}'`.chomp

Whereas in Ruby, by contrast, you should use Bash only to run dig, everything else (such as processing the standard output of dig) should be done in Ruby. The built-in Open3 module makes this straightforward in many cases:

require 'open3'
output, error, status = Open3.capture3 dig_command

Open3.capture3 returns a stream containing the standard output of running dig_command, as well as the status of running the command. The next tip covers how to actually replicate the rest of the above pipeline that populated the mail_server variable.

Tip 3: Use Enumerable to replace Bash pipelines

A distinguishing feature of Bash is its ability to chain commands together using pipes, as illustrated in the previous tip:

mail_server = `#{dig_command} | egrep -w MX | awk '{print $6}'`.chomp

Most of the time you can replicate pipelines using Ruby's built-in Enumerable module. Almost everything you think might be an Enumerable actually is an Enumerable: arrays, strings, open files, etc. In particular, if you're trying to convert Bruby to Ruby, you can use methods like IO.popen, or better yet the methods in Open3, to get an Enumerable (or else a string, which can be converted into one with split) over the standard output of that process. From there, you can take advantage of methods such as Enumerable.grep (which, for example, seamlessly handles regular expressions).

In those cases where Enumerable itself doesn't immediately solve the problem, you have the Ruby programming language itself at your beck and call. For example, many of the features of Awk can be found directly in Ruby (if you did enough programming language research you'd probably dig up some indirect connection between the two languages, but that shall be left as an exercise for the reader).

Here are all the equivalents of each part of the above Bash pipeline.

Bruby Ruby
`#{dig_command}` Open3.capture3(dig_command)
egrep -w MX split("\n").grep(/\WMX\W/)
awk '{print $6}' split[5]

Putting it all together:

require 'open3'
dig_command = "dig +qr any -x ns +noqr"
o, e, s = Open3.capture3 dig_command
mail_server = if s.success?
                raise "Failed: #{dig_command}\n#{e}"

Here are some other common mappings from Bash to Enumerable methods:

Bash Enumerable
head -n 2 take(2)
tail -n 2 reverse_each.take(2)
sort sort
uniq to_a.uniq
grep grep
grep -v grep_v