Saturday, August 25, 2007

Macrophobia

As a fledgling CL programmer, I've found macros to be one of its most enduring charms. Yet it is also one of its most criticized (or at least feared) features. Much like threads have become the bogeyman of the programming world, so macros would be too, if more than a handful of programmers actually knew about them. Among this handful, though, it would be nice to dispel the myth that macros are too powerful.

I mean, obviously, they are, in some sense, too powerful. The common complaint about them is that junior programmers (especially) can create what appear to be language-level constructs that confuse the heck out of everyone else, impairing the code's "readability". As with most broadly-accepted maxims (in this case, broadly accepted by the 15 people in the world who actually know about the issue in the first place), it is both well-intended, intuitively reasonable, and incorrect, for the simple reason that if, at your company, junior programmers are running amok with no mentors to teach them, the code will be unreadable whether or not it uses macros.

Frankly, I'd like to dispel the whole myth that there's great import to be placed on the "readability" of a programming language. I see blog after blog, each about as scientifically sound as my 10th grade Chemistry report (in case you're wondering, I'm not a chemist), about whether Perl is more readable than C, or vice-versa, or whether Python is more readable than CL, etc. This tiresome, endless debate stirs the fanboy in all of us, but is missing not just the forest for the trees, but the solar system for the planet.

I've read a lot of code in my day, I'll have you know. What matters most to readability is, first, comments, followed by identifier names, followed by there being as little code as possible, followed by program structure. Ok, I can hear the chorus of "boos" in the background at the preceding statement. All I can say is, most programmers have no idea how to write good comments (even though, as they say, the "force" is with them; they just have no idea, and lack a Yoda-like figure to explain it to them). When comments are good, there is nothing quite like it. The code could be written in TECO for all I care. Maybe blog, will I, about it, some day.

So macros, in fact, cannot really get in the way of readability. In fact, they have the potential to greatly enhance readability, used judiciously (and as pointed out earlier, if your company is not run judiciously, you have larger issues to work out first). How many of you out there have worked somewhere that had some kind of coding guidelines? I'd warrant that a fair number of you have. I certainly have. I'm talking about stuff like Ellemtel's Programming in C++, Rules and Recommendations.

Give those rules a once-over, if you haven't already. It's pretty complicated stuff. Use this feature this way, use that feature that other way, don't do this, do do that, etc, etc. Yawn. I've worked at companies that successfully enforced such rules (at least 80% of the time), and some that failed utterly. In both cases, it was wickedly hard to enforce these rules (successfully or not). All programmers had to read the coding guidelines document, understand it, remember it, and periodically re-read it when it was updated. Senior programmers had to enforce the guidelines during code reviews (which was sometimes easier than other times, depending on the stubbornness of the reviewee).

At more sophisticated shops, one could imagine, at least, that some of the rules would be enforced by build scripts. A poor solution if ever there was one. For one thing, build scripts tend to be written in a different language (e.g., Perl), and usually resort to crude regexp-matching in order to catch offending code. Only a handful of programmers at your company are brave (or foolish) enough to grok the build scripts. These poor souls find themselves, before too long, at the bottom of a pit of despair from which they never return (save by leaving for another team or company). Eventually they label themselves with self-deprecating monikers such as "build bitch".

Macros afford the possibility, at least, that some of these conventions be enforced within the language itself, rather than by a bunch of hacky Perl scripts (or no automated process at all). Take, for instance, the use of CL's DEFCLASS. Now DEFCLASS has roughly one grillion options for how to do this, that, or the other thing. It's also remarkably lax about, well, almost everything. If you want a member variable ("slot") called FOO but wish for the accessor to it to be called BAR, you can do that.

If you wanted to prevent such wanton creativity on the part of your less-trustworthy programmers, you could do so by writing a macro wrapping DEFCLASS which might both limit its complexity and enforce constraints such as the regular naming of accessors and the like. You could prevent the use of multiple inheritance if you found it too frightening (I'm not suggesting you go out and do this, just pointing out that you could). These rules would be enforce using code written in the same language as everything else, making them easier to write (in that they can harness the reflective power of the language itself, i.e., no hack regexps) and easier to find volunteers to maintain and develop.

I could go on and on about this, but it's getting late so I'll just wrap it up there. One last thing, though, I urge the readers of this blog (all three of you) to re-evaluate the various maxims you may have assimilated over the years. Are threads really that bad if there is a problem facing you that is genuinely concurrent? Are you really optimizing prematurely? Is there really such thing as a "scripting language"? Should operating systems and programming languages keep each other at arm's length? The list goes on.

8 comments:

Paul Prescod said...

Sorry, J4G, I couldn't disagree more.

I've read a lot of code in my day, I'll have you know. What matters most to readability is, first, comments, followed by identifier names, followed by there being as little code as possible, followed by program structure.

Those four things all matter, but they aren't the only thing that matters. There is a ton of anecdotal evidence that programmers who had trouble reading their own Perl code later had no such problem with other languages. Now please do not bother to criticize an anecdotal approach, because a) it is all we have, b) you've preceded from a sample size of 1 (yourself) so I'm just trying to generalize a bit.


If you wanted to prevent such wanton creativity on the part of your less-trustworthy programmers, you could do so by writing a macro wrapping DEFCLASS which might both limit its complexity and enforce constraints such as the regular naming of accessors and the like.

Great. And then all of your
programs are in a dialect that is specific to your company, that cannot be learned from off-the-shelf books. Now your new employees must learn Common Lisp, AND your dialect AND your program structures. That's supposed to be a net improvement in readability?

The funny thing is that I do agree that macros can improve readability. But not by making a new dialect of core features like DEFCLASS. If you feel a need to wrap or reinvent core features in your dialect then you're just demonstrating that your underlying language is not a good fit for your goals.

Furthermore, you're contradicting yourself. On the one hand, you say that the way to police your junior programmers is to just watch them, don't depend on the tools to do it. Then you talk about how macros are great as a tool to do it.

shadytrees said...

There is a ton of anecdotal evidence that programmers who had trouble reading their own Perl code later had no such problem with other languages.

All the bad Perl code I've seen breaks one of those four rules. I don't see how your experience contradicts the author's statement. And there is no statement being made to the contrary that the author stated those were the only four things that mattered.

Now your new employees must learn Common Lisp, AND your dialect AND your program structures.

A company looking to hire non-junior developers to work on a CL project is already hiring CL people anyway. And macros are part of the program structure: People who think otherwise will ultimately struggle with CL anyway.

If you feel a need to wrap or reinvent core features in your dialect then you're just demonstrating that your underlying language is not a good fit for your goals.

Library, not language. Anybody's welcome to use another Common Lisp OOP system. And just because a library has to appeal to everybody (big companies, big projects, small companies, smlal projects) doesn't mean you're stuck with the extremely generalized version if you indeed want to restrict the features.

On the one hand, you say that the way to police your junior programmers is to just watch them, don't depend on the tools to do it.

In my opinion, the author said not to use bad tools. Macros are certainly anything but a bad tool if you want to check CL code. It's probably the best tool.

denis bider said...

What matters most to readability is, first, comments, followed by identifier names, followed by there being as little code as possible, followed by program structure.

I seriously disagree with this statement.

I believe that what matters most are first and foremost the identifier names and program structure; good structure includes that there needs to be as little code as possible; then, to the extent that the meaning of the program is still not plainly evident, this needs to be addressed by comments.

As long as we're sticking to anecdotal support, I've seen some very thoroughly commented C++ code that had poor structure, poorly chosen identifier names, and was as difficult to understand as reading someone else's disassembly. The comments did help, but they couldn't make up for the filthy program. On the other hand, I've seen some complex C++ code that was very well written and well-structured and so was intelligible even though it had almost no comments. That doesn't mean it couldn't have used some, but my satisfaction was magnitudes greater with the uncommented, well-structured code than with the well-commented, filthy garbage.

Comments don't make up for a lack of program structure and useful identifiers. However, great program structure and great identifiers do make up for a lack of comments.

This is why I believe that it's important for a programming language to encourage and support a good programming practice from which the meaning of the program can be understood. To the extent that the programming language cannot capture and express the meaning of the program, it fails in its task, which is to make programs easier to create and its maintenance more manageable.

After all, if comments are all it takes to have readable software, you can have comments in the margin of binary-encoded machine code. The whole purpose of programming languages, starting with evocative instruction mnemonics (assembly) and continuing all the way through Fortran and C and Smalltalk and Lisp, is to make programs easier to write and understand.

I would go so far as to say that whole extent to which a programming language is better than another can be measured in how well it expresses programmer's intent without requiring the use of comments. Comments aren't part of the structure of a programming language; they are a free-form addition that compensates for the lack of it. Comments can be written in any language.

denis bider said...

And I could actually rephrase that to say that the extent to which a programmer is better than another can be measured in how well he or she can express the program's intent before resorting to comments. Anyone can wave around in free-form English about the meaning they are intending to express; but not anyone can express that intent concisely through their program. To the extent that the programming language does not permit the expression of your intent - i.e. to the extent that it requires you to remind future maintainers of something instead of expressing it in a way that the compiler can enforce - that is the extent to which the language sucks.

Jacob Gabrielson said...

For the record, what I meant about comments was not that they are normally the greatest aid to readability, just that good comments would be. I've seen good comments in about 0.01% of all code I've ever looked at, so it's not surprising that nobody thinks they're all that important.

Also, my DEFCLASS example perhaps wasn't the best (I'm pretty sure I wouldn't actually recommend doing it). I mainly wanted to point out the irony in the fact that using macros one could implement coding standards in a way that is arguably more readable (overall).

Mike Dunlavey said...

Say, J4G, I actually agree with much of what you said :-)

I love the term "Macrophobia", and the lack of a simple macro preprocessor is probably my biggest complaint with C# and Java. Of course, macros can be misused, as can everything else, but there are some very useful things that can't really be done without them. Yes they make it harder for kiddoes to read, but I think the solution is to grow the kids up, not throw out the good stuff.

If I could make coders think like me (which they never will) this is some of what they would do:

- understand that data structure, with classes, pointers, container classes, messages, and so on is not the greatest thing since sliced cheese. Rather it is costly busywork stuff that puffs up your code volume and multiplies both your development effort and your buglist. Understand that if a little data structure is good, a lot is worse. Always ask, regarding any "objects" - do we Really Really need this? because it's gonna cost, big time.

- Get a sense of perspective about performance. Machines these days are unbelievably fast. There is no harm in using an O(N) algorithm if N is not too big and you only do it once in a blue millisecond. Keep it simple.

- Learn to write simple parsers. It's not hard. It's easy. Recursive-descent parsers are almost no-brainers, and they can really come in handy.

- Don't be so d-mn religious about what your teachers or some authors told you. They're often just kids themselves. Few of them ever had a real programming job. Learn to rely on your own common sense. If something is a good idea or a bad idea, it is so for a reason, not just "that's what I was taught" or "everybody thinks so". Computer Science is a Science, remember? It is based on questioning, not believing. Software Engineering is Engineering, remember? It is the home of invention, not conformity.

OK, I've flamed enough.

Mike Dunlavey said...

More on what macros are good for...

There are times when the structure of an application is such that the coding is really tedious, and you start wondering why you can't get a little automatic help in writing it. In other words - This program is so dumb that a program could write it - and probably do a better job than I am. Well, A.I. researchers never got very far with automatic programming, but that's because they tried to solve it in general. The world isn't in general, it's in specific. In fact, writing a program to write a specific program can be pretty simple, and very useful.

So what is a macro preprocessor? Simply the poor person's automatic programmer. Think of it - it's got conditionals in the form of #if... It's got function definitions in the form of #define... with arguments. #include is also a lot like a function call. You want three copies of something? include it three times.

You can do something like loops this way: (pardon my C-ness)
First define a macro that expands into a list of macro calls
#define MYLIST DEF(a) DEF(b) DEF(c) DEF(d) ...
Then use that list for various purposes, like to define a bunch of variables:
#define DEF(x) int x;
MYLIST
#undef DEF
Write a routine to lookup a variable by name:
int* Lookup(char* nm){
if (0);
#define DEF(x) else if (strcmp(nm, #x)==0) return &x;
MYLIST
#undef DEF
return NULL;
}
Then if you want to add or remove a variable, just add or remove it in the list.

I'm not claiming that Stroustrup didn't have a point when he saw macros being used for things that really belong in the language, like manifest constants and templates. Rather I blame Gosling for taking his advice and leaving them out, thereby spawning a generation of coders who consider macros every bit as evil as goto.

And while we're at it, what exactly is wrong with hungarian notation? For example, what is so great about having a variable called "rows" and not being able to tell if that means the number or rows or the set of rows?

Grump, grump...

Mike Dunlavey said...

Sorry, I keep going back to this thread and wanting to add something. Maybe that means it's a pretty good thread. Maybe it means I need a life :-) Anyway...

Bider said:

"I would go so far as to say that whole extent to which a programming language is better than another can be measured in how well it expresses programmer's intent without requiring the use of comments. Comments aren't part of the structure of a programming language; they are a free-form addition that compensates for the lack of it. Comments can be written in any language."

I think the comment-language relationship leaves out a third thing, the model (think of it as a functional spec, though it is probably mental) of what the program is desired to accomplish. I think the relationship should be model-comment-program. The model, epecially if it is mental, is probably stated in a good language for its domain, with all the right nouns and verbs. If you were explaining to another good programmer what you wanted, this is the language you would use.

The key thing about the model-comment-program relationship is that the model changes over time, in ways that, in model space, amount to single-point or one-line or fairly local changes. Like if it is a list of features, you can add a feature or remove a feature, generally without breaking the rest of the model. When such a small change is made in the model, ideally it should be just as easy to change the program, and the comments should act as a map between parts of the model and parts of the program so that these changes can be carried out correctly. Now it's nice if the code is basically self-commenting, but that's an ideal I've almost never seen. But anyway, the language you're really using is not just the base language, like [name your favorite language], but the nouns and verbs and structure you build in it so that your program has pieces that connect, via comments, to your model, so that the making of changes is made easy. So I define the redundancy of a language as the number of local edits necessary to change the program to implement small changes in the model, on average, and the goal is to minimize that redundancy.

Avoiding fallback in distributed systems

As previously mentioned , I was recently able to contribute to the Amazon Builders' Library . I'd also like to share another post t...