Sunday, February 1, 2009

make "|Step 1 of a logo interpreter| "read

An interpreter at the highest level looks like this:

loop until done
  read input
  evaluate the input
  print the evaluated input to the user


This set of steps is commonly referred to as the read-eval-print loop.

At a quick glance, one might think that each step takes an equally long amount of time to implement. That thought is very far from the truth. If we were to relate the implementation time for each step in the read-eval-print loop to the thickness of the components of a deli sandwich, the read and print steps would be thin slices of rye, and eval would be 20 different layers of delicious meats and cheeses squished in between those two slices.

I am currently working on test cases for the logo lexer, which is the core of the read step. Although it's certainly not the meat of the interpreter (TO SELF: oh god, give up with this analogy), it is important to get right. In coming up with a comprehensive set of test cases, I came across a few syntactical features that you don't really find in other languages.

For example, in ucblogo, you can set a variable name to be any valid word value. Since a word can be almost any sequence of text, this can lead to some interesting, if not obfuscated ways of writing code. Here are some interesting, and perfectly legal, logo code snippets (note: don't try to copy this into your logo interpreter since the browser doesn't render tabs):

;; test a space followed by a tab. Creates a variable named " " with value "hello"
? make "\  "hello
;; test a tab followed by a space. Creates a variable named "\t" with value "goodbye"
? make "\  "goodbye
;; print the value of the space variable
? print \ 
;; print the value of the tab variable
? print \ 


The above code prints hello, and then goodbye. UCBLogo actually gives you several ways to do this. You could also write the code above with vertical bars like this:

;; print the value of the space variable
? print | |
;; print the value of the tab variable
? print | |


This is only a taste of the kind of obfuscation this syntactic feature gives you. Imaging reading and maintaining code which combined spaces and tabs to create all sorts of weird variable names. It's a scary thought.

However, it does offer the power to come up with some more practical naming conventions that allow for white space, something most other languages don't offer. Take for example:

? make "|my greeting| "hello
? make "|my farewell| "goodbye
? print |my greeting|
hello
? print |my farewell|
goodbye


That's a little more understandable than camel hump notation... or it would be if those darn vertical bars weren't there. While it doesn't look like there's an easy way around this, something I may do once I've implemented the core language is create an editor that hides the vertical bars but still makes it clear which words are part of the variable.

To me, this flexibility in variable naming is a neat part of logo that really shows it was a language designed to be accessible to everyone. Although Logo gives you the power to write messed up code with variables names dependent on spaces and carriage returns, it also allows kids to write variables in plain English. Actually, since ucblogo has UTF8 support, kids can write programs that read well in any language.

The more I delve into this language, the more interesting it gets.

No comments: