Sunday, February 22, 2009

Well, I haven't been updating this blog recently. The reasons for this are two fold:

1) The updates that I have been making aren't interesting enough to make a blog post about.

2) My work on this project has been slowed down due to job work and other interests.

As per the law of fire and motion, this project won't get done unless I keep making progress each week. I didn't want to set concrete deadlines since I don't really know how long this is going to take and it's demotivating to fail to meet a deadline once you've set one.

Instead, I'll do something I should have done formally from the beginning. I'm going to schedule time each week for working on this project. Here are the times:

M: 9:30-11p
T: 9:30-11p
Sa: 10a-12p
Su: 9-11a

There you go. 7 hours total per week. Since it takes about 15 minutes to really get started, it's 6 hours of consecutive work per week. That should get this project rolling.

Saturday, February 7, 2009

Logo development update: Test driven development and Logo Lexer

Test driven development is relatively new to me in the sense that, while I've known about it for some years, I've never used it in any non trivial project. This deficiency is something that I regret.

To be more accurate, not using test driven development is something I feel that I should regret, but deep down inside I do not. Most working developers understand things need to get done under a certain deadline. To state the obvious, certain development methods take longer than others to initially deliver a result. If the final product is something with loose requirements or requirements that are hard to test, test driven development is probably too expensive in terms of the time it will take to push out a change. However if you have a tightly defined spec with well defined requirements, then test driven development may work for you.

Note that while I do not think test driven development is the bees knees, I do believe well defined tests are essential. Most projects are better for having rigorous tests, but you have to think carefully about the way you place writing tests into your development process. For some projects, it's better to write tests before you've written a line of code, for some it's better after you've written the entire thing, and then there's all that stuff in between. Who really cares, it's all very domain specific, and it's been written about before and written better. We're done with this topic and we're moving to part two:

Logo development will be test driven. The test driven development model appeals to this project since ucblogo is a (mostly) well defined language whose core concepts rarely change over time. It's important to write the tests before writing the code since the tests will essentially write the spec for interpreter implementation.

So far I've written several tests for the logo lexer. Each test is separated into two files: a test file with a sample logo program, and an expected results file with the logo lists written as strings. Each test file will be read and executed by a junit test in the lexer module and the output will be compared to the correspond expected file.

Here is the sample test and expected file:

sample.test
;; a simple print statement
print "simple

sample.expected
[print quoted simple]


As per usual, let me first state why this method of testing is bad (and by association why it makes me a bad developer):

This approach sucks because it separates the core part of the test from code itself and makes it more difficult for me to read my test files . Now I have to create a separate conf file to hold path variables that java doesn't know about natively since it's not embedded in the code. This conf file makes the build process more complicated, since I now need to include this file if I'm testing, but not include it if I'm running the app as normal since I don't want the dependencies in the final application.

So why did I choose this method to develop the Lexer tests?

This approach is nice because it separates the core part of the test from the code itself and allows me to switch the code base I will be using if I need to. Since this project is in its early stage, if I decide it would be better to write the interpreter in say, actionscript or C#, I could switch without losing everything I wrote. Java is a nice platform and all, but in reality, it's not an ideal language for this kind of development. If I find out there's a better multi-platform (in terms of both browser and OS) language for this kind of work, then I want to be able to quickly and painlessly jump ship.

Sunday, February 1, 2009

make "|Step 1 of a logo interpreter| "read

An interpreter at the highest level looks like this:

loop until done
  read input
  evaluate the input
  print the evaluated input to the user


This set of steps is commonly referred to as the read-eval-print loop.

At a quick glance, one might think that each step takes an equally long amount of time to implement. That thought is very far from the truth. If we were to relate the implementation time for each step in the read-eval-print loop to the thickness of the components of a deli sandwich, the read and print steps would be thin slices of rye, and eval would be 20 different layers of delicious meats and cheeses squished in between those two slices.

I am currently working on test cases for the logo lexer, which is the core of the read step. Although it's certainly not the meat of the interpreter (TO SELF: oh god, give up with this analogy), it is important to get right. In coming up with a comprehensive set of test cases, I came across a few syntactical features that you don't really find in other languages.

For example, in ucblogo, you can set a variable name to be any valid word value. Since a word can be almost any sequence of text, this can lead to some interesting, if not obfuscated ways of writing code. Here are some interesting, and perfectly legal, logo code snippets (note: don't try to copy this into your logo interpreter since the browser doesn't render tabs):

;; test a space followed by a tab. Creates a variable named " " with value "hello"
? make "\  "hello
;; test a tab followed by a space. Creates a variable named "\t" with value "goodbye"
? make "\  "goodbye
;; print the value of the space variable
? print \ 
;; print the value of the tab variable
? print \ 


The above code prints hello, and then goodbye. UCBLogo actually gives you several ways to do this. You could also write the code above with vertical bars like this:

;; print the value of the space variable
? print | |
;; print the value of the tab variable
? print | |


This is only a taste of the kind of obfuscation this syntactic feature gives you. Imaging reading and maintaining code which combined spaces and tabs to create all sorts of weird variable names. It's a scary thought.

However, it does offer the power to come up with some more practical naming conventions that allow for white space, something most other languages don't offer. Take for example:

? make "|my greeting| "hello
? make "|my farewell| "goodbye
? print |my greeting|
hello
? print |my farewell|
goodbye


That's a little more understandable than camel hump notation... or it would be if those darn vertical bars weren't there. While it doesn't look like there's an easy way around this, something I may do once I've implemented the core language is create an editor that hides the vertical bars but still makes it clear which words are part of the variable.

To me, this flexibility in variable naming is a neat part of logo that really shows it was a language designed to be accessible to everyone. Although Logo gives you the power to write messed up code with variables names dependent on spaces and carriage returns, it also allows kids to write variables in plain English. Actually, since ucblogo has UTF8 support, kids can write programs that read well in any language.

The more I delve into this language, the more interesting it gets.