Sunday, March 1, 2009

Logo list evaluation

Here's a quick update; the problem that I'm considering right now.

Consider the following code in logo:

? print first [3+5]
? print if 1=1 [3+5]

For the first statement, the output is:

For the second statement, the output is:

Why is this significant to logo? Well, it turns out these cases represent special cases in UCBLogo for handling input inside a list. The list [3+5] is a list with one word, '3+5'. If you do the command first [3+5], the word '3+5' is returned. However, if you do an evaluation of the list [3+5] (i.e. if 1=1 [3+5]), the list is interpreted as having 3 symbols, "3", "+", and "5", and is evaluated as 8.

The solution to this problem looks like it's simple enough: If you evaluate an instruction list, and you come across a word which has the symbol(s) +, - (with the exception if it's the first symbol of the word), /, <, (, ), =, or > then you split up the word into separate words around these delimiters. You then evaluate the list.

Fun stuff!

Sunday, February 22, 2009

Well, I haven't been updating this blog recently. The reasons for this are two fold:

1) The updates that I have been making aren't interesting enough to make a blog post about.

2) My work on this project has been slowed down due to job work and other interests.

As per the law of fire and motion, this project won't get done unless I keep making progress each week. I didn't want to set concrete deadlines since I don't really know how long this is going to take and it's demotivating to fail to meet a deadline once you've set one.

Instead, I'll do something I should have done formally from the beginning. I'm going to schedule time each week for working on this project. Here are the times:

M: 9:30-11p
T: 9:30-11p
Sa: 10a-12p
Su: 9-11a

There you go. 7 hours total per week. Since it takes about 15 minutes to really get started, it's 6 hours of consecutive work per week. That should get this project rolling.

Saturday, February 7, 2009

Logo development update: Test driven development and Logo Lexer

Test driven development is relatively new to me in the sense that, while I've known about it for some years, I've never used it in any non trivial project. This deficiency is something that I regret.

To be more accurate, not using test driven development is something I feel that I should regret, but deep down inside I do not. Most working developers understand things need to get done under a certain deadline. To state the obvious, certain development methods take longer than others to initially deliver a result. If the final product is something with loose requirements or requirements that are hard to test, test driven development is probably too expensive in terms of the time it will take to push out a change. However if you have a tightly defined spec with well defined requirements, then test driven development may work for you.

Note that while I do not think test driven development is the bees knees, I do believe well defined tests are essential. Most projects are better for having rigorous tests, but you have to think carefully about the way you place writing tests into your development process. For some projects, it's better to write tests before you've written a line of code, for some it's better after you've written the entire thing, and then there's all that stuff in between. Who really cares, it's all very domain specific, and it's been written about before and written better. We're done with this topic and we're moving to part two:

Logo development will be test driven. The test driven development model appeals to this project since ucblogo is a (mostly) well defined language whose core concepts rarely change over time. It's important to write the tests before writing the code since the tests will essentially write the spec for interpreter implementation.

So far I've written several tests for the logo lexer. Each test is separated into two files: a test file with a sample logo program, and an expected results file with the logo lists written as strings. Each test file will be read and executed by a junit test in the lexer module and the output will be compared to the correspond expected file.

Here is the sample test and expected file:

;; a simple print statement
print "simple

[print quoted simple]

As per usual, let me first state why this method of testing is bad (and by association why it makes me a bad developer):

This approach sucks because it separates the core part of the test from code itself and makes it more difficult for me to read my test files . Now I have to create a separate conf file to hold path variables that java doesn't know about natively since it's not embedded in the code. This conf file makes the build process more complicated, since I now need to include this file if I'm testing, but not include it if I'm running the app as normal since I don't want the dependencies in the final application.

So why did I choose this method to develop the Lexer tests?

This approach is nice because it separates the core part of the test from the code itself and allows me to switch the code base I will be using if I need to. Since this project is in its early stage, if I decide it would be better to write the interpreter in say, actionscript or C#, I could switch without losing everything I wrote. Java is a nice platform and all, but in reality, it's not an ideal language for this kind of development. If I find out there's a better multi-platform (in terms of both browser and OS) language for this kind of work, then I want to be able to quickly and painlessly jump ship.

Sunday, February 1, 2009

make "|Step 1 of a logo interpreter| "read

An interpreter at the highest level looks like this:

loop until done
  read input
  evaluate the input
  print the evaluated input to the user

This set of steps is commonly referred to as the read-eval-print loop.

At a quick glance, one might think that each step takes an equally long amount of time to implement. That thought is very far from the truth. If we were to relate the implementation time for each step in the read-eval-print loop to the thickness of the components of a deli sandwich, the read and print steps would be thin slices of rye, and eval would be 20 different layers of delicious meats and cheeses squished in between those two slices.

I am currently working on test cases for the logo lexer, which is the core of the read step. Although it's certainly not the meat of the interpreter (TO SELF: oh god, give up with this analogy), it is important to get right. In coming up with a comprehensive set of test cases, I came across a few syntactical features that you don't really find in other languages.

For example, in ucblogo, you can set a variable name to be any valid word value. Since a word can be almost any sequence of text, this can lead to some interesting, if not obfuscated ways of writing code. Here are some interesting, and perfectly legal, logo code snippets (note: don't try to copy this into your logo interpreter since the browser doesn't render tabs):

;; test a space followed by a tab. Creates a variable named " " with value "hello"
? make "\  "hello
;; test a tab followed by a space. Creates a variable named "\t" with value "goodbye"
? make "\  "goodbye
;; print the value of the space variable
? print \ 
;; print the value of the tab variable
? print \ 

The above code prints hello, and then goodbye. UCBLogo actually gives you several ways to do this. You could also write the code above with vertical bars like this:

;; print the value of the space variable
? print | |
;; print the value of the tab variable
? print | |

This is only a taste of the kind of obfuscation this syntactic feature gives you. Imaging reading and maintaining code which combined spaces and tabs to create all sorts of weird variable names. It's a scary thought.

However, it does offer the power to come up with some more practical naming conventions that allow for white space, something most other languages don't offer. Take for example:

? make "|my greeting| "hello
? make "|my farewell| "goodbye
? print |my greeting|
? print |my farewell|

That's a little more understandable than camel hump notation... or it would be if those darn vertical bars weren't there. While it doesn't look like there's an easy way around this, something I may do once I've implemented the core language is create an editor that hides the vertical bars but still makes it clear which words are part of the variable.

To me, this flexibility in variable naming is a neat part of logo that really shows it was a language designed to be accessible to everyone. Although Logo gives you the power to write messed up code with variables names dependent on spaces and carriage returns, it also allows kids to write variables in plain English. Actually, since ucblogo has UTF8 support, kids can write programs that read well in any language.

The more I delve into this language, the more interesting it gets.

Saturday, January 24, 2009

Logo in Java: why do it at all?

The plan: implement a version of the UCBLogo interpreter in Java.

To begin, let me just say that I start and abandon a lot of projects. It starts by me choosing a project that I have few doubts about. However, soon after I start, there's an inner skeptic in me that says I should forget about the whole thing and do something useful. Usually, my mentality is to ignore my skeptical thoughts and just go on with the project. "Screw you doubts, I'm going to go and do it anyway!" These doubts, unresolved, then fester and grow as I'm working out the project details. As soon as I hit a tough point, the doubts take over and soon I abandon what I'm working on and start the cycle anew by searching for another doubt free project.

This time, I'm going to take a different approach. I'm going to get the doubts out of the way now, so I can push through the initial development phase and end up with something of value.

So here it is, the three main reasons why this project is a giant waste of time:

1) It's been done and it's been done better.

There already are many logo implementations already out there. According to Wikipedia, there are 187 working implementations of Logo. If someone wants to write a program in Logo, they can pick any one of these implementations. Moreover, there are several rock solid implementations, like UCBLogo, FMSLogo, and Microworlds, which are better than anything I could ever come up with. There's even a really cool web based one being developed!

2) Logo is old, boring, and dying

Logo is over 40 years old. Its main purpose was to be an educational programming language for kids. In recent years, there have been other exciting developments in educational programming languages. Alice lets you create programs in your own 3D environment. Scratch lets you snap together blocks and share your programs on the web. Logo is just turtle graphics, which no kid thinks is exciting. Logo is dead, long live Alice and Scratch.

3) Java is the wrong language

Let's face it, Java is not the ideal language for implementing a Logo interpreter. The biggest problem with choosing Java as my language is the lack of built in continuations. You see, Logo (or actually UCBLogo) requires tail call elimination. The easiest way to implement tail call elimination is with a language that supports continuations. In Brian Harvey's C based version of UCB Logo, this problem was solved with an evaluation procedure that used GOTO statements to avoid explicit recursion and a stack to store the execution state. Java doesn't have GOTO, so tail call elimination will need to be implemented by either inventing continuations on top of the JVM or some other more convoluted way. Why not implement the language in something that already has continuations so I don't have to deal with this stuff?

To summarize the above: Why make a cement wheel with a hammer and chisel when you can get a better one, made of rubber, at the hardware store for less?

I chose Logo because I'm interested in Computer Science education, and what better way to learn about the ideas behind CS education then by learning every detail of a language rich in those ideas.

I chose Java, or explicitly the JVM, because it can run in a browser, it runs on all platforms, and because it's popular. When I'm done, I want someone to be easily able to use it for their own purposes. As for continuations, I'll deal with them when I get there.

And for the point that it's been done and it's been done better. Well, erhm... this point is rather embarrassing. You see, I've never written a compiler or interpreter before that wasn't for a class, and I've certainly never written one that was any good. Many people say that writing a compiler or interpreter yourself is a right of passage as a programmer. I know it's doubtful anybody will call me a better programmer after writing this thing, but it's even more doubtful that they will if I don't write it. This is one right of passage that I need to take.