Esoteric Haskell
Posted on 2021-01-17
One of my favorite parts of being a programmer has always been learning new languages. Each language I've learned has introduced me to new programming concepts and ideas which have made me a better programmer. Python (my first language) taught me all of the basics of programming and inspired me to learn more, C taught me how computers really work at a low level, and Rust taught me the joys of safety and immutability. However, the one language that has caused me to rethink how I program the most has definitely been Haskell
The biggest difference between Haskell and other programming languages I've worked with is its emphasis on pure functions and immutability. Pure functions are functions which have no side-effects. This means that if you give a function the same inputs, it will always return the same outputs. It also means that we have complete immutability, so if we wanted to add one element to a list, we would have to return a completely new list.
While Haskell is a strange language by most developers' standards, there is one family of languages which takes this to the next level. These are the esoteric languages. Esoteric programming languages push the bounds of computing, and there have been languages which encode programs in pictures, create programs out of Shakespearean plays, or even consist entirely of whitespace. While these programming languages are both amazing and terrifying, my favorite esoteric language is definitely Brainf*ck.
In this post, we will explore the strange worlds of Brainf*ck and Haskell, and use Haskell to create a Brainf*ck interpreter so that we can write programs in some of the strangest languages around.
Brainf*ck Background
I was first introduced to Brainf*ck language through its cousin, P''. P'' was originally written as a minimal Turing-complete language, and contains only 6 symbols used to manipulate the contents of its memory and perform basic looping. Brainf*ck extends these capabilities by adding two new characters, which allow the language to read read things into or write things out from memory.
There are four things that we need to keep track of in order to understand what our Brainf*ck program is doing. These are:
- Input Stream: This is similar to other programming languages and text-based
programs. This stream is
STDIN
on Linux systems, and is the same place that the text from a Pythoninput()
statement comes from. - Output Stream: This is also like any other programming language. It is
STDOUT
on Linux systems, and is accessed by theprint()
statement in Python. - Program Tape: We can think of this as an array of instructions. Each instruction
in Brainf*ck is a single character, like
<
,>
, or+
. These instructions are stored in an array, and can be indexed by a "Program Counter" which we control. - Memory Tape: This is simply an array of characters which our program can manipulate. It is similar to memory in other programming languages, except with fewer operations we can perform on it. This array is accessed by a "Data Pointer", which we control.
Next, let's go over the instructions that make up the Brainf*ck language.
>
: This instruction increments the data pointer, which allows us to access the next element in the memory tape.<
: This instruction decrements the data pointer, which allows us to access the previous element in the memory tape.+
: This instruction increments the current item in the memory tape. The memory tape consists of 8-bit ASCII characters, so incrementing "a" would turn it into "b".-
: This instruction does the reverse of+
, decrementing the value of the current character..
: This instruction outputs the current character in the memory tape to the output stream.,
: This instruction reads in a character from the input stream and writes it to the current character in the memory tape.[
: This instruction checks the current item in the memory tape. If that item is zero, it will jump forward to the matching]
.]
: This instruction does the opposite of[
. If the item on the memory tape is non-zero, it jumps back to the matching[
. Otherwise, it does nothing.
These instructions are nicely summarized in this table from the Brainf*ck Wikipedia page, which shows the corresponding C commands for each Brainf*ck instruction.
Brainf*ck Command | C Equivalent |
---|---|
(Program Start) | char ptr[30000] = {0}; |
> |
++ptr; |
< |
--ptr; |
+ |
++(*ptr); |
- |
--(*ptr); |
. |
putchar(*ptr); |
, |
*ptr = getchar(); |
[ |
while (*ptr) { |
] |
} |
Building a Brainf*ck Interpreter
Our goal for this post is to build a Brainf*ck interpreter in Haskell. An interpreter is a program which reads in another program written in some language and directly executes that program. In our case, the interpreter will be written in Haskell, and the target language will be Brainf*ck.
Haskell provides an interesting challenge for writing this kind of interpreter. Haskell is inherently immutable, while Brainf*ck is inherently built on the premise of mutable state. This is a worthwhile exercise, however, as it allows us to gain insight to into several parts of Haskell, as well as functional programming more generally.
Getting Started: Outlining the Main Function
Before we get lost in the functional weeds, let's define at a high level
what we want our program to do. We should be able to pass in a Brainf*ck
program as an argument and have the interpreter execute that program. This
would give the following usage, where PROG_NAME
is the name of the input
program.
./brainfk PROG_NAME
We will start writing our main file, brainfk.hs
, by adding some
necessary package imports.
import System.Exit
import System.Environment
import Control.Monad
import Control.Monad.State
At a high level, our program will need to be able to read a file name
from the command line, open and parse that file, and then run the parsed
program. This gives us the following main
function.
main :: IO ()
main = do
-- Read information from the environment
args <- getArgs
prog_name <- getProgName
-- Validate arguments
when ((length args) == 0) $ do
putStrLn $ usage prog_name
exitFailure
-- Read in the input program
let fname = args !! 0
program_contents <- readFile fname
-- TODO: Execute the program
Let's walk through this line by line. First, we declare our main
function
as an IO ()
monad. A
monad
is one of the more abstract parts of Haskell, and allows us to do things not
usually possible in a pure function, like read input and write output. We
start our declaration of main
with the keyword do
, which allows us to
chain multiple commands together. Next, we read in our arguments and the name
of the program using getArgs
and getProgName
.
After reading our arguments, we validate them to ensure that we actually got a
file to read. The when
function is similar to an if
statement in other languages,
and only executes the lines within its do
block if the condition evaluates
to True
. In this case, the do
block checks if we received no arguments, and
calls the usage
function if so. This function prints a help message to the user,
and is defined as follows:
usage :: String -> String
usage prog_name = "usage: " ++ prog_name ++ " PROG_NAME"
The last part of this initial stage is to read in the Brainf*ck file and bind
it to the program_contents
variable. This variable will contain a string with
the entire contents of the input file inside.
Defining Our Command Interface
Now that we've got our setup code established, we can begin to define our data
interface. Haskell has a really nice system for defining custom types, which
allows us to use the data
keyword to define our own types. For example,
the type Command
, as defined below, allows us to encapsulate the possible
commands in a Brainf*ck program.
data Command = PointerIncr | PointerDecr
| ByteIncr | ByteDecr
| ByteOutput | ByteInput
| JumpForward | JumpBackward
deriving (Show, Eq)
The deriving (Show, Eq)
means that our Command
data type can be printed
and compared. For example:
PointerIncr == PointerDecr -- returns False
putStrLn $ show PointerIncr -- print a PointerIncr
We can then define a couple helper functions to make using our Command
data
type easier.
readCommand :: Char -> Maybe Command
readCommand x = case x of '>' -> Just PointerIncr
'<' -> Just PointerDecr
'+' -> Just ByteIncr
'-' -> Just ByteDecr
'.' -> Just ByteOutput
',' -> Just ByteInput
'[' -> Just JumpForward
']' -> Just JumpBackward
_ -> Nothing
readProgram :: String -> [Command]
readProgram = mapMaybe readCommand
invertCommand :: Command -> Command
invertCommand PointerDecr = PointerIncr
invertCommand PointerIncr = PointerDecr
invertCommand ByteDecr = ByteIncr
invertCommand ByteIncr = ByteDecr
invertCommand ByteInput = ByteOutput
invertCommand ByteOutput = ByteInput
invertCommand JumpBackward = JumpForward
invertCommand JumpForward = JumpBackward
We start by defining the readCommand
function. This takes in a character
x
and attempts to read it into a Command
, based on the definition
of Brainf*ck defined above. The Maybe
wrapper allows for us to encapsulate
the uncertainty of whether a value will exist. If our input character x
is not valid, then we return Nothing
.
The readProgram
extends readCommand
by reading a string (which is equivalent
to a list of Char
) into a list of commands. The mapMaybe
function is
defined in Data.Maybe
and maps over a list. It would be the equivalent
to doing the following in Python.
out_list = []
for my_char in in_list: # Iterate through an input list
x = f(my_char) # Apply the function to each element
if x != None: # None is like Nothing in Python
out_list.append(x) # Add to the output list if not Nothing
Finally, the invertCommand
function is a convenience functio which will
be useful later on. Each command in Brainf*ck has an opposite. For example,
.
outputs a byte, while ,
reads in a byte. The invertCommand
formalizes
all of these pairs of outputs.
Defining Our Program State
Now that we have a way to represent individual commands, we will work on
representing the state of the program as a whole. As we defined earlier,
a Brainf*ck state consists of both a program tape and a memory tape,
and a pointer pointing to each of them. We can then create a type
ProgramState
as follows to encapsulate this.
data InstructionTape = InstructionTape [Command] Int
data MemoryTape = MemoryTape [Char] Int
type ProgramState = (InstructionTape, MemoryTape)
The InstructionTape
class represents our list of instructions, while the MemoryTape
class represents our memory contents. Additionally, both contain the pointers
within themselves. The ProgramState
is just a type synonym, stating that the
program state consists of both an instruction and memory tape.
Now that we have our program state defined, we need to define some functions to help us manipulate that state. First, we need a function to initialize the state.
initializeProgramState :: [Command] -> ProgramState
initializeProgramState instrs = (ins_tape, mem_tape)
where ins_tape = InstructionTape instrs 0
mem_tape = MemoryTape (repeat '\0') 0
As this function shows, we initialize both pointers to 0
and the instruction tape
to an inputted list of commands. One interesting thing to note here is the definition
of mem_tape
. The repeat
function creates an infinite list by repeating its input.
Thanks to Haskell's
lazy evaluation,
we can create (theoretically) infinite data structures which are generated as-needed
by the program. Now that we've initialized our program state, we need to implement
each of the operations of Brainf*ck.
Incrementing the Program Counter
The functions below are used to increment the program counter.
updateInstructionPointer :: Int -> InstructionTape -> InstructionTape
updateInstructionPointer amt (InstructionTape xs pc) = InstructionTape xs (pc+amt)
incrementProgramCounter :: ProgramState -> ProgramState
incrementProgramCounter (instrs, mems) = (updateInstructionPointer 1 instrs, mems)
They work by simply unpacking the arguments, adding one to the programming counter, and stitching the arguments back together. We can run this function after every other function to increment the program counter between instructions.
Moving the Data Pointer
The >
and <
Brainf*ck functions both change the data pointer by one position.
Both of their behavior can be encapsulated in a function which takes in an Int
and a ProgramState
and changes the data pointer by the amount given.
moveDataPointer :: Int -> ProgramState -> ProgramState
moveDataPointer amt (instrs, MemoryTape mems dp) = (instrs, MemoryTape mems (max 0 (dp+amt)))
This function works similarly to updateInstructionPointer
, except it has
logic to ensure that the data pointer never moves off the left side of the tape.
Returning max 0 (dp+amt)
ensures that the result can never decrease below 0
.
Editing Data
There are three commands which can change the contents of a memory cell in a
Brainf*ck program. The +
and -
commands increment and decrement the current
value, while the ,
command reads in data from the input stream and writes it
to memory. We can emulate this behavior with a couple functions which operate
on program state.
changeData :: Int -> ProgramState -> ProgramState
changeData amt (instrs, MemoryTape mems dp) = (instrs, MemoryTape mems' dp)
where mems' = updateListElem dp (updateChar amt) mems
writeCharacter :: Char -> ProgramState -> ProgramState
writeCharacter c (instrs, MemoryTape mem ptr) = (instrs, MemoryTape (updateList ptr c mem) ptr)
safeIndex :: Int -> [a] -> Maybe a
safeIndex n xs
| n < 0 = Nothing
| otherwise = Just (xs !! n)
updateListElem :: Int -> (a -> a) -> [a] -> [a]
updateListElem n f xs = front ++ val ++ back
where front = take n xs
back = drop (n+1) xs
val = maybeToList . (fmap f) . (safeIndex n) $ xs
updateList :: Int -> a -> [a] -> [a]
updateList idx val xs = updateListElem idx (\_ -> val) xs
updateChar :: Int -> Char -> Char
updateChar n c = chr $ (n + (ord c)) `mod` 256
The changeData
function is used to increment or decrement the current value in
memory, while the writeCharacter
allows for the insertion of arbitrary characters.
These functions both rely on several helper functions. The updateListElem
function
takes an index n
, a function f
, and a list xs
. It essentially applies the
function f
to the n
th element in xs
. The one complication is that since
Haskell lists are immutable, we need to cheat the system a little by duplicating
the list and changing the indexed element. This is done with the take
and
drop
functions, as well as the safeIndex
function. This function ensures
that the index given is greater than or equal to zero, and will return Nothing
if this is not the case.
There are a couple other functions, updateList
and updateChar
, which help to
extend this further. The updateList
function simply wraps updateListElem
with a function which sets the result to a chosen value, allowing for a
single list element to be modified. The updateChar
function ensures
that any updates to a character are done modulo 256, in order to ensure
that the value stays within the 1-byte limitation.
Outputting Characters
In order to output characters from our ProgramState
, we again need to rely on
the IO
monad. The function below takes in a ProgramState
and outputs the
value currently pointed to by the data pointer.
outputCharacter :: ProgramState -> IO ()
outputCharacter (_, mem) = case (readMemory mem) of Nothing -> return ()
Just c -> putChar c
readMemory :: MemoryTape -> Maybe Char
readMemory (MemoryTape xs ptr) = safeIndex ptr xs
This function relies on the readMemory
function, which simply unwraps
a MemoryTape
data type and applies the safeIndex
function to it. The
result is that a character is outputted to the screen if the ProgramState
is valid, but nothing happens otherwise.
Implementing Looping Instructions
The trickiest instructions to implement are the [
and ]
instructions.
These instructions are used to control program flow and allow for arbitrary
looping. We write a function which takes a ProgramState
and returns
another ProgramState
, such that the branching conditions are appropriately
carried out.
jumpProgram :: ProgramState -> ProgramState
jumpProgram (instrs, mem) = (instrs', mem)
where f :: Int -> Command -> InstructionTape -> InstructionTape
f levels target cmds
| new_symbol == Nothing || (new_symbol == Just target && levels == 0) = cmds'
| new_symbol == Just target_inverse = f (levels+1) target cmds'
| new_symbol == Just target = f (levels-1) target cmds'
| otherwise = f levels target cmds'
where dir = case target of JumpBackward -> 1
JumpForward -> (-1)
_ -> 1
target_inverse = invertCommand target
cmds' = updateInstructionPointer dir cmds
new_symbol = readInstruction cmds'
curr_symbol = readInstruction instrs
instrs' = case curr_symbol of Just x -> f 0 (invertCommand x) instrs
Nothing -> instrs
The jumpProgram
function looks complicated, but it can be broken down
into a few simpler steps. First, we define an internal function f
. This
function will carry out the actual computation, and will recursively
search in a direction until it finds the target symbol. At each function
call, the function will be given a "nested level" and a target symbol.
The function f
can be thought of climbing as climbing a mountain. To
make things more explicit, consider the following text.
[[[]]]
I would start at the leftmost [
, seeking the rightmost ]
. As I traverse
right, I would continue to find more [
characters and climb further up the
mountain. As I find more ]
characters, I climb back down the mountain, until
I eventually reach an elevation of zero at the target bracket. The f
function
implements this logic, taking different steps depending on what symbol is
encountered.
At a high level, the function f
just reads in the current symbol and calls
the function f
with the appropriate arguments, and finally returns the
result. Putting all of these pieces together, we get a function which
recursively traverses a list of instructions until it finds the correct
matching bracket.
Implementing a Functional Emulation Loop
We've now written functions which can perform every possible operation
on our ProgramState
. All that's left is to chain them all together
into something that can take our ProgramState
and give it life. To
do this, we will again turn to the magic that is Haskell monads. We've
already used the IO
monad a few times to handle inputs and outputs.
Something else we need is a way to encapsulate the ProgramState
as
a mutable state which can be evolved over time. The solution I chose
to use for this was the StateT
monad. This monad allows us to
combine a concept of state with our I/O manipulation.
If you look at Haskell tutorials on higher-level concepts like monads
and functors, you will often hear the analogy of a box used. I think
that analogy is especially relevant for the StateT
monad. We can
use the put
command to put something, in this case our ProgramState
,
into the box. We can also use the get
command to take that thing
back out of the box. We can also define arbitrary functions
to operate on whatever is currently inside the box, giving us a
programming style which is fairly similar to imperative languages.
Our first step will be wrapping some of our previous functions in the
StateT
context. Most of these functions revolved around taking some
ProgramState
as input and emitting it as output. To lift these into
the StateT
context, we can use the following function:
liftStateFunction :: (a -> a) -> StateT a IO ()
liftStateFunction f = do
my_state <- get
put $ f my_state
return ()
This function works by using get
to retrieve the current program
state, then using put
to set the new state to the result of
applying f
to that state. Some applications of this function
can be seen below.
incrementProgramCounterM :: StateT ProgramState IO ()
incrementProgramCounterM = liftStateFunction incrementProgramCounter
moveDataPointerM :: Int -> StateT ProgramState IO ()
moveDataPointerM x = liftStateFunction $ moveDataPointer x
changeDataM :: Int -> StateT ProgramState IO ()
changeDataM x = liftStateFunction $ changeData x
We need some slightly more specialized functions to handle input
and output, although they still share the same general formula.
Additionally, they make use of the liftIO
function. Since the StateT
monad we are using combines a state with the capability to perform
I/O, we need a way of interfacing between the two. The liftIO
function achieves this objective, allowing us to wrap functions
like putStrLn
or getChar
and elevate them to the correct level.
writeCharacterM :: StateT ProgramState IO ()
writeCharacterM = do
my_state <- get
c <- liftIO getChar
put $ writeCharacter c my_state
return ()
outputCharacterM :: StateT ProgramState IO ()
outputCharacterM = do
my_state <- get
liftIO $ outputCharacter my_state
return ()
Next, to implement jumping, we combine the jumpProgram
function
we wrote earlier with a check that we either have a JumpForward
instruction and a 0
value on the memory tape, or a JumpBackward
instruction and a non-zero value on the memory tape. This is
implemented below.
isNull :: ProgramState -> Bool
isNull (_, mem)
| c == Just '\0' = True
| otherwise = False
where c = readMemory mem
jumpProgramM :: StateT ProgramState IO ()
jumpProgramM = do
my_state <- get
curr_instr <- getStateInstructionM
if ((isNull my_state) && curr_instr == JumpForward) || ((not (isNull my_state)) && curr_instr == JumpBackward)
then put $ jumpProgram my_state
else put my_state
return ()
Finally, we have a couple functions which return information
about the current state of the program. The getStateInstructionM
function
returns the current instruction, while the running
function checks
whether the program counter has exceeded the length of the instruction tape.
getStateInstructionM :: StateT ProgramState IO Command
getStateInstructionM = do
my_state <- get
return . fromJust . getStateInstruction $ my_state
running :: StateT ProgramState IO Bool
running = do
my_state <- get
return . not . isFinished $ my_state
isFinished :: ProgramState -> Bool
isFinished (InstructionTape commands ptr, _) = ptr >= (length commands)
Now that all the pieces are in place, we can construct the actual emulation
loop. This looks similar to an imperative program. It makes use of the
whileM_
function, which continues to execute as long as its first argument
is True
, similar to how a while
loop works in most programming languages.
emulate :: StateT ProgramState IO ()
emulate = do
whileM_ running (do
curr_cmd <- getStateInstructionM
case curr_cmd of PointerDecr -> moveDataPointerM (-1)
PointerIncr -> moveDataPointerM 1
ByteDecr -> changeDataM (-1)
ByteIncr -> changeDataM 1
ByteOutput -> outputCharacterM
ByteInput -> writeCharacterM
JumpForward -> jumpProgramM
JumpBackward -> jumpProgramM
incrementProgramCounterM
)
We start by binding the current command to curr_cmd
. We then use a
case statement to perform different actions based on the possible commands.
Lastly, we increment the program counter. This loop repeats until the program
executes all of its instructions, at which point it terminates.
Updated Main Function
Now that we have a more comprehensive picture of the complete state of
the emulation loop, we can finish our main
function. The full
function body can be seen below.
main :: IO ()
main = do
-- Read information from the environment
args <- getArgs
prog_name <- getProgName
-- Validate arguments
when ((length args) == 0) $ do
putStrLn $ usage prog_name
exitFailure
-- Read in the input program
let fname = args !! 0
program_contents <- readFile fname
-- Execute the program
let instructions = readProgram program_contents
program = initializeProgramState instructions
runStateT emulate program >> return ()
The runStateT
function is used to run a stateT
monadic function
inside a different monad. Here, we give it an internal state
which matches the output of initializeProgramState
.
Playing Around With Brainf*ck Code
Now that we've created our Brainf*ck interpreter, we can write a couple of (relatively) simple programs to play around with the interpreter and ensure that everything is working correctly.
Hello World
Our first program, which is usually one of the simplest to write, is actually surprisingly complicated in Brainf*ck. Wikipedia has a great example program which prints out the text "Hello World!" They make use of looping to write the entire program in just 106 characters. In order to keep things simple, we will just write a program that prints out "Hi!".
The first step to writing this program is looking up the ASCII values of the desired characters. These are:
Character | ASCII |
---|---|
'H' | 72 |
'i' | 106 |
'!' | 33 |
'\n' | 10 |
Next, we can structure our program. The simplest way to write the program
would be to exclusively use the +
, -
, >
, and .
commands. However, that
program would require 155 commands. We can shorten this a little by using
a loop. The full program looks like this:
++++++++ Set Cell #0 to 8 (loop counter)
[
>++++ Set Cell #1 to 4 (will be set to this each loop)
[
>++ Add 2 to Cell #2
>+++ Add 3 to Cell #3
>+ Add 1 to Cell #4
<<<- Decrement the inner loop counter
]
>+ Add 1 to Cell #2
>+ Add 1 to Cell #3
>>+ Add 1 to Cell #5
[<] Move back to Cell #1
<- Decrement the outer loop counter
]
After running this we have the following memory contents:
| Cell | 0 | 1 | 2 | 3 | 4 | 5 |
| Value | 0 | 0 | 72 | 104 | 32 | 8 |
>>. Print out 'H'
>+. Print out 'i'
>+. Print out '!'
>++. Print out '\n'
Parrot
Now, we will make a simpler program which just parrots whatever someone says back at them. This will let us play with both inputs and outputs. The program looks like the following:
+ Set Cell #0 to 1
[ Enter an infinite loop
>, Shift to Cell #1 and read in a character
.< Output that character and shift back to Cell #0
]
Our program acts like a simplified version of the cat
program. You can
input as many characters as you want, and when you hit Return/Enter, it
parrots all of those characters back to you.
Conclusion
Thank you for joining me for a wild ride through two of the strangest programming languages I've ever had the pleasure of working with. The full code for this project can be found in this Git repository.
Credit to this repository for the title image.