Esoteric Haskell

Posted on 2021-01-17

One of my favorite parts of being a programmer has always been learning new languages. Each language I've learned has introduced me to new programming concepts and ideas which have made me a better programmer. Python (my first language) taught me all of the basics of programming and inspired me to learn more, C taught me how computers really work at a low level, and Rust taught me the joys of safety and immutability. However, the one language that has caused me to rethink how I program the most has definitely been Haskell

The biggest difference between Haskell and other programming languages I've worked with is its emphasis on pure functions and immutability. Pure functions are functions which have no side-effects. This means that if you give a function the same inputs, it will always return the same outputs. It also means that we have complete immutability, so if we wanted to add one element to a list, we would have to return a completely new list.

Haskell Logo

While Haskell is a strange language by most developers' standards, there is one family of languages which takes this to the next level. These are the esoteric languages. Esoteric programming languages push the bounds of computing, and there have been languages which encode programs in pictures, create programs out of Shakespearean plays, or even consist entirely of whitespace. While these programming languages are both amazing and terrifying, my favorite esoteric language is definitely Brainf*ck.

In this post, we will explore the strange worlds of Brainf*ck and Haskell, and use Haskell to create a Brainf*ck interpreter so that we can write programs in some of the strangest languages around.

Brainf*ck Background

I was first introduced to Brainf*ck language through its cousin, P''. P'' was originally written as a minimal Turing-complete language, and contains only 6 symbols used to manipulate the contents of its memory and perform basic looping. Brainf*ck extends these capabilities by adding two new characters, which allow the language to read read things into or write things out from memory.

There are four things that we need to keep track of in order to understand what our Brainf*ck program is doing. These are:

  • Input Stream: This is similar to other programming languages and text-based programs. This stream is STDIN on Linux systems, and is the same place that the text from a Python input() statement comes from.
  • Output Stream: This is also like any other programming language. It is STDOUT on Linux systems, and is accessed by the print() statement in Python.
  • Program Tape: We can think of this as an array of instructions. Each instruction in Brainf*ck is a single character, like <, >, or +. These instructions are stored in an array, and can be indexed by a "Program Counter" which we control.
  • Memory Tape: This is simply an array of characters which our program can manipulate. It is similar to memory in other programming languages, except with fewer operations we can perform on it. This array is accessed by a "Data Pointer", which we control.

Next, let's go over the instructions that make up the Brainf*ck language.

  • >: This instruction increments the data pointer, which allows us to access the next element in the memory tape.
  • <: This instruction decrements the data pointer, which allows us to access the previous element in the memory tape.
  • +: This instruction increments the current item in the memory tape. The memory tape consists of 8-bit ASCII characters, so incrementing "a" would turn it into "b".
  • -: This instruction does the reverse of +, decrementing the value of the current character.
  • .: This instruction outputs the current character in the memory tape to the output stream.
  • ,: This instruction reads in a character from the input stream and writes it to the current character in the memory tape.
  • [: This instruction checks the current item in the memory tape. If that item is zero, it will jump forward to the matching ].
  • ]: This instruction does the opposite of [. If the item on the memory tape is non-zero, it jumps back to the matching [. Otherwise, it does nothing.

These instructions are nicely summarized in this table from the Brainf*ck Wikipedia page, which shows the corresponding C commands for each Brainf*ck instruction.

Brainf*ck Command C Equivalent
(Program Start) char ptr[30000] = {0};
> ++ptr;
< --ptr;
+ ++(*ptr);
- --(*ptr);
. putchar(*ptr);
, *ptr = getchar();
[ while (*ptr) {
] }

Building a Brainf*ck Interpreter

Our goal for this post is to build a Brainf*ck interpreter in Haskell. An interpreter is a program which reads in another program written in some language and directly executes that program. In our case, the interpreter will be written in Haskell, and the target language will be Brainf*ck.

Haskell provides an interesting challenge for writing this kind of interpreter. Haskell is inherently immutable, while Brainf*ck is inherently built on the premise of mutable state. This is a worthwhile exercise, however, as it allows us to gain insight to into several parts of Haskell, as well as functional programming more generally.

Getting Started: Outlining the Main Function

Before we get lost in the functional weeds, let's define at a high level what we want our program to do. We should be able to pass in a Brainf*ck program as an argument and have the interpreter execute that program. This would give the following usage, where PROG_NAME is the name of the input program.

./brainfk PROG_NAME

We will start writing our main file, brainfk.hs, by adding some necessary package imports.

import System.Exit
import System.Environment
import Control.Monad
import Control.Monad.State

At a high level, our program will need to be able to read a file name from the command line, open and parse that file, and then run the parsed program. This gives us the following main function.

main :: IO ()
main = do
  -- Read information from the environment
  args <- getArgs
  prog_name <- getProgName

  -- Validate arguments
  when ((length args) == 0) $ do
    putStrLn $ usage prog_name
    exitFailure

  -- Read in the input program
  let fname = args !! 0
  program_contents <- readFile fname

  -- TODO: Execute the program

Let's walk through this line by line. First, we declare our main function as an IO () monad. A monad is one of the more abstract parts of Haskell, and allows us to do things not usually possible in a pure function, like read input and write output. We start our declaration of main with the keyword do, which allows us to chain multiple commands together. Next, we read in our arguments and the name of the program using getArgs and getProgName.

After reading our arguments, we validate them to ensure that we actually got a file to read. The when function is similar to an if statement in other languages, and only executes the lines within its do block if the condition evaluates to True. In this case, the do block checks if we received no arguments, and calls the usage function if so. This function prints a help message to the user, and is defined as follows:

usage :: String -> String
usage prog_name = "usage: " ++ prog_name ++ " PROG_NAME"

The last part of this initial stage is to read in the Brainf*ck file and bind it to the program_contents variable. This variable will contain a string with the entire contents of the input file inside.

Defining Our Command Interface

Now that we've got our setup code established, we can begin to define our data interface. Haskell has a really nice system for defining custom types, which allows us to use the data keyword to define our own types. For example, the type Command, as defined below, allows us to encapsulate the possible commands in a Brainf*ck program.

data Command = PointerIncr | PointerDecr
             | ByteIncr    | ByteDecr
             | ByteOutput  | ByteInput
             | JumpForward | JumpBackward
             deriving (Show, Eq)

The deriving (Show, Eq) means that our Command data type can be printed and compared. For example:

PointerIncr == PointerDecr  -- returns False
putStrLn $ show PointerIncr -- print a PointerIncr

We can then define a couple helper functions to make using our Command data type easier.

readCommand :: Char -> Maybe Command
readCommand x = case x of '>' -> Just PointerIncr
                          '<' -> Just PointerDecr
                          '+' -> Just ByteIncr
                          '-' -> Just ByteDecr
                          '.' -> Just ByteOutput
                          ',' -> Just ByteInput
                          '[' -> Just JumpForward
                          ']' -> Just JumpBackward
                          _   -> Nothing

readProgram :: String -> [Command]
readProgram = mapMaybe readCommand

invertCommand :: Command -> Command
invertCommand PointerDecr = PointerIncr
invertCommand PointerIncr = PointerDecr
invertCommand ByteDecr = ByteIncr
invertCommand ByteIncr = ByteDecr
invertCommand ByteInput = ByteOutput
invertCommand ByteOutput = ByteInput
invertCommand JumpBackward = JumpForward
invertCommand JumpForward = JumpBackward

We start by defining the readCommand function. This takes in a character x and attempts to read it into a Command, based on the definition of Brainf*ck defined above. The Maybe wrapper allows for us to encapsulate the uncertainty of whether a value will exist. If our input character x is not valid, then we return Nothing.

The readProgram extends readCommand by reading a string (which is equivalent to a list of Char) into a list of commands. The mapMaybe function is defined in Data.Maybe and maps over a list. It would be the equivalent to doing the following in Python.

out_list = []

for my_char in in_list: # Iterate through an input list
  x = f(my_char)        # Apply the function to each element
  if x != None:         # None is like Nothing in Python
    out_list.append(x)  # Add to the output list if not Nothing

Finally, the invertCommand function is a convenience functio which will be useful later on. Each command in Brainf*ck has an opposite. For example, . outputs a byte, while , reads in a byte. The invertCommand formalizes all of these pairs of outputs.

Defining Our Program State

Now that we have a way to represent individual commands, we will work on representing the state of the program as a whole. As we defined earlier, a Brainf*ck state consists of both a program tape and a memory tape, and a pointer pointing to each of them. We can then create a type ProgramState as follows to encapsulate this.

data InstructionTape = InstructionTape [Command] Int
data MemoryTape = MemoryTape [Char] Int
type ProgramState = (InstructionTape, MemoryTape)

The InstructionTape class represents our list of instructions, while the MemoryTape class represents our memory contents. Additionally, both contain the pointers within themselves. The ProgramState is just a type synonym, stating that the program state consists of both an instruction and memory tape.

Now that we have our program state defined, we need to define some functions to help us manipulate that state. First, we need a function to initialize the state.

initializeProgramState :: [Command] -> ProgramState
initializeProgramState instrs = (ins_tape, mem_tape)
  where ins_tape = InstructionTape instrs 0
        mem_tape = MemoryTape (repeat '\0') 0

As this function shows, we initialize both pointers to 0 and the instruction tape to an inputted list of commands. One interesting thing to note here is the definition of mem_tape. The repeat function creates an infinite list by repeating its input. Thanks to Haskell's lazy evaluation, we can create (theoretically) infinite data structures which are generated as-needed by the program. Now that we've initialized our program state, we need to implement each of the operations of Brainf*ck.

Incrementing the Program Counter

The functions below are used to increment the program counter.

updateInstructionPointer :: Int -> InstructionTape -> InstructionTape
updateInstructionPointer amt (InstructionTape xs pc) = InstructionTape xs (pc+amt)

incrementProgramCounter :: ProgramState -> ProgramState
incrementProgramCounter (instrs, mems) = (updateInstructionPointer 1 instrs, mems)

They work by simply unpacking the arguments, adding one to the programming counter, and stitching the arguments back together. We can run this function after every other function to increment the program counter between instructions.

Moving the Data Pointer

The > and < Brainf*ck functions both change the data pointer by one position. Both of their behavior can be encapsulated in a function which takes in an Int and a ProgramState and changes the data pointer by the amount given.

moveDataPointer :: Int -> ProgramState -> ProgramState
moveDataPointer amt (instrs, MemoryTape mems dp) = (instrs, MemoryTape mems (max 0 (dp+amt)))

This function works similarly to updateInstructionPointer, except it has logic to ensure that the data pointer never moves off the left side of the tape. Returning max 0 (dp+amt) ensures that the result can never decrease below 0.

Editing Data

There are three commands which can change the contents of a memory cell in a Brainf*ck program. The + and - commands increment and decrement the current value, while the , command reads in data from the input stream and writes it to memory. We can emulate this behavior with a couple functions which operate on program state.

changeData :: Int -> ProgramState -> ProgramState
changeData amt (instrs, MemoryTape mems dp) = (instrs, MemoryTape mems' dp)
  where mems' = updateListElem dp (updateChar amt) mems

writeCharacter :: Char -> ProgramState -> ProgramState
writeCharacter c (instrs, MemoryTape mem ptr) = (instrs, MemoryTape (updateList ptr c mem) ptr)

safeIndex :: Int -> [a] -> Maybe a
safeIndex n xs
  | n < 0 = Nothing
  | otherwise = Just (xs !! n)

updateListElem :: Int -> (a -> a) -> [a] -> [a]
updateListElem n f xs = front ++ val ++ back
  where front = take n xs
        back = drop (n+1) xs
        val = maybeToList . (fmap f) . (safeIndex n) $ xs

updateList :: Int -> a -> [a] -> [a]
updateList idx val xs = updateListElem idx (\_ -> val) xs

updateChar :: Int -> Char -> Char
updateChar n c = chr $ (n + (ord c)) `mod` 256

The changeData function is used to increment or decrement the current value in memory, while the writeCharacter allows for the insertion of arbitrary characters. These functions both rely on several helper functions. The updateListElem function takes an index n, a function f, and a list xs. It essentially applies the function f to the nth element in xs. The one complication is that since Haskell lists are immutable, we need to cheat the system a little by duplicating the list and changing the indexed element. This is done with the take and drop functions, as well as the safeIndex function. This function ensures that the index given is greater than or equal to zero, and will return Nothing if this is not the case.

There are a couple other functions, updateList and updateChar, which help to extend this further. The updateList function simply wraps updateListElem with a function which sets the result to a chosen value, allowing for a single list element to be modified. The updateChar function ensures that any updates to a character are done modulo 256, in order to ensure that the value stays within the 1-byte limitation.

Outputting Characters

In order to output characters from our ProgramState, we again need to rely on the IO monad. The function below takes in a ProgramState and outputs the value currently pointed to by the data pointer.

outputCharacter :: ProgramState -> IO ()
outputCharacter (_, mem) = case (readMemory mem) of Nothing -> return ()
                                                    Just c -> putChar c

readMemory :: MemoryTape -> Maybe Char
readMemory (MemoryTape xs ptr) = safeIndex ptr xs

This function relies on the readMemory function, which simply unwraps a MemoryTape data type and applies the safeIndex function to it. The result is that a character is outputted to the screen if the ProgramState is valid, but nothing happens otherwise.

Implementing Looping Instructions

The trickiest instructions to implement are the [ and ] instructions. These instructions are used to control program flow and allow for arbitrary looping. We write a function which takes a ProgramState and returns another ProgramState, such that the branching conditions are appropriately carried out.

jumpProgram :: ProgramState -> ProgramState
jumpProgram (instrs, mem) = (instrs', mem)
  where f :: Int -> Command -> InstructionTape -> InstructionTape
        f levels target cmds
          | new_symbol == Nothing || (new_symbol == Just target && levels == 0) = cmds'
          | new_symbol == Just target_inverse = f (levels+1) target cmds'
          | new_symbol == Just target = f (levels-1) target cmds'
          | otherwise = f levels target cmds'
          where dir = case target of JumpBackward -> 1
                                     JumpForward -> (-1)
                                     _ -> 1
                target_inverse = invertCommand target
                cmds' = updateInstructionPointer dir cmds
                new_symbol = readInstruction cmds'
        curr_symbol = readInstruction instrs
        instrs' = case curr_symbol of Just x -> f 0 (invertCommand x) instrs
                                      Nothing -> instrs

The jumpProgram function looks complicated, but it can be broken down into a few simpler steps. First, we define an internal function f. This function will carry out the actual computation, and will recursively search in a direction until it finds the target symbol. At each function call, the function will be given a "nested level" and a target symbol.

The function f can be thought of climbing as climbing a mountain. To make things more explicit, consider the following text.

[[[]]]

I would start at the leftmost [, seeking the rightmost ]. As I traverse right, I would continue to find more [ characters and climb further up the mountain. As I find more ] characters, I climb back down the mountain, until I eventually reach an elevation of zero at the target bracket. The f function implements this logic, taking different steps depending on what symbol is encountered.

At a high level, the function f just reads in the current symbol and calls the function f with the appropriate arguments, and finally returns the result. Putting all of these pieces together, we get a function which recursively traverses a list of instructions until it finds the correct matching bracket.

Implementing a Functional Emulation Loop

We've now written functions which can perform every possible operation on our ProgramState. All that's left is to chain them all together into something that can take our ProgramState and give it life. To do this, we will again turn to the magic that is Haskell monads. We've already used the IO monad a few times to handle inputs and outputs. Something else we need is a way to encapsulate the ProgramState as a mutable state which can be evolved over time. The solution I chose to use for this was the StateT monad. This monad allows us to combine a concept of state with our I/O manipulation.

If you look at Haskell tutorials on higher-level concepts like monads and functors, you will often hear the analogy of a box used. I think that analogy is especially relevant for the StateT monad. We can use the put command to put something, in this case our ProgramState, into the box. We can also use the get command to take that thing back out of the box. We can also define arbitrary functions to operate on whatever is currently inside the box, giving us a programming style which is fairly similar to imperative languages.

Our first step will be wrapping some of our previous functions in the StateT context. Most of these functions revolved around taking some ProgramState as input and emitting it as output. To lift these into the StateT context, we can use the following function:

liftStateFunction :: (a -> a) -> StateT a IO ()
liftStateFunction f = do
  my_state <- get
  put $ f my_state
  return ()

This function works by using get to retrieve the current program state, then using put to set the new state to the result of applying f to that state. Some applications of this function can be seen below.

incrementProgramCounterM :: StateT ProgramState IO ()
incrementProgramCounterM = liftStateFunction incrementProgramCounter

moveDataPointerM :: Int -> StateT ProgramState IO ()
moveDataPointerM x = liftStateFunction $ moveDataPointer x

changeDataM :: Int -> StateT ProgramState IO ()
changeDataM x = liftStateFunction $ changeData x

We need some slightly more specialized functions to handle input and output, although they still share the same general formula. Additionally, they make use of the liftIO function. Since the StateT monad we are using combines a state with the capability to perform I/O, we need a way of interfacing between the two. The liftIO function achieves this objective, allowing us to wrap functions like putStrLn or getChar and elevate them to the correct level.

writeCharacterM :: StateT ProgramState IO ()
writeCharacterM = do
  my_state <- get
  c <- liftIO getChar
  put $ writeCharacter c my_state
  return ()

outputCharacterM :: StateT ProgramState IO ()
outputCharacterM = do
  my_state <- get
  liftIO $ outputCharacter my_state
  return ()

Next, to implement jumping, we combine the jumpProgram function we wrote earlier with a check that we either have a JumpForward instruction and a 0 value on the memory tape, or a JumpBackward instruction and a non-zero value on the memory tape. This is implemented below.

isNull :: ProgramState -> Bool
isNull (_, mem)
  | c == Just '\0' = True
  | otherwise = False
  where c = readMemory mem

jumpProgramM :: StateT ProgramState IO ()
jumpProgramM = do
  my_state <- get
  curr_instr <- getStateInstructionM
  if ((isNull my_state) && curr_instr == JumpForward) || ((not (isNull my_state)) && curr_instr == JumpBackward)
    then put $ jumpProgram my_state
  else put my_state
  return ()

Finally, we have a couple functions which return information about the current state of the program. The getStateInstructionM function returns the current instruction, while the running function checks whether the program counter has exceeded the length of the instruction tape.

getStateInstructionM :: StateT ProgramState IO Command
getStateInstructionM = do
  my_state <- get
  return . fromJust . getStateInstruction $ my_state

running :: StateT ProgramState IO Bool
running = do
  my_state <- get
  return . not . isFinished $ my_state

isFinished :: ProgramState -> Bool
isFinished (InstructionTape commands ptr, _) = ptr >= (length commands)

Now that all the pieces are in place, we can construct the actual emulation loop. This looks similar to an imperative program. It makes use of the whileM_ function, which continues to execute as long as its first argument is True, similar to how a while loop works in most programming languages.

emulate :: StateT ProgramState IO ()
emulate = do
  whileM_ running (do
                      curr_cmd <- getStateInstructionM
                      case curr_cmd of PointerDecr -> moveDataPointerM (-1)
                                       PointerIncr -> moveDataPointerM 1
                                       ByteDecr -> changeDataM (-1)
                                       ByteIncr -> changeDataM 1
                                       ByteOutput -> outputCharacterM
                                       ByteInput -> writeCharacterM
                                       JumpForward -> jumpProgramM
                                       JumpBackward -> jumpProgramM
                      incrementProgramCounterM
               )

We start by binding the current command to curr_cmd. We then use a case statement to perform different actions based on the possible commands. Lastly, we increment the program counter. This loop repeats until the program executes all of its instructions, at which point it terminates.

Updated Main Function

Now that we have a more comprehensive picture of the complete state of the emulation loop, we can finish our main function. The full function body can be seen below.

main :: IO ()
main = do
  -- Read information from the environment
  args <- getArgs
  prog_name <- getProgName

  -- Validate arguments
  when ((length args) == 0) $ do
    putStrLn $ usage prog_name
    exitFailure

  -- Read in the input program
  let fname = args !! 0
  program_contents <- readFile fname

  -- Execute the program
  let instructions = readProgram program_contents
      program = initializeProgramState instructions

  runStateT emulate program >> return ()

The runStateT function is used to run a stateT monadic function inside a different monad. Here, we give it an internal state which matches the output of initializeProgramState.

Playing Around With Brainf*ck Code

Now that we've created our Brainf*ck interpreter, we can write a couple of (relatively) simple programs to play around with the interpreter and ensure that everything is working correctly.

Hello World

Our first program, which is usually one of the simplest to write, is actually surprisingly complicated in Brainf*ck. Wikipedia has a great example program which prints out the text "Hello World!" They make use of looping to write the entire program in just 106 characters. In order to keep things simple, we will just write a program that prints out "Hi!".

The first step to writing this program is looking up the ASCII values of the desired characters. These are:

Character ASCII
'H' 72
'i' 106
'!' 33
'\n' 10

Next, we can structure our program. The simplest way to write the program would be to exclusively use the +, -, >, and . commands. However, that program would require 155 commands. We can shorten this a little by using a loop. The full program looks like this:

++++++++  Set Cell #0 to 8 (loop counter)
[
  >++++   Set Cell #1 to 4 (will be set to this each loop)
  [
    >++   Add 2 to Cell #2
    >+++  Add 3 to Cell #3
    >+    Add 1 to Cell #4
    <<<-  Decrement the inner loop counter
  ]

  >+      Add 1 to Cell #2
  >+      Add 1 to Cell #3
  >>+     Add 1 to Cell #5

  [<]     Move back to Cell #1

  <-      Decrement the outer loop counter
]

After running this we have the following memory contents:
| Cell  | 0 | 1 | 2  | 3   | 4  | 5 |
| Value | 0 | 0 | 72 | 104 | 32 | 8 |

>>.      Print out 'H'
>+.      Print out 'i'
>+.      Print out '!'
>++.     Print out '\n'

Parrot

Now, we will make a simpler program which just parrots whatever someone says back at them. This will let us play with both inputs and outputs. The program looks like the following:

+         Set Cell #0 to 1
[         Enter an infinite loop
  >,      Shift to Cell #1 and read in a character
  .<      Output that character and shift back to Cell #0
]

Our program acts like a simplified version of the cat program. You can input as many characters as you want, and when you hit Return/Enter, it parrots all of those characters back to you.

Conclusion

Thank you for joining me for a wild ride through two of the strangest programming languages I've ever had the pleasure of working with. The full code for this project can be found in this Git repository.

Credit to this repository for the title image.