Assembler

The Hack Machine Language Specification

We distinguish two programs:

Binary Hack program: A binary Hack program is a sequence of text lines, each consisting of sixteen 0 and 1 characters. Contains the instruction data we load onto the CPU to execute.

Assembly Hack program: An assembly Hack program is a sequence of text lines, each being one of thre:

  • Assembly instruction: A symbolic A-instruction or a symbolic C-instruction.
  • Label declaration: A line of the form (xxx), where xxx is a symbol.
  • Comment: A line beginning with two slashes (//) is considered a comment and is ignored.

See Figure 4.5 for the specification of the Hack instruction set:

Hack Instruction Set

Handling Instructions

For each assembly instruction, the assembler

  1. Parses the instruction into its underlying fields.
  2. For each field, generates the corresponding bit-code, as specified in figure 4.5.
  3. If the instruction contains a symbolic reference, resolves the symbol into its numeric value.
  4. Assembles the resulting binary codes into a string of sixteen $0$ and $1$ characters.
  5. Writes the assembled string to the output file.

Handling Symbols

A common solution is to develop a two-pass assembler.

  1. The assembler creates a symbol table and initializes it with all the predefined symbols and their pre-allocated values.
  2. In the first pass, the assembler builds a symbol table, adds all the label symbols to the table, and generates no code
  3. In the second pass, the assembler handles the variable symbols and generates binary code, using the symbol table.