A Tiny Bit of ARM64 Assembly

Experienced programmers with time on their side always insist you should know how to program in C, and you should be able to read assembly. Maybe they are right. Or maybe they have the good fortune to be able to pontificate from on high. I mean, I agree with them. I strongly believe everyone who works in technology (in, not just with), should know what is happening under the hood. But maybe I am wrong.

If you do want to start learning to read assembly, where and how do you start? The time honoured tradition here is: write a simple program in C, compile it, disassemble it, and compare the two.

If you are on a Mac, you can use clang to compile your C code, and otool to do the disassembly. If you have a full C/C++ IDE available, it likely has a disassembler built in.

Start with some simple math, and use small numbers. This makes it easy to see the numbers in the assembly code. Take this bit of C for example:

int main()
{
    int a,b,c;
    a=5;
    b=7;
    c=a+b;
}

It should be clear what is happening here. Compile it, and then disassemble it:

$ make foo
cc     foo.c   -o foo
$ otool -tv foo
foo:
(__TEXT,__text) section
_main:
0000000100003f78        sub     sp, sp, #0x10
0000000100003f7c        mov     w8, #0x5
0000000100003f80        str     w8, [sp, #0xc]
0000000100003f84        mov     w8, #0x7
0000000100003f88        str     w8, [sp, #0x8]
0000000100003f8c        ldr     w8, [sp, #0xc]
0000000100003f90        ldr     w9, [sp, #0x8]
0000000100003f94        add     w8, w8, w9
0000000100003f98        str     w8, [sp, #0x4]
0000000100003f9c        mov     w0, #0x0
0000000100003fa0        add     sp, sp, #0x10
0000000100003fa4        ret

Now start with the basics. You know you assigned the values 5 and 7 to some variables, and you can see those assignments on lines 3f7c and 3f84 (referring to the last four digits of the hex numbers in the first column). Both of these lines are using the mov operator to copy data into a register, in this case register w8.

ARM64 General-Purpose Registers

ARM has 30 general-purpose, 64-bit registers. These registers can be referred to in different ways, each way dictating the number of bit that you are addressing. Remember, these registers are 64-bits wide, and they are named x0 through x30.

But they are also named w0 through w30. What is the difference? If you use the x names, you are addressing the full 64 bits of each register. If you use the w names, you are addressing the lower 32 bits of each register. In other words, using the x registers will result in 64-bit operations while the w registers will result in 32-bit operations.

Notice in the C code above the variables are ints, and the register references in the assembly are w registers. If I change the ints in the source code to longs:

int main()
{
    int a,b,c;
    a=5;
    b=7;
    c=a+b;
}

Compile, and disassemble:

foo:
(__TEXT,__text) section
_main:
0000000100003f78        sub     sp, sp, #0x20
0000000100003f7c        mov     x8, #0x5
0000000100003f80        str     x8, [sp, #0x18]
0000000100003f84        mov     x8, #0x7
0000000100003f88        str     x8, [sp, #0x10]
0000000100003f8c        ldr     x8, [sp, #0x18]
0000000100003f90        ldr     x9, [sp, #0x10]
0000000100003f94        add     x8, x8, x9
0000000100003f98        str     x8, [sp, #0x8]
0000000100003f9c        mov     w0, #0x0
0000000100003fa0        add     sp, sp, #0x20
0000000100003fa4        ret

Now the registers are being referenced with their x names, other than on line 3f9c, which is likely doing something special.