A Tiny Bit of ARM64 Assembly
Sat 15 February 2025Experienced programmers with time on their side always insist you should know how to program in C, and you should be able to read assembly. Maybe they are right. Or maybe they have the good fortune to be able to pontificate from on high. I mean, I agree with them. I strongly believe everyone who works in technology (in, not just with), should know what is happening under the hood. But maybe I am wrong.
If you do want to start learning to read assembly, where and how do you start? The time honoured tradition here is: write a simple program in C, compile it, disassemble it, and compare the two.
If you are on a Mac, you can use clang
to compile your C code, and otool
to do the disassembly. If you have a full C/C++ IDE available, it likely has a disassembler built in.
Start with some simple math, and use small numbers. This makes it easy to see the numbers in the assembly code. Take this bit of C for example:
int main()
{
int a,b,c;
a=5;
b=7;
c=a+b;
}
It should be clear what is happening here. Compile it, and then disassemble it:
$ make foo
cc foo.c -o foo
$ otool -tv foo
foo:
(__TEXT,__text) section
_main:
0000000100003f78 sub sp, sp, #0x10
0000000100003f7c mov w8, #0x5
0000000100003f80 str w8, [sp, #0xc]
0000000100003f84 mov w8, #0x7
0000000100003f88 str w8, [sp, #0x8]
0000000100003f8c ldr w8, [sp, #0xc]
0000000100003f90 ldr w9, [sp, #0x8]
0000000100003f94 add w8, w8, w9
0000000100003f98 str w8, [sp, #0x4]
0000000100003f9c mov w0, #0x0
0000000100003fa0 add sp, sp, #0x10
0000000100003fa4 ret
Now start with the basics. You know you assigned the values 5 and 7 to some variables, and you can see those assignments on lines 3f7c
and 3f84
(referring to the last four digits of the hex numbers in the first column). Both of these lines are using the mov
operator to copy data into a register, in this case register w8.
ARM64 General-Purpose Registers
ARM has 30 general-purpose, 64-bit registers. These registers can be referred to in different ways, each way dictating the number of bit that you are addressing. Remember, these registers are 64-bits wide, and they are named x0
through x30
.
But they are also named w0
through w30
. What is the difference? If you use the x
names, you are addressing the full 64 bits of each register. If you use the w
names, you are addressing the lower 32 bits of each register. In other words, using the x
registers will result in 64-bit operations while the w
registers will result in 32-bit operations.
Notice in the C code above the variables are ints
, and the register references in the assembly are w
registers. If I change the ints
in the source code to longs
:
int main()
{
int a,b,c;
a=5;
b=7;
c=a+b;
}
Compile, and disassemble:
foo:
(__TEXT,__text) section
_main:
0000000100003f78 sub sp, sp, #0x20
0000000100003f7c mov x8, #0x5
0000000100003f80 str x8, [sp, #0x18]
0000000100003f84 mov x8, #0x7
0000000100003f88 str x8, [sp, #0x10]
0000000100003f8c ldr x8, [sp, #0x18]
0000000100003f90 ldr x9, [sp, #0x10]
0000000100003f94 add x8, x8, x9
0000000100003f98 str x8, [sp, #0x8]
0000000100003f9c mov w0, #0x0
0000000100003fa0 add sp, sp, #0x20
0000000100003fa4 ret
Now the registers are being referenced with their x
names, other than on line 3f9c
, which is likely doing something special.