Okay, so you want to see how small you can make a 64 bit binary. In the age of giant bloated applications full of impossibly convoluted machine instructions, eating up your memory and disk space, it’s nice sometimes to get down to the lowest of low levels and create something so tiny, that you know what every single bit is doing and it’s purpose. To do so, we need to employ some standard tricks and a little creativity to get us down there.
Building Your Binary
Let’s start with a really simple program that prints a string in the terminal! I chose these smaller opcodes to save a bit more space, but we can get into assembly optimization in another post.
|
|
This program uses the most primitive form of writing to STDOUT. It invokes a raw Unix system call to the kernel, with the registers containing the arguments.
Save this into a file called asm_smile.s
vim asm_smile.s
as asm_smile.s -o asm_smile.o
Now we’ve created an object file that can be used to create an executable. We can link it with ld, then run.
$ ld asm_smile.o -o asm_smile
$ ./asm_smile
[^0^] u!!
Okay what have we done here? Let’s take a look at the raw data we generated. A good place to start is objdump.
$ objdump -d asm_smile
asm_smile: file format elf64-x86-64
Disassembly of section .text:
0000000000400078 <_start>:
400078: b0 01 mov $0x1,%al
40007a: 48 89 c7 mov %rax,%rdi
40007d: 48 c7 c6 8f 00 40 00 mov $0x40008f,%rsi
400084: b2 0b mov $0xb,%dl
400086: 0f 05 syscall
400088: b0 3c mov $0x3c,%al
40008a: 48 31 ff xor %rdi,%rdi
40008d: 0f 05 syscall
000000000040008f <msg>:
40008f: 5b pop %rbx
400090: 5e pop %rsi
400091: 30 5e 5d xor %bl,0x5d(%rsi)
400094: 20 75 21 and %dh,0x21(%rbp)
400097: 21 0a and %ecx,(%rdx)
Our program + string is only 33 bytes, so why is our binary 752 bytes? Let’s take a look at a quick hex dump.
Hrm… There’s quite a bit of extra data in there! We can see our program begins at 0x78 and ends at 0x98. How can you make a binary smaller right off the bat? We can use strip!
$ strip asm_smile
Strip reads a binary file, and removes a lot of the extra debug and compiler info that isn’t needed. So what does our binary look like now?
After using strip, we are down to 368 bytes! That’s a pretty small binary. But remember, our machine instructions were just 33 bytes, so what’s up with all this overhead?
To understand this, we need to break down the sections of an ELF binary real quick. If you’re not used to looking at hex dumps and hand modifying data, this is a great place to start. It’s not that scary!
Under the Hood
All ELF binaries need to have a few things in place in order for them to be interpreted by the Linux kernel properly. As with Windows EXEs, there’s a structure to the header that defines the overall layout of the binary.
This example is using x86_64 assembly, so the ELF binaries I am describing here are the 64 bit version. The 32 bit version is slightly different.
Let’s take a look at what other information is in this binary. We can use a program called readelf to help us follow along!
What does this all mean? We will start by first understanding the ELF header.
ELF Header
The ELF header section defines the file as an ELF binary. In the hex dump it looks like this:
Each one of these bytes has a specific purpose.
Offset | # │ Description | |
---|---|---|
00-03 | A │ Magic number - 0x7F, then ‘ELF’ in ASCII | |
04 | B │ 1 = 32 bit, 2 = 64 bit | |
05 | C │ 1 = little endian, 2 = big endian | |
06 | D │ ELF header version | |
07 | E │ OS ABI - usually 0 for System V | |
08-0F | F │ Unused/padding | |
10-11 | G │ 1 = relocatable, 2 = executable, 3 = shared, 4 = core | |
12-13 | H │ Instruction set - see table below | |
14-17 | I │ ELF Version | |
18-1F | J │ Program entry position | |
20-27 | K │ Program header table position | |
28-2F | L │ Section header table position | |
30-33 | M │ Flags - architecture dependent; see note below | |
34-35 | N │ Header size | |
36-37 | O │ Size of an entry in the program header table | |
38-39 | P │ Number of entries in the program header table | |
3A-3B | Q │ Size of an entry in the section header table | |
3C-3D | R │ Number of entries in the section header table | |
3E-3F | S │ Index in section header table with the section names |
This is the ELF Header with index values to show exactly where these values line up in our binary.
These values are mainly metadata that tells the operating system what to do with this file. I won’t get too deep into what these things mean, but they are necessary to be aware of as we move along. You can find more info here! https://wiki.osdev.org/ELF
Program Headers
Next up is the program header. This area describes a segment and other info that the operating system needs to know how to run the program. This is how it appears in our hex dump:
Here is a quick listing of the components that make up the program header. Note: The offsets are relative to the start of the program header (at 0x40).
Here’s a layout of our the program header in our binary, with indexes for reference.
From the output of readelf, we can see that it matches up with the hex dump.
[ .text Section ]───────────────────────────────────────────────────────────────
Next up is the machine instructions themselves. We saw these earlier when we used objdump, but in their raw form they look like this.
You can see that they contain the 33 bytes of our program.
[ Section Headers ]─────────────────────────────────────────────────────────────
These next chunks of information are known as the section headers. They are used to describe the layout of the sections in the binary.
You can see in the section header output of readelf that we have descriptions of the .text and .shstrtab sections. The .text section is what we just saw above, at offset 0x00000078, containing the machine instructions.
The section after that is .shstrtab, which is the table of addresses where strings are located in the binary.
In a binary this small, with no labels or anything else, .shstrtab only exists to say that it exists, by describing the location of the .shstrtab label.
In any case, these sections are totally unnecessary unless you are actively debugging the program. All we need are the machine instructions, so we can get rid of this big bulk of bytes taken up by the .shstrtab and the section headers by hand with your hex editor of choice.
Delete everything from 0x99 on!
Our binary now looks like this
We keep the 0a byte at the end just so the terminal knows that the string is over and we need a new line.
[ Mangling ]────────────────────────────────────────────────────────────────────
Well, we are now down to 0x99 (153) bytes. This is pretty small, but we can do more to get this thing even smaller.
We can see in the objdump from before that we MOV 0x40008f into %rsi, which is the virtual address pointing to our string 0x5b5e305e5d207521210a. or “[^0^] u!!\n”
If the program is pointing to the address of the string at 0x40008f, then that means that it maps out to 0x00008f in our binary. What if we save even more space (10 whole bytes!) by moving our string somewhere else?
But where else, and how? Well, we can try and find some unused space elsewhere to store our string. At first glance it looks like all the bytes in our binary are accounted for. Admittedly, x86_64’s structure is a bit more rigid than x86, because of the amount of space needed to hold addresses in such a large memory space. But, there are still some spots that we can hide some data.
The ELF Header from above contains a bit of padding at 0x08–0x015. It also contains some bytes that are pretty much always going to be a specific value at this point in ELF’s history.
Two of these values are the ELF version (which is 1 for version 1) at 0x06, and the OS Application Binary Interface at 0x07. These can be overwritten and still run on most Unix based systems, and are a perfect location to begin our code insertion.
We now have 10 bytes free that we can use to move our string up into the header like this:
Now before we run this, we have to make sure our machine code is pointing to where our new string is. Previously we were at 0x40008f, which is referenced in the binary at 0x00000080
48 c7 c6 8f 00 40 00 mov $0x40008f,%rsi
Since our string is now at 0x00000006 in our binary, we change the address at 0x00000080 as such. Simply swap out 8f for 06. Note: Addresses are little endian, so 0x0040008f is represented as 0x8f004000.
And there you have it. We have successfully rearranged this binary by hand to hide code in the header, and have removed debugging capabilities. Our binary should do the same thing as it did when we first compiled it, but now at a lean 143 bytes.
Final Output:
ARM32 Binary Mangling