ELF Binary Mangling Pt. 1: Concepts


[:  :]

Okay, so you want to see how small you can make a 64 bit binary. In the age of giant bloated applications full of impossibly convoluted machine instructions, eating up your memory and disk space, it’s nice sometimes to get down to the lowest of low levels and create something so tiny, that you know what every single bit is doing and it’s purpose. To do so, we need to employ some standard tricks and a little creativity to get us down there.

Building Your Binary

Let’s start with a really simple program that prints a string in the terminal! I chose these smaller opcodes to save a bit more space, but we can get into assembly optimization in another post.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
;                                                                       smile.asm
;────────────────────────────────────────────────────────────────────────────────
.global _start
.text
_start:            
    mov   $1, %al     # RAX holds syscall 1 (write), I chose to use
                      # %al, which is the lower 8 bits of the %rax 
                      # register. From a binary standpoint, there
                      # is less space used to represent this than
                      # mov $1, %rax
    mov   %rax, %rdi  # RDI holds File Handle 1, STDOUT. This means 
                      # that we are writing to the screen. Again, 
                      # moving RAX to RDI is shorter than 
                      # using mov $1, %rdi
    mov   $msg, %rsi  # RSI holds the address of our string buffer. 
    mov   $11, %dl    # RDX holds the size our of string buffer. 
                      # Moving into %dl to save space.
    syscall           # Invoke a syscall with these arguments.
    mov   $60, %al    # Now we are invoking syscall 60. 
    xor   %rdi, %rdi  # Zero out RDI, which holds the return value.
    syscall           # Call the system again to exit.
msg:
    .ascii "[^0^] u!!\n"

This program uses the most primitive form of writing to STDOUT. It invokes a raw Unix system call to the kernel, with the registers containing the arguments.

Save this into a file called asm_smile.s

vim asm_smile.s
as asm_smile.s -o asm_smile.o

Now we’ve created an object file that can be used to create an executable. We can link it with ld, then run.

$ ld asm_smile.o -o asm_smile
$ ./asm_smile
[^0^] u!!

Okay what have we done here? Let’s take a look at the raw data we generated. A good place to start is objdump.

$ objdump -d asm_smile
 
   asm_smile:     file format elf64-x86-64
   
   Disassembly of section .text:
   
   0000000000400078 <_start>:
     400078:       b0 01                   mov    $0x1,%al
     40007a:       48 89 c7                mov    %rax,%rdi
     40007d:       48 c7 c6 8f 00 40 00    mov    $0x40008f,%rsi
     400084:       b2 0b                   mov    $0xb,%dl
     400086:       0f 05                   syscall
     400088:       b0 3c                   mov    $0x3c,%al
     40008a:       48 31 ff                xor    %rdi,%rdi
     40008d:       0f 05                   syscall
   
   000000000040008f <msg>:
     40008f:       5b                      pop    %rbx
     400090:       5e                      pop    %rsi
     400091:       30 5e 5d                xor    %bl,0x5d(%rsi)
     400094:       20 75 21                and    %dh,0x21(%rbp)
     400097:       21 0a                   and    %ecx,(%rdx)

Our program + string is only 33 bytes, so why is our binary 752 bytes? Let’s take a look at a quick hex dump.

Hrm… There’s quite a bit of extra data in there! We can see our program begins at 0x78 and ends at 0x98. How can you make a binary smaller right off the bat? We can use strip!

$ strip asm_smile

Strip reads a binary file, and removes a lot of the extra debug and compiler info that isn’t needed. So what does our binary look like now?

After using strip, we are down to 368 bytes! That’s a pretty small binary. But remember, our machine instructions were just 33 bytes, so what’s up with all this overhead?

To understand this, we need to break down the sections of an ELF binary real quick. If you’re not used to looking at hex dumps and hand modifying data, this is a great place to start. It’s not that scary!

Under the Hood

All ELF binaries need to have a few things in place in order for them to be interpreted by the Linux kernel properly. As with Windows EXEs, there’s a structure to the header that defines the overall layout of the binary.

This example is using x86_64 assembly, so the ELF binaries I am describing here are the 64 bit version. The 32 bit version is slightly different.

Let’s take a look at what other information is in this binary. We can use a program called readelf to help us follow along!

What does this all mean? We will start by first understanding the ELF header.

ELF Header

The ELF header section defines the file as an ELF binary. In the hex dump it looks like this:

Each one of these bytes has a specific purpose.

Offset# │ Description
00-03A │ Magic number - 0x7F, then ‘ELF’ in ASCII
04B │ 1 = 32 bit, 2 = 64 bit
05C │ 1 = little endian, 2 = big endian
06D │ ELF header version
07E │ OS ABI - usually 0 for System V
08-0FF │ Unused/padding
10-11G │ 1 = relocatable, 2 = executable, 3 = shared, 4 = core
12-13H │ Instruction set - see table below
14-17I │ ELF Version
18-1FJ │ Program entry position
20-27K │ Program header table position
28-2FL │ Section header table position
30-33M │ Flags - architecture dependent; see note below
34-35N │ Header size
36-37O │ Size of an entry in the program header table
38-39P │ Number of entries in the program header table
3A-3BQ │ Size of an entry in the section header table
3C-3DR │ Number of entries in the section header table
3E-3FS │ Index in section header table with the section names

This is the ELF Header with index values to show exactly where these values line up in our binary.

These values are mainly metadata that tells the operating system what to do with this file. I won’t get too deep into what these things mean, but they are necessary to be aware of as we move along. You can find more info here! https://wiki.osdev.org/ELF

Program Headers

Next up is the program header. This area describes a segment and other info that the operating system needs to know how to run the program. This is how it appears in our hex dump:

Here is a quick listing of the components that make up the program header. Note: The offsets are relative to the start of the program header (at 0x40).

Here’s a layout of our the program header in our binary, with indexes for reference.

From the output of readelf, we can see that it matches up with the hex dump.

[ .text Section ]───────────────────────────────────────────────────────────────

Next up is the machine instructions themselves. We saw these earlier when we used objdump, but in their raw form they look like this.

You can see that they contain the 33 bytes of our program.

[ Section Headers ]─────────────────────────────────────────────────────────────

These next chunks of information are known as the section headers. They are used to describe the layout of the sections in the binary.

You can see in the section header output of readelf that we have descriptions of the .text and .shstrtab sections. The .text section is what we just saw above, at offset 0x00000078, containing the machine instructions.

The section after that is .shstrtab, which is the table of addresses where strings are located in the binary.

In a binary this small, with no labels or anything else, .shstrtab only exists to say that it exists, by describing the location of the .shstrtab label.

In any case, these sections are totally unnecessary unless you are actively debugging the program. All we need are the machine instructions, so we can get rid of this big bulk of bytes taken up by the .shstrtab and the section headers by hand with your hex editor of choice.

Delete everything from 0x99 on!

Our binary now looks like this

We keep the 0a byte at the end just so the terminal knows that the string is over and we need a new line.

[ Mangling ]────────────────────────────────────────────────────────────────────

Well, we are now down to 0x99 (153) bytes. This is pretty small, but we can do more to get this thing even smaller.

We can see in the objdump from before that we MOV 0x40008f into %rsi, which is the virtual address pointing to our string 0x5b5e305e5d207521210a. or “[^0^] u!!\n”

If the program is pointing to the address of the string at 0x40008f, then that means that it maps out to 0x00008f in our binary. What if we save even more space (10 whole bytes!) by moving our string somewhere else?

But where else, and how? Well, we can try and find some unused space elsewhere to store our string. At first glance it looks like all the bytes in our binary are accounted for. Admittedly, x86_64’s structure is a bit more rigid than x86, because of the amount of space needed to hold addresses in such a large memory space. But, there are still some spots that we can hide some data.

The ELF Header from above contains a bit of padding at 0x08–0x015. It also contains some bytes that are pretty much always going to be a specific value at this point in ELF’s history.

Two of these values are the ELF version (which is 1 for version 1) at 0x06, and the OS Application Binary Interface at 0x07. These can be overwritten and still run on most Unix based systems, and are a perfect location to begin our code insertion.

We now have 10 bytes free that we can use to move our string up into the header like this:

Now before we run this, we have to make sure our machine code is pointing to where our new string is. Previously we were at 0x40008f, which is referenced in the binary at 0x00000080

48 c7 c6 8f 00 40 00    mov    $0x40008f,%rsi

Since our string is now at 0x00000006 in our binary, we change the address at 0x00000080 as such. Simply swap out 8f for 06. Note: Addresses are little endian, so 0x0040008f is represented as 0x8f004000.

And there you have it. We have successfully rearranged this binary by hand to hide code in the header, and have removed debugging capabilities. Our binary should do the same thing as it did when we first compiled it, but now at a lean 143 bytes.

Final Output: