This is my entry for the first annual Binary Golf Grand Prix.
Like all good things, the Binary Golf Grand Prix challenge started life as a tweet from 0xdade, and further conversation with some fine folks on Matrix. The goal of this challenge was to create a binary that executed the same forwards as it did backwards.
We established some ground rules, and I made the [announcement(https://twitter.com/netspooky/status/1275256864314470402) on Twitter calling other people to participate. The results of the first BGGP will be posted here when the challenge officially ends. Watch my Twitter for updates.
Developing An Approach
This challenge lends itself towards binary formats without headers, such as COM, bootloaders, firmware images, and most binary programs for 8 bit computers. It would make sense due to the short widths of instructions and data, leading to a smaller number of possible combinations of machine instructions.
Since I’ve done quite a bit of work with golfing ELF files, I figured it was worth a shot to see the feasibility of doing palindromic binaries that will run on modern systems.
I began my work with a baseline of a barebones binary golf’d 64 bit ELF file ( see ELF Binary Mangling 2 ), for the foundation of this binary. The first task was to flip the entirety of the data structures and code backwards, so that I could still use nasm to build the binary.
The original idea I had involved alphanumeric shellcode, to use and spell out a palindromic sentence, such as this from Scottish poet Alastair Reid:
T. Eliot, top bard, notes putrid tang emanating, is sad; I’d assign it a name: gnat dirt upset on drab pot toilet.
Palindromes rely on the ambiguity of punctuation to work in a majority of cases, so something shorter would be needed, and preferably in all lowercase or upper case. I had decided to go with the phrase “PULLUP IF I PULLUP”, because it made more sense when presented as a string without punctuation.
The next problem quickly arose: the alphanumeric shellcode article was written in 2001. These techniques work for 32 bit instructions, but a lot of encoding was changed in 64 bit x86. This means we need to determine what still works, if anything.
Testing led me to understand that only letters P-Z are the only viable single byte instructions in the A-Z space. This means that I actually can’t do the original phrase “PULLUPIFIPULLUP” due to letters appearing outside this range.
50 push rax ; 'P'
51 push rcx ; 'Q'
52 push rdx ; 'R'
53 push rbx ; 'S'
54 push rsp ; 'T'
55 push rbp ; 'U'
56 push rsi ; 'V'
57 push rdi ; 'W'
58 pop rax ; 'X'
59 pop rcx ; 'Y'
5a pop rdx ; 'Z'
Luckily, there are vowel sounds that can be used to find some words. An online Scrabble word finder came in handy to generate some possible words for this. The palindromic phrase I decided to go with used only letters in our established range. I ended up with “PUPPY SPY, PSY P. PUP”, mainly because it’s absurd, but also it could be expressed with just the letters in the P-Z range.
The nice thing about these particular instructions is that they are really only just push & pop instructions, so you don’t have to worry too much about messing up data that might be in these registers, and just have to track where values might end up if I used them at all.
Now that we have all of this established, let’s start thinking in mirror mode.
Mirroring
The template binary only really executes 7 bytes:
0: b0 3c mov al,0x3c
2: 48 31 ff xor rdi,rdi
5: 0f 05 syscall
What happens when if you flip these bytes backwards? Quite shockingly, this:
0: 05 0f ff 31 48 add eax,0x4831ff0f
5: 3c b0 cmp al,0xb0
There are several disassemblers that might interpret data differently, so when using nasm, it’s helpful to confirm that these instructions actually assemble the way you expect them to. Lucky for us, this works just fine.
Padding the header (and various other places) with nops is helpful when you are measuring the space available when constructing a binary like this. The code began at offset 0x04, so padding with 5 nops filled the rest of the header.
The next thing to plan out was the jumps.
I figured that jmp instructions could be accounted for in one of two ways:
- A pairing of jmp’s that jmp over each other, or
- jmps that executed as code when percieved backwards.
I wrote a small script to generate all of the possible opcode combinations for short jumps and what they disassemble to when interpreted backwards. Even though it’s only two bytes, there are a lot of incompatible instructions, such as references to EBP and other registers that aren’t easily referenceable in x64.
import sys
import subprocess
# python3 opiter.py opcode
# Will iter through one byte in front of the opcode you put in there.
# It's hella bespoke, feel free to change heh
exp = sys.argv[1]
for i in range(0,255):
opp = format(i, '02x')
inf = '"'+opp+' '+exp+'"'
print(opp+" "+exp+" |",end=" ")
process = subprocess.run(['/usr/bin/rasm2 -a x86 -b 64 -d '+inf],
shell=True, check=True, stdout=subprocess.PIPE,
universal_newlines=True)
output = process.stdout
if 'invalid' in output:
print("--")
else:
print(output,end="")
This is the output from the jmp brute table with invalid opcodes ignored:
OPCODE|INSTRUCTIONS
------+---------------
00 eb | add bl, ch
01 eb | add ebx, ebp
02 eb | add ch, bl
03 eb | add ebp, ebx
04 eb | add al, 0xeb
08 eb | or bl, ch
09 eb | or ebx, ebp
0a eb | or ch, bl
0b eb | or ebp, ebx
0c eb | or al, 0xeb
10 eb | adc bl, ch
11 eb | adc ebx, ebp
12 eb | adc ch, bl
13 eb | adc ebp, ebx
14 eb | adc al, 0xeb
18 eb | sbb bl, ch
19 eb | sbb ebx, ebp
1a eb | sbb ch, bl
1b eb | sbb ebp, ebx
1c eb | sbb al, 0xeb
20 eb | and bl, ch
21 eb | and ebx, ebp
22 eb | and ch, bl
23 eb | and ebp, ebx
24 eb | and al, 0xeb
28 eb | sub bl, ch
29 eb | sub ebx, ebp
2a eb | sub ch, bl
2b eb | sub ebp, ebx
2c eb | sub al, 0xeb
30 eb | xor bl, ch
31 eb | xor ebx, ebp
32 eb | xor ch, bl
33 eb | xor ebp, ebx
34 eb | xor al, 0xeb
38 eb | cmp bl, ch
39 eb | cmp ebx, ebp
3a eb | cmp ch, bl
3b eb | cmp ebp, ebx
3c eb | cmp al, 0xeb
63 eb | movsxd rbp, ebx
6a eb | push 0xffffffffffffffeb
70 eb | jo 0xffffffffffffffed
71 eb | jno 0xffffffffffffffed
72 eb | jb 0xffffffffffffffed
73 eb | jae 0xffffffffffffffed
74 eb | je 0xffffffffffffffed
75 eb | jne 0xffffffffffffffed
76 eb | jbe 0xffffffffffffffed
77 eb | ja 0xffffffffffffffed
78 eb | js 0xffffffffffffffed
79 eb | jns 0xffffffffffffffed
7a eb | jp 0xffffffffffffffed
7b eb | jnp 0xffffffffffffffed
7c eb | jl 0xffffffffffffffed
7d eb | jge 0xffffffffffffffed
7e eb | jle 0xffffffffffffffed
7f eb | jg 0xffffffffffffffed
I needed some code to execute, something very small, yet produce some sort of output to prove that it worked. The code I decided to adapt for this was based on the tiny ELF binary I had explained in a stream on assembly optimization called i2ao, and it was simple enough that I could port for this application. All it really does is print out a string.
Since we have a palindrome string that could be executed as well, it seemed fitting to print that string out, as well as execute it. This would save space on data and make the overall mirrored layout look more even.
Fitting The Pieces Together
Now that all the pieces to build this are in place, we can start to put them together. The full assembly listing is at the bottom of this section if you want to play along at home!
The primary concern is to make sure that the registers we need are cleared prior to making a syscall. In this case, we are making two syscalls: write and exit.
The registers required for a write syscall are: RAX, RDX, RDI, and RSI. Since the first instructions executed add values to RAX, an explicit mov rax, 1 is needed, rather than any clever tricks to populate RAX. If we wanted to use something like xor rax, rax; inc rax, it would add an extra byte. Some other space saving measures are also used in the write syscall code, which you can refer to in the i2ao writeups / video.
The next step is to reference the string within the code that is right after the write syscall. There are a few ways of making references to the current offset, but none of them made much sense other than simply using the offsets in the binary and moving that value into the sil register.
In times like these, it’s useful to build, but not execute, and open your binary in a debugger. Mind you, it does start to get frustrating when your debugger gets confused about what the heck you’re actually doing, but it’s trying it’s best! Sometimes seeking directly to the offset you’re interested in and disassembling there is the only real way to understand what’s happening.
After the write syscall code was sorted, it was time to start mirroring the entire executable section. Since the bounds of the headers have already been established, you can safely do this without messing up your binary too much.
The write syscall code doesn’t really have too many instructions that you can safely execute backwards without entirely rewriting it. So instead of that, another approach is to simply jump over whatever you can’t execute. The alphanumeric machine code bit is executable and won’t interfere with the flow of the program, so a jmp can be placed between that and the junk code.
The size of the write code, along with the short jmp, produces ‘jmp 0x17’, which turns into ’eb 15’ in machine code. This unfortunately doesn’t translate to anything usable backwards. Referring to the jmp table, there is a possibly usable instruction ‘sbb bl, ch’, that can be achieved by padding with 3 nops to bring the opcode to ‘18eb’ when backwards, ’eb18’ forwards.
So what’s the point of this? Mainly to prevent generating even more junk bytes to account for. Another solution would be to just encode a backwards jmp ‘02eb’ after the jmp to the write syscall, which would do a small hop over the jmp that we can’t execute backwards. This way felt more clean in the end.
Now all we have to do is just execute our string, clear RAX, and jump back into the headers. This just adds a small (5 byte) block that we have to account for when we jump out of the headers the first time and into the main code section.
A space saving trick used here was to completely overwrite the p_align section, which saves 16 bytes in total (8 on each side) within the code section.
It was frustrating to have junk code within the space of 0x4-0x10, where the binary begins and ends execution, so a final idea was tried to utilize all the space here.
There is writable space at both 0x3C and 0x44 in the program header. They must be exactly the same or the binary will not execute. Each part has 4 bytes of space to work with, which is perfect to do something simple like a short jmp.
Doing a short jmp from the top of the ELF header to 0x44, produces a some bytes that are usable backwards! This is ’eb34’, which backwards, ‘34eb’ is decoded to ‘xor al, 0xeb’. Since it’s only messing with AL, the lowest byte of RAX, this doesn’t matter because the value is explicitly assigned afterwards. chefs kiss
This is the final code:
BITS 64
org 0x100000000 ; Where to load this into memory
;----------------------+------+-------------+----------+------------------------
; ELF Header struct | OFFS | ELFHDR | PHDR | ASSEMBLY OUTPUT
;----------------------+------+-------------+----------+------------------------
db 0x7F, "ELF" ; 0x00 | e_ident | | 7f 45 4c 46
_start:
add eax,0x4831ff0f ; The reverse of
cmp al,0xb0 ; the exit syscall
nop ; 90
nop ; 90
nop ; 90
jmp hjmp ; eb 34
;----------------------+------+-------------+----------+------------------------
; ELF Header struct ct.| OFFS | ELFHDR | PHDR | ASSEMBLY OUTPUT
;----------------------+------+-------------+----------+------------------------
dw 2 ; 0x10 | e_type | | 02 00
dw 0x3e ; 0x12 | e_machine | | 3e 00
dd 1 ; 0x14 | e_version | | 01 00 00 00
dd _start - $$ ; 0x18 | e_entry | | 04 00 00 00
;----------------------+------+-------------+----------+------------------------
; Program Header Begin | OFFS | ELFHDR | PHDR | ASSEMBLY OUTPUT
;----------------------+------+-------------+----------+------------------------
phdr:
dd 1 ; 0x1C | ... | p_type | 01 00 00 00
dd phdr - $$ ; 0x20 | e_phoff | p_flags | 1c 00 00 00
dd 0 ; 0x24 | ... | p_offset | 00 00 00 00
dd 0 ; 0x28 | e_shoff | ... | 00 00 00 00
dq $$ ; 0x2C | ... | p_vaddr | 00 00 00 00
; 0x30 | e_flags | ... | 01 00 00 00
dw 0x40 ; 0x34 | e_shsize | p_addr | 40 00
dw 0x38 ; 0x36 | e_phentsize | ... | 38 00
dw 1 ; 0x38 | e_phnum | ... | 01 00
dw 2 ; 0x3A | e_shentsize | ... | 02 00
;dq 2 ; 0x3C | e_shnum | p_filesz | 02 00 00 00 00 00 00 00
dw 0x0beb ; eb 0b ; Overwrites e_shnum and p_filesz
dw 0
dd 0
hjmp:
;dq 2 ; 0x44 | | p_memsz | 02 00 00 00 00 00 00 00
jmp sec0 ; eb 0b ; Overwrites p_memsz
dw 0
dd 0
;dq 2 ; 0x4C | | p_align | 02 00 00 00 00 00 00 00
;--- Outer bounds of executable portion
cmp al, 0xeb ; 3c eb ; Overwrites p_align
db 0xc0 ; c0
db 0x31 ; 31
db 0x48 ; 48
sec0:
push rax ; 50
push rbp ; 55
push rax ; 50
push rax ; 50
pop rcx ; 59
push rbx ; 53
push rax ; 50
pop rcx ; 59
push rax ; 50
push rbx ; 53
pop rcx ; 59
push rax ; 50
push rax ; 50
push rbp ; 55
push rax ; 50
jmp sec1 ; eb 18
nop ; 90
nop ; 90
nop ; 90
nop ; 90
nop ; 90
add eax,0x40b6950f ; 05 0f 95 b6 40 ; Third byte is str offset
and dh,ah ; 20 e6
ror DWORD [rax-0x3a],0x89 ; c1 48 c6 89
dd 0x89c7b20f ; 0f b2 c7 89
add BYTE [rax],al ; 00 00
add BYTE [rcx],al ; 00 01
;--- split - the first byte is shared with the mov rax,1
sec1:
mov rax, 1 ; b8 01 00 00 00
mov edi, eax ; 89 c7
mov dl, 15 ; b2 0f
mov esi, eax ; 89 c6
shl rsi, 0x20 ; 48 c1 e6 20
mov sil, 0x95 ; 40 b6 95
syscall ; 0f 05
nop ; 90
nop ; 90
nop ; 90
nop ; 90
nop ; 90
sbb bl, ch ; 18 eb
sec2:
push rax ; 50
push rbp ; 55
push rax ; 50
push rax ; 50
pop rcx ; 59
push rbx ; 53
push rax ; 50
pop rcx ; 59
push rax ; 50
push rbx ; 53
pop rcx ; 59
push rax ; 50
push rax ; 50
push rbp ; 55
push rax ; 50
xor rax, rax ; 48 31 c0
jmp rstart ; eb 3c
;--- Header Mirror
dd 0
dw 0
dw 0xeb0b ; 0x44 | | p_memsz | 02 00 00 00 00 00 00 00
dd 0 ;
dw 0 ;
dw 0xeb0b ; 0x3C | e_shnum | p_filesz | 02 00 00 00 00 00 00 00
db 0 ;
db 2 ; 0x3A | e_shentsize | ... | 02 00
db 0 ;
db 1 ; 0x38 | e_phnum | ... | 01 00
db 0 ;
db 0x38 ; 0x36 | e_phentsize | ... | 38 00
db 0 ;
db 0x40 ; 0x34 | e_shsize | p_addr | 40 00
dw 0 ;
db 0 ;
db 1 ; 0x30 | e_flags | ... | 01 00 00 00
dd 0 ; 0x2C | ... | p_vaddr | 00 00 00 00
dd 0 ; 0x28 | e_shoff | ... | 00 00 00 00
dd 0 ; 0x24 | ... | p_offset | 00 00 00 00
dw 0 ;
db 0 ;
db 0x1c ; 0x20 | e_phoff | p_flags | 1c 00 00 00
dw 0 ;
db 0 ;
db 1 ; 0x1C | ... | p_type | 01 00 00 00
dw 0 ;
db 0 ;
db 4 ; 0x18 | e_entry | | 04 00 00 00
dw 0 ;
db 0 ;
db 1 ; 0x14 | e_version | | 01 00 00 00
db 0 ;
db 0x3e ; 0x12 | e_machine | | 3e 00
db 0 ;
db 2 ; 0x10 | e_type | | 02 00
rstart:
xor al, 0xeb ; 34 EB ; Jmp in reverse
nop ; 90
nop ; 90
nop ; 90
mov al, 0x3c ; b0 3c
xor rdi, rdi ; 48 31 ff
syscall ; 0f 05
db "F"
db "L"
db "E"
db 0x7F ; 0x00 | e_ident | | 7f 45 4c 46
Confirming It Works
This was tested and built on Ubuntu 20.04 with kernel 5.4.0-42-generic.
Here is a small script you can use to build and test the ASM file:
#!/bin/bash
nasm -f bin ns.bggp.asm -o ns.bggp
chmod +x ns.bggp
echo "Executing initial binary..."
./ns.bggp
echo ""
xxd ns.bggp
echo ""
echo "Reversing..."
perl -0777pe '$_=reverse $_' ns.bggp > ns.bggp.R
chmod +x ns.bggp.R
echo "Executing binary in reverse..."
./ns.bggp.R
echo ""
xxd ns.bggp.R
echo ""
echo "Comparing hashes..."
sha1sum ns.bggp
sha1sum ns.bggp.R
Output:
$ ./build.sh
Executing initial binary...
PUPPYSPYPSYPPUP
00000000: 7f45 4c46 050f ff31 483c b090 9090 eb34 .ELF...1H<.....4
00000010: 0200 3e00 0100 0000 0400 0000 0100 0000 ..>.............
00000020: 1c00 0000 0000 0000 0000 0000 0000 0000 ................
00000030: 0100 0000 4000 3800 0100 0200 eb0b 0000 ....@.8.........
00000040: 0000 0000 eb0b 0000 0000 0000 3ceb c031 ............<..1
00000050: 4850 5550 5059 5350 5950 5359 5050 5550 HPUPPYSPYPSYPPUP
00000060: eb18 9090 9090 9005 0f95 b640 20e6 c148 ...........@ ..H
00000070: c689 0fb2 c789 0000 0001 b801 0000 0089 ................
00000080: c7b2 0f89 c648 c1e6 2040 b695 0f05 9090 .....H.. @......
00000090: 9090 9018 eb50 5550 5059 5350 5950 5359 .....PUPPYSPYPSY
000000a0: 5050 5550 4831 c0eb 3c00 0000 0000 000b PPUPH1..<.......
000000b0: eb00 0000 0000 000b eb00 0200 0100 3800 ..............8.
000000c0: 4000 0000 0100 0000 0000 0000 0000 0000 @...............
000000d0: 0000 0000 1c00 0000 0100 0000 0400 0000 ................
000000e0: 0100 3e00 0234 eb90 9090 b03c 4831 ff0f ..>..4.....<H1..
000000f0: 0546 4c45 7f .FLE.
Reversing...
Executing binary in reverse...
PUPPYSPYPSYPPUP
00000000: 7f45 4c46 050f ff31 483c b090 9090 eb34 .ELF...1H<.....4
00000010: 0200 3e00 0100 0000 0400 0000 0100 0000 ..>.............
00000020: 1c00 0000 0000 0000 0000 0000 0000 0000 ................
00000030: 0100 0000 4000 3800 0100 0200 eb0b 0000 ....@.8.........
00000040: 0000 0000 eb0b 0000 0000 0000 3ceb c031 ............<..1
00000050: 4850 5550 5059 5350 5950 5359 5050 5550 HPUPPYSPYPSYPPUP
00000060: eb18 9090 9090 9005 0f95 b640 20e6 c148 ...........@ ..H
00000070: c689 0fb2 c789 0000 0001 b801 0000 0089 ................
00000080: c7b2 0f89 c648 c1e6 2040 b695 0f05 9090 .....H.. @......
00000090: 9090 9018 eb50 5550 5059 5350 5950 5359 .....PUPPYSPYPSY
000000a0: 5050 5550 4831 c0eb 3c00 0000 0000 000b PPUPH1..<.......
000000b0: eb00 0000 0000 000b eb00 0200 0100 3800 ..............8.
000000c0: 4000 0000 0100 0000 0000 0000 0000 0000 @...............
000000d0: 0000 0000 1c00 0000 0100 0000 0400 0000 ................
000000e0: 0100 3e00 0234 eb90 9090 b03c 4831 ff0f ..>..4.....<H1..
000000f0: 0546 4c45 7f .FLE.
Comparing hashes...
c082d226c96b7251649c48526dd9766071fa5e59 ns.bggp
c082d226c96b7251649c48526dd9766071fa5e59 ns.bggp.R
Here is the base64 encoded binary if you don’t want to do all that:
base64 -d <<< \
f0VMRgUP/zFIPLCQkJDrNAIAPgABAAAABAAAAAEAAAAcAAAAAAAAAAAAAAAAAAAAAQAAAEAAOAAB\
AAIA6wsAAAAAAADrCwAAAAAAADzrwDFIUFVQUFlTUFlQU1lQUFVQ6xiQkJCQkAUPlbZAIObBSMaJ\
D7LHiQAAAAG4AQAAAInHsg+JxkjB5iBAtpUPBZCQkJCQGOtQVVBQWVNQWVBTWVBQVVBIMcDrPAAA\
AAAAAAvrAAAAAAAAC+sAAgABADgAQAAAAAEAAAAAAAAAAAAAAAAAAAAcAAAAAQAAAAQAAAABAD4A\
AjTrkJCQsDxIMf8PBUZMRX8= > ns.bggp
Going Further
I could’ve possibly shrunk the write syscall code down even more to try and save 1 byte to produce a short jmp of 0xeb12->0x12eb (adc ch, bl). Since I was coding not just for size, but for % of bytes executed as well, it made more sense just to leave it.
There are a few different approaches to this, and I just wanted to cover a few of them. This write up was just a brain dump of my notes as I worked on this. I will have a more in depth write up on this that I will submit to PoC||GTFO, so keep an eye out for that!
If you have any questions / comments, my dms are open on Twitter @netspooky.
Check out my main site if you want to see more binary golf examples!
Check out binary.golf for all Binary Golf Grand Prix related content!!