x86 Bootloaders


//  

Bootloaders at their core are very simple pieces of code. Their responsibility is to give the BIOS a way to load the operating system you want to load into memory and do all of the proper initialization it needs to actually boot.

This can become a very complicated process, and because BIOS based systems are fairly old, newer technology like UEFI has been created to replace them.

Here’s some very basic examples of bootloader code

Initialization

When the BIOS initializes, it self tests the hardware then loads the first 512 bytes of memory from the media device.

When your computer turns on, the BIOS starts up by checking what hardware is attached to the machine. It detects a disk and loads the first 512 bytes into memory and executes it as a bootloader.

From here, the bootloader is operating in 16 bit real mode, the most primitive operating mode an x86 processor can operate in. During this phase, it is up to the bootloader to start it’s initialization and checks to create an environment that it can load the OS into. At this point though, we can do whatever we want.

In this example, we can write a super basic program that will just serve as a hello world type program as a bootloader.

bits 16                  ; NASM directive for 16 bit mode
org 0x7c00               ; This tells NASM to load the code at 0x7c00
boot:
    mov si,ayy           ; point SI register to hello label's loc
    mov ah,0x0e          ; 0x0e means "Write Character in TTY mode"
loop:
    lodsb                ; Loads bytes pointed at by ds:si into al.
    or al,al             ; Checks if al == 0
    jz halt              ; if (al == 0) jump to halt label
    int 0x10             ; runs BIOS interrupt 0x10 - video services
    jmp loop
halt:
    cli                  ; Clear interrupt flag
    hlt                  ; Halt execution
ayy: db "[^_^] [.~.] ",0 ; Our null terminated string

times 510 - ($-$$) db 0  ; Pad the rest of the bytes with 00s
dw 0xaa55                ; This is the magic byte for the bootloader

As you can see, this is a very basic assembly program (think DOS) that just loads a string into memory and uses interrupts to write it to the screen.

The last line dw 0xaa55 is used as a magic byte for the bootloader, which traditionally acts as a checksum for the BIOS to tell it that it should jump to 0x7C000 and let the bootloader take care of the rest. I have found that this isn’t always the case. In Virtualbox, you can totally make these whatever bytes you want and it will boot just fine. This varies from vendor to vendor.

Speaking of booting as a VM, you can do this by compiling and naming the file something.img.

Here is how you compile:

nasm -f bin bootloadertest.asm -o boot1.bin

If we do a hex dump, we can take a look at what this actually looks like:

00000000: be10 7cb4 0eac 08c0 7404 cd10 ebf7 faf4  ..|.....t.......
00000010: 5b5e 5f5e 5d20 5b2e 7e2e 5d20 0000 0000  [^_^] [.~.] ....
00000020: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000030: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000040: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000050: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000060: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000070: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000080: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000090: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000000a0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000000b0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000000c0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000000d0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000000e0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000000f0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000100: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000110: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000120: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000130: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000140: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000150: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000160: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000170: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000180: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000190: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000001a0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000001b0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000001c0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000001d0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000001e0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000001f0: 0000 0000 0000 0000 0000 0000 0000 55aa  ..............U.

You can disassemble with nasm as well to get a proper look

x@n0:~$ ndisasm -o 0x7c00 bl
00007C00  BE107C            mov si,0x7c10
00007C03  B40E              mov ah,0xe
00007C05  AC                lodsb
00007C06  08C0              or al,al
00007C08  7404              jz 0x7c0e
00007C0A  CD10              int 0x10
00007C0C  EBF7              jmp short 0x7c05
00007C0E  FA                cli
00007C0F  F4                hlt
00007C10  5B                pop bx
00007C11  5E                pop si
00007C12  5F                pop di
00007C13  5E                pop si
00007C14  5D                pop bp
00007C15  205B2E            and [bp+di+0x2e],bl
00007C18  7E2E              jng 0x7c48
00007C1A  5D                pop bp
00007C1B  2000              and [bx+si],al
00007C1D  0000              add [bx+si],al
... truncated bc lots of the same ...
00007DFD  0055AA            add [di-0x56],dl

Now that this is done, it’s time we do something more useful. Let’s move to 32 bit mode, and actually set up where we’d plop more advanced code.

Global Descriptor Table

The Global Descriptor Table is a table of entries that keeps track of memory segments. When we are on this level, we need to establish the boundaries where our code / data / task queue / user / malware will operate.

You load the GDT with the lgdt instruction.

Segment Types

TypeDescription
Null SegmentWe keep a null segment because certain emulators will complain about limit exceptions if you don’t have a null segment. You can store a pointer to the GDT itself in here. The null descriptor is 8 bytes and the pointer is 6 bytes, so it works.
Code SegmentThis describes the code we are loading. Could be a kernel, second stage bootloader, bootkit etc.
Data SegmentThis will contain our writable memory space.

Also you can load a TSS Segment to store the Task State, but it’s not necessary for a super basic example.

GDT Descriptors

The GDTR register is where the GDT descriptor is held. A descriptor is 6 bytes long and looks like this:

BYTESIZEDESC
0-22LIMIT - Size of the table subtracted by 1. This means the GDT has a max size of 65536 bytes or 8192 entries
2-64BASE - The linear address of the GDT table.

GDT Entries

Once we’ve established where we’re going, we need to put some data there!

What kind of data do we actually need?

base is a 32 bit value that describes where the segement begins. It’s split up into a bunch of parts because of the original limitations of the processor

limit is a 20 bit value describing where the segment ends. If the granularity bit in flags is set to 1, then this value is multiplied by 4096 to signify that we are working with pages. The default is 1 byte.

The rest of the data is stored in access and flag sections here, and are given definitions. The layout is whacky as hell, so maybe just look up some GDT diagrams if you don’t quite get it because this might not be a good description.

Table entries are 8 bytes total, and they look like this.

                        +-------+-------------------+
                        | Bits  | Entry Part        |
                        +-------+-------------------+
                        | 00-15 | limit_low 0:15    |
                        | 16-31 | base_low 0:15     |
                        | 32-39 | base_middle 16:23 |
                        | 40-47 | access byte       |>---+
                        | 48-51 | limit_high 16:19  |    |
+----------------------<| 52-55 | flags             |    |
|                       | 56-63 | base_high 24:31   |    |
|                       +-------+-------------------+    |
|    +---------------------------------------------------+
|    |
|    | The flags within the access byte (bits 40-47) are:
|    v 
| +-----+----------------------------------------------------------------------+
| | BIT | DESC                                                                 |
| +-----+----------------------------------------------------------------------+
| | 0   | Accessed bit - Set 0, the CPU sets to 1 if the segment is accessed   |
| | 1   | Read / Write flag - If a code segment, 1 means that read is allowed. |
| |     | If a data segment, 1 means that writing is allowed.                  |
| | 2   | Direction bit. 0 = Segement Grows up 1 = Segment grows down          |
| | 3   | Executable - If 1, code in this segment can be executed. If 0, this  |
| |     | is a data segment                                                    |
| | 4   | Descriptor Type - Should be set if it's a code or data segment. Else |
| |     | it's a system segment (eg TSS)                                       |
| | 5-6 | Privilege level, contains the ring level. 0-3                        |
| | 7   | Present bit. Must but 1 for all valid sectors.                       |
| +-----+----------------------------------------------------------------------+
|
| The flags in the flag nibble (bits 52-55) are:
|
| +-----+----------------------------------------------------------------------+
| | BIT | DESC                                                                 |
| +-----+----------------------------------------------------------------------+
+>| 0   | Leave at zero                                                        |
  | 1   | L bit - Only used in x64. Used to mark a 64 bit data segment         |
  | 2   | Size bit. 0=16 bit,1=32 bit. If L (and 64 bit), this must be 0       |
  | 3   | Granularity. 0=limit is in 1B blocks(byte) or 1=4KiB blocks(page)    |
  +-----+----------------------------------------------------------------------+

Enabling the A20 Line

Here is some example code of doing a bit of what we’ve just described.

The A20 line is the actual pin on the Address Bus that is the 21st bit of memory access. Because this is x86, anything beyond 20 bits of address space was originally disabled by default, due to some systems that couldn’t count that high. So because of this, we have to enable this each time we use our processor, unless we want to be bound to < 16 megabytes of address space.

Fun fact: This used to literally be an option on older keyboards, specifically those with a 8042 PS/2 controller, where you could manually toggle the A20 gate to enable more memory access. I’m not sure if I want to laugh or find some way to directly toggle pins on my i7 with my keyboard.

This method uses the BIOS INT 15h method to enable the A20 line

; Enabling the A20 line, aka address more than 1MB of Memory

bits 16
org 0x7c00

boot:
    mov ax, 0x2401      ; AH=24;AL=01 is the enable A20 gate INT 15 function
    int 0x15            ; You can also use AH24;AL=3 to query support and
                        ; AH=24;AL=02 to get the status. AL=00 disables ;)
    mov ax, 0x3         ; Use INT 10h to set video mode to VGA Text. There's
    int 0x10            ; quite a number of other screen modes to play with
    cli                 ; Clear the interrupt flag
    lgdt [gdt_pointer]  ; Load the gdt table
    mov eax, cr0        ; get cr0 into eax
    or eax, 0x1         ; Set the protected mode bit on cr0 register
    mov cr0, eax        ; Put this value back in cr0
    jmp CODE_SEG:boot2  ; long jump to code segment
;-- This is the beginning of the GDT struct
gdt_start:
    dq 0x0              ; This is our null segment, just 8 bytes of 00
gdt_code:
    dw 0xFFFF
    dw 0x0
    db 0x0
    db 10011010b
    db 11001111b
    db 0x0
gdt_data:
    dw 0xFFFF
    dw 0x0
    db 0x0
    db 10010010b
    db 11001111b
    db 0x0
gdt_end:

; need a gdt pointer structure to load this.
; A 16 bit field containing the GDT size followed by a 32 bit pointer 
; to the GDT

gdt_pointer:
    dw gdt_end - gdt_start
    dd gdt_start

CODE_SEG equ gdt_code - gdt_start
DATA_SEG equ gdt_data - gdt_start

bits 32
boot2:
    mov ax, DATA_SEG
    mov ds, ax
    mov es, ax
    mov fs, ax
    mov gs, ax
    mov ss, ax
    mov esi,hello
    mov ebx,0xb8000  ; memory location of text buffer
.loop:
    lodsb
    or al,al
    jz halt
    or eax,0x0100       ; 0x0200 = green. Use 16 bit color here
    mov word [ebx], ax
    add ebx,2           ; changing this to add ebx,3 makes it get all weird.
    jmp .loop
halt:
    cli
    hlt
hello: db "Finally made it to 32 bit mode!!",0

times 510 - ($-$$) db 0
dw 0xaa55

If you are wondering where a lot of these seemingly arbitrary values are coming from, I would highly recommend checking out the interrupt list, as well as numerous aging websites that desperately need you to archive them. Seriously, there is a ton of incredible info that is increasingly hard to find on very low level computing.

Grabbing your own MBR

If you are on linux, you can do this to get the MBR of your machine

First, look at your mounted disks, and find which one you boot from (example /dev/sda), then

sudo dd if=/dev/sda of=mbr.bin bs=512 count=1

Tags
X86 · Firmware · Nasm