Defenses Against Low-Level Attacks- Introduction
In the previous module, we looked at several low-level vulnerabilities found in CE and C++ programs. And we looked at how these vulnerabilities can be exploited using techniques like stack smashing and format string attacks. These attacks had two elements in common. First, the attacker was able to determine or influence some input data that was used by the program. For example, the attacker might have input the data as part of a web request to, to a remote server. Second, the attacker was able to carefully crack that data to induce the program to access memory illegally, contrary to the intentions to the programmer. For example, the input might have been too big and thus would over run a buffer it was copied into. Or the input might have been ill-formed. Indirectly causing the program to read or write arbitrary locations on the stack or heap. In this module, we will look at how to defend against low level memory based attacks. First, we will see how low level attacks are violations of a property called memory safety. We will define the property carefully and show how various attacks violate it. The C and C++ programming languages are not memory safe by default. But we will briefly consider means of making them so. Next, we will look at some automatic means to defend against attacks that are memory safety violations. These defenses do not require changes to the source program. Instead, they affect how a program is compiled or how it is run by the operating system and the hardware. We will look at two defenses in particular, which are now quite common. Stack canaries and address space layout randomization, or ASLR. Unfortunately, these two defenses can still be overcome. In particular, they can be bypassed by a technique called return-oriented programming, or ROP. We will look at how ROP attacks work. And then we will look at a defense called Control Flow Integrity, or CFI, that can defeat ROP attacks. CFI is still an experimental technique, but it shows promise and so it is worth considering. None of these defenses is perfect, not even CFI, at defeating attacks on memory safety. As such, we conclude by considering a variety of rules for writing more secure C programs. Such rules are one example of more general secure development principles. We will consider these principles and elements of a secure development process in a future module of this course.
Memory Safety
Memory safety is a property of a program execution. A program execution is memory safe if firstly, it only ever constructs valid pointers. Valid pointers are those returned by malloc, the standard heap allocator, or those created by legal language operations like ampersand or the address of operator, as well as point arithmetic. The second condition for memory safety is that a program does not use a pointer p to access memory that does not belong to p. This notion of belonging is something that we'll consider shortly. The first condition roughly corresponds to a property we can call temporal safety. The second is one we call spatial safety. We'll explain these terms now, considering spatial safety first. But before we do, one aside. Sometimes, we will speak about memory safe programs and memory safe languages. A program is memory safe if all of its possible executions, that is, executions over all possible inputs, are memory safe. A language is memory safe if all possible programs that can be written in that language are memory safe. Which sense we mean when using the term should be clear from context. Spacial safety means that accesses by a pointer should only be to memory that the pointer owns, which is to say that belongs to it. We conceptualize this idea by viewing a pointer as a triple, p, b, e. Here, p is the address of the memory that the pointer points to, b is the base of the memory region that the pointer owns, and e is the bounds or extent of that memory region. That is, the address of the end of that region that b starts. Dereferencing a pointer is allowed, which is to say is spatially safe, when p falls between b and e. The bounds on the memory region are determined by the argument to malloc or by the size of the ar, of the object whose address is taken. Pointer arithmetic only affects p, incrementing or decrementing that element, but not b and e. The bounds stay the same. This prevents one from dereferencing a pointer that by pointer arithmetic, for example, goes outside its bounds. Now let's look at some examples. Here we've declared a variable x. We assume that integers are size 4 on this platform, that is, 32 bits or 4 bytes. All right, next we take the address of x and store it into the variable y. Now, according to our definition, we can think of y, the pointer, as a triple, where p is the actual pointer, which is the address of x, b is the base of the memory region into which that points, which is also the address of x, and the extent is x plus 4. That is, x plus the 4 bytes that the variable consumes. Next, we store to z y plus 1. Now, pointer arithmetic increments the address by the number given, 1 times the size of the memory. Hence our 4 bytes and so, p is incremented by 4, as is shown here. As we mentioned already, the base and extent are unchanged. Finally, we dereference y to assign the value 3. Now, y is perfectly legal to deference because the current pointer p is within the base, which is also the same address as p, and the bounds minus the size. The condition that needs to be satisfied is here, shown in the comment. On the other hand, dereferencing z is a violation of memory safety and that's because p is outside the bounds. That is, while it is greater than the lower bound b, it is now not less than or equal to the upper bound, x plus 4 minus 4. And this makes sense it's gone off the end of the memory region. Okay, let's consider another example. We have defined a struct foo that has two fields, a buffer, which is a character array of size 4, immediately followed by an integer. Okay, we'll first declare a value of struct foo and initialize it to cat and 5. Sticking the cat string there will serve to initialize the four characters to C-A-T and then the null terminator. Next, we'll take the address of that first field and store it into the character pointer y. Now in this case, p and b are again initialized to the start of that buffer. And because the buffer is 4 bytes, each character is 1 byte and the entire buffer is 4 bytes, then the extent is the starting address plus 4. Okay, next we can store into y at the third value, the third index, the character s. And this is going to be fine because the pointer that we've constructed basically adds 3 to y. And so, p is f.buf plus 3 and it's less than the bounds, so everything is fine. On the other hand, if we did y of 4, then this is no good because now we've incremented past the memory region associated with buf. And in effect, this is going to cause us to modify the field x, assuming there's no padding or, or something else the compiler has put in the struct. But for the purposes of our model, for memory safety, this is a memory safety violation whatever the compiler might end up doing. Here's a more complicated example, fully visualized to show you the relationships of all these pointers. So here we have a struct foo that has two fields, x and y, and we have a character pointer pc for pointer to character. We allocate one of these struct foos by calling malloc and we initialize its values. So we initialize x to 5, y to 256, and we initialize the pointer to the character to the constant string before. Now, if we look at the picture on the right, we can see that pf, it's a triple. The p part of it points to this buffer that's shown here in the middle that was returned by malloc, and you can see the first, the int x, which is 5, the int y, which is 256, and then we see another triple that represents the character pointer pc. The base and extent contain the string before with the null terminator, so that's why we see those pointers there. But the second to last line of the program, we've incremented pc by 3 and that's caused the p pointer to move to just before o, to point at the o character inside the string. We've also modified px. Sorry, we've taken the address px of the pf arrow x field and we've shown that there at the top, the upper right, to show that it's pointing to also the first field of that struct. Notice here though that the base and extent are different than they were for the struct pointer itself. And that's because by taking the address of x, x is an integer and that's of size 4. And so, the extent should only consider the memory for that field and not the memory of the entire struct for purposes of memory safety. A buffer overflow is a violation of spatial safety according to our definition. Let's have a look at this copy function. It takes two pointers, src and dst and the purported length, len, of those two pointers. And then byte by byte, it will copy from src to dst. Now, a memory safety violation will occur, i.e. a buffer overflow, if src is read past its legal extent. This could happen if len is greater than the extent of the pointer, pointed to by, of the region pointed to by src. At the same time, a buffer overrun could happen if dst is overrun and again, this could happen by writing to dst past its length because the len field is again too long for the dst argument. So, overrunning the bounds of the source or destination implies either src or dst is illegal or alternatively, that len is illegal, that is, it is too large. Format string attacks also exploit a violation of memory safety. We can think of the buf format string as defining the expected extent of the stack. Here in this example, there are three format specifiers for integers and this implies that we expect printf to be called with three pointer arguments. Sorry, three int arguments, x, y and z, say. Now, because in this particular case, we did not call them with those things, then when printf goes to access beyond the bounds of its stack frame, it would have violated memory safety because we can think of that stack frame as defining the legal extent of memory that that function printf can use. But that this format string has induced printf to read beyond the end of the allocated stack. So this is essentially a kind of buffer overflow. Now let's consider temporal safety. A violation of temporal safety occurs when a programs accesses memory that is undefined. Undefined memory is memory that has never been allocated or was previously allocated but is now freed. We also consider uninitialized memory to be undefined. While spatial safety ensures that we will access memory regions that we're legally allocated, temporal safety ensures that they are still allocated and initialized when the program accesses them. Thus, freeing a pointer with the free library call and then dereferencing it is a violation of temporal safety. Or returning the address of a callee's local variable to a caller, which then dereferences it, is also a violation of temporal safety. Our definition assumes that the memory is never reused and is conceptually infinite. As such, according to the model, one memory, once memory is freed, it becomes permanently inaccessible. Dereferencing pointers to freed memory is a temporal safety violation. Now, back to reality. Our definition thus rules out allocating memory to a pointer p, freeing that memory, reallocating it to another pointer q, but then dereferencing via the pointer p. Now let's look at some examples. Now, temporal safety implies that we will not have any dangling pointers because accessing a freed pointer would violate temporal safety. To see what I mean, let's look at this little program. First, we call malloc to allocate an integer and store it in the pointer p. We assign the contents of p to 5 and then we free p. At this point, it's no, p is no longer a legal pointer and so we should not be allowed to dereference it, but here we are attempting to by passing its contents to printf, a violation. Inside of printf, it will dereference p, but the memory that p pointed to is now no longer legal and therefore a temporal safety violation. Accessing uninitialized pointers is similarly problematic. Here we have an uninitialized pointer p, which we try to assign to, and violation of temporal safety. Integer overflows are allowed as long as they are not used to manufacture an illegal pointer. So, let's look at a function that illustrates how such an illegal pointer might be constructed from an integer overflow. So we start off with a short integer variable. These are assumed to be 2 bytes and so, the maximum unsigned value is 65,535. As a result, incrementing x by 1 overflows and so, now x becomes 0. When we print x, it will print 0, perfectly memory safe even though an overflow has taken place. On the other hand, if we then tried to malloc using x as the length of the buffer to malloc, we would be allocating a size-0 buffer. Now, some memory some malloc implementations may check to see whether the argument is non-negative or positive only. But actually, few do and they will happily return back a size-0 buffer. As a result, when we assign to the first index of p some value like a, we have now violated memory safety. So, integer overflows happen often enough to enable buffer overflows, a violation of memory safety, often enough that we often think of them as their own independent source of bug, but what I'm attempting to show here is that intinger, integer overflows in and of themselves are not necessarily vulnerabilities. Instead, they are commonly used to exploit other vulnerabilities in the program. All of this discussion ultimately is about C and C++. Why? Well, because essentially every other high-level language you've heard of is memory safe. Here, I'm talking about object-oriented languages like Java, Python, C#, and Ruby, or functional languages like Scheme, Haskell, and O Caml, and even relatively new languages like Scala, Go, and Rust. In short, you get a big boost of security simply by using a modern language instead of using C or C++. Now that said, there is no doubt that C is here to stay. It is a big part of legacy systems and most programmers know how to use it. Therefore, it's no surprise that C and C++ are among the top five most popular programming languages even today, according to a recent study done by the IEEE. Can we have our cake and eat it too? That is, can we compile C so that it is memory safe? Research over the past 20 years has aimed to do just that. One way to do this is directly implement the memory safety model we just introduced. In particular, the compiler can introduce a fat pointer representation similar to our p, b, e triples. Then, it can enforce memory safety at runtime, just like Java, C# and other languages do. Unfortunately, most memory safe C implementations are too slow for production use. In some cases, they impose orders of magnitude overheads to normal execution. More performant approaches reject uncommon but per, popular C programming idioms. Fortunately, the situation is improving. CCured, for example, a system developed in 2004, could compile most C programs with few changes, but it would introduce high overhead in that case. This overhead could be reduced by making many refactorings to the program that ultimately the compiler was happy with. Softbound, a system developed starting in 2010, was able to do much better, requiring essentially no changes and inducing far less overhead than CCured in that case. On the horizon, Intel's MPX hardware extensions will permit implementing Softbound-like functionality, but in hardware. The result will be much better performance, perhaps to the point that memory safe C will finally become a reality. And this would be good for security. Imagine, no more buffer overflows, no more dangling pointer dereferences, no more format string attacks, an end to the eternal war in memory.
Type Safety
As we mentioned earlier, modern languages are not just memory safe, they are type safe. What do we mean when we say that? In a type safe language, objects are ascribed a type. Like int, pointer to int, pointer to function, and so on. Type safety ensures that operations on the object are always compatible with its type. That is, they're not undefined. An example of undefined behavior is what happens when you're overrun a buffer in C. In this case, the behavior is undefined. The C standard would allow literally anything to happen. Here's a program that shows an example of a type safety violation that is not also a memory safety violation. It takes an integer pointer and casts it to a function pointer, which it then calls. Doing so is a legal memory access. But it uses that memory at the wrong type. No type safety. Type safety applies to so-called dynamically typed languages too. These are languages like Ruby or Python. For these languages, there is essentially one type called dynamic that all objects have. An operation on an object like a call or an index necessitates a coercion to see whether the object allows that operation. If not, it throws an exception, which while perhaps undesirable, is not undefined. Types are often used to enforce invariants about a program. Compatibility with operations like function calling and array indexing is one kind of invariant, but there are others. One common use of types is to enforce data abstraction. A module enjoys this property if it is able to keep its representation hidden from the rest of the program. As such, we should be able to change the internal representation of a module without the module's users knowing, as long as the semantics visible to those users is the same. Some languages have a special type systems specifically developed for enforcing security properties. A type correct program in such a language is sure to not violate that property. One example of such as language is JIF, which stands for Java with Information Flow. In JIF, types are augmented with security labels that describe policies protecting data. In this example, the label on the type of variable x says that Alice owns the data and Alice allows Bob to read it. The label on the type of y says that Alice and Chuck own the data, and Alice, again, allows Bob to read it. As such, the first assignment is legal, since y's policy gives more access than x's policy. Essentially, it is safe to reduce the level of access, which will happen when assigning to a variable with a stronger policy. On the other hand, the assignment of x to y is illegal because doing so would expand the allowed accesses of the data stored in x. The research community has developed many other interesting security type systems too. Now, if type safety is so great, why do people keep on using C and C++? Well, one reason is that C and C++ are often chosen for performance reasons, and type safe languages tend to have worse performance. This is because type safety is often assured by mechanisms like garbage collection, bounds checks, and abstract representations. These mechanisms make programming easier and programs easier to reason about, but they add overhead both in memory and in time. That said, a new generation of language is emerging that provide type safety while aiming for good performance. For example, Mozilla's Rust programming language permits a programmer to avoid the use of garbage collection in certain circumstances. And in any case, frankly, few programs require the extra performance that C and C++ provide. Programmer productivity due to programming convenience is often more important. And it certainly helps that type safe languages are more secure than C and C++. We can hope that that one day, the memory errors of C will be behind us and perhaps type safe languages will be the reason.
Avoiding Exploitation
. For many reasons, programmers will continue to write and run memory unsafe C programs. In such a world, we need to defend against attacks that exploit vulnerabilities in these programs. There are two basic defenses. First, we can try to make vulnerabilities harder to exploit. The vulnerabilities are still in the programs, but the adversary will have a harder time using them to, say, perform remote code injection. This defense focuses on making one or more of the steps involved in an exploitation harder or, or even impossible. The second approach is to avoid vulnerabilities in the first place. This defense involves employing coding practices that are less likely to introduce bugs. It might also involve the use of advanced code review and testing techniques to reveal bugs prior to deployment. These two approaches, mitigating exploitations and preventing bugs are complementary. And in fact, both are used in modern software development processes. Alright, so let's consider how to avoid exploitation. Recall the steps of a stack smashing attack. We need to make it so that one of these steps is hard or is impossible to carry out. The first was finding a way to put attacker code into memory. And recall that we worried about zeroes and other control characters in the code, which would thwart strcpy or scanf during the insertion phase. Then we needed a way to get EIP, the program counter, to point to and run the attacker code. And the way we did that, was to overwrite the return address. We needed to know its relative location on the stack so we could get it to point instead, to the buffer that we inserted when smashing the stack. So the question is, how can we make each step more difficult to carry out. In the best case we can complicate exploitation by changing the libraries, the compiler, the operating system. That way, we can leave the application code alone, which is hard to change correctly. And the fix instead is in the architectural design, which we can do once and apply to all programs. The first technique we`ll look at, is Detecting overflows using stack canaries. And the analogy here is from 19th century coal mine integrity. Back in the 19th century, coal miners were concerned that mines might contain noxious gas. In order to tell that, that might be the case, they would bring along with them a canary. If at some point, the canary keeled over and died, the coal miners would be very concerned that there was gas around them and they would abort and leave the mine. So, we can do the same thing to check whether or not the stack has been smashed. The way that we can do it, is that instead of putting all the local variables next to the safe frame pointer, the same, the saved program counter that is the return address and so on. We can make space for what's called a stack canary, which is some number that we store there. Then, if the attacker overruns a buffer, the attacker will also overrun the canary. Now the code will have been changed. Automatically by say the compiler or the linker, to check upon returning whether or not the stack canary is as it was expected to be. In this case because we overrode it, it's different, and so we'll abort. Now the question is, what value should the canary have? Obviously we want to choose one that it's hard for the attacker to guess or to use. So there's a couple of choices. One is called a Terminator canary. This attempts to stop attacks that are based on library routines like scanf or strcopy. And it uses things like carriage return, line feed, null and so on, which those routines can't deal with and so therefore wouldn't be able to deposit there. An alternative and the most common approach is to use a Random canary. And the idea here is that when a process starts up it will choose a random value, store it somewhere in memory and write protect it so that the program won't subsequently change it. And then, it will use that value to write it as the canary. And it will then use the stored value somewhere in memory to check that the canary is unchanged. A modification a, a variation of this approach is to XOR canaries against some control information that's particular to say, a stack frame. So for example you could use a return address as the XOR mask against which to XOR the canary. Okay, so this defense will make it so that we can't smash the stack and overrun it with with new attacker code. Another thing we could try to stop is getting EIP to point to and run the code. And in fact, one way we can do that directly, whether we use stack canaries or not, is to make it so that the stack and heap are non-executable. That is, if the program runs normally, then the text segment, the code, is immutable, and every other portion of memory in the program is non-executable. So if you loaded code in there into a malloc buffer or onto the stack, the architecture would prevent you from running it because, it was not designated as a code region and therefore not allowed to be run. Unfortunately, this approach can be bypassed too using an attack called return-to-libc. So, here's the set up of the stack with our normal stack smashing attack when we have the nop sled and the guess of the location and the malicious code. So we don't have to insert malicious code, or the nop sled, or the guess. Instead what we do, is we overwrite the return address to point to some location already in memory, that has the code that we want. In addition, we overwrite the next thing to be the argument to that memory. For example here, we pointed the return pointer to exec to create a new process, and we pointed the next thing on the stack to /bin/sh the constant string. And this way when we return from the overrun function, we'll turn our current process into a shell. One nice thing about this approach is we don't need to know the absolute return address location, we simply need to know that we've overrun it with pointers to exec and bin/sh. So, the way that we can defeat this attack is we can use something called Address-space Layout Randomization. And the idea is to randomly place the standard libraries and other elements of memory of the system like the stack in random locations, and therefore making them harder to guess. So the attacker can't build a one size fits all exploit that knows exactly where exec is for every program that's run. And a similar sort of thing makes it hard to find the return pointer. That is, the stack has been randomized and so it's location is not the same every time. So, if we return to the example, here we are again, now we don't know where exec and bin/sh are because the location of the standard libraries has been randomized. ASLR is available today on most modern operating systems. It was first introduced for Linux in 2004 and adoption came slowly thereafter for other systems. For example, OpenBSD supported it in 2006, Windows in 2007. MAC OS introduced it in the same year for system libraries, and for normal programs in 2011. Handheld device operating systems also use ASLR. Android had full support as of 2012. And IOS for iPhones had it in 2011. Now, while ASLR makes exploitation more difficult, it is not perfect. First, keep in mind that it only randomizes the offset of memory areas, it does not randomize their contents. Second, ASLR implementations apply only to standard libraries, not program code. This is because program code must be compiled to be position independent if it is to be located in the address space. Some programs are not compiled this way and so their application code is not subject to ASLR relocation. Shared libraries are normally compiled this way, application code, not always. Third, ASLR needs sufficient randomness to avoid brute force attacks. A research paper by Hovav Shacham demonstrated that Linux ASLR support on 32 bit operating systems was susceptible to brute force attacks. In particular, he crafted an attack that attempted to guess the ASLR offset while attacking a running Apache HTTPD instance. He was able to perform this attack successfully and taking only 216 seconds on average. Fortunately, modern 64 bit operating systems do provide a lot more entropy, and would make Shacham's brute force attack infeasible.
Return Oriented Programming - ROP
The sequence of defenses we have considered so far track a game of cat and mouse. First, to prevent stack smashing attacks, which involve injecting code into a stack allocated buffer, defenders made the stack nonexecutable. But then, attackers figured out that they could effect attacks without injecting new code just by jumping to libc. And in particular, using the existing system command. To prevent this, defenders attempted to hide the address of libc functions as well as the return address. By using Address Space Layout Randomization, or ASLR. However, attackers can effectively search for the hidden address on 32-bit systems even by using brute force techniques. Moreover, they can affect information leaks to send back the hidden address. One way to perform such a leak is to use a format string attack to reveal the contents of the EBP register pushed onto the stack. This register points to the stack and can thus reveal its basic location. Similar leaks can be used to recover heap addresses. And mechanisms aside from format string attacks can be used, too. For example, read-based buffer overflows as in the Heartbleed bug. Now, one idea to mitigate such problems is to simply not include libc code that the application does not need. For example, if the system library call is not needed, don't include it. Unfortunately, even this approach or the absence of information leaks is unlikely to work with the advent of what is now called return-oriented programming, or ROP. So return-oriented programming was invented by Hovav Shacham in 2007 in an academic paper. And the idea is that rather than use a single, say, libc function to run your shellcode, you can string together pieces of existing code, which he called gadgets to do it instead. And the challenge is to find the gadgets that you need, and you need a way to string them together. So here's how you do it. First, gadgets are instruction groups that end with a return instruction. And the stack serves as the code. In particular, the esp, or stack pointer, acts as a kind of program counter. And gadgets are invoked one after the other by a return instruction. And that's why each gadget ends with a return instruction. It basically is saying, please send me to the next gadget. Gadgets will get their arguments via pop, and so on, which will also be stored on the stack between the addresses of the gadgets themselves. So let's look at a simple example. In the upper right, we see a normal code sequence. The instruction mov 5 into the edx register. And now, we'll construct a gadget to perform the equivalent of this instruction's behavior. Suppose in the text segment in my program, I happen to have the code sequence pop edx, followed by return at address 17f. Maybe this is the final bit of an existing piece of code. Could be the middle of a function. Could be at the end of a function. Could even be starting from the middle of an instruction that is interpreted differently in order for it to become a pop, and then a return. Doesn't matter. Okay. So I can set up the stack to be my sequence of instructions. Where the stack pointer is pointing at the top, acting as a kind of program counter. And I'll assume that the instruction pointer is about to execute a return. So this is what normally would happen in stack smashing. We would hijack the return address. We would overrun a buffer, so when the function returned, the return address that we modified would redirect the program to do something else. So let's imagine we are in that situation now and the stack pointer is pointing to what, maybe, should have been the return address. But it's now instead a pointer to our gadget at 17f. In addition on the stack, we also have the number 5, and we have a pointer to the next gadget after we execute this one. But we won't worry about what that gadget is. Okay, so the return is executed. That's going to cause us to pop off 17f, and jump to it. Making the instruction pointer point to that location. Now, when we execute that instruction, we will pop 5 off the stack into edx. At this point, the instruction pointer points to return. Return will pop off the top of the stack, which we have arranged, to point to our next gadget. This gives you an idea of how you can string different gadgets together. So let's illustrate that. Here's a code sequence of three instructions. We'll just look at how we execute these instructions normally. And then we will see the equivalent done using return-oriented programming. So here, we have the instruction pointer. It's going to load from the current stack pointer. So it will load 5 into eax. The next thing that we'll do is we'll move esp plus 8, which we, will be the value 404 into ebx. And then finally, we will store the contents of eax, 5, into the memory pointed to by the contents of ebx. That is the location 404, which we've also depicted in memory in the lower left. Now, suppose these code operations are part of our nefarious plan to take over the program. But we can't inject the code that I've shown here directly due to, say, data execution prevention. Instead, we need to string it together using gadgets already present in the program. So here are some gadgets we could use instead. Suppose that it's 17f, we have a gadget like the one we saw before. Here's, we're popping into eax. Then we have another gadget that pops into ebx. Finally, we have a last gadget at 21a that moves eax into the address stored at ebx. And we've also arranged the stack so the top of the stack is the argument that we're going to pop. Followed by the pointer to the next gadget, 20d, and its argument 404. And then finally, a pointer to the next gadget, which is 21a. And so you can see, each one of these gadgets is sort of corresponding to one of the instructions that we looked at a moment ago. Okay, so let's start things moving. We'll execute the pop instruction, which pulls out the contents 5 and sticks them into eax. And then we return, which will pop 20d, so that the instruction pointer goes to 20d. And now, we will pop value 404 into ebx. And we will return, going to the next gadget which is at 21a. Now, we can execute that, which will load the contents of eax, which are 5, into the memory pointed to by ebx, which is 404. So we can think of return-oriented programming as kind of like a ransom note. But instead of cutting out letters from magazines, you're cutting out instructions from the text segment. Now, we've shown how to string gadgets together, but we haven't said, well how do I know where the gadgets are in the first place. I need to find them to construct an exploit. What we can do is automate a search of the target binary for gadgets in advance of constructing the exploit. In particular, we can disassemble the binary and hunt around for return instructions. Once we find a return instruction, we can move one instruction back to find what precedes that. And eventually, create a set of all of the possible gadgets in the program. Now, you might rightfully be wondering are there sufficient gadgets that we can find using this method to do anything interesting. And the interesting thing is, indeed, there are in fact enough gadgets to construct Turing complete programs. That is, you can essentially do what you like in most interesting programs. And this is especially true on the x86's dense instruction set. Because many gadgets can be found sort of within particular instructions that aren't on their designed instruction boundary but somehow are in the middle of them interpreting their bytes differently. And in fact Schwartz et al, in their USENIX security paper have automated the gadget shellcode creation. And, but on the other hand, they do not need Turing completeness to do this. Now, in fact, it gets even worse than the way I was describing it. You can imagine that by randomizing the location of the code using something like ASLR is going to make it hard for you to to do return-orientated programming. Many of the most recent published attacks are for 32-bit versions of executables. So how can we defeat ASLR to know where the gadgets will be for any particular program executable? And one answer is to use a technique called Blind ROP that was introduced this year at the IEEE security and privacy conference. And this year being 2014. So the idea here is that if a server restarts on a crash but does not re-randomize, then we can read the stack to leak stack canaries and a return address. We can find gadgets at run-time that effect a call to write. And then we can dump the binary to find gadgets for the shellcode using that write system call. So in other words, we take a few steps prior to the attack that I, the approach that I was suggesting before of knowing where the gadgets are in constructing your shell code, to leak the binary. So that we can then perform that step, string together the gadgets we need, inject them into the program, and perform the exploit. Blind ROP serves as the current challenge the defenders must overcome to defeat memory-based attacks. Blind ROP completely automatically managed to defeat stack canaries and even 64-bit ASLR by automatically constructing ROP style programs. Where will it end? The safest way to end the cat and mouse battle is to ensure memory safety. And the simplest way to do that is to use a memory safe programming language. However, there is yet one more automated defense that could work, which we will now consider. It is called control flow integrity. It is still experimental but it holds significant promise and may be deployed in the near future.
Control Flow Integrity
As we saw with the success of Blind ROP, state of the art defenses complicate attacks. But in some cases, those attacks are still feasible. Stepping back, consider what we really want to do is to prevent malicious behavior. In particular, if there was an efficient method to check that a program is doing what we expect it to be doing, then we could use that method to identify when a program has been compromised and has deviated from our expectations. Such deviations could be malicious, and so the program could be halted when a deviation is observed. Now for this idea to work, we must define what we mean by should be doing. That is we have to define the expected program behavior. Next, we need a way of efficiently checking that a program is acting according to our expectations. Then finally, the detector that does the checking needs to be protected from compromise, so that its determinations are trustworthy. So how does control-flow integrity implement behavior-based detection? First, we have to define what we mean by expected behavior. For CFI, the expected behavior is defined by the control flow graph or CFG. How does CFI detect deviations from this expectation? It does it by using an in-line reference monitor or IRM. An in-line reference monitor is a rewriting of the program, where the instructions that are inserted check whether or not the CFI property is maintained. To avoid compromise of the detector, we rely on sufficient randomness for the secrets that CFI uses and on the immutability of the code. Is CFI efficient? Well, classic CFI imposes a 16% overhead on average, and 45% in the worst case. It works on arbitrary executables, but it's not modular. So it can't handle dynamically linked libraries very well, which is a problem for modern programs which use such libraries quite often. Moreover, 45% in the worst case is getting to be a bit high in terms of overhead. And may partially explain why CFI has not really been deployed so much to date. Now on the other hand, research has continued since the original paper in 2005. And a recent approach called modular CFI solves the problem of modularity. It does support dynamically linked libraries, and it's far more efficient. It imposes only a 5% overhead on average, and 12% in the worst case. The main drawback here is that it works on C programs only, as part of the LLVM compiler. More importantly, we should be concerned about whether CFI is actually secure. One way we can measure this is to look at how many ROP gadgets are eliminated by the use of CFI. How might they be eliminated? Well, their use would be considered non-compliant according to the program's CFG. And what we see is that for MCFI in particular, 96% or so of ROP gadgets are prevented. Another measure called average indirect-target reduction, AIR, shows that MCFI works really well in ruling out the target. The possible targets of indirect jumps. In particular, rules out 99% or nearly all of them. So we're looking at pretty good security here. Okay, let's look at our definition of expected behavior which we said was the control flow graph or CFG. To introduce this idea, I'm going to first introduce an idea called the call graph that the CFG can be derived from. So here, we have a program sort2, which takes two arrays, a and b, and the length. And it sorts one in ascending order, and the other in descending order. At the bottom, we see the call graph of the program. And it basically shows which functions call which other functions. So sort2 calls sort, and sort will call lt and gt, because we assume that sort is going to take those function pointers lt and gt in its third arguments. And invoke them when comparing the various elements of the two arrays that it's passed. Now, the control flow graph is defined by distinguishing calls and returns in the program. So here, we can see that it looks quite similar to what we saw before. But now, we've broken up each function into basic blocks, where a basic block is always going to end with a jump, or a return, or a call. Some kind of control transfer, rather than allowing the program to continue from one instruction to the next. So here, we can see that the sort2 function calls sort. So we do a break there at that first call. And then when it's return 2, it's going to call the second sort, which is then going to be return 2 again, until it's completed. So we can follow the control flow through the program as follows, sort2 calls sort, which then will call lt, and return back to sort, which will return back to sort2. Then sort2 will call sort again, which this time will gt, which is then returned from, and then finally, return back to sort2. Now, we want to check compliance with the CFG. That is, we want to make sure that all control flow transfers, follow the CFG. To do this, we need to compute the CFG in advance. We can do this during compilation or we can do it by analyzing the binary. So we saw that MCFI does it during compilation, whereas classic CFI will just analyze the binary. To monitor the control flow, we're going to use this inline reference monitor to make sure that each one of these jumps, these control flow transfers from the CFG, that only legal paths are followed. An important observation for reducing performance is that direct calls do not need to be monitored. And why is this? Well, we're assuming the code is immutable. So if I directly call a function f, then the consent, the constant target address is embedded inside of the text of the program and it cannot be changed. So we're sure that by definition, it adheres to the control flow graph. As such, we're only concerned about monitoring indirect calls, which are jumps, calls, returns, via registers. So looking back at our example, we can see that the direct calls to sort do not need to be monitored. Because they're just in the control, in the program text. Whereas the indirect calls to lt and gt, and all of the returns, do need to be monitored because they're through dynamic data. The returns are going to pop values off of the stack which is dynamically alterable. And the jumps, the calls of lt and gt, are through function pointers which are again pushed onto the stack. Okay, so we're going to implement an in-line monitor as a program transformation. And the way it will work is we can insert a label just before the target address of an indirect transfer. We'll insert code to check the label of the target at each indirect transfer. And if the label is not the one that we expect, then we'll abort. How do we know what we expect? From the analysis of the control flow graph. So the simplest possible labeling is to just pick a random value, label L. And stick it at all possible targets of indirect transfers in the entire program. That is all, the beginning of all functions that could be indirectly jumped to. Say, through function pointers, and the location of all just after all calls, to which something could return. So that's what we've shown here. That basically every, the edge of the end of every arrow is the same label L. And every time we go to return or to call via the function pointer, we're going to check, is the target a legal label L? So what this will do then is it will prevent overriding a return address to say go to the system command in the library somewhere. Why is that going to work? Well, the system command will not have a label preceding it, the label L. And so therefore, the jump to it will be denied. On the other hand, what our labeling will not prevent is returning not to the intended location but to some other location that has the label L. So what we've shown here is that the sort function should have returned back to the caller sort2. But instead, let's say the adversary modified the return address to point to the beginning of the greater than function instead. Well, this is legal according to CFI because it has the expected label. This is not the control flow that we expected in the program, but it is allowed according to the control flow graph. And in particular, the labeling we use to implement the enforcement of that control flow graph using CFI. Now, we can overcome this precision limitation by having a more detailed labeling of all of the targets of indirect transfers. And we can see here at the bottom of the slide, the constraints that these targets must satisfy. So in particular, return sites from calls to sort must all share the same label. When sort goes to return, it doesn't know who its caller is, and so all of the callers have to have the same label at their targets. So we see that in sort2, the two called from locations have the same label. At the same time, the call targets for gt and lt must share the same label. And the reason for this is that the sort function doesn't know which function pointer it's going to be given, so that all those function pointers have to have the same target label, in this case M. And then finally, the last label is not constrained and so we can give it a new label, N. So this is sort of the most precise graph labeling that we can have. So now, that weird transfer we saw before is going to be denied because label L does equal label M. On the other hand, there's nothing that's going to prevent the weird case of sort of calling from one location from the, the first label L in sort2. But then, returning to the second label L. It's not clear why the adversary would gain anything form this, at least in this example. But it would be allowed by the control flow graph of the program. Okay, so how are these labels implemented? What are the instructions that we might use? Well this is the classic CFI instrumentation. The MCFI and other approaches are not quite the same. So here, we see call of a function pointer. We're transforming the call via ebx plus 8, to be the sequence that's just below it. And what we see here is that we move the target first into eax, ebx plus 8's location. And then we compare against the target label which we expect to be 12345678. If that's not going to work, then we're going to jump to this error_label. Otherwise, we'll call eax, the actual target. When we call the actual target, we need to make sure that the target, when it returns back to us, it will also check its label. So when we do the call, the stack pointer's going to be pointing to this location, this ins, prefetch instruction. Now in the target, when it goes to return, we're rewriting the return 10h here at the bottom to pick the new lo, to store the actual return address into ecx. And then to look for or past it to see whether or not the label stored there matches what we expect. That is AABBCCDD. In this case, we can see that it does, and therefore we will not jump to the error_label, but instead we'll jump to ecx. So given the strengths and weaknesses of CFI, lets think about what this adversary might try to do to defeat it. One thing he might do is to inject code that has a legal label. Maybe you can smash the stack and insert a code buffer that has a label that turns out to be a legal label to which you can jump. Now, this shouldn't work because we assume that we're using non-executable data and so you can't insert any new code. The only thing you could attempt to do would be to, say, modify code labels. But even this won't work because we assume the code is immutable. You could try to modify the stack or registers during the check to make it seem like it succeeded, right? So once we did the comparison and stored the result inside of the register, maybe we could somehow make a failed check look like a successful check. But this is not going to work because the adversary can't change any registers into which we would load the relevant data that would fake out the check. So basically, CFI is going to defeat control flow-modifying attacks like remote code injection, ROP/return-to-libc, and so on. But it's not going to prement, prevent the manipulation of control flow that is allowed by the labels in the graph. And these are called mimicry attacks. And recent results have shown that the simple single label control flow graph is susceptible to dangerous remote code injections. CFI also does not prevent data leaks. So for example, the heartbleed bug which overruns a buffer to read the, the contents in adjacent memory. That's not going to be prevented because that's not affecting the control flow of the program, just what is read. It also is not going to effect the control flow that's determined based on data, rather than on things like jump targets or return address targets. So in this little function here on the right where we have this authenticated variable that we branch on, above for overrun by the stir copy location could overrun the authenticated value. And make it seem like we'd been authenticated even though we're not. And this is perfectly legal according to the control flow graph of the program even though it is violating our security. So control flow integrity is not everything. But it will defeat many, many, probably all, as far as we know, remote code injection attacks. Because the performance of CFI, particularly modular CFI is improving, we can hope that CFI will one day, one day soon hopefully, make it's way into actual practice. And protect our programs far better than defense mechanisms do today.
Secure Coding
. While automated defenses are ideal, since they do not require changes to the original program, they cannot be completely relied upon to prevent exploitation of vulnerabilities. Therefore, as a complement to automated measures, developers should strive to avoid introducing vulnerabilities into their programs in the first place. Doing so can be achieved in at least two ways. First, developers can employ discipline, limiting themselves to coding patterns that avoid vulnerabilities. Such patterns are documented in coding standards like the CERT C coding standard or, other collected wisdom. Second, developers can also use advanced code review and testing mechanisms to find vulnerabilities prior to deployment. Such mechanisms employ techniques like static program analysis, buzz testing and symbolic execution. We will discuss these techniques in depth in a future module. In general, security conscious practices apply at both the Design and Implementation stages of a development process. Recommendations have two flavors, principles and rules. A principle, is a design goal with several possible manifestations. A rule, is a particular practice that follows from sound design principles. The difference between these is not hard and fast. For example, a principle might be, validate your input. How you do that is different, depending on the kind of input it is. That is, there are many possible rules. But, validate your input could itself be considered a manifestation of a principle of least privilege. That is, we would like to trust the input as little as possible. Nevertheless, the distinction is usually helpful. At this point we are going to look at several rules for good C coding. Following these rules will help us avoid vulnerabilities that could permit violating memory safety. In a later unit, we will look at secure development practices more broadly. The first rule we'll consider is Enforce input compliance. And to illustrate it, we'll use our earlier example of an ecoserver. So recall that the way that this works, is that it will read from standard input. First what it reads is an integer. So it will read a buffer up to a carriage return. Assuming it's not null, it will then call atoi to convert it to an integer. Next, it will read in another buffer, a message, up to the carriage return. And then, one character at a time, as long as it's not a control character, it will echo that character onto the screen. And then, finally print the carriage return. Now, the problem here is that length, the len field, could exceed the length of the actual message. And in this case we'll have a buffer overflow because we'll read whatever happened to be in buff, past its length beyond what was typed in. So the way that we can solve this problem is to enforce input compliance that is not trust that len is the appropriate length. Instead what we'll do is we'll compute length as either the minimum of this, length of the buffer that we read in or, whatever value the user typed in len. This way we're sure that we don't exceed the length of the actual buffer and therefore we won't release any sensitive information. Here's another example of enforcing input compliance. This function converts a digit to a character. So it takes the digit i, and then uses that i as an index into an array that has each digit. So when i is zero, it will get the first element of the buffer, which is the character zero. If i was five it would get the that offset into the character array and return the character five and so on. Now the problem is, i could be less than zero or greater than ten, greater than nine. And therefore read off the end of, of that buffer, overrunning it again. To solve this problem we can again Enforce input compliance. That if(i < 0 || i > 9), then we'll just return ?, instead. In general unfounded trust in received input is a recurring source of vulnerabilities. And we'll just continue to see examples in the course over and over again, where code assumes that input is well formed, and an adversary takes advantage of that trust. We want to apply a principle instead called Robust coding, and validating or enforcing input compliance is one element of that principle. You can think of it as like defensive driving. You want to avoid depending on anyone else around you to do the right thing. Instead you want to check at all times, or often enough, that something is the way you expect in order to minimize trust that whatever that other party is doing for you, that party is doing it the right way. So one way to realize this idea is to pessimistically check the assumed preconditions on outside callers. So we just saw this in the example with conversion by, instead of assuming that clients will always call convert with values between 0 and 9, we will check that that's the case, and throw an exception or return a different value instead. The next rule to consider is to use safe string functions rather than the unsafe versions. So gets, strcpy, strcat many of these routines assume that target buffers have a sufficient length. And here are two examples where that's not the case. We try to copy the string hello which takes six characters, the world hello plus the null terminator, into a buffer str that only has four characters of room. Or we do strcat into buf which has its first five characters taken up by the word find many more than the ten allowed in that buffer in the phrase day to you. The Safe versions are going to check the destination length that you pass in and therefore avoid this problem, so strlcopy and strlcat allow you to include the size of the target string and target buffer. So that the routines can check that the second strings you passed in aren't going to exceed that size. So, here are a bunch of the replacements for the unsafe string functions. Microsoft has its own versions that are slightly different. Strcpy_s, strcat_s, and so on. Another important thing, is to not forget the NUL terminator. Strings require one additional character to store the NUL terminator. And if you don't allocate space for it, then it could lead to overflows. So, here we might think oh, I'm going to store the string bye into the str buffer. Bye has three characters, so I should make the str buffer have three characters. But when we do that, str copy is going to add a fourth character, the NUL terminator at the end, which will overflow the stack will overflow the space allocated to str. So that's a right overflow. We'll also have a read overflow, when we call strlen on str. If instead we only wrote the three characters, and not the fourth character, because it will read past the NUL terminator. So using a safe string library will catch this mistake. Strlcpy will write the 0 terminator at the third position, so instead when we call strlen we'll get 2. That is the phrase the word B-Y instead of the word bye. So it's the wrong answer, but it's the safe answer. Another rule is to understand pointer arithmetic properly. Recall that the sizeof function returns a number of bytes, but pointer arithmetic multiplies, by the sizeof operator, on the type of the pointer that's being incremented. So here we have a little example where we have a buffer of integers of size SIZE. And we're iterating through it. And we've made a mistake by adding sizeof(buff) to the buf_ptr++. Sizeof(buff) returns bytes, but buff expects words to be added to it. So this is going to add four times as much length as, as really required, and therefore could potentially overflow. So we have to use the right units. Here we can do that by saying (buf + SIZE). SIZE is the size, it's the number of integers in the buffer. And therefore it is properly pointing to just past the end of the buf, and bounding it properly. Another mistake we will people might make is to have dangling pointers which can be exploited. And one way to defend against dangling pointer errors, even if you still make them is to null them out. So first, let's see what can go wrong with dangling pointers. In the program we have this variable x = 5. We malloc something into the pointer p, so the Heap here has some unallocated memory that's now allocated to p. We then free p, so now this memory is no longer active. Now while our memory safety definition would say you can't reuse memory, of course, a proper implementation may reuse memory, and an adversary could actually exploit that fact. So here we've allocated memory to the pointer q, which we assume is an int** pointer, that is a pointer to a pointer to an int. And maybe it turns out the malloc reuses the memory that p just used, and allocates it instead to q. Okay, so then we store into the location q the address of x. Now, we de-reference p, the dangling pointer, and store the value 5. Which is going to blow away the pointer that we had stored there into q. So that, when we de-reference the contents of what's pointed to by q, we're going to crash. In fact, we maybe do, far worse than crashing. Dangling pointers can be exploited for remote code injection. Basically you can imagine that if you can make a function pointer something that you overwrite through a dangling pointer, you can run into trouble. And in fact, it was a dangling pointer invalid pointer reference, that was exploited in a flaw that exploited some Google code back in 2010. Okay, so the way we fix this, is we can NULL after we free. So let's go through the steps again. Now, when we free, we're going to null out p. We'll continue on doing just as before. But now, where we wrote through p as a dangling pointer, we're going to crash. It's better to crash than it is to allow remote code execution. Another useful rule is to make sure that you manage memory properly. That is, you don't have memory leaks or otherwise say dangling pointers. And one way you can do that is you can use goto chains to avoid duplicated or missed code. This is kind of like try finally pattern that languages like Java have that C doesn't. So what we see in this case is that we have a couple of arguments that cause us to incrementally allocate some memory. So we allocate pf1, and then we check whether the first argument is okay. If it's not okay, we're going to go straight to DONE. Which is the label at the bottom of the code, which will then free that memory in return. Now if argument one was fine, we'll allocate into pf2. If argument two is not okay we'll go to FAIL_ARG2, that label instead, which will fail, which will free pf2, then fall through free(pf1) and then finally return. On the other hand you want to confirm that your logic is reasonable. If you mess up these goto chains you could end up avoiding checks that you shouldn't avoid, or doing redundant checks. And in fact Apples infamous goto fail bug was as a result of doing an improper goto chain. Now we've looked at the safe string library, strlcopy, strlcat and so on. But these libraries still recall that you still require that you call the functions properly. That is, you pass in the proper lengths of the, the target buffers for them to work. Instead, you might be better served by using a completely safe string library that can never have buffer overruns, no matter what mistakes you make. And in fact, this is the approach that's taken by Very Secure FTP. It implements a string library. Some of the, the API which is shown here on the slide. So basically it defines a new type called mystr and we can allocate a bunch of text. We can append to it. We can check where the strings are equal. We can see whether they contain spaces. And the internal representation of the string is hidden. But the way that it is implemented is that the length is known, doesn't rely on a null-terminator. And there's no way that you'll ever allocate past the end of the buffer because the buffer always comes with its length. C++'s standard string library also does something similar. In general, using safe libraries is a great way to make your program more secure because it allows you to reuse a well thought out design. Smart pointers is another example. These are pointers that only only permit safe operations and their lifetimes are managed appropriately. Just like with the safe strings where extra metadata, like the length, are kept with the string. Likewise, they are kept with the pointers, whether or not they are valid. All of that information is kept in the implementation of the pointer like in the Boost library for C++. On the networking side, you can use something like Google Protocol Buffers or Apache Thrift to make it easy to send network packets efficiently. And not worry about input validation or parsing or so on, the library will take care of all of that for you. It's not exactly a library because we think of it as a system feature, but it might as well be one, and that's using an allocator that's safe. Heap-based buffer overflows are still a problem. And one way to challenge those, to make it difficult to exploit them, is to make the address return by malloc unpredictable. In the same way that ASLR makes it hard to exploit stack smashing attacks because we don't know where the return address of the stack is. A randomizing malloc implementation makes it hard to guess what addresses are returned by malloc, and therefore makes it harder to exploit them. And there are a couple of options, the Windows Fault-Tolerant Heap, and for Window systems. And the DieHard allocator for Linux systems both implement this, this sort of safety by randomness approach. Okay, so that's all the rules that we're going to talk about for safe C coding. We'll get back to more on safe C design, in fact safe design overall, a bit later in the course. Let's finish up with with a fun poem that was written by Andrew Myers, called the Gashlycode Tinies. And it encapsulates a bunch of failures to follow important rules. It's inspired by the Gashlycrumb Tinies by Edward Gorey. A is for Amy whose malloc was one byte short. B is for Basil who used a quadratic sort. C is for Chuck who checked floats for equality. D is for Desmond who double-freed memory. E is for Ed whose exceptions weren't handled. F is for Franny whose stack pointers dangled. G is for Glenda whose reads and writes raced. H is for Hans who forgot the base case. I is for Ivan who did not initialize. J is for Jenny who did not know Least Surprise. K is for Kate whose inheritance depth might shock. L is for Larry who never released a lock. M is for Meg who used negatives as unsigned. N is for Ned with behavior left undefined. O is for Olive whose index was off by one. P is for Pat who ignored buffer overrun. Q is for Quentin whose numbers had overflows. R is for Rhoda whose code left the rep exposed. S is for Sam who skipped retesting after wait(). T is for Tom who lacked TCP_NODELAY. U is for Una whose functions were most verbose. V is for Vic who subtracted when floats were close. W is Winnie who aliased arguments. X is for Xerxes who thought type casts made good sense. Y is for Yorick whose interface was too wide. And Z is for Zack whose code nulls were often spied.
No comments:
Post a Comment