MyClassNotes: Software Security

Low Level Security- Introduction

In this unit, we will consider low-level software security, which is a concern for systems written in a C and C++ programming languages. We will begin by considering the infamous buffer overflow attack, which low-level software is vulnerable to in particular. What is a buffer overflow? A buffer overflow is a bug that affects low-level code, typically written in C and C++, with significant security implications. Normally, a program with this bug bug will simply crash, but an attacker can alter the situation, and cause the program to do much worse. Allowing the attacker to steal private information, to corrupt important data, and even to run code of his choice. It is worth studying buffer overflows for several reasons. First, they are still relevant today. C and C++ are used to write a lot of software. And that software often has buffer overflow vulnerabilities. Second, during their long history attackers and defenders have played a game of cat and mouse. As defenders address one weakness, attackers find a way to work around it. We will find it instructive to understand the technical details of that long history. To see how the attack works and how to defend against it. Lessons we learn here will be relevant to other software weaknesses. So, let's dig in a little bit further to the relevance and history of buffer overflows. First, this chart shows that C and C++ are still very relevant today. The chart comes from a recent study done by IEEE Spectrum magazine. To come up with their ranking, they looked at new and active open source projects hosted on sites like GitHub. And also looked at Google keyword searches among other data. Considered all together, the evidence supports C and C++ is two of the top three languages used today. Therefore, any vulnerabilities particular to these languages, as buffer overflows are, are quite relevant to a good understanding of cyber security. What software is written in C and C++? Some examples include operating system kernels. High performance servers, such as web servers and database servers. And embedded systems, which appear in cars, airplanes, industrial control systems, and, even the Mars rover. These systems are all of critical importance. They are the platform for computing, and they drive our economy, and ourselves, from here to there. A successful attack on these systems has tremendous consequences. The first buffer overflow attack occurred in 1988 and was carried out by a student named Robert Morris. This attack was part of a self-propagating computer worm that replicated itself across the internet. Once it compromised one system, it would gain a foothold there and try to launch an attack against other systems. The attack worked in part by sending a special string to a server called Finger D, that was vulnerable to a buffer overflow. This string contained code that would help carry out the attack. In the end, the attack infected a significant portion around 10% of the nascent Internet. Causing a tremendous amount of damage due to denial of service. Morris was eventually caught and had to pay a fine, serve three years probation and carry out 400 hours of community service. Since the Morris worm many other worms have been developed some of which have exploited buffer overflows. One example is CodeRed which came out in 2001. It's exploited the vulnerability in Microsoft IIS web server and spread rapidly across the more larger internet. In fact, the worm was one of the elements that prompted Bill Gates, then Chairman of Microsoft to write his now famous memo. Exhorting the company to take security far more seriously and to develop a platform for trustworthy computing systems. Unfortunately for Microsoft, another big attack occurred in 2003. With the SQL Slammer worm infecting a huge number of machines running Microsoft's SQL server. The infections took place in a matter of minutes. Changing the culture and practices at a huge organization like Microsoft takes time. Microsoft has made significant strides since that time. And many of the practices and ideas that we will consider in this course are ones that they developed. Or that they have taken to heart. Now buffer overflows are pernicious. As evidence consider that in early 2014 a buffer overflow vulnerability was discovered in the code of the X11 server. Which was an early leader for standardized support for graphic desktop displays. And it forms the formation of remote desktop technology today. That bug that was discovered in 2014 had been latent in the source code for more than 20 years. And indeed, despite the increased knowledge of how dangerous buffer overflows are. And the attention paid toward defending them. The number of reported vulnerabilities continues to rise. So, here's what we'll do. For the rest of this unit, we'll learn how buffer overflows work and then learn various ways to defend against them. For a complete understanding, we'll need to look at how a compiler produces executable code from C source programs. And how the operating system and architecture work together to run these programs. We'll see that knowing these details an attacker can exploit bugs and how the program utilizes memory in order to attack that program. In general, security often requires a whole systems view. And out study here will be an example of that. Before proceeding I'd like to make a note about terminology. I use the term buffer overflow to mean any access of a buffer outside of its allotted bounds. This access could be an over-read or it could be an over-write. It could occur during iteration across each element of the buffer, for example, running of the end. Or by a direct access, through a direct index of the buffer. The out of bounds access could be two addresses in memory that either proceed or follow the buffer. All of these things I'll consider broadly as a buffer overflow. Now others sometimes use different terms to refer to specific instances of buffer overflow that I have listed above. They might reserve the term buffer overflow to refer only to actions that write beyond the bounds of a buffer. And they might use terms like buffer underflow, buffer over-read, or out-of bounds access to be more specific. Throughout this course, when I use the phrase buffer overflow, I use it typically in the most general sense. And more specific uses can be determined from context.

Memory Layout

[SOUND] Before we can talk about how buffer overflows work, we need to review some details about how you run a program on a modern computer. For understanding buffer overflows, we're particularly interested in how programs are laid out in memory. We will consider where the program code and it's data are located in memory. We will look at the call stack and how it stores arguments and local variables of functions when they are called. We will look at some of the metadata that is stored amongst this program data. To make it easier for the compiler to generate code that can be used in different circumstances. For example, no matter which function calls which other function. In our discussion, we focus on the Linux operating system process model running on an Intel x86, 32, or 64 bit processor. While the details differ for different operating systems and architectures, the concepts that we will consider are very similar. All programs are stored in memory. A program, when it begins running is called a process. And that process is given memory by the operating system in order to run. Here we depict the processes adverse space. At the bottom is address zero, the lowest address. And at the top is the address at four gigabytes, which is the highest address on a 32-bit system. The process's view of memory is that it owns all of it. As far as it can tell, it's the only programming running on a system. In reality, these are virtual addresses. That the operating system and processor map to actual physical addresses for the memory on the machine. At the bottom of the address space is the Text segment or code. Here we see some x86 instructions that might make up the code of our program. Just above the text segment is the data segment and it has two parts. The first is the initialized data area. So here we see variable y that's initialized to ten. Above that is the uninitialized data area. Here, the variable x is not initialized at all. However, note that global variables not initialized by the program are assured by the process model to be zero. This is not true of uninitialized local variables, as we'll see later. All of this data is known at compile time. So the compiler can determine where it goes and can specify as much in the executable. At the top of the address space comes the command line arguments and the environment variables. And these are set when the process starts. Just below them, is the stack. The stack is what holds local variables, along with metadata that the program uses to call and return from functions. Above the data segment is the heap. This is the area that malloc manages. All of this data is organized and managed at runtime. That is, how it behaves depends on what the program does. What it interacts with, what input files it reads or writes and so on. Now we've turned the picture on its side so the lowest address is to the left and the highest address is to the right. And we'll use this orientation for most of the rest of the slides. Here again, we see the stack and the heap depicted and we also show the direction that they grow. As more memory is needed in the heap, it grows towards the higher addresses. Where as more memory is needed for the stack, is grows downward toward the lower address. While the program is running, it maintains a stack pointer which indicates the top of the stack. When the program issues a push instruction it will move the stack pointer after pushing the value.

Now, suppose that after running for a while, the function that had pushed these values returns. In that case we expect that the function will pop a large portion of the stack off removing all of its local variables and arguments. We'll see how this works exactly in a minute. The compiler emits the instructions that adjusts the stack at run-time. Likewise code, that is the implementation of malloc, keeps track of the heap. The memory that the heap uses is apportioned by the OS, but the individual data that's stored inside of the heap is managed by malloc. For now we're going to focus on the stack because that's our target of the first attack that we'll consider. The next question is, how does a program use the stack while it is running? As mentioned earlier, the stack is used to support calling and returning from functions. We'll now look at the details. In particular, we'll look at what data we need to store and where we'll put it when calling a function. We'll also look at what has to happen when a function returns. That is, what data needs to be restored and where to get it from. Now let's consider the basic stack layout. Here we see a simple function func, that takes three arguments, arg1, arg2, and arg3, and has two local variables, loc1 and loc2. Below we see the depiction of the memory of the process. The highest addresses are to the right as usual. And we see a depiction of the callers data, that is the caller of this function. When the caller goes to call this function it's going to push the arguments in reverse order of the code. So remember, the stack grows from the right to the left that is, the top addresses to the bottom addresses. So we see then that arg3 comes first, then arg2, then arg1, that is, the opposite order of the program. Now, the local variables of the function are accessed, on the stack as well. And they are stored in the order that they appear in the program text. That is, first loc1, and then loc2. There are a couple bits of information that are stored in between, and we'll see what these are in a moment. Now suppose the compiler is generating code to access these variables. So here we show that within the function it wants to increase the value of loc2 by one. How will it do this? Well, in order to do it, it needs to know where loc2 is stored on the stack. Suppose, for argument's sake, that it's stored at this particular address. How will the program know that? Well if we think about it, if this function could be called from many different places in the program. The actual address of loc2 could differ depending on who called the function. Therefore, the compiler cannot know this address at compile time, and it's going to need to do something else. Fortunately, the compiler always know the relative address of this variable. That is, it's always eight bytes before the question marks here on the stack. Stepping back, we can think of all of this stuff that's highlighted in blue as the stack frame for the function. The arguments and the local variables plus these extra question marks that we'll get to in a minute. Now because we want to know how to locate local variables and for that matter how to locate arguments. We need a reference point within the stack frame. We'll call that the frame pointer. Typically compilers store the frame pointer in the EBP register. Therefore, the compiler knows that no matter where this function is called from. It will always be eight bytes distant from the current value of the frame pointer. Now let's see how we implement returning from functions. Here we see main which is called the function func we were just looking at. And we see the stack frame for func, here at the bottom of the slide. Here's the caller's data for main that we've saved. Now, when we called func, main was using the frame pointer just as func is to access its own local variables. When we return from func, main is going to want to use the same frame pointer that it had before. So that when it goes to access its variables, it's going to the right addresses. So the question is, how do we save and restore the frame pointer so that this works properly? Well let's think about how main is going to call func in the first place. What it will do is it will push it's three arguments, arg3, arg2, arg1, here hey 10 minus 3. It'll push some other data that we'll see in a minute. At this point, the stack frame pointer is right here. Now what we can do is we can save main's frame pointer right on the stack. At this point, we can update the frame pointer to be the current stack pointer. And now when the func function starts to run, it will push its local variables after the current stack pointer. And here we are from where we started. The next question is, how do we resume at the same place that we were in, in main when we called func. Here's what's going on. As main is running the instruction pointer, eip, is moving through the different instructions that implement main. Now it goes to call func. When it goes to call it. The fr, the instruction pointer is going to move up and start executing these different instructions. So what we want is to resume back to where we were when we called the function. Well, we can play the same trick that we did with the frame pointer. We can store the instruction pointer just before calling the function on the stack. Now, when we go to return. We just have to set the instruction pointer to four off of the current frame pointer in the call E. In summary, when calling a function, we push arguments onto the stack in reverse order. Then we push the return address, and then we jump to the functions address. Within the called function, we pushed the old frame pointer onto the stack. We set the new frame pointer value to be were the stack is right now. And then push the local variables in order. Finally, to return, we set the previous stack frame by restoring the frame pointer. And then we simply jump back to the instruction pointer that we saved on the stack. Which is four more than the reset stack pointer which was set to be the previous frame pointer.

Buffer Overflow

[SOUND] Now that we're refreshed on the basics of how C programs are laid out in memory, in particular, how they use the stack to support calling and returning from functions. We can start looking at buffer overflow attacks. Let's look at the components of the name. A buffer is simply a contiguous region of memory associated with a program variable or field. When they use the term buffer, people are often thinking of strings, where a string is simply an array of characters ending with a null or zero. For now, we will focus on strings too. Later, we will consider format string attacks and in the process see how the idea of a buffer is actually quite general. An overflow occurs when the program tries to write more data to a buffer that it can actually hold. This term is evocative of data running off the end of the buffer. But once again, the idea is really more general. Basically, whenever the program tries to use a variable to access memory, that doesn't belong to that variable. For example, by indexing an array out of its bounds, the program is performing a kind of overflow. An important question is, what happens when the program reads or writes to a buffer outside its bounds? According to the C programing language standard, such a program is undefined. Effectively, it is allowed to do anything. In a move positive for security, the compiler could choose to insert code to detect out of bounds accesses and terminate the program when they occur. Instead, most compilers simply assume the program does not have any overflows, and so the program will access whatever memory happens to be at the accessed location. By knowing how memory is laid out, an attacker can use out of bounds accesses to his advantage. Let's look at what could happen if a buffer overflow takes place. Here we have a function, func, and the function main which calls this function with the string AuthMe. Inside of the function, it tries to copy the string AuthMe into a buffer. But probably you can see the problem here. The string has seven characters plus a null terminator. Whereas the buffer in the local function only allots four characters. And so, we're going to overflow that buffer when we call strcpy. Let's see this depicted on the stack. First, when calling func, we see arg1 and we see the instruction pointer that we saved from the caller, and we see the frame pointer. Then we see the buffer, four bytes, that we allocated inside of func. Now we see strcpy works and it's going to copy the first four characters. Then, it's going to copy more characters and overwrite the frame pointer with the rest. When we get to the end of the function, we're going to try to follow the same process we always do, to return to the calling function main. But of course, the frame pointer is now corrupted. So it's going to set it to whatever this strange value is. And we're going to segmentation fault when we subsequently use that frame pointer, for example, when accessing a local variable in the caller. Now, normally, we think, oh, that's a crash. There are bugs in the program, this is one of them. Who cares? Eventually we'll discover it and we'll fix it. Well, buffer overflows are security relevant. If we modify the function func as follows, we can see that it can have security implications on the program when the buffer is overflowed. We've allocated a new local variable, authenticated, and throughout the function func we assume that authenticated should be set only if in fact authentication has really taken place. Perhaps this will happen after a strcpy. Now let's see what happens with our buffer overflow this time. So when calling func, we push arg1, then the instruction pointer, then the frame pointer, and we've allocated the local variable authenticated, and the local variable buffer. And now the strcpy takes place. This time, instead of overwriting the frame pointer, we overwrite the contents of the authenticated variable. Now this is a problem, because every time we go to check authenticated, the value is non-zero and the check is going to succeed. So this mistake had a security relevant outcome by allowing the program to do things that probably we didn't intend. Could it be worse than this? Well in fact, if we think about it, strcpy gives us the ability to copy any amount of data into a buffer that's not the right size. So basically, we could overwrite lots of memory on the stack. And the question is, what could you do with that ability if you were an attacker? Well as we'll see, one thing the attacker can do is overwrite the buffer with code. It arrange for the program to execute that code when it returns from the function. Now, before we see how that works, as an aside, let me point out that these examples are providing their own strings simply as constants. But in reality, the issue is that strings come from users, some of those users malicious. For example, they could come as textual input. They could come as packets, or environment variables, or input from files. It's very important that we validate our assumptions about user input. That is, we want to make sure that the input, for example, is not too long or that it conforms to a certain structure that the program assumes. We'll discuss validating input assumptions later and throughout the course because it turns out to be a problem that programs make all the time, not just with buffer overflows.

Code Injection

[SOUND] Now let's look at the main idea of code injection using a buffer overflow. Recall our function func, and in this case, using sprintf to copy into buffer. There are two main challenges for code injection. The first is somehow using the program to load your own code into memory. And the second is somehow getting the instruction pointer to point to it, so that that code can be executed. So let's look at the first challenge, loading code into memory. The first thing to keep in mind is that this code must be the machine code instructions that that machine is prepared to run. In other words, it's not going to be C source code, but instead, it's going to be the actual assembly language for the target architecture. Moreover, it can't be just any assembly language. We have to be con, careful about how we construct it. So for example, it cannot contain any all-zero bytes. Why is this? Well, strcpy, sprintf, gets, scanf, and various other unsafe calls that we might like to exploit will only copy data that doesn't have zeros in it. That is, it will copy from the start of a source buffer up until it reaches zero and then stop. Therefore if we want to inject a lot of code, we have to make sure that all of that code contains no zeros. Next, we need to be careful that the code is complete. It can't assume that it can use the loader to say resolve memory addresses inside of the program. Instead, it has to be completely self-contained. What code should we try to run? Well, we want to try to run a general purpose shell in the best case. A general purpose shell is a command-line prompt that provides the attacker general access to the system. You may want to do other things with the code you inject, but this is sort of the best case. Code that launches a shell as part of an attack is called shellcode. Here's what the shellcode you might like to write might look like. It's a simple function that calls execve, which effectively transforms the current program into the one given as an argument. In this case, the argument is /bin/sh, a shell. Here's some assembly for this shell code. If we look at the first instruction, this is what it might look like as a string. This would be the string that you provide as part of your input. The second challenge is getting the injected code to run. Just because we loaded the code in doesn't mean we can get the pointer, the program, to jump to your code whenever you like. Moreover you don't know precisely where you code is with respect to the instruction pointer. Somehow, we have to get it at the start, and start running. Now, recall the memory layout summary for the calling and returning from functions. How could we use this setup to our advantage to inject code? Here's the key. The very last step jumps back to the location of the return address, which was saved on the stack. Therefore, we can store the address of our code at that location. And therefore, get the program to jump to that code. That's the main trick. Here's what it looks like visually. We load in the address of out code over top of eip saved on the stack. Therefore, when the function returns, it's going to return exactly to that location and then start running the code. Now the next question is, what address should we put there? How will we know what the address is? Well, maybe thinking of the question another way, we can ask, what if we get the address wrong? If we pick the wrong address and jump to some other location. Most likely, the CPU will panic because it will reach an invalid instruction, therefore crashing the program. Another challenge that adversaries sometimes face is finding the return address. Now, if the adversary knows the code that he is trying to attack and knows exactly where the buffer overrun is, he might know exactly where the buffer is with respect to the frame pointer, and therefore where the return address is located. Therefore, he knows what the location is to overwrite to get his code to be run. On the other hand, the adversary may not have access to the code and may not know how far an overrun buffer is from the saved frame pointer. One approach is trial and error. Just try a lot of injected values on a running server until something works. But of course, the address space is quite large and maybe this won't really work. On the other hand, without address randomization, which is something we'll discuss later, the stack always starts from the same fixed address. The stack will grow, but usually it doesn't grow very deeply unless the code is heavily recursive. This reduces the search base dramatically. Another thing the adversary can do is use what is called a nop sled. A nop is a single-byte instruction that just moves to the next instruction. If the adversary sticks a bunch of nops as padding, prior to his own code, then jumping anywhere in that nop sled will work. Now we can improve our chances by a factor of a number of nops. So, putting it all together. Here's what all of the injected adversarial code might look like. This part labeled padding has to be something, because we have to start writing wherever the input to gets, or sprintf, or strcpy begins. But, when the program returns to the picked location, it'll hit the nop sled and start running our malicious code.

Other Memory Exploits

The attack we have just deconstructed is called a stack smashing attack. The term was coined by the hacker with the handle Aleph One in his famous 1996 article in Frac magazine titled Smashing the Stack for Fun and Profit, which you can still find online. The reason for the name is obvious. The attack overrides or smashes important data on the stack to enable illicit actions. Revisiting the three security properties we briefly discussed in the introductory lecture. Confidentiality, integrity and availability, we can see that stack smashing is a violation of integrity. The attack has corrupted important data in the program and enabled further corruption of data on the system by allowing arbitrary code to run on behalf of the attacker. Stack smashing can also reduce availability by simply crashing the program or injecting code to make it unresponsive. For the remainder of this unit, we will take a brief look at other attacks that are a variation of stack smashing. They too will take advantage of bugs involving the use of memory. But that they will consider memory allocated in different places. And they may read memory illicitly, rather than write to it. Another sort of attack is a heap overflow attack. While stack smashing overflows a stack allocated buffer, you can also overflow a buffer allocated by malloc, which resides on the heap. This code gives an example. At the top, we define a struct, vulnerable struct that has two fields. The first is buff, a character pointer. The second is the compare function pointer. Below we see a function, foo, that takes a vulnerable struct as an argument along with two character pointer arguments. To begin, the first line of the function copies one into buff. The second line copies two past one into buff. Finally, the third line calls the compare function pointer passing buff as an argument. And comparing it against the foobar file pointer. Now, you may have noticed that this code is only going to work properly if the string length of one and two is less than the maximum length of the buffer into which they were copied. Otherwise we will overwrite the compare function pointer. Just as when we overwrite the return address in a stack smashing attack the adversary may be able to control how this overwrite happens and get the program to run code of his choice. There were many variants of this basic heap overflow attack. One variation applies to programs written in C++ which extends C with support for object-oriented programming. C++ objects consist of data and methods as defined by a class. Class is support inheritance so a method in a parent class can be overridden by a method defined in inheriting child class. C++ supports subtype polymorphism so that a child class' object can be used where a parent class object is expected. As a result the compiler can not be sure whether an object declared to have a type T really does have type T or has a type that inherits from T. To handle this situation all objects are compiled to have what is called a vtable. This is an array containing pointers to the code of each of the object's methods. The code used to call a method simply indexes the vtable using a fixed offset that corresponds to the desired method. Now for this to work, the vtable has to be at a standard location within an object. Wherever it happens to be, the fields containing the object's data are nearby. If one of those fields is subject to a buffer overflow, then the vtable could be corrupted and a method function pointer overwritten. This is analogous to the situation we just saw with the vulnerable struct in C. Both this and the earlier attack we saw overflowed a buffer into another field of the same object. An alternative is to overflow into an adjacent object. For example, one containing a function pointer. This is more challenging because the attacker may need to work to get the right kind of object near by the one he can overflow. But it can be done. A related attacks aims to overflow not a program object, but instead the metadata that malloc uses to keep track of heap allocated memory. Oftentimes, the memory just before the pointer returned from mallet contains a header. This header may contain pointers, for example, linking the returned object into a list of allocated data. Data not currently in use by the program will be linked in a free list, instead. By corrupting this data, an attacker can cause the code implementing Malloc and Free, to carry out actions to his advantage. Another sort of attack that's often considered in its own right is an integer overflow attack. These attacks rely on the fact that in C, a variable has a maximum value. And when that value is exceeded. The variables value will wrap around. In this case we're reading in from the network using the packet get int function. Suppose that the adversary has control of the other side of the network and is sending a very large number. In fact, suppose the number is 1,073,741,824. And that the size of a character pointer on our architecture is four. In other words it's a 32 bit architecture. Obviously nresp is greater than zero and so we will malloc a buffer into which we store a response. Now, the adversary has arranged it so that this very large number times four wraps around to zero. Many malloc implementations will happily allocate a size zero buffer and then the subsequent rights to that buffer are overflowing it. Of course, just as in all of the other attacks that we have seen this overflow may be controllable so that the adversary can inject code or otherwise, have his way. Many of the attacks we have shown so far affect code, return addresses and function pointers. But we can also affect data as well. For example, the attacker might overrun a buffer to modify a secret key to be one known to him. And therefore, he can decrypt future intercepted messages using that key. He might also modify state variables to bypass authorization checks. For example, we showed this with the authenticated flag when first introducing the idea of buffer overflows. He might also modify interpreted strings used as part of subsequent ma, commands sent to other programs. For example, server programs that communicate with databases will often do so using SQL. SQL commands may be overwritten by buffer overflows to get the attacker access to arbitrary portions of the database. So far, we've just been interested in what happens when you write past the end of a buffer. But a bug could also permit reading past the end of the buffer. This might leak secret information. As an example, consider the, consider this program. The program is going to read into buf from standard int, then it will echo back the number of characters specified. Here, we're reading int in integer. We first read into the buffer, and then call the A2I function to convert the contents of that buffer, a string, into an integer length. Next, we read in a message. Finally, we echo back that message by iterating up to the length specified, printing out the characters one at a time. Where's the problem? The problem is that the length that was specified in the first read may exceed the length of the message provided in the second read. If it does, it's going to print out characters beyond what was written. Here's an example run of this program. We start the server, enter in a number, and then enter a message. In this case, the number does correspond to the length of the message. And the program echoes back the message as expected. Here the number is slightly less than the length of the message and as expected, fewer characters are returned. Here the number is greater than the length of the message and we can see that extra data is printed out beyond what was entered. This data is leaked. It was whatever was read in previously. The heartbeat bug is an example of a high profile buffer overflow, discovered in early 2014 that involves reading data rather than writing it. By some estimates, Heatbleed affected nearly 600,000 servers on the Internet. The bug was in the implementation of the so-called heartbeat functionality of the SSL protocol. This functionality allows a client to send a heartbeat message to the server, asking it to respond back to confirm the connection is still active. The heartbeat message contains a length field that indicates the length of the portion of the message to echo back. The bug in the SSL server was that it did not check the length was accurate. In fact, it could be much longer than the heartbeat message itself. By specifying a long length, the attacker could get the buggy server to read beyond the buffer containing the heartbeat message. And therefore, return whatever was in nearby memory. Depending on the activities of the server, prior to the overflow. Nearby memory could contain things of interest to the attacker. Such as passwords, cryptographic keys. Or other items specific to the server using SSL. Another interesting memory bug occurs when dealing with stale memory. A dangling pointer bug occurs when a pointer is freed, but the program continues to use it. An attacker may be able to arrange for the freed memory to be reallocated. And then under his control prior to the program using the pointer that was previously freed. So here's an example at the bottom. We have a struct again with a com compare character point or a function pointer in it. Here we allocate it and then free it. Now, suppose some time goes by and malloc is called, and it reuses the memory that we just freed, allocating it now to this buffer pointed to by Q. Q stores to it some random value. Worse, maybe the attacker can control what was stored to the value, using Q. Now later on, the program reuses P, despite the fact that it freed it. By calling the compare function pointer. And in this case, it has to reference the dangling pointer. And is going to go straight to the memory that the attacker put there. In fact, it was just this sort of bug that played a huge role in the attack that China had on Google back in 2010. An invalid pointer was accessed after an object was deleted in Internet Explorer.

Format String Vulnerabilities

[SOUND] The final category of buffer overflow style attack we will consider is called a format string attack. It is named for the format strings used by the printf family of library functions in the standard C library. A format string is typically the first or one of the first arguments to a printf style function and the remaining variable number of arguments comprise the data to be printed. Format strings use what are called format specifiers to indicate how data should be formatted. For example, the code snippet shown here prints out a record consisting of an individual's name and age. The first format specifier applies to the first argument following the format string, and the second specifier applies to the second argument. In this case, the first argument is a string and the second is an integer. There are also many other kinds of specifiers too. Now, let's see how a simple misunderstanding of how format strings should be used, can lead to a serious vulnerability. Now, we might wonder, what's the difference between this function and this function? We can see that on the first two lines the functions are identical. They allocate a character buffer on the stack. And they call fgets to read into it. The difference is on the third line. The first function calls %s as a format string prior to printing buf. Whereas the second function forgoes using a format string altogether and just places buf there. Now we might think to ourselves, buff is just a string in both cases. So why do I need the format string? Well, the important part is that buff might itself contain format specifiers. In the first case, if it does, those specifier will just be printed out to the screen. In the second, those will be interpreted. If the attacker controls the format string then the interpretation of those format specifiers can work to his advantage. So let's look at how printf is implemented and see how that might happen. Here we see a call to printf. It's going to print out i and its address using the %d and %p format specifiers. The first is for printing an integer. The second is for printing out a pointer. Here's what the stack might look like with the top most address range to the right, and the stack growing from right to left as usual. We can see the arguments on the stack. First of all, the &i argument. The i argument. And the format string, pushed in reversed order. Printf takes a variable number of arguments and pays no mind to where the stack frame actually ends. It presumes that when you called it, you passed in at least the number of arguments specified in the format string. Here we have %d which corresponds to the argument ten. And here we have %p that corresponds to the argument &i. So everything is well. Now let's go back to our vulnerable function. Suppose we passed in the format string %d %x. And notice that there are no additional arguments provided. In this case, the stack will look as follows. We'll have just pushed the format string argument and that's all. Now when printf goes to interpret that string, it will read from the caller stack rame, frame the %d portion and then we'll read again for the %export portion. Let's think about some other format strings and what might happen. This format string will print out the four bytes above the saved instruction pointer. Why is that? Well it turns out that printf ignores any spaces between the percent and the format character that's used in the specifier. In this case%d. In this case, it's going to print out the byte pointed to by this stack entry. So it's going to look one pass the saved eip, interpret that four bytes as a pointer, go to that memory address and then print out the entire content until it reaches an ultimanator. This will print out a series of stack entries as integers and this format string will print in, print them out as hex. Now here is the really terrible one. This one will actually write the number three toward the address pointed to by the stack entry. Why is that? Well, %n is a format specifier that is used to write the progress that printf has made in printing out to the output stream. In this case, it will have printed 3 characters, 100, and so it will print the number 3 to whatever the argument is on the stack. It's expecting to receive an integer, right, for corresponding to the %n. But it's not actually going to get one. Instead it's going to override the stack entry instead. And as you might suspect, this is going to allow the attacker to do a remote code injection in certain circumstances. So you might ask yourself, why is a format string attack like a buffer overflow. Well, we should think of it as a buffer overflow in the sense that the stack itself can be viewed as a kind of buffer. That is, all of the arguments defined by a function define a kind of buffer and bounds. The size of that buffer is determined by the number and size of the arguments passed to the function. So providing a bogus format string thus induces the program to overflow the buffer as defined by the arguments. This vulnerability has been around for quite a while and continues to happen despite people knowing about it. Now that we have seen the wide variety of buffer overflow style attacks that exist, it's time to trade our black hat for a white one. To see about how to defend against them. In our next unit we will step back and look more carefully at what these attacks have in common. Then we will look at a variety of different defenses and evaluate their effectiveness. In essence, we will chronicle the cat and mouse game played by attacker and defender over the last couple of decades. By the end, we will see that unfortunately, when programming in C the attacker still largely has the upper hand. Fortunately, it is far more difficult today to generate an exploit than it use to be, and new methods for avoiding buffer overflow vulnerabilities are being developed.

3 comments:

Naresh MuvvaNovember 9, 2015 at 8:48 PM
could you please share software security project-1 answers.
MembrozDecember 3, 2021 at 6:57 AM
Thanks for sharing this important information with local and international communities. Our workshop software free is designed to help you out with billing, customer inquiry resolution, and much more.
PinkiJune 13, 2025 at 4:37 AM
Great post on software security! For anyone interested in building a career in digital protection, the B.Sc. (Hons.) Cyber Security programme offers in-depth learning on ethical hacking, network defense, and cyber law ideal for future cyber experts!

MyClassNotes

Monday, August 31, 2015

Software Security - Week 1

3 comments: