Monday, August 31, 2015

Software Security - Week 3 - Notes

Security for the Web- Introduction

         So far, we have focused on    programs written in C and C++.    We've examined how vulnerabilities in    programs written in these languages can    lead to attacks that violate memory safety    resulting in remote code injection or    theft of sensitive data.    We've also looked at various    defenses against these attacks.    One effective defense is to use    a memory safe programming language.    In this unit, we turn our    attention to internet security,    focusing on applications that    are a part of the world-wide web.    While many web applications    are implemented in type safe languages and    thus avoid memory safety issues,    they have their own sets of problems.    These problems go by names like    SQL Injection, Cross-site Scripting,    Cross-site Request Forgery,    and Session Hijacking.    Interestingly, the issues underlying    web vulnerabilities are sometimes very    similar to memory vulnerabilities.    For example, SQL injection, SQL injection    and Cross-site Scripting arise because,    like buffer overflows,    the application's failure to    properly validate its input results in    it treating data as if it were code.    The modern web, sometimes called web 2.0,    brings in the additional    complication of mobile code.    In particular, when you visit a website,    code from that site will be silently    downloaded to your machine and then run.    Well, how do you insure that this    code does not violate the security of    other programs running in your browser or    on your machine?    The outline for this unit is as follows,    first, we will look into the technical    details of the basic world-wide web.    We will see how typical web applications    are structured at the client and    at the server.    We'll examine http, the hypertext    transfer protocol, which client and    servers use to communicate.    We'll see how improper    interaction with the database on    the server can enable an attack    called Sequel Injection.    Next, we'll look at how web    applications implement ephemeral,    that is short-lived non-persistent state.    Such state is useful during    a long-lived session.    Typically, ephemeral state is implemented    using hidden form fields and cookies.    Unfortunately, sloppy use of these    features can lead to attacks such as    Session Hijacking and    Cross-site Request Forgery or CSRF.    With these attacks, an adversary    can take over a user's account or    manipulate an application to act for that    user, but in the adversaries interests.    Finally, we'll look at so-called web    2.0 which characterises the modern web.    The modern web makes heavy use    of mobile code often written in    the language JavaScript.    JavaScript programs originated    a server but run at the client.    Running code at the client    creates new possibilities for    attack, one of which is called    Cross-site Scripting, or XSS.    With XSS, a user is tricked into running    code he thinks is from a trusted web site,    but in fact it's from a malicious source.    Throughout the unit we will look at    various defenses to these attacks.    One common theme of all of    the defences should be familiar.    Validate your input So, let's begin by    looking at the basics of the web 1.0       

Web Basics

         So lets begin by going over    the basics of the world wide web.    The world wide web consists roughly    speaking of two sorts of participants,    clients and servers.    And clients and    servers interact with one another.    So clients consist of things like laptops,    and desktops, and    mobile phones, all of which are interested    in content that's provided by servers.    And these are things like    shopping web sites, Amazon for    example, like information web sites,    Wikipedia, blogs, things like that.    So the client runs a browser,    this is Internet Explorer, or Chrome, or    Firefox, and at the server,    one runs a web server to run the content.    And web servers are like Apache, HTTPD.    Engine X, IIS by Microsoft and so on.    The server often maintains    a database which keeps track of    the information that it's serving.    And the client might also    maintain a bunch of private data    that's relevant to the interaction that    it's having with the server over the web.    The database is often a separate entity,    logically and sometimes even physically.    So, for example,    a database might be MySQL,    that's a particular database management    system, or Postgres or SQL Server.    Much of the user data is stored    as part of the browser or    might be stored as files that    the browser later has access to.    When the browser interacts    with the web server,    it does it via what's called    a universal resource locator, or URL.    And we see one here.    this is the URL for my home page    at the University of Maryland.    So the first part of    the URL is the protocol,    here http,    which we'll talk about in a minute.    The second part is the address of    the host that is serving the content,    where on the internet the server is.    And this is translated into    an internet protocol or    IP address by the domain main service or    DNS.    So when the user types in,    this address is provided to DNS.    Which will then look it up and    return back an IP address,    which is just a 32 bit integer.    In this case it's    With this information, the browser knows    that it should connect to this Internet    address using the HTTP protocol.    The remaining part of the URL is    the path to the particular resource that    the client is interested    in the server providing.    Here it's till the end of UH/index html,    if we    look at just the last part, the index.html    part, we can see that it's a file.    It's static content because of the suffix    html, so basically the server's just    going to acquire this file and    return it back to the client.    Because it's hyper text,    HTML, the browser will render    that file on the browser screen.    And the user can interact with it,    clicking on links and    filling in forms and so on.    Another sort of URL might have    a different final file, a different path.    So in this case, delete.php.    And the content of that path is different.    PHP is a program written in    the PHP programming language and    the content is actually determined    by running the program.    PHP runs at the side of the server and    it will query the database    to acquire information.    So if the database changes because    other users interact with it.    Then the next time you go to delete PHP,    you might get different    content shown in your browser.    So this is a dynamically generated file.    It's content that's created on the fly.    Now sometimes URLs also have what    are called arguments that direct.    How, the server is going to do    the rendering for dynamic content,    how it's going to produce the page that's    ultimately returned back to the client.    So, here we have a couple of arguments,    F, which is set to joe123, and W,    which is set to 16.    Okay, so    the browser communicates with the web    server using the protocol    that's specified in the URL.    And, the most common one,    the one we'll focus on is HTTP and    this stands for    hypertext transfer protocol.    It's a so-called application layer    protocol when using the OSI network stack    protocol, and it runs on top of TCP,    the transmission control protocol.    Which can exchange collections of    data that reliably across across    a network even an unreliable one.    So what will happen is that a user    might be viewing a web page and    click on the web page.    And as a result an HTTP    request will then be    sent to the server that    was specified in the URL.    Associated with that button click.    This URL is accompanied by headers    in the actual HTTP Request.    And we'll look at those in a moment.    And there are two types of requests,    Get and Post requests.    Get requests are such that all of    the data is in the URL itself.    This data being the data that determines    what file or, or content is returned.    Importantly, it will not change    the contents of data that's stored on    the server.    It will only cause those contents    to produce a file that is    sent back to the client.    A post, on the other hand,.    Which can happen by filling    in a form field and    then submitting it may allow    the server's state to be changed.    In other words it may have side effects.    So here's an example of an HTTP GET    request, and in this case it is to., and    we can see at the very top the URL and    most importantly, the first two lines,    say that it's a get request and    it's going to the R Security path,    using HTTP version 1.1.    And it's going to get it from    the host    Another interesting field,    another interesting header I should say    that's shown here is the User-Agent field.    And this typically identifies a browser,    so this is what the server can use for    example, to provide content    that depends on the browser.    Like when you go to download a Open Source    program and it knows that you're running.    On a MAC, it will know this by looking    at the User-Agent field in the header.    Okay, so    lets suppose we are at this site, and    we follow that link and this content    is then showed to us in our browser.    And we want to click on this worst DDos    attack of all time link that's shown here.    Okay, when we do that, it will generate    an HTTP request that goes to,    and gets the URL that we clicked on.    Notice here something different    from the last header,    the set of headers we looked at.    There's a so called Referrer field.    And this indicates the web page    that was clicked on in order to    generate the current HTTP request.    And this can be useful for    the server to know whether or    not the request was    generated from content.    Say, that it produced,    rather than from content that    somebody just typed into the browser.    And this does have security ramifications.    We'll look at those more a bit later.    Now lets look at an HTTP Post request.    So here we're posting on a site    called, which is    a course management site that allows you    to have online discussions and so on.    Here we can see the,    the request at the very top    has post as the request type,    along with the URL again.    And we can see the host is    Much of the rest of    the request looks similar.    Here's one part that's different.    The very end includes data that is    explicitly part of the request content.    So for example, these are.    this, part of the request might    be populated by form fields that    are filled in by a user, for    example when responding to a blog post.    So you can see here, at the very right,    a bit of HTML, interesting.    Perhaps it has to do with and so on, that    someone might have typed into a text box,    that then got included in this post    request when they clicked submit.    On the other hand you can still also have    content included in the URL just like you    might for a get request.    Okay so once you've submitted    a request now you're going to    get back a response that    the browser is going to render.    And what is the response look like?    Well, it will contain a status code,    some headers, and then the data.    As well, it will contain some cookies.    We'll talk a whole lot    more about cookies later.    But roughly speaking, these represent    state that the server would like    the browser to store on its behalf, that    helps maintain the notion of a session.    Or other sorts of memories that    the server would like to have,    and, in terms of interactions    with the client later on.    So here's an example response.    At the very top we can see the http    version, the status code and    the reason phrase.    So here the status was 200, which means    that it could find the page that we    asked for, hence okay for    the reason phrase.    Then a whole bunch of headers.    And finally the data at the very end.    You can see here, set cookie,    as part of the header.    These are those cookies that    I speaking about before.    And you can see things like,    at the very end,    content type text html, that indicates the    type of the data that's being provided.    The browser can then use this content    type specifier to know what to do with    the data, how to render it.    One the, the user's browser window.       

SQL Injection

         In a web application the persistent or    long-lived data resides in the database.    This is things like personnel records or    credit card numbers, online inventories.    And of course, this sort of data    needs to be protected from illicit    access and tampering.    Even web applications    that recognize this and    do try to protect it sometimes    fail because of a bug that    allows a very clever sort of attack,    called an SQL or SQL injection.    So, in this unit we're going to    look at that attack, but to do so,    we're first going to have to talk    about how data tends to be managed,    and how it's accessed using    a special language called SQL.    So, typically for    an online server application,    we want to support ACID transactions    on our persistent, long-live data.    What's a transaction?    Well it might be a transfer from    one bank account to another, or    a purchase of an item    from an online store.    ACIS stands for first atomicity.    And this is to say that transactions    should complete entirely or not at all.    That is, you should either    completely purchase the book or not    purchase the book, but you shouldn't pay    the money and then not receive the book,    or you should transfer $100 from    bank account A to bank account B.    You should not withdraw the $100    from bank account A, but    then not deposit the result    in bank account B.    Consistency means that the database    is always in a valid state.    That is, as far as other concurrent    queryers of the database are concerned    all of the data in the database    is in the state you expect.    Intermediate states of transactions like,    I've drawn the money from one account but    not yet put it in the next account,    should not be visible.    Isolation basically says, the results    of a transaction aren't visible until    an atomic operation completes.    And finally, durability.    Once a transaction is committed    it should be persistent, durable.    Even if there's a power failure or    some other sort of failure the effects    of that transaction should persist.    Now, achieving acid    transactions is no easy feat.    Fortunately the research community and    industry have done a great job    over the last three or four decades    putting together systems called    Database Management Systems, or    DBMSes, for managing data and    supporting transactions on that data    that provide the ACID properties.    So, the standard way, maybe not    entirely completely the standard,    but a common way anyway,    of storing data is in tables and    accessing those tables using    the standard query language or SQL.    So here on this slide we    have an example of a table.    The name of the table is users, and    the table consists of a bunch of    records where each field of the record    is referred to as a column.    So, here we see that we have five tuples,    a record with five elements: name,    gender, age, email, and password.    And we have four records    stored in the users table.    Now, we can use SQL to read    from this database and    to write to it, update it in various ways,    using different SQL commands.    So the most common SQL command is    probably the select command and    it gives you a way of querying    the contents of a database.    So this particular select command says    look at the age fields in the users table    and select all of them where the same    record as the age has the name D.    So, looking at the table we can    look at the age column, and    find sorry, we can look at the name column    to find the users whose names are D and    then select out the age    column from those records.    So in this case, the answer would be 28.    Another SQL command is update, and    this gives us a way to modify records    that are already in the database.    So this update command says we want    to set the email address of all    users whose age is 32 to be    the email    So if we look at the users table and    we look at all of those users whose    age is 32, there is only one.    It is the third record Charlie, and so    we should update the email address of    that record to be    Another command is the insert command.    This gives you a way to add new    records to the data to the table, and    here we're inserting    into the table a record.    Frank who's male and 57 and so on.    Finally, there's the drop command which    gives you a way of removing data from    the database.    Here we show how drop can be used to    delete an entire table from the database.    Now, server side code will interact    with the database using SQL and    one common way that server side code is    written is using a language called PHP.    So PHP is looks like a hybrid of HTML,    which is the standard markup language for    rendering web pages, along with some    extra code that allows the program,    the PHP program, to query the database and    substitute the results of    that query into variables in the PHP    program that are eventually rendered.    So this is a very convenient and    popular way of writing dynamic    content generation from asking    a database questions and then putting    the results in the generated webpage.    So here on the slide we see a web site,    or we see a couple of fields, user name,    password, and a button.    And we can suppose that when the user    clicks a on the login button he will have    filled in the username and password fields    and these will have been included in    arguments to the URL that was sent    along with the http post request.    So let's suppose then,    that a user types in their username and    password and sends that information along.    That information is going to    be included in the user and    pass variables shown in the PHP code.    So, what is that doing?    Well I've logged in as user, and the query    is going to select star from users.    So it's going to get every possible user    in the users table, where the user's    name is the user, that was typed in,    and the password is the given password.    So, let's suppose that you can fill    in that user field however you like,    the question is, could you exploit    this situation by being clever    in your choice in what    the user field contains.    And in this particular case,    the code is vulnerable to a SQL injection.    How?    Well, suppose we fill in    the user field with this data.    Frank close quote, or 1 equals 1    closed paren semicolon dash dash.    So this text is going to    be substituted in for    the dollar user field inside of    the query that's being constructed as    a string in the mysql query    call to the database.    And after substitution,    this is what the string will look like.    I'll take the frank closed quote and    stick it in for    user, along with everything else, or    one equals one semicolon dash dash.    But what you can see here is that    the way that the string is constructed,    what we've done is we've effectively    modified the query that's going to    be sent to the database.    The frank close quote closes off the user,    as if the user was frank.    And the or 1 equals 1 is another element    of the where clause for the query.    Dash dash is a comment, so    that's effectively ig, causing the SQL    interpreter on the database to ignore the    and password equals portion of the query.    What's this query going to do in the end?    Well, it's going to select every user    whose name is frank or 1 equals 1.    Well, 1 always equals 1.    And so this query will always succeed,    and effectively this will return    the entire contents of the users table.    You can even chain together    statements with a semicolon,    here by adding the semicolon    drop table users dash dash.    Once we substitute in we'll do    the select to grab all the records from    the database and print them out.    And then we will delete the table    by writing drop table users.    So this is very bad.    A very obvious way of coding things,    just putting in the user and    the password,    has opened us up to the possibility that    malicious input can effectively    change the form of the query.    And cause private information    to be returned, or sensitive and    important information to be deleted.    And SQL injection attacks    are quite common.    They're less common than    they were a while back but    they're still a significant    source of vulnerabilities.    So at this point you should know    enough to understand this xkcd comic.    The great thing about the comic    is that at the very end,    it indicates the solution to the problem,    a solution we've seen many times before.    Which is to sanitize your database    inputs to prevent these sorts of    injection attacks from taking place and,    in fact,    that is exactly what we'll    look like in the next unit.       

SQL Injection Countermeasures

     .    All right.    So if we're going to understand how to    defend against SQL injection attacks,    we have to understand the underlying issue    that makes those attacks possible, and    that issue is the following.    One string combines code and data.    So this is similar to what    happens with buffer overflows.    When a user types in more characters    than a buffer can hold, then in    an attack like a stack smashing attack,    those extra characters will overrun    the buffer's contents and corrupt control    flow data, like the return address or    function pointers, that then can    manipulate what the program does.    In the same way, we can think that    the code and data boundary is    blurred in an SQL injection, and in sense,    we're overflowing the contents of    that user field and inserting new code    to change the structure of the query.    So in general,    when the boundary between code and    data blurs, we're going to open    ourselves up to vulnerabilities.    So we can see the underlying issue    by viewing the SQL query that we    intend as a parse tree.    So, if you look at that query,    you can see that a select statement    basically has three parts.    First, it's what to select.    In this case, it's star.    That is, all possible records.    Second, it's which table.    In this case, the user's table.    And then third, the where clause    that refines which records should    be considered.    So, we represent that here as    a tree where the first node of    the tree is indicating the select command.    And then the three children from that node    are the three elements of that sort of    command, star, which record,    how many records or    which records, users, which table,    and finally, the where clause.    Now, the where clause itself can    be deconstructed into a tree.    It's basically the conjunction    of two equalities.    It's saying that the name field of    the records selected from users should    be equal to the $user contents.    And likewise, the password field, that is,    the password column,    should be equal to $pass.    So, this should be data and not code.    That's our intention.    Whatever we stick here, it's going to    be a string that's checked for    equality to name.    Unfortunately, by the way we're    constructing this query, we actually will    create a different parse tree by putting    in a clever input like Frank closed quote,    or 1 equals 1.    And that's what we want to stop.    Okay, so we hinted a moment ago    at the end of the last unit that we    should stop it by using input validation.    We should check user input to    make sure that it adheres to    the form that we expect and therefore    only use data that's trustworthy.    And we can make data trustworthy,    we can validate it in two ways.    We can check that it has    the expected form, or    simply we can sanitize    it by modifying it or    using it in such a way that the result    is correctly formed by construction.    So, one kind of sanitization    is blacklisting and    one kind of blacklisting is to delete    the characters you don't want.    So, a blacklist is a bunch of bad things.    So what we could do is we could look    in the input for these bad things and    we can delete them.    So, what are some bad    things we could delete?    Well, we could delete the quote or    the semicolon or    the dash dash because those were important    elements of that SQL injection attack.    Unfortunately, this is not    going to work in some cases.    That is, when those characters have    meaning in reasonable context,    like the name Peter O'Connor.    Another kind of sanitization    is called escaping.    And instead of deleting characters,    escaping would have you change problematic    characters to be safe ones instead.    So here we could eliminate,    we could escape out the quote,    semicolon, dash and so on with the escaped    versions that won't be interpreted in    a control flow changing way    in the constructed query.    So you can do this using libraries and,    and so    on that are provided in various frameworks    you might program in like in PHP.    Now, the downside here is that even still,    you may want to have some of these    these characters in your SQL, and so    this approach is not going to work.    Now, on the checking side, you could use    a whitelist to check that the input is    reasonable and    simply reject it if it's not.    So, for example,    we saw this in C programming that    you can make sure that an integer    is within the correct range.    That is to say the length is,    that's specified by user is no greater    than the actually length of the buffer,    the length it is referring to,    and then reject the input.    And the idea here is that it's safer to    reject an input than it is to fix it.    And the reason there is that a fix may    actually produce the wrong output.    That is,    an attacker can manipulate the attempt to    sanitize something to actually    corrupt it to his to his needs.    So this is the following the principle    of fail-safe defaults, that is,    do the simplest thing and reject the rest.    And we'll look at that principle    actually in more depth later on.    It turns out that whitelisting can    sometimes be hard because it's hard to    specify what the whitelist should be.    For example, if you only want to    allow reasonable first names, well,    how do you know what    the reasonable first names are?    Will you provide a, a specific dictionary    that needs to be iterated through to    check the names against?    Could be both expensive and    difficult to get right.    So, for SQL injection,    SQL injection in particular,    the preferred solution is to use    what are called prepared statements.    And the idea here is to treat user data    according to a type, therefore decoupling    strings the use of strings as code    from the use of strings as data.    So here's our cue, our query again.    Here's the same query expressed    as a prepared statement.    So, the first line creates    a handle to the database, and    the second line creates    a prepared statement template.    You can see that this very much    resembles the string that we    were constructing before.    But the prepared statement instead    includes everything other than    the parts that are to be filled in and    those it designates with question marks.    The filling-in comes as    a separate statement,    happening with the bind_param call.    And here,    the bind_param call takes a format string,    which specifies how many arguments    are to follow and what the types are.    So, the statement has two question marks,    so it expects two arguments.    And therefore, the format string    has two format specifiers,    here SS, where S stands for string.    So, when user and    pass are substituted in to the statement    that's created by the prepared statement,    it will treat the, those data as strings    and not misinterpret them as code.    And then the query can    be sent to the database.    So, we bind the variables.    We type them.    And as such,    we have decoupled compilation,    specifying the template with    the binding process of the data.    So, using a prepared a statement,    we would replace the original    query shown above with this query.    And as such, the binding is only applied    to the leaves where the purple parts    are specified,    leaving the structure of the tree fixed.    So if we had filled in    the user name with Frank or    1 equals 1 as we did in the beginning,    it would be fine because it would not    change the structure of the query.    It would simply be an odd user name    to be looking up in the database.    Now, prepared statements make it possible    to eliminate many SQL injection attacks,    but you may, sometimes queries    are complicated and mistakes can be made.    So you have reason to want to use    defense in depth to mitigate the,    the impact of a SQL    injection should it happen.    One way to do that is to limit privileges.    That is, to reduce the power of    the exploitation by not allowing    the server application to do anything    it wants when accessing the database.    So for example,    when you connect to the database,    you can indicate to the database that    only certain commands should be allowed.    For example, only SELECT queries    on the Orders_Table, but    not on the Creditcards_Table.    You can also chose to encrypt sensitive    data stored in the database, so    if somehow the database is stolen,    that data is going to be less useful.    You may not need to encrypt some of    the tables like the Orders_Table.    So when encrypted data is selected from    the database, it is then decrypted inside    of the server application using a key or a    smartcard or some other sort of mechanism.    But while it's stored in the database    at rest, it's encrypted.       

Web-based State Using Hidden Fields and Cookies

         What we have seen so far has been    pretty simple in terms of interactions.    The client sends a message and    the server responds.    But how about multi-message interactions    that might be part of a longer session?    In particular, and perhaps your    own experience, you are used to    a session lifetime in which the client    connects, the client sends a request,    the server responds, the client issues    another request whose content is    based on the server's prior response,    the server responds again, and so    on until finally the client disconnects.    Given this experience,    you may be interested to know that    HTTP has no notion of state or memory.    Each request response pair is completely    independent at the protocol level.    At the least this should make you wonder,    well how is it I don't have to provide log    in credentials with each request, so    that the server knows it's me each time?    The answer is that web applications    themselves keep track of    the relationship between requests and    responses in a session.    They do this by maintaining bits of state    that relate the requests one to the next.    Next, we'll look at how this is done.    So in addition to the long lived state,    that it is going to be stored in    the database at the server, so this will    have things like credit card numbers and    user account information and    inventory and things like that.    The web application will maintain    a femoral state that aids the server    processing while a client    is interacting with it.    So this is not long lived state    that must be durable, but    instead it's keeping track    of what a client is doing    during an interaction to relate    one request to the next.    So, in order to maintain this state    on the client rather than the server,    the server will actually send an encoding    of it back with each response.    And then the client can return    that state to the server when it    makes its subsequent requests.    And there are two ways that we    can implement things this way.    One is to use hidden fields in the HTML    documents that are communicated back and    forth that the server generates.    And the other is to use    something called cookies.    So, let's look at hidden form fields    using an example for an online store.    So here on the left we see    a web page from    It's the order.php page.    And let's suppose the client has    navigated, the user has decided to    buy this pair of socks for    $5.50 and clicks the Order button.    Now the server will receive that    request and send a page in response.    And notice that this page    says that the order is $5.50.    So the user is going    to confirm that order.    Now the question is,    how will the server relate the $5.50    request that the server chose on    the first page so that when he or    she clicks the Yes button on the second    page the server makes the right order.    Because HTTP is stateless, it's not    remembering, technically speaking,    one interaction to the next.    And the answer is, that the server can    embed this information, the cost, for    example, of the socks,    in the webpage that it responds with.    That is, the pay.php page.    So here it is, presented to the user.    And notice here we have what's    called a hidden form field.    So a form is just an HTML element    that has a button associated with    it that will produce a web request.    And there are two non-hidden form values,    yes and no.    They're shown at the bottom of this    HTML code, value yes, value no.    And then there's this hidden one,    the value $5.50.    So this hidden value will get sent    with the request when the user    presses the button.    Now here's the code, the PHP code on    the backend at the web server that will    receive this web request, and    it will pull out the form field values.    Pay, that this was a pay request and    price.    And as long as the price was not null it    will debit the credit card that amount and    deliver the socks.    Here's the problem though,    price comes from the user.    It's filled in by this form field value.    Well, because the HTML is sent    to the user, a clever and    malicious user can change    the value to be something far    less than the vendor intended, and    therefore corrupt the computation.    So we can get around that problem    by using hidden form fields    of a special variety that    we call capabilities.    So the server will maintain trusted state.    Instead of sending the state to the client    the server will keep track of the state,    but it will index that state    by a capability that gives    the client access to it.    And what is a capability?    Well a capability is a right.    It's a piece of data that gives    a client who possesses it a right to    perform some action, and    that capability should be unforgeable.    By definition, that would prevent us from    waging the attack that we saw before,    which was that the client was    able to change the price.    Because the capability is intended    to be and should be unforgeable.    The client will not be able to change    it to effectively change the price.    So, given that capability, the client will    reference it in subsequent responses, and    therefore be able to access the state.    To make capabilities unforgeable,    a typical approach is to make    them large random numbers.    So they are difficult to guess, and    therefore if a client does attempt to make    a guess the client is very unlikely to    find a number that corresponds to a real    capability, and therefore has no power.    So here's what the page    looked like before,    here's how we might change it    to use capabilities instead.    So now the name is SID for capability,    and the value is a random number.    On the server side,    we will modify the code to look up the SID    to find the price, and only if the SID    is legal will the price be present,    in which case we'll bill the,    the credit card.    If it's not there, then we'll go to    the else case and cancel the transaction.    So, capabilities of this sort can take us    quite a long way but they aren't perfect.    We don't like to have to pass    around hidden fields all the time.    It complicates the interaction    amongst all the pages.    It's sort of difficult to put    together a web application that way.    And it has the big drawback that,    if you ever close the browser window,    you throw away the HTML that    contained the hidden form fields.    And therefore if you    reopen your browser and    reconnect to the site, all memory    of your prior interaction is gone.    So we can solve these problems by    using what are called cookies.    Just like with capabilities, the server    will maintain some trusted state.    And that state will be indexed by    a cookie rather than a capability.    Just as with hidden form fields,    the server will send along cookies,    along with any responses.    And the client will then    store them locally.    When the client reconnects to the server,    they will send the cookies in response.    Now instead of these cookies    being embedded in an HTML page,    they're actually stored, they're sent    around as part of the HTTP protocol, and    so they're just stored on the disk in an    area that's associated with the browser.    And so it doesn't matter    if you close the page and    it doesn't matter exactly what's on    the page during the interaction.    So here's an example HTTP    response that contains cookies.    One of the headers, well,    many of the headers, as we can see here,    have the form Set Cookie.    And the, what's,    what follows is key=value, which    indicates that the cookie key is the key,    and it's associated with that given value.    And then there are a bunch of options that    specify things like timeouts and paths and    hosts and so on.    So let's dig into that and    look at this example in further detail.    So here on the client,    it receives that HTTP response and    it sees this set cookie header.    And now it's going to process it.    The first thing it does is it says, well    the key is edition and the value is US.    And the options for that cookie    are that the value expires as of    the given date,    perhaps when the session expires.    And the cookie is associated with    the denai, with the domain, and    URLs at that domain that begin    with the subdirectory slash.    So in short, whenever a client interacts    with the server via his browser,    if the interaction is with the given    domain with the prefix of the given path,    then this cookie should be sent    along with that HTTP request.    So, in particular here is that HTTP    response that was received by the client.    We see a bunch of cookies    are being set here.    In a subsequent visit, notice we're    visiting and the root directory,    and at the very bottom there we see    the header that says, well, we're going to    provide these session zdnet production    cookie and we include it's value.    And you can see that this matches    the value that's given on the last line of    the response at the top.    Followed by another cookie,    which is zd region,    followed by further data that's,    again, shown at the top, and so on.    So all the cookies that are relevant    are going to be provided    along with the request.    There are several reasons that    web applications use cookies.    The most common use,    as we've hinted at already, is for    a cookie to act as a session identifier.    Basically, after the user logs    in as part of a post request,    the application sends a cookie in    response that identifies the user.    The cookie is sent in subsequent    requests to the same web application.    So that it can silently    authenticate the user each time.    The human user, of course,    is unaware that this is happening and    interacts naturally with the system.    Another use of cookies is personalization.    Shopping websites for    example, are interested in showing you    things that you are interested in.    They can figure out your interests    based on past interactions.    Based on observations a site can create a    cookie that identifies various interests.    This cookie can be used to prioritize the,    the display of various elements of    the site,    effectively personalizing the site to you.    Personalization can even go to    the level of font choices and    other superficial elements of the display.    The nice thing about cookies is    that these can be anonymous.    Such preferences are not security    sensitive at least from the site's point    of view, and so no authentication    of the particular user is needed.    Of course, the flip side to    personalization is tracking.    Instead of personalization cookies being    used only by a particular site, they can    be made available to other interested    parties, like advertising networks.    How can this work,    given that cookies should only be    visible to the site that created them?    One way is the following.    Site A uses advertising    network B to show an ad.    When B receives the request from    your browser to show the ad,    it can figure out that the ad was    displayed when visiting site A.    How?    Well, by looking at the referrer    attribute of the HTTP request.    For example, the request to show an image.    Site B would like to associate    you with a list of sites and    the pages on them that you tend to visit.    In this case,    it would like to include site A.    One way it could this is    to maintain a list in a database    mapping IP addresses to site lists.    The idea would be you are associated    with a particular IP address and    the list is then associated with    that address and therefore you.    But doing this isn't so reliable because    you are not associated solely with    one IP address, and in fact different    people might use the same address.    Instead, what will happen is the ad    network will store the site list as    a cookie on your machine.    This cookie is called a third party cookie    because it is associated with site B, but    was created when visiting site A.    This way, when you visit other sites    that happen to use the same ad network,    the ad network's cookie will    be accessible to the ad.    It can then add to the cookie,    the current site.    It can even customize the ad as    shown based on previously visited    sites which reveal your interests.    Now one way to prevent this sort of thing,    is to disable third party cookies.    But this method is not perfect thanks    to the ability to otherwise fingerprint    your browser.    But that's a topic for another time.    Let's get back to considering session    cookies and how we can protect them.       

Session Hijacking

         As we've already mentioned,    an extremely common use of Cookies    is as session identifiers, which    associate a user with a multi-interaction    session with the website.    The basic idea is simple.    The user first logs into the website, for    example, using a username and password.    On successful log-in, the server sends    back a session cookie with the response.    Subsequent requests to the same site    will also send along a session cookie.    This cookie will either    go in an HTTP header or    will be explicitly included    in a hidden field.    At this point, the server now knows,    who it's talking to.    This approach makes the session cookie    a prized commodity for attackers.    In effect, a session cookie is    a capability to access a site with    credentials of the, of a particular user.    Thus, this capability needs    to be protected from theft.    Otherwise, an attacker can impersonate the    user and perform actions on her behalf.    Such actions, could result in lost or    corrupted data,    like your bank account balance.    To see how theft could happen,    consider where,    sh, cookies show up in an interaction.    First, they are generated and    possibly stored at a server.    Then, they are transmitted between    the server and client and back again.    Finally, they are stored at the client for    later use.    As such, Cookies could be stolen by    compromising, either the server or    client and copying them.    Or if the server's algorithm for    generating Cookies is known,    an adversary could predict,    what a particular user's cookie is.    Cookies could also be copied by sniffing    the network to observe them in transit.    Or the network could be manipulated,    into sending the cookies    to an adversary directly.    Using techniques like DN ash ca,    DNS cache poisoning.    Cookie release due to compromise    can be prevented by building more    resilient clients,    servers, and DNS caches.    In other words, avoid buffer overruns and    the other sorts of errors    we've already seen.    Now, cookie release due to sniffing can be    avoided by using encrypted connections.    In particular, sensitive interactions    after a log-in should use    secure HTTP called HTTPS,    which encrypts the communicasen,    communications, including the cookies.    In fact, by setting the secure    attribute of a cookie,    you can ensure it will only be sent    over HTTPS, and never over simply HTTP.    This fail safe prevents a coding    mistake from inadvertently revealing    a session cookie.    To avoid allowing the adversary    to guess what a cookie will be,    applications should generate    cookies that are long, so    that there are many possibilities for    the adversary to try to guess and random.    So, that the particular    possibility is hard to predict.    Note that the same guidelines should hold    for generating SIDs in hidden form fields.    Now if, despite these protections, a    cookie is stolen, session hijacking can be    further defended against by not using only    cookies to identify a user's session.    Instead, you can build your application    to require correlating information    on the site.    That identifies a user,    users current interaction.    For example,    if the user is currently looking at    a Web page that shows his bank accounts,    but then a request comes in asking for    a transfer the web site should be able    to tell that the request is not valid.    This can be done by storing    hidden fields on pages.    Or using the refer attribute, so    that impossible requests even with    a proper cookie are rejected.    This same sort of protection is used for    CSRF attacks,    which we will discuss shortly.    There are yet more mitigations    against the tax due to stolen cookies.    To motivate them consider    a recent Twitter vulnerability.    Twitter uses a single cookie,    called auth_token to identify a user.    This cookie is computed from    the user name and password.    Now, this approach suffers    from two weaknesses.    First, the auth_token does not    change from session to session.    Second, it does not become invalid,    when the user logs out.    This means that stealing the cookie gives    an attacker indefinite hijacking of    a user's account.    Now, the defenses we've already discussed    can reduce the chance that a cookie is    stolen or that it is used inappropriately.    But there are two more    defenses we can add.    First, do not allow session    cookies to go on indefinitely.    They should have an expiration date.    Now, this is like the expiration    date on your credit card.    It cannot be used after that date.    The other defense is to direct a cookie    to be deleted from a user's machine and    from the server, once a session ends.    This reduces the time that a cookie    is exposed to possible theft due    to compromise.    Finally, let us consider a non-defense.    You might think that you can neuter    an attack by tying a session to    a particular IP address.    That is a user logs in    from one machine and if    a request comes from some other machine,    they must be bogus and should be rejected.    Now, this defense will murk,    will work most of the time but    there are sufficient problems    that it is not usually deployed.    In particular a user's IP address might    legitimately change during a session.    If using a mobile device it    could legitimately roam for    example, between networks.    It could also be forced to    renegotiate an address using    the DHCP protocol for other reasons.    In these cases,    a user would suddenly be denied access.    In addition to these false positives,    relying only on network addresses    will miss some attacks, too.    One example is when a user's machine    is behind a Network Address Translator,    or NAT device.    In such cases.    Hosts beyond the NAT box will view    all clients behind the NAT box as    having the same address.    The NAT box internally translates this    address into different local addresses.    As such,    in a setting like an Internet cafe,    it's not unlikely that all machines in the    cafe will have the same external address.    And as such, one machine in the cafe could    hijack a session of another machine in    the same cafe for    example by sniffing the network.    Once again, the right defense here is    to protect the session identifier.    Next, we'll look at a related form of    attack called Cross-Site Request Forgery    or CSRF.    That requires similar and    complimentary defenses.       

Cross-site Request Forgery - CSRF

     .    Recall the two main kinds of web request.    Get and post.    Get requests are meant to be    reads of the server state.    As such, they are not intended to    affect modifications to that state.    Nevertheless, they often do just that.    With this in mind, consider the URL    to the banking website shown here.    Suppose a user is logged into    this site with an active session.    What if an attacker is able to trick    the user into visiting this link?    The outcome for the user is not good.    An unintended bank transfer    out of his account.    The question is, what would    convince a user to visit this link?    So, here's how this could happen.    Suppose the client is logged    in to the banking website, and    at the same time is surfing the Internet,    and ends up at returns    a page back to the user.    And that page in,    contains this tag, where the tag includes    a reference to the URL that we saw before.    Now the browser,    upon seeing the image tag,    will automatically visit the URL to    obtain what it believes will be an image.    So, it will go to,    and send the request.    Now, normally if the user was not    logged into the banking web site, would reject the request,    because the user was not authenticated.    But, if the user happened to be    logged in at the same time as when    visiting, then this request    will be accompanied by the session ID    that includes the cookie that says    that the user was authenticated.    And as a result will    dutifully perform the request.    This kind of misdirection attack is called    a cross-site request forgery or CSRF.    The target of the attack is a user with    an account on a vulnerable server.    The goal of the attack is to issue    requests on the user's behalf.    That look to the server to be legitimate.    To ensure legitimacy, the requests    are issued from the user's browser,    which will send along    the needed session cookies.    For the request to come from the browser,    the user must be tricked into clicking    a link while logged into a sensitive site.    In the previous depiction, this happened    when visiting a malicious site,    which sent a request to a URL    embedded in an image tag.    The user could also be tricked to click    a link in a spam email, which will get,    which will get sent by the browser.    The link could be disguised, for example,    by email formatting, to look benign.    CSRF works,    because certain sorts of request to    the vulnerable site have the same    structure minus the session information.    Let's look into this in more detail.    So, how can we protect    against CSRF attacks?    One way to do it is to pay attention    to the Referer field that's set    by the browser when it sends    an HTTP request to the server.    We call the request that we saw.    In the example earlier in this unit, where    we clicked a link on the site.    Here we can see that    the referer is filled in with    the original page on    which that link resided.    What a server could do is check that when    it receives a request, especially one    that's sensitive, that the referer    field only includes URLs for that site.    Or from any trusted location from which    the link could've been generated.    In our example, the referer should    not include and    it would reject the request.    So, the pages that a user could    legitimately reach should be    allowed as referer fields.    The problem here is that    the refer field is optional.    Not all browsers send it along.    One way to deal with this problem is to    use what's called lenient    referrer checking.    That is we should block    requests with a bad refer.    For example, in our prior example.    But allow requests with no refer,    for example,    because the browser just    doesn't include it.    So, then the question is,    is a missing refer always harmless,    assuming that the browser is legitimate?    Unfortunately, the answer is no.    Attackers can be clever in    sending redirect requests and    other protocol messages to.    Cause the referrer to be removed.    For example, it can bounce a user off    an FTP page that the attacker controls.    And the FTP request will not    include a referrer header.    The attacker could also exploit    a browser vulnerability, and    hack the browser to not    include the referrer field.    Or could mangle the web request in    transit by snooping on the link.    Another approach is to,    you secretized links.    The idea here is similar to hidden form    fields, where we used capabilities.    Here the hidden form field will be,    will include a secret that the attacker    has a difficult time guessing.    So recall that the attacker relies    on the cookie already being    present in the user's cache,    so that when the request to    the remote site is initiated    the cookie goes along with it.    But the attacker does not know what    the contents of the cookie are.    It doesn't know what is    expected on the page that    would normally allow such requests.    So if in that page, we can embed a secret    that the attacker wouldn't know, and    the server only allows    the request if that secret is    included say as a hidden form field.    Then we protect against    the attack that we saw before.    That is attacks initiated    indirectly from a remote site.    We can even make the secret equal to    the value in the session cookie because,    as I said before the attacker    has no way to see the cookie.    The attacker can only cause the request    to go out without knowing exactly what    that request contains.    And web frameworks help here.    For example, Ruby on Rails is    a web framework, that makes it    easy to write multi-tier web applications,    and Rails automatically embeds such    secrets in the links that it generates for    the webpages of the sites it produces.       

Web 2.0

 [MUSIC]    So far we've seen that the server    will produce static or    dynamic HTML that it sends    to the browser to render.    Recall that static HTML is    unchanged from request to request,    whereas dynamic HTML is    generated at the server, for    example by running a PHP program    which may query a database.    Either way,    HTML pages may contain programs written    in a language called Javascript.    And these programs will execute    at the client, and perform fur,    further rendering and    content production on the page.    So here's a little example of an HTML    file that may be sent to the client.    You can see it has the standard tags,    it also has tags script that indicate the    beginning and end of a Javascript program.    Here the program sets two    local variables a and b and    then invokes document.write to    modify the contents of the page.    The contents of the page    will include world,    the value three which is the sum a plus b,    and then a closed bold face tag    which closes off the open bold    face tag that was at the top.    When this program is rendered by the    browser it looks like this on the page.    >> Javascript is a programming    language who's programs called    scripts implement client side    portions of a web application.    These programs run in the browser and    therefore make the web    experience more interactive.    In fact, the web experience is so much    better when using Javascript that people    marked the point at which Javascript    came into heavy use as web 2.0.    Javascript programs are executed    in the browser, and as just shown,    can be used to alter a web page's    contents to determine what is displayed.    To access page elements    Javascript programs use    the document object model or DOM.    Javascript programs can also    perform interactive processing.    For example, they can track mouse    movements to implement drag and    drop or run code to process button clicks.    In addition to event handlers    being used to update a page,    they can be used to issue web requests and    to read replies.    Requests can be issue asynchronously,    leading to a programming    pattern called AJAX.    This acronym stands for    asynchronous Javascript plus XML.    It first saw use in the early 2000s and    then saw widespread deployment in Google    Mail and Google Maps in 2004 and 2005.    These we applications for    email and maps were far more responsive    then their web 1.0 counterparts.    Finally, the most relevant part of our    of Javascript to our discussion now    is that programs can read and    modify cookies.    Javascript is a powerful    programming language with access to    sensitive resources    maintained by the browser.    And therefore the browser needs    to enforce a security policy with    respect to what Javascript    programs can do.    In particular if I use my browser to    visit to do my banking but    I also use it to visit I don't want Javascript programs    to be able to access my bank data.    How might they do this?    Since Javascript programs can read    cookies, then without security protection    a script from could run and    alter the layout of's website,    which say could be running in    another browser, browser window.    Or it could intercept keystroke events to    sniff the user's password to    It could read cookies stored by    If cookies include session identifiers    then the script could issue    web requests as if they were    from the authenticated user.    To avoid these problems browsers implement    the so-called same origin policy or SOP.    The browser associates the elements of    a web page, including it's lay out,    cookies, events, and so on with an origin.    The origin is primarily    defined by the hostname that    the web page originated from,    for example    The same origin policy states    that a web page's elements can    only be accessed by scripts with    the same origin as the page.    So to be clear,    according to the same origin policy,    Javascript programs are limited    in the cookies that they access.    Here we see a cookie that was set and    stored into the cookie cache.    And according to the rules the value    should only be reachable by    any domain ending in    That is the origin of the cookie.    We can see this in the domain part of    the set cookie attribute shown at the top.    Any other JavaScript program whose    origin is different than will    not have access to that cookie.       

Cross-site Scripting

To work around the same origin policy,    attackers can try to inject code using    an attack called cross-site scripting.    Unfortunately, this    attack is quite common.    For example, here is a CERT    advisory about Huawei modems.    Notice that, it says that the broadband    modems include a web interface, and    this web interface is vulnerable    to a cross-site scripting attack.    A cross-site scripting, or XSS attack.    Aims to subvert the same origin policy.    In particular, an attacker can    construct a malicious script to try to    trick a user's browser into running it as    if the script came from a trusted origin.    Doing so gives the script access    to the sensitive content and    pages from that origin.    A cross-site scripting attack    works by fooling the victim's site    into sending the script    to the user's browser,    which will run it with the full    privileges of the victim.    Let's see how this can be achieved.    There are 2 types of    cross-site scripting attack.    The first we'll talk about    is called stored or,    persistent cross-site scripting.    In this attack, the attacker leaves their    script on the vulnerable web server, for    example    The server will later unwittingly    send that script to your browser.    Your browser, none the wiser,    will execute it within the same    origin as the server.    Visualized it looks like this.    First, we have the bad,    website, and    we have the vulnerable website    Step one, injects a malicious    script to the website.    Second, a client connects to the    site, and unwittingly    sends the malicious script along with its    content back to the client's browser.    The client's browser will then execute    that malicious script as though    the bank website intended to provide it.    As a result, that script can do nefarious    things like perform attacker actions,    such as initiate a bank transfer, or    steal secret data like document cookies    to send back to the site.    So in summary, the target is a user    with a JavaScript-enabled browser who    visits a user-influenced content page.    That is a page whose content can be    influenced by prior interactions with    users, like the site.    The attack goal is to run a script    in the user's browser with    the same access provided by    the server's regular scripts, and    in this way subvert    the same-origin policy.    To do this, the attacker needs the ability    to leave content on the web server, for    example, using an ordinary browser.    The attacker might also have a website    to collect stolen information.    And the key trick here is that    the server will fail to ensure that    content uploaded to it's page does    not contain embedded scripts.    Let's look at an example of a cross-side    scripting attack, the Samy MySpace worm.    So MySpace was a social networking site.    Prior to the rise of Facebook    that was very popular, and    it allowed people to    create custom webpages.    Samy embedded JavaScript    in his MySpace page.    MySpace attempted to filter these scripts,    but in this case it failed.    As a result, users who visited    Samy's page ran the program,    which made them friends with Samy,    displayed, but most of all,    Samy is my hero on their profile,    installed the program in their profile so    a new user who viewed the profile    got infected, and as a result,    Samy went from 73 friends to one    million friends in 20 hours.    And took MySpace down for    a weekend, to boot.    The second type of cross-site    scripting attack is called reflected    cross-site scripting.    Here, the attacker gets you to    send the server a URL,    that includes in it some JavaScript code.    The site will echo some or all of    that script back to you in it's response.    And your browser none the wiser will    execute the script in the response with    the same origin as    So here this is visualized.    The browser will visit,    the nefarious website.    And it will send back a malicious page.    The client will then click on    a link on that malicious page,    which will take it to    The link will contain    some JavaScript code. will then echo back the link    in it's response to the user.    And the browser will then execute    the script as though the server meant to    provide it.    Once again the attacker can perform    nefarious actions as a result.    So the key here is echoed input.    Reflected cross-site scripting    attacks need to find instances of    good web servers that will echo user    input back in the HTML response.    So, for example, here's an input from, where to the search term,    it provides the term socks.    When this goes to, and    it sends the results back, notice    that socks are included in the body.    Now, the problem arises when.    That input includes scripts, so here's an    example script that's included in the URL.    If that script is not filtered out,    it will be included in    the body returned from, and    the JavaScript interpreter running in    the user's browser will execute that    script rather than simply print it out.    And of course the script will    execute within's origin.    To summarize, a reflected XSS attack,    targets someone who    is using a JavaScript enabled browser    to access a vulnerable web service.    The service's vulnerability is    that it echoes back part of    the URLs it reviews in    its output responses.    The attacker goal is to run    the attacker script as if it had    the origin of the victim's site.    The attacker does this by    getting a user to click on    a URL that contains JavaScript code.    This code is then reflected in    the server's response to the browser.    And, so the browser runs the script    as if it were from the origin server.    The key, here is that the origin server    reflects the attacker script, unchanged.    This, indifferent reflection of    the input points to the proper defense.    Validate the input.    In particular, the vulnerable    server should either check or    sanitize input from untrusted sources.    One form of validation is sanitization.    In particular a server can remove all    executable portions of untrusted that    is user-provided content that    it could appear in HTML pages.    For example, it might look for    script tags and filter them out.    Then instead of running the script,    the browser will end up    printing it in the document.    This might look a little strange,    but it will be harmless.    Such filtering is often done in    the comment sections of blogs.    Commentors are permitted to provide rich    content like bold face formatting or    italics or underlying.    And they can express this    using various HTML tags.    However, they are not permitted    to include tags that would    demarcate Java Script code.    Blacklisting particular tags,    like the script tag, and    removing them from the input    is a natural idea.    The problem with it, is that there are    many ways to introduce Java Script code.    You may think you have them all,    when you actually don't.    For example, it turns out that you can    embed JavaScript as XML encoded files or    as a cascading style sheet,    that is, CSS tag.    Moreover, even if you    found all of the tags that    are specifically indicated    as allowing scripts.    There may be other ways of specifying    code that is browser specific.    In particular, browsers have often    been known to try to be helpful, and    render manged input.    Such permissiveness is good for making    busted websites look okay to a user,    but it can be exploited    by a clever attacker.    For example, such permissiveness    was the flaw that allowed Samy to    evade the MySpace filter.    Internet Explorer permitted splitting the    JavaScript tag into two words, Java and    Script, across two lines.    Even though other browsers would not,    interpret this as being a JavaScript tag.    This split tag evaded the MySpace filters.    A better validation approach    is to use a white list.    In particular, a site can allow    a particular small set of tags.    It can then check that the input    only has those tags in it.    Any other tags that appear well in    that case the input is rejected.    The same sort of white list applies to    all elements of a page that could be    affected by untrusted sources.    Returning to our blog example, a white    list filter could check that the urls or    other user comments contain only bold    face, italics, an underline tags, and    no other tags.    Or the blog could permit only a more    limited language for providing comments.    So rather than full HTML,    it could allow, say, markdown.    One note.    We have just considered two different    attacks with strikingly similar names.    Cross-site scripting, and    cross-site request forgery.    What the attacks have in common,    is that one site tries to    act with the privileges of another site,    hence the phrase, cross site.    XSS works by exploiting the trust    a browser has in data sent to it    from a legitimate website.    So, the attacker tries to manipulate    what the site sends to the browser.    CSRF exploits the trust a website has in    data sent from a semi-trusted browser.    So, the attacker tries to manipulate,    what the browser sends to the site.    In short, it's all about exploiting trust.    The right defence is to reduce    that trust as much as possible.    In particular, by using input validation.    This theme comes up again and    again in web security, and indeed,    in distributed system security generally.    Let's finish off this unit    with one more example.    One popular framework for writing web    applications is called Ruby on Rails.    Server side web applications are written    in the Ruby programming language, and    the Rails framework makes it easy for    these applications to work via the web.    Parameters in web requests sent    to Rails applications can be    Ruby objects encoded in XML, or    as a format called YMAL within XML.    YAML is particularly desirable    because it's easy to read and    Ruby has good support for it.    In particular, Ruby makes it easy to    encode Ruby objects into YAML, and    likewise, decode YAML    strings into Ruby objects.    When those Ruby objects are pre, represent    integers or strings or enumerated types,.    Then web app behavior    proceeds as we might hope.    However YAML can encode any object    which it does by embedding Ruby code.    Since the default YAML decoder can    be used to decode arbitrary objects,    it can thus decode arbitrary Ruby code.    With a little work the decoder can be    made to invoke the code of the objects it    has just decoded.    This means that an attacker can    send a carefully crafted message to    any Ruby on Rails application, and get    it to run code on the attacker's behalf.    Woah.    Once again the problem here is    accepting input without validating it.    The problem is hidden from the application    developers because the bug is in    the Ruby on Rails framework and    not the application.    A fix to validate the input might    be to reject YAML altogether, or    to reject YAML-encoded    objects that embed code.    Of course, as with XSS,    holes in the filters mean that    vulnerabilities will persist.    To conclude,    web security introduces a plethora of    vulnerabilities that application    writers must guard against.    All of them can be boiled down    to mismatches with trust.    If we cannot completely trust    the source of some input, then we    must validate that input so that it,    to make sure that it cannot cause harm.    When considering means of validation,    checking is preferred to sanitization and    whitelisting is preferred to blacklisting.    In our next unit, we will see how input    validation is just one instance of    a general set of principles we should    follow when designing secure applications.       


  1. can you please help me in solving project 2 of this course

  2. project 2 & project 3 of software security course.
