fernando | Hello, |
fernando | we are very pleased to have here today Mr. David Larochelle. He is a |
fernando | graudate student in Computer Science at the University of Virginia working |
fernando | with Dave Evans on annotation-assisted static checking. |
fernando | They work on a tool that all of you, for sure, know: LCLint (Soon to be |
viXard | David Larochelle |
fernando | Let me thank all of you for comming here, and specially, David Larochelle |
fernando | for preparing this tallk. |
fernando | He has prepared some slides for the talk, you can access them via our web |
fernando | page |
fernando | http://umeet.uninet.edu/umeet2001/talk/10-12-2001/umeet/umeet.htm |
fernando | http://www.cs.virginia.edu/~drl7x/umeet/umeet.htm |
fernando | The title is: ' Statically detectting likely buffer overflow |
fernando | vulnerabilities.' It is based on a paper that was presented at USENIX, this |
fernando | summer. |
fernando | Mr. Larochelle. ... |
drl7x | Thank you |
fernando | you can make questions in #qc |
drl7x | Hello. This talk is based largely on a paper in this years usenix security |
drl7x | if you want more detail you can find the paper at lclint.cs.virginia.edu |
drl7x | 13 years ago, many computers on the Internet became saturated to the point of being unusable. |
drl7x | The culprit was the now infamous Morris Internet worm. |
drl7x | Exploiting a buffer overflow in fingerd was one of the primary methods the worm used to compromise systems. |
drl7x | At this time the dangers of buffer overflows were not well understood, so it's understandable that a common security critical Internet demon was vulnerable. |
drl7x | But buffer overflows are a simple program flaw and the consequences of buffer overflows were soon well understood within the security community. |
drl7x | Today buffer overflows remain a problem. |
drl7x | Buffer overflows in IIS are still being explioted by the code red worm. |
drl7x | Buffer overflows have remained the single largest cause of vulnerabilities in CERT advisories. Earlier this year the WSJ described a buffer overflow in bind as a threat to the Internet. |
drl7x | |
drl7x | If you have access to the slides you can see that the Wall Street Journal web page recently provided a more compleeling testment to |
drl7x | the dangers of buffer overflows. |
drl7x | (The site was defaced earlier this year. |
drl7x | slide 2 |
drl7x | Why aren't we better off than we were 13 years ago? |
drl7x | Part of the problem is certainly lack of education. There are definitely a lot of programmers that are not aware of the dangers posed by buffer overflows and how to prevent them. |
drl7x | The security community understands buffer overflows but this knowledge has not been codified in a way that typical programmers can benefit from it. |
drl7x | This is important because The C language is hard to program in securely. |
drl7x | There's no bounds checking which is what permits buffer overflow attacks in the first place. |
drl7x | But the standard library exacerbates the situation. There are functions such as strcpy that are easy to use incorrectly and even function such as gets which are always unsafe. |
drl7x | Even the so-called safe functions such as strncpy and strncat have confusing and inconsistent APIs. |
drl7x | For these reasons, even security aware programmers make mistakes. |
drl7x | To illustrate, this latter we'll show an example of a subtle error in code that was published by a major security web site. |
drl7x | Because programmers will always make mistakes, we need to use automated tools. |
drl7x | slide 3 |
drl7x | Two possible approaches automated tools can take are run-time and compiler time. |
drl7x | Run time solutions work by creating executables with additional checking to detect buffer overflows. |
drl7x | For example, StackGuard is a compiler which produces special binaries to detect stack smashing attacks. |
drl7x | There are 2 problems with the run-time approach. First because of the additional checking there will be some type of performance penalty. |
drl7x | More significantly, there's usually no sensible way to recover when a buffer overflow is detected so the buffer overflows become a denial of service attack. |
drl7x | An alternative is a compile time approach. Which is the approach we take. |
drl7x | Here, problems are detected by examining the source code without running it, usually at compile time or as part of the compilation process. |
drl7x | The problems are then fixed in the source code so there is no run-time performance penalty. |
drl7x | Static analysis is able to check properties of all possible program executions. |
drl7x | This is extremely useful for detecting security problems because security problems often occur in obscure cases which are not likely to come up in testing or normal use. |
drl7x | Also this approach can be used to compliment the run-time approach. |
drl7x | next slide |
drl7x | We wanted to create a tool that typical programmers would be willing and able to use as part of the development process. |
drl7x | In order to do this we needed it to be fast, -- about as fast as a compiler. And we also wanted it to be easy to use. |
drl7x | Additionally we wanted the tool to be useful for analyzing legacy code. |
drl7x | Either for auditing existing code or for someone making changes who did not want to introduce new vulnerabilities. To do this our tool needed to handle code in typical C programs. |
drl7x | Finally we want to encourage a proactive approach to security. |
drl7x | Rather than just directing programmers to fix confirmed bugs, we want programmers to avoid potentially dangerous constructs and only use them if they have clear well documented assumptions. |
drl7x | <next slide > |
drl7x | Our approach is to document assumptions about buffers using semantic comments or annotations. |
drl7x | These are special comments which are treated as comments by the compiler but are recognized to contain semantic information by our tool. We provide an annotated version of the standard library internally and allow users to annotate their code. |
drl7x | Our checking works by finding inconsistencies between the code and annotations. |
drl7x | In order to have a tool that's useful in real world situations, we make a number of compromises. We make simplifying assumptions to improve efficiency. And we use heuristics to analyze common loop idioms. |
drl7x | As a result we accept some false positives and false negatives. |
drl7x | It's worth mentioning that like many interesting program properties detecting buffer overflows is an undecidable problem. |
drl7x | So all tools must accept either false positives or false negatives or restrict the kinds of programs they work on. |
drl7x | <slide 7> |
drl7x | We implemented our tool by extending LCLint, an open source static checking tool. |
drl7x | By using annotations LCLint is able to detect a variety of problems such as memory leaks and null pointer errors which can not be detected by a standard lint. |
drl7x | We integrated our checking into LCLint. Which allowed us to leverage the existing checking and annotations within LCLint. |
drl7x | We extended LCLint by adding new annotations and checking regarding buffer sizes. |
drl7x | I should mention that we've since decided to rename the tool to splint. This renaming isn't official yet but will be soon |
drl7x | <slide 8> |
drl7x | Two annotations we added are maxSet which denotes the highest index of an array that can be safely written to and maxRead which denotes the highest index of an array that can be safely read from. |
drl7x | The requires and ensures denote whether it is a function precondition or postcondition. |
drl7x | For example, the declaration of buffer as a static array would cause LCLint to generate a constraint equivalent to ensures maxSet(buffer) == 99. |
drl7x | <next slide> |
drl7x | Here';s the example I mentioned earlier. |
drl7x | This works best if you go to slide 9. |
drl7x | It comes from a guide to secure programming from the SecurityFocus web site. |
drl7x | In the guide they suggest that programmers use strncat instead strcat and provide this as an example of how to use strncat. |
drl7x | At first glance the problems with this example are not obvious. Here's an annotated declaration of strncat. |
drl7x | THe code was basicly |
drl7x | char buffer[256] |
drl7x | strncat (buffer, str, sizeof(str) -1 ) |
drl7x | where str is a function parameter. |
drl7x | There are two problems here |
drl7x | first the buffer is uninitialized |
drl7x | additionally the user has confused the requirement of strncat. |
drl7x | next slide |
drl7x | If you can see this slide it gives an example of the warning reported by lclint |
drl7x | ,slide 11. |
drl7x | Our checking takes place entirely within functions. But we are able to check interprocedural properties by using annotations to check functions calls and the state of function at entry and exit points. |
drl7x | We generate constraints at the expression level using C semantics and annotations in the case of function calls. |
drl7x | We propagate and resolve constraints using axiomatic semantics. We use a set of simplifying rules to help resolve constraints. |
drl7x | In the previous slides LCLint used simple algebra to combine to terms. |
drl7x | We also have LCLint specific rules. For example, |
drl7x | the rule maxRead(str + i) = maxRead(str) - i |
drl7x | this helps us to handle code in which a pointer has been incremented. |
drl7x | Finally we produce error messages for any constraints that can not be resolved. |
drl7x | <slide> 12 loop heuristics |
drl7x | Traditionally loops have been analyzed using invariants. |
drl7x | But loop invariants are extremely difficult to generate automatically and programmers have been reluctant to add them by hand. |
drl7x | We wanted to avoid loop invariants. |
drl7x | But we still needed to analyze loops reasonably well. |
drl7x | We found that many loops have common forms. By recognizing common loop idioms we were able to develop a series of heuristics that work for many loops in real programs. |
drl7x | For example, in the loop: |
drl7x | for (init; *buf; buff++) |
drl7x | we assume that the loop runs for maxRead(buf) iterations and we model the first and last iterations of the loop. |
drl7x | So the last iteration will be checked assuming buf_last = buf_orig + maxRead (buf) ). |
drl7x | |
drl7x | Obviously this approach will not always be correct but we've found that this approach works for many loops in real programs. |
drl7x | <next slide 13> |
drl7x | To test our tool we used it to analysis two security critical open source programs. |
drl7x | We analyzed wu-ftpd and a subsection of BIND. |
drl7x | In both cases we successfully detected known bugs. |
drl7x | Additionally we were able to detect unknown buffer overflows exploitable by users with write access to |
configuration files. |
drl7x | Our tool ran fast enough, that we believe it's reasonable for users to add it to the makefile as part of the built process. |
drl7x | (It took about 7 seconds to analyze wu-ftpd 2.50) |
drl7x | and 33 secnonds to analyze Bind |
drl7x | wu-ftpd is 20000 lines |
drl7x | bind is 40,000 lines |
drl7x | The table in slide 14 has a nice summary of the results |
drl7x | Recently we reanalzed wu-ftpd |
drl7x | With no annotations we the tool porduced 166 warning for out-of bounds writes. |
drl7x | After iteratively adding annotations and rerunning the tool |
drl7x | lclint produced 101 warnings for buffer overflows |
drl7x | 25 or these were real problems while 76 were spurios messages. |
drl7x | Slide 15 |
drl7x | THis is an example of a bug in wu-ftpd that we detected. |
drl7x | int access_ok( int msgcode) { |
drl7x | char class[1024], msgfile[200]; |
drl7x | int limit; |
drl7x | limit = acl_getlimit(class, msgfile); |
drl7x | Here the function getaclentry sets entry->arg[3] to point to a string of potentially arbitrary length read from a configuration file. |
drl7x | The str copy this into msgpathbuf |
drl7x | strcpy(msgpathbuf, entry->arg[3]; |
drl7x | When we run lclint to reports: |
drl7x | a potential buffer overflow here |
drl7x | By adding annotations and changes to strncoy to determine that the function |
drl7x | LCLint is able to determine that the function is safe. |
drl7x | How ever we had to make a number of changes to the function and rerun lclint before we could be sure of saftey |
drl7x | slide 16 |
drl7x | Related work |
drl7x | Traditionally, lexical analysis has been used to find vulnerabilities. |
drl7x | Using a tool such as grep an auditor would scan a program for potentially unsafe library functions. Recently tools such as ITS4 combine this approach with a data base of potentially unsafe functions. |
drl7x | Lexical analysis is limited to detecting vulnerabilities involving library functions. And tends to produce a large number of false positives. |
drl7x | Wagner has developed a system to detect vulnerabilities by modeling c strings as an abstract data type which is accessed through library functions. |
drl7x | Flow insensitive analysis is used to resolve constraints. |
drl7x | This approach does not require annotations and scales very well. |
drl7x | But flow insensitive analysis is imprecise causing large numbers of false positives. |
drl7x | Also this approach can not detect buffer overflows that occur as a result of a direct access to a buffer. |
drl7x | Recently Dor has attempted to detect buffer overflows by performing a source to source transformation. |
drl7x | This is creates a safe program which contains additional variables and assert statements. |
drl7x | Buffer overflows in the original program are detected by finding cases where assert statements in the safe program which may fail. |
drl7x | slide 17 |
drl7x | We recognize that there are certain impediments to the wide spread use of our tool. |
drl7x | People are lazy and |
drl7x | programmer are lazier than most people. |
drl7x | Adding annotations may be too much work except for security weenies. |
drl7x | We're currently working on techniques to automate the process. |
drl7x | <slide 18 conclusions> |
drl7x | What will things be like 13 years from now? |
drl7x | Will buffer overflows still be commonly exploited. |
drl7x | Within the security community, buffer overflows have long been a well understood problem. |
drl7x | However, buffer overflows have remained far too common. |
drl7x | If we want 2013 to be better than 2001, we, the security committee, must work to codify our knowledge into tools that real programmers can use as part of the development process so that our expertise reaches real developers. |
drl7x | The creation of static checking tools is a promising way of doing this. |
drl7x | Update: |
drl7x | There are a couple of new additions to LCLint which are not included in the slides. |
drl7x | We now offer ITS4 like functionality in which |
drl7x | every instance of a potentially valnerable function such as gets |
drl7x | is reported. |
drl7x | However our approach uses a full parse tree so if gets is used as a variable or in a string literal no warning is reproted. |
drl7x | THanks you for listening to this talk. |
mjcoma | clap clap clap clap clap clap clap clap clap clap |
mjcoma | clap clap clap clap clap clap clap clap clap clap |
mjcoma | clap clap clap clap clap clap clap clap clap clap |
mjcoma | clap clap clap clap clap clap clap clap clap clap |
Chess | splash splash splash splash splash splash splash |
Chess | splash splash splash splash splash splash splash |
fernand | plas plas plas plas plas plas plas plñas plas |
Chess | splash splash splash splash splash splash splash |
fernand | plas plas plas plas plas plas plas plñas plas |
fernand | plas plas plas plas plas plas plas plñas plas |
Chess | splash splash splash splash splash splash splash |
peter11 | plasplas plas plas plas plas plas plas plas plas |
peter11 | plasplas plas plas plas plas plas plas plas plas |
peter11 | plasplas plas plas plas plas plas plas plas plas |
sarnold_ | thank you drl7x; so, your static tool (lclint, correct?) requires marking up the source code in order to disable warnings on 'known good' pieces of code? |
drl7x | Okay sarnold |
drl7x | You had the first question |
drl7x | It does a combination of things. |
drl7x | You use annotations to mark up source code so that lclint has more information |
drl7x | for example so it knows a function is only called with a buffer of size 100 |
drl7x | Hopefully this additional information will be verified by lclint |
drl7x | In the worst case you can explictedly mark code safe |
sarnold_ | is there an easy way to tie together the LCLint "100 element array" with the declaration/malloc of the buffer to 100 elements? |
drl7x | Yes. We have an internal annotated version of the standard library. |
drl7x | malloc has the annotation: |
drl7x | this tells us that str = malloc(100); implies that str[99] or lowwer is okay |
sarnold_ | hmm, I think I missed the annotation.. |
sarnold_ | (did it start with '/' ? :) |
drl7x | Sorry the annotation is * void malloc (size_t size) /*@ensures maxSet(result) == (size - 1) @*/ |
drl7x | Yes the anotations are ine /*@ .. @*/ |
drl7x | Other questions |
sarnold_ | yes, one sec :) |
fernand | what do you think of cyclone and other similar approaches? |
Amanda | 15,15·14,14·06,00 Buenas tardes a todos 14,14·15,15· |
drl7x | I think cyclone is an interesting project. |
sarnold_ | drl7x, how does LCLint handle functions that take variable-sized arrays as their arguments? |
drl7x | cyclone for these who don't know is an attempt to make a type safe version of C |
drl7x | This is a thing and in would be good if people used safer languages. |
drl7x | However, many safe languages already exist: java, ada, etc. |
drl7x | Still people insist on probleming C. |
drl7x | Whatever benefits a new language has geting people in switch languages is difficult. |
drl7x | Perhaps cyclone is close enough in C that people will be willing to switch. But history indicate that this isn't likely to happen. |
drl7x | next question |
sarnold_ | drl7x, how does LCLint handle functions that take variable-sized arrays as their arguments? |
Bryam | Hola a todos.... |
drl7x | We use the required annotation for example: |
drl7x | foo (char *s, int n ) /@requires maxSet(s) >= n @*/ |
drl7x | this says that this function is only safe if the buffer pointed to by s is larger than n |
sarnold_ | ok, and that ties together with the size declarations at static compile time, or the dynamic /*@ensures maxSet(result) == (size - 1) @*/ from malloc() ? |
drl7x | Well when we check the body of foo we assume that the annotations is true.... |
sarnold_ | and, as a final pestering question (that I can think of right now :), how would that last example be modified, say, for free(), which is going to back up four bytes or something similar, using *(variable - 4) to get to housekeeping information? |
drl7x | THIS IS for the previos question: Every time the foo is called we report an error this we can't verify the precondition for the actuall paramenter. |
drl7x | Let me think about your next question... |
drl7x | Are you asking using free then accessing the memory? |
sarnold_ | no, no |
sarnold_ | asking how LCLint can verify that free(3) isn't going 'out of bounds' when it takes ptr-4 to get to its own data .. |
sarnold_ | maxSet(ptr) >= n-4 ? |
drl7x | You mean free( *(ptr -3) ) or free (ptr - 3) |
sarnold_ | heh, sometimes IRC isn't perfect :) |
sarnold_ | free(3) as in the libc implementation of malloc... |
sarnold_ | I'm curious if LCLint can protect from buffer 'underruns'; the malloc implementation often uses the four bytes previous to the ptr to store house-keeping information.. |
sarnold_ | since there is no count on the size of the data blob to be free()ed, except in implementation-specific data structures, does LCLint punt and say, "Hmm, call to free(), looks fine" :), or does it try to do more analysis to see if the free() call is legitimate? |
sarnold_ | (and yes, it is legit to say, "go read the USENIX paper" :) |
drl7x | I'm not exactly sure free is implemented internally. We dodn't check the internal workings of standard libraries have some memor |
drl7x | instead me just an annotated declaration... |
drl7x | We do have the option to detect buffer underflows. We have a minSet and minRead options. I found that underflows we're common enough to |
drl7x | justify checking for them in many cases though. |
drl7x | Yes. Or download the beta version. Some of the code examples didn't translate well to IRC. |
sarnold_ | heh, yeah, that is true |
sarnold_ | the number of vulnerabilities i've seen that exploited free() are much more rare than overflow auto variables.. :) |
sarnold_ | drl7x, well, I think I am finally out of questions :) thanks again |
drl7x | LCLint does try to detect memory leaks with free. |
drl7x | sarnold thanks for your questions. |