David Richard Larochelle talk

ENGLISH

fernando Hello,

fernando we are very pleased to have here today Mr. David Larochelle. He is a

fernando graudate student in Computer Science at the University of Virginia working

fernando with Dave Evans on annotation-assisted static checking.

fernando They work on a tool that all of you, for sure, know: LCLint (Soon to be

viXard David Larochelle

fernando Let me thank all of you for comming here, and specially, David Larochelle

fernando for preparing this tallk.

fernando He has prepared some slides for the talk, you can access them via our web

fernando page

fernando http://umeet.uninet.edu/umeet2001/talk/10-12-2001/umeet/umeet.htm

fernando http://www.cs.virginia.edu/~drl7x/umeet/umeet.htm

fernando The title is: ' Statically detectting likely buffer overflow

fernando vulnerabilities.' It is based on a paper that was presented at USENIX, this

fernando summer.

fernando Mr. Larochelle. ...

drl7x Thank you

fernando you can make questions in #qc

drl7x Hello.  This talk is based largely on a paper in this years usenix security

drl7x if you want more detail you can find the paper at lclint.cs.virginia.edu

drl7x 13 years ago, many computers on the Internet became saturated to the point of being unusable.

drl7x The culprit was the now infamous Morris Internet worm.

drl7x Exploiting a buffer overflow in fingerd was one of the primary methods the worm used to compromise systems.

drl7x At this time the dangers of buffer overflows were not well understood, so it's understandable that a common security critical Internet demon was vulnerable.

drl7x But buffer overflows are a simple program flaw and the consequences of buffer overflows were soon well understood within the security community.

drl7x Today buffer overflows remain a problem.

drl7x Buffer overflows in IIS are still being explioted by the code red worm.

drl7x Buffer overflows have remained the single largest cause of vulnerabilities in CERT advisories.   Earlier this year the WSJ described a buffer overflow in bind as a threat to the Internet.

drl7x

drl7x If you have access to the slides you can see that the Wall Street Journal web page recently provided a more compleeling testment to

drl7x the dangers of buffer overflows.

drl7x (The site was defaced earlier this year.

drl7x slide 2

drl7x Why aren't we better off than we were 13 years ago?

drl7x Part of the problem is certainly lack of education.  There are definitely a lot of programmers that are not aware of the dangers posed by buffer overflows and how to prevent them.

drl7x The security community understands buffer overflows but this knowledge has not been codified in a way that typical programmers can benefit from it.

drl7x This is important because The C language is hard to program in securely.

drl7x There's no bounds checking which is what permits buffer overflow attacks in the first place.

drl7x But the standard library exacerbates the situation. There are functions such as strcpy that are easy to use incorrectly and even function such as gets which are always unsafe.

drl7x Even the so-called safe functions such as strncpy and strncat have confusing and inconsistent APIs.

drl7x For these reasons, even security aware programmers make mistakes.

drl7x To illustrate, this latter we'll show an example of a subtle error in code that was published by a major security web site.

drl7x Because programmers will always make mistakes, we need to use automated tools.

drl7x slide 3

drl7x Two possible approaches automated tools can take are run-time and compiler time.

drl7x Run time solutions work by creating executables with additional checking to detect buffer overflows.

drl7x For example, StackGuard is a compiler which produces special binaries to detect stack smashing attacks.

drl7x There are 2 problems with the run-time approach.  First because of the additional checking there will be some type of performance penalty.

drl7x More significantly, there's usually no sensible way to recover when a buffer overflow is detected so the buffer overflows become a denial of service attack.

drl7x An alternative is a compile time approach.  Which is the approach we take.

drl7x Here, problems are detected by examining the source code without running it, usually at compile time or as part of the compilation process.

drl7x The problems are then fixed in the source code so there is no run-time performance penalty.

drl7x Static analysis is able to check properties of all possible program executions.

drl7x This is extremely useful for detecting security problems because security problems often occur in obscure cases which are not likely to come up in testing or normal use.

drl7x Also this approach can be used to compliment the run-time approach.

drl7x next slide

drl7x We wanted to create a tool that typical programmers would be willing and able to use as part of the development process.

drl7x In order to do this we needed it to be fast, -- about as fast as a compiler. And we also wanted it to be easy to use.

drl7x Additionally we wanted the tool to be useful for analyzing legacy code.

drl7x Either for auditing existing code or for someone making changes who did not want to introduce new vulnerabilities.  To do this our tool needed to handle code in typical C programs.

drl7x Finally we want to encourage a proactive approach to security.

drl7x Rather than just directing programmers to fix confirmed bugs, we want programmers to avoid potentially dangerous constructs and only use them if they have clear well documented assumptions.

drl7x <next slide >

drl7x Our approach is to document assumptions about buffers using semantic comments or annotations.

drl7x These are special comments which are treated as comments by the compiler but are recognized to contain semantic information by our tool.  We provide an annotated version of the standard library internally and allow users to annotate their code.

drl7x Our checking works by finding inconsistencies between the code and annotations.

drl7x In order to have a tool that's useful in real world situations, we make a number of compromises.  We make simplifying assumptions to improve efficiency. And we use heuristics to analyze common loop idioms.

drl7x As a result we accept some false positives and false negatives.

drl7x It's worth mentioning that like many interesting program properties detecting buffer overflows is an undecidable problem.

drl7x So all tools must accept either false positives or false negatives or restrict the kinds of programs they work on.

drl7x <slide 7>

drl7x We implemented our tool by extending LCLint, an open source static checking tool.

drl7x By using annotations LCLint is able to detect a variety of problems such as memory leaks and null pointer errors which can not be detected by a standard lint.

drl7x We integrated our checking into LCLint.  Which allowed us to leverage the existing checking and annotations within LCLint.

drl7x We extended LCLint by adding new annotations and checking regarding buffer sizes.

drl7x I should mention that we've since decided to rename the tool to splint.  This renaming isn't official yet but will be soon

drl7x <slide 8>

drl7x Two annotations we added are maxSet which denotes the highest index of an array that can be safely written to and maxRead which denotes the highest index of an array that can be safely read from.

drl7x The requires and ensures denote whether it is a  function precondition or postcondition.

drl7x For example, the declaration of buffer as a static array would cause LCLint to generate a constraint equivalent to ensures maxSet(buffer) == 99.

drl7x <next slide>

drl7x Here';s the example I mentioned earlier.

drl7x This works best if you go to slide 9.

drl7x It comes from a guide to secure programming from the SecurityFocus web site.

drl7x In the guide they suggest that programmers use strncat instead strcat and provide this as an example of how to use strncat.

drl7x At first glance the problems with this example are not obvious.  Here's an annotated declaration of strncat.

drl7x THe code was basicly

drl7x char buffer[256]

drl7x strncat (buffer, str, sizeof(str) -1 )

drl7x where str is a function parameter.

drl7x There are two problems here

drl7x first the buffer is uninitialized

drl7x additionally the user has confused the requirement of strncat.

drl7x next slide

drl7x If you can see this slide it gives an example of the warning reported by lclint

drl7x ,slide 11.

drl7x Our checking takes place entirely within functions.  But we are able to check interprocedural properties by using annotations to check functions calls and the state of function at entry and exit points.

drl7x We generate constraints at the expression level using C semantics and annotations in the case of function calls.

drl7x We propagate and resolve constraints using axiomatic semantics.  We use a set of simplifying rules to help resolve constraints.

drl7x In the previous slides LCLint used simple algebra to combine to terms.

drl7x We also have LCLint specific rules.  For example,

drl7x the rule  maxRead(str + i) = maxRead(str) - i

drl7x this helps us to handle code in which a pointer has been incremented.

drl7x Finally we produce error messages for any constraints that can not be resolved.

drl7x <slide> 12 loop heuristics

drl7x Traditionally loops have been analyzed using invariants.

drl7x But loop invariants are extremely difficult to generate automatically and programmers have been reluctant to add them by hand.

drl7x We wanted to avoid loop invariants.

drl7x But we still needed to analyze loops reasonably well.

drl7x We found that many loops have common forms.  By recognizing common loop idioms we were able to develop a series of heuristics that work for many loops in real programs.

drl7x For example, in the loop:

drl7x for (init; *buf; buff++)

drl7x we assume that the loop runs for maxRead(buf) iterations and we model the first and last iterations of the loop.

drl7x So the last iteration will be checked assuming buf_last = buf_orig + maxRead (buf) ).

drl7x

drl7x Obviously this approach will not always be correct but we've found that this approach works for many loops in real programs.

drl7x <next slide 13>

drl7x To test our tool we used it to analysis two security critical open source programs.

drl7x We analyzed wu-ftpd and a subsection of BIND.

drl7x In both cases we successfully  detected known bugs.

drl7x Additionally we were able to detect unknown buffer overflows exploitable by users with write access to

configuration files.

drl7x Our tool ran fast enough, that we believe it's reasonable for users to add it to the makefile as part of the built process.

drl7x (It took about 7 seconds to analyze wu-ftpd 2.50)

drl7x and 33 secnonds to analyze Bind

drl7x wu-ftpd is 20000 lines

drl7x bind is 40,000 lines

drl7x The table in slide 14 has a nice summary of the results

drl7x Recently we reanalzed wu-ftpd

drl7x With no annotations we the tool porduced 166 warning for out-of bounds writes.

drl7x After iteratively adding annotations and rerunning the tool

drl7x lclint produced 101 warnings for buffer overflows

drl7x 25 or these were real problems while 76 were spurios messages.

drl7x Slide 15

drl7x THis is an example of a bug in wu-ftpd that we detected.

drl7x int access_ok( int msgcode) {

drl7x     char class[1024], msgfile[200];

drl7x     int limit;

drl7x   limit = acl_getlimit(class, msgfile);

drl7x Here the function getaclentry sets entry->arg[3] to point to a string of potentially arbitrary length read from a configuration file.

drl7x The str copy this into  msgpathbuf

drl7x strcpy(msgpathbuf, entry->arg[3];

drl7x When we run lclint to reports:

drl7x a potential buffer overflow here

drl7x By adding annotations and changes to strncoy to determine that the function

drl7x LCLint is able to determine that the function is safe.

drl7x How ever we had to make a number of changes to the function and rerun lclint before we could be sure of saftey

drl7x slide 16

drl7x Related work

drl7x Traditionally, lexical analysis has been used to find vulnerabilities.

drl7x Using a tool such as grep an auditor would scan a program for potentially unsafe library functions. Recently tools such as ITS4 combine this approach with a data base of potentially unsafe functions.

drl7x Lexical analysis is limited to detecting vulnerabilities  involving library functions.  And tends to produce a large number of false positives.

drl7x Wagner has developed a system to detect vulnerabilities by modeling c strings as an abstract data type which is accessed through library functions.

drl7x   Flow insensitive analysis is used to resolve constraints.

drl7x This approach does not require annotations and scales very well.

drl7x But flow insensitive analysis is imprecise causing large numbers of false positives.

drl7x   Also this approach can not detect buffer overflows that occur as a result of a direct access to a  buffer.

drl7x Recently Dor has attempted to detect buffer overflows by performing a source to source transformation.

drl7x This is creates a safe program which contains additional variables and assert statements.

drl7x Buffer overflows in the original program are detected by finding cases where assert statements in the safe program which may fail.

drl7x slide 17

drl7x We recognize that there are certain impediments to the wide spread use of our tool.

drl7x People are lazy and

drl7x programmer are lazier than most people.

drl7x Adding annotations may be too much work except for security weenies.

drl7x We're currently working on techniques to automate the process.

drl7x <slide 18 conclusions>

drl7x What will things be like 13 years from now?

drl7x Will buffer overflows still be commonly exploited.

drl7x Within the security community, buffer overflows have long been a well understood problem.

drl7x However, buffer overflows have remained far too common.

drl7x If we want 2013 to be better than 2001, we, the security committee, must work to codify our knowledge into tools  that real programmers can use as part of the development process so that our expertise reaches real developers.

drl7x The creation of static checking tools is a promising way of doing this.

drl7x Update:

drl7x There are a couple of new additions to LCLint which are not included in the slides.

drl7x We now offer ITS4 like functionality in which

drl7x every instance of a potentially valnerable function such as gets

drl7x is reported.

drl7x However our approach uses a full parse tree so if gets is used as a variable or in a string literal no warning is reproted.

drl7x THanks you for listening to this talk.

mjcoma clap clap clap clap clap clap clap clap clap clap

mjcoma clap clap clap clap clap clap clap clap clap clap

mjcoma clap clap clap clap clap clap clap clap clap clap

mjcoma clap clap clap clap clap clap clap clap clap clap

Chess splash splash splash splash splash splash splash

Chess splash splash splash splash splash splash splash

fernand plas plas plas plas plas plas plas plñas plas

Chess splash splash splash splash splash splash splash

fernand plas plas plas plas plas plas plas plñas plas

fernand plas plas plas plas plas plas plas plñas plas

Chess splash splash splash splash splash splash splash

peter11 plasplas plas plas plas plas plas plas plas plas

peter11 plasplas plas plas plas plas plas plas plas plas

peter11 plasplas plas plas plas plas plas plas plas plas

sarnold_ thank you drl7x; so, your static tool (lclint, correct?) requires marking up the source code in order to disable warnings on 'known good' pieces of code?

drl7x Okay sarnold

drl7x You had the first question

drl7x It does a combination of things.

drl7x You use annotations to mark up source code so that lclint has more information

drl7x for example so it knows a function is only called with a buffer of size 100

drl7x Hopefully this additional information will be verified by lclint

drl7x In the worst case you can explictedly mark code safe

sarnold_ is there an easy way to tie together the LCLint "100 element array" with the declaration/malloc of the buffer to 100 elements?

drl7x Yes.  We have an internal annotated version of the standard library.

drl7x malloc has the annotation:

drl7x this tells us that str = malloc(100); implies that str[99] or lowwer is okay

sarnold_ hmm, I think I missed the annotation..

sarnold_ (did it start with '/' ? :)

drl7x Sorry the annotation is * void malloc (size_t size) /*@ensures maxSet(result) == (size - 1) @*/

drl7x Yes the anotations are ine /*@ .. @*/

drl7x Other questions

sarnold_ yes, one sec :)

fernand what do you think of cyclone and other similar approaches?

Amanda 15,15·14,14·06,00 Buenas tardes a todos 14,14·15,15·

drl7x I think cyclone is an interesting project.

sarnold_ drl7x, how does LCLint handle functions that take variable-sized arrays as their arguments?

drl7x cyclone for these who don't know is an attempt to make a type safe version of C

drl7x This is a thing and in would be good if people used safer languages.

drl7x However, many safe languages already exist: java, ada, etc.

drl7x Still people insist on probleming C.

drl7x Whatever benefits a new language has geting people in switch languages is difficult.

drl7x Perhaps cyclone is close enough in C that people will be willing to switch.  But history indicate that this isn't likely to happen.

drl7x next question

sarnold_ drl7x, how does LCLint handle functions that take variable-sized arrays as their arguments?

Bryam Hola a todos....

drl7x We use the required annotation for example:

drl7x foo (char *s, int n ) /@requires maxSet(s) >= n @*/

drl7x this says that this function is only safe if the buffer pointed to by s is larger than n

sarnold_ ok, and that ties together with the size declarations at static compile time, or the dynamic /*@ensures maxSet(result) == (size - 1) @*/ from malloc() ?

drl7x Well when we check the body of foo we assume that the annotations is true....

sarnold_ and, as a final pestering question (that I can think of right now :), how would that last example be modified, say, for free(), which is going to back up four bytes or something similar, using *(variable - 4) to get to housekeeping information?

drl7x   THIS IS for the previos question: Every time the foo is called we report an error this we can't verify the precondition for the actuall paramenter.

drl7x Let me think about your next question...

drl7x Are you asking using free then accessing the memory?

sarnold_ no, no

sarnold_ asking how LCLint can verify that free(3) isn't going 'out of bounds' when it takes ptr-4 to get to its own data ..

sarnold_ maxSet(ptr) >= n-4 ?

drl7x You mean free( *(ptr -3) ) or free (ptr - 3)

sarnold_ heh, sometimes IRC isn't perfect :)

sarnold_ free(3) as in the libc implementation of malloc...

sarnold_ I'm curious if LCLint can protect from buffer 'underruns'; the malloc implementation often uses the four bytes previous to the ptr to store house-keeping information..

sarnold_ since there is no count on the size of the data blob to be free()ed, except in implementation-specific data structures, does LCLint punt and say, "Hmm, call to free(), looks fine" :), or does it try to do more analysis to see if the free() call is legitimate?

sarnold_ (and yes, it is legit to say, "go read the USENIX paper" :)

drl7x I'm not exactly sure free is implemented internally.  We dodn't check the internal workings of standard libraries have some memor

drl7x instead me just an annotated declaration...

drl7x We do have the option to detect buffer underflows.  We have a minSet and minRead options.  I found that underflows we're common enough to

drl7x justify checking for them in many cases though.

drl7x Yes.  Or download the beta version.  Some of the code examples didn't translate well to IRC.

sarnold_ heh, yeah, that is true

sarnold_ the number of vulnerabilities i've seen that exploited free() are much more rare than overflow auto variables.. :)

sarnold_ drl7x, well, I think I am finally out of questions :) thanks again

drl7x LCLint does try to detect memory leaks with free.

drl7x sarnold thanks for your questions.

Generated by irclog2html.pl by Jeff Waugh - find it at freshmeat.net!

Mas información: