The goal of this project is to develop scalable analysis techniques that make use of dynamic information from program runs as well as some limited static information. The analysis developed is being exploited to improve the Security and Reliability of software. We have developed techniques for collecting and storing large amounts of profile data and mining this data for information useful in automated fault location and automated testing of software. More specifically our contributions include the following:
Scalable Collection of Dynamic Information is being achieved through integrated use of checkpointing and logging as well as fine-grained tracing of software. We have developed a two step approach in which during initial program execution checkpointing and logging is used. If the program exhibits incorrect behavior, the log is analyzed to identify portion of execution that needs to traced using the replay phase to collect information useful in debugging. We have developed the Whole Execution Trace representation that integrates many different kinds of profiles that have been used for characterizing program behavior into one single representation. This includes address, value, control flow, as well as data and control dependence profiles. Collection, storage, and analysis of complete profiles is a challenging task. We has developed a compact and rapidly traversible representation of whole execution profiles. Whole execution traces produced by execution of a couple of billion instructions require a gigabyte of storage. We have further scaled this approach so it can be applied to long running programs by proposing Execution Reduction, a technique that combines checkpointing/logging with fine-grained tracing.
Dynamic Slicing & Matching are powerful analysis that are useful for many applications. The large amounts of information collected to enable computation of dynamic program slices can easily cause the machine to run out of memory. In our recent work we have experimentally evaluated various precise and imprecise dynamic slicing algorithms to show the drawbacks of imprecise slicing. We have shown that whole execution trace representation enables rapid dynamic program slicing of realistic program runs without running out of memory. Dynamic matching of whole execution traces of two program versions is being carried out to enable a wide range of applications including those of piracy detection, some debugging scenarios, and software maintenance. Unlike some static approaches for matching program versions, our approach does not require access to source code of the two program versions. Rich program execution histories enable us to rapidly produce highly accurate matches. Our work differs from static differencing techniques as it enables matching of code sequences that dynamically behave the same even though they statically appear to be different.
Fault Location & Correction. After developing practical implementations of dynamic slicing we have employed it for fault location. Three different types of dynamic slices are being used for fault location: backward slices of erroneous values; forward slices of minimal fault inducing inputs; and bidirectional slices of critical predicates. We have also developed techniques for pruning dynamic slices based upon positive evidence indicating that some of the statements in the dynamic slices may be expected to be correct because they are responsible for producing correct results. Location of faults is also important for security because faults can often be used to launch attacks. We have also developed a new approach to locate and correct an erroneous statement in a faulty function. Our approach combines ideas from software testing and weakest preconditions used in correctness proof methods to locate a likely erroneous statement. Our recent work is aimed at developing techniques for automatically identifying corrupted program state and determining corrections to the corrupted state. Our results show that this approach performs fault location with a very high degree of precision.
Software Testing is an important activity undertaken to improve the reliability of software. We have developed a greedy heuristic algorithm that iteratively exploits the implications among the test cases and implications among the requirements to generate a minimized suite that covers the same set of requirements as that of the unminimized suite. To address the problem of fault loss due to removing test cases from test suite, we have also developed a technique to selectively retain coverage redundant test cases that are likely to expose new faults. We have developed a new structural coverage criterion called invariant-coverage criterion for generating inputs to detect likely program invariants. We have developed numerical iterative techniques for automatically generating test data for a given path in a program.