| |
| \section{Introduction} |
| |
| This document describes a \emph{Dataflow Framework} for the Java |
| Programming Language. The framework is used in the |
| \href{https://checkerframework.org/}{Checker Framework}, |
| \href{http://errorprone.info/}{Error Prone}, |
| \href{https://github.com/uber/NullAway}{NullAway}, |
| at Facebook, and in other contexts. |
| |
| The primary purpose of the Dataflow Framework is to estimate values: |
| for each line of source code, it determines what |
| values each variable might contain. |
| |
| The Dataflow Framework's result (\autoref{sec:analysis_result_class}) |
| is an abstract value for each expression (an estimate of the |
| expression's run-time value) and a store at each program point. A |
| store maps variables and other expressions to abstract |
| values. As a pre-pass, the Dataflow Framework transforms an input |
| AST into a control flow graph (\autoref{sec:cfg}) consisting of basic |
| blocks made up of nodes representing single operations. An analysis |
| performs iterative data flow |
| analysis over the control flow graph. The effect of a single node on |
| the dataflow store is represented by a transfer function, which takes |
| an input store and a node and produces an output store. Once the |
| analysis reaches a fix point, the result can be accessed by client |
| code. |
| |
| In the Checker Framework, the abstract values to be computed are |
| annotated types. An individual checker can customize its analysis by |
| extending the abstract value class and by overriding the behavior of |
| the transfer function for particular node types. |
| |
| The Dataflow Framework was |
| designed with several goals in mind. First, to encourage use |
| beyond the Checker Framework, it is written as a separate package that can be |
| built and used with no dependence on the Checker Framework. Second, |
| the framework supports analysis but not |
| transformation, so it provides information that can be used by a type |
| checker or an IDE, but it does not support optimization. Third, the |
| framework aims to minimize the burden on developers who build on top |
| of it. In particular, the hierarchy of analysis classes is designed |
| to reduce the effort required to implement a new flow-sensitive type |
| checker in the Checker Framework. The |
| \href{https://docs.google.com/document/d/1oYzbOrrS4ZEEx4wQgIHbijNzcI5CiQAq_-1NrOS8JME/edit?usp=sharing}{Dataflow User's Guide} |
| gives an introduction to customizing dataflow to add checker-specific |
| enhancements. |
| |
| \begin{workinprogress} |
| Paragraphs colored in gray with a gray bar on the left side (just |
| like this one) contain questions, additional comments or indicate |
| missing parts. Eventually, these paragraphs will be removed. |
| \end{workinprogress} |
| |
| |
| \paragraph{Potential version conflicts if you export the Dataflow Framework} |
| Suppose that two tools both utilize the Dataflow Framework. If neither one |
| exports the Dataflow Framework's classes or the Checker Framework |
| annotations, both tools can be run at the same time. If both export the |
| classes, then it may be impossible for users to run both tools, because |
| they may require a different version of the Dataflow Framework. The |
| Checker Framework necessarily exports the Dataflow Framework. If your tool |
| cannot shadow the Checker Framework and Dataflow Framework classes, and you |
| cannot conform to the Checker Framework release schedule, then instead of |
| using the \code{dataflow} artifact at |
| \url{https://search.maven.org/artifact/org.checkerframework/dataflow/}, |
| you can use the \code{dataflow-shaded} artifact at |
| \url{https://search.maven.org/artifact/org.checkerframework/dataflow-shaded/}. |
| It contains the Dataflow Framework in package |
| \code{org.checkerframework.shaded.dataflow} and also its dependencies in |
| their usual packages, namely the Checker Framework |
| qualifiers and the plume-util project |
| (\url{https://github.com/plume-lib/plume-util}). |
| (There may still be a problem if two different non-Checker-Framework tools |
| both export different, incompatible versions of the \code{dataflow-shaded} |
| artifact.) |
| |
| |
| \section{Organization} |
| |
| \subsection{Projects} |
| % TODO: Update |
| |
| The source code of the combined Checker Framework and Dataflow |
| Framework is divided into multiple projects: \code{javacutil}, |
| \code{dataflow}, \code{framework}, and \code{checker}, |
| which can be built into distinct jar files. \code{checker.jar} is a fat |
| jar that contains all of these, plus the Stub Parser. |
| |
| \code{javacutil} provides convenient interfaces to routines in |
| Oracle's javac library. There are utility classes for interacting |
| with annotations, elements, trees and types, as well as |
| \code{InternalUtils}, which gives direct access to internal features |
| of javac that are not part of a supported interface. There are |
| interfaces or abstract classes for reporting errors, for processing |
| types in an AST, and for providing the annotations present on an |
| Element. |
| The \code{org.checkerframework.javacutil.trees} package provides a |
| class to parse expressions into javac Trees (\code{TreeParser}), a |
| class to build new Trees from scratch (\code{TreeBuilder}), and a |
| class to represent newly introduced variables that are not part of an |
| input program (\code{DetachedVarSymbol}). |
| |
| \code{dataflow} contains the classes to represent and construct |
| control flow graphs and the base classes required for flow analysis. |
| These classes are described in detail in \autoref{sec:node_classes}. |
| |
| \code{framework} contains the framework aspects of the Checker |
| Framework, including the derived classes for flow analysis of |
| annotated types which are described later in this document. |
| |
| \code{checker} contains the type system-specific checkers. |
| |
| The \code{dataflow} project depends only on \code{javacutil}. |
| |
| |
| \subsection{Classes} |
| |
| This section gives an overview of the major Java classes and |
| interfaces in the implementation of the Dataflow Framework and the |
| flow-sensitive type checking feature of the Checker Framework. It |
| includes both the base classes in the \code{dataflow} project and the |
| derived classes in the \code{framework} project. The class and |
| interface declarations are given with full package names to indicate |
| which project they belong to. |
| |
| \subsubsection{Nodes} |
| \label{sec:node_classes} |
| |
| Dataflow doesn't actually work on trees; it works on Nodes. |
| A Node class represents an individual operation of a program, |
| including arithmetic operations, logical operations, method calls, |
| variable references, array accesses, etc. |
| Nodes |
| simplify writing a dataflow analysis by separating the dataflow |
| analysis from the original source code. |
| \autoref{tab:nodes} on page~\pageref{tab:nodes} lists |
| the Node types. |
| |
| \begin{verbatim} |
| package org.checkerframework.dataflow.cfg.node; |
| |
| abstract class Node |
| class *Node extends Node |
| \end{verbatim} |
| |
| |
| \subsubsection{Blocks} |
| \label{sec:block_classes} |
| |
| The Block |
| classes represent basic blocks. |
| |
| \begin{verbatim} |
| package org.checkerframework.dataflow.cfg.block; |
| |
| interface Block |
| abstract class BlockImpl implements Block |
| interface SingleSuccessorBlock extends Block |
| abstract class SingleSuccessorBlockImpl extends BlockImpl implements SingleSuccessorBlock |
| \end{verbatim} |
| |
| A RegularBlock contains no exception-raising operations and has a |
| single control-flow successor. |
| |
| \begin{verbatim} |
| package org.checkerframework.dataflow.cfg.block; |
| interface RegularBlock extends SingleSuccessorBlock |
| class RegularBlockImpl extends SingleSuccessorBlockImpl implements RegularBlock |
| \end{verbatim} |
| |
| An ExceptionBlock contains a single operation that may raise an |
| exception, with one or more exceptional successors and a single normal |
| control-flow successor. |
| |
| \begin{verbatim} |
| package org.checkerframework.dataflow.cfg.block; |
| interface ExceptionBlock extends SingleSuccessorBlock |
| class ExceptionBlockImpl extends SingleSuccessorBlockImpl implements ExceptionBlock |
| \end{verbatim} |
| |
| A SpecialBlock represents method entry or exit, including exceptional |
| exit which is represented separately from normal exit. |
| |
| \begin{verbatim} |
| package org.checkerframework.dataflow.cfg.block; |
| interface SpecialBlock extends SingleSuccessorBlock |
| class SpecialBlockImpl extends SingleSuccessorBlockImpl implements SpecialBlock |
| \end{verbatim} |
| |
| A ConditionalBlock contains no operations at all. It represents a |
| control-flow split to either a `then' or an `else' successor based on |
| the immediately preceding boolean-valued Node. |
| |
| \begin{verbatim} |
| package org.checkerframework.dataflow.cfg.block; |
| interface ConditionalBlock extends Block |
| class ConditionalBlockImpl extends BlockImpl implements ConditionalBlock |
| \end{verbatim} |
| |
| |
| |
| \subsubsection{ControlFlowGraph} |
| \label{sec:control_flow_graph_class} |
| |
| A ControlFlowGraph represents the body of a method or an initializer |
| expression as a graph of Blocks with distinguished entry, exit, and |
| exceptional exit SpecialBlocks. ControlFlowGraphs are produced by the |
| CFGBuilder classes and are treated as immutable once they are built. |
| |
| \begin{verbatim} |
| package org.checkerframework.dataflow.cfg; |
| class ControlFlowGraph |
| \end{verbatim} |
| |
| \subsubsubsection{CFGBuilder} |
| \label{sec:cfg_builder_classes} |
| |
| The CFGBuilder classes visit an AST and produce a corresponding |
| ControlFlowGraph as described in \autoref{sec:ast_to_cfg_translation}. |
| |
| \begin{verbatim} |
| package org.checkerframework.dataflow.cfg; |
| class CFGBuilder |
| \end{verbatim} |
| |
| The Checker Framework derives from CFGBuilder in order to desugar |
| enhanced for loops that make explicit use of type annotations provided |
| by the checker in use. |
| |
| \begin{verbatim} |
| package org.checkerframework.framework.flow; |
| class CFCFGBuilder extends CFGBuilder |
| \end{verbatim} |
| |
| \subsubsubsection{CFGVisualizeLauncher} |
| \label{sec:cfg_visualize_launcher_class} |
| |
| The CFGVisualizeLauncher generates a DOT or String representation of |
| the control flow graph for a given method in a given class. |
| |
| \begin{verbatim} |
| package org.checkerframework.dataflow.cfg; |
| class CFGVisualizeLauncher |
| \end{verbatim} |
| |
| \subsubsection{JavaExpressions} |
| \label{sec:flow_expressions_class} |
| |
| The Dataflow Framework records the abstract values of certain |
| expressions, called JavaExpressions: local variables, field accesses, |
| array accesses, references to \code{this}, and pure method calls. |
| JavaExpressions are keys in the store of abstract values. |
| |
| \begin{verbatim} |
| package org.checkerframework.dataflow.analysis; |
| class JavaExpressions |
| \end{verbatim} |
| |
| Java expressions that appear in method pre- and postconditions are |
| parsed into JavaExpressions using helper routines in |
| \code{org.checkerframework.framework.util.JavaExpressionParseUtil}. |
| |
| |
| \subsubsection{AbstractValue} |
| \label{sec:abstract_value_classes} |
| |
| AbstractValue is the internal representation of dataflow information |
| produced by an analysis. An AbstractValue is an estimate about the |
| run-time values that an expression may evaluate to. The client of the |
| Dataflow Framework defines the abstract value, so the information may |
| vary widely among different users of the Dataflow Framework, but they |
| share a common feature that one can compute the least upper bound of |
| two AbstractValues. |
| |
| \begin{verbatim} |
| package org.checkerframework.dataflow.analysis; |
| interface AbstractValue<V extends AbstractValue<V>> |
| \end{verbatim} |
| |
| For the Checker Framework, abstract values are essentially |
| AnnotatedTypeMirrors. |
| |
| \begin{verbatim} |
| package org.checkerframework.framework.flow; |
| abstract class CFAbstractValue<V extends CFAbstractValue<V>> implements AbstractValue<V> |
| class CFValue extends CFAbstractValue<CFValue> |
| \end{verbatim} |
| |
| For the Nullness Checker, abstract values additionally track the |
| meaning of PolyNull, which may be either Nullable or NonNull. The |
| meaning of PolyNull can change when a PolyNull value is compared to |
| the null literal, which is specific to the NullnessChecker. Other |
| checkers often also support a Poly* qualifier, but only the |
| NullnessChecker tracks the meaning of its poly qualifier using the |
| dataflow analysis. |
| |
| \begin{verbatim} |
| package org.checkerframework.checker.nullness; |
| class NullnessValue extends CFAbstractValue<NullnessValue> |
| \end{verbatim} |
| |
| |
| \subsubsection{Store} |
| \label{sec:store_classes} |
| |
| A Store is a set of dataflow facts computed by an analysis, so it is a |
| mapping from JavaExpressions to AbstractValues. As with |
| AbstractValues, one can take the least upper bound of two Stores. |
| |
| \begin{verbatim} |
| package org.checkerframework.dataflow.analysis; |
| interface Store<S extends Store<S>> |
| \end{verbatim} |
| |
| The Checker Framework store restricts the type of abstract values it |
| may contain. |
| |
| \begin{verbatim} |
| package org.checkerframework.framework.flow; |
| abstract class CFAbstractStore<V extends CFAbstractValue<V>, |
| S extends CFAbstractStore<V, S>> |
| implements Store<S> |
| class CFStore extends CFAbstractStore<CFValue, CFStore> |
| \end{verbatim} |
| |
| An InitializationStore tracks which fields of the `self' reference |
| have been initialized. |
| |
| \begin{verbatim} |
| package org.checkerframework.checker.initialization; |
| class InitializationStore<V extends CFAbstractValue<V>, |
| S extends InitializationStore<V, S>> |
| extends CFAbstractStore<V, S> |
| \end{verbatim} |
| |
| A NullnessStore additionally tracks the meaning of PolyNull. |
| |
| \begin{verbatim} |
| package org.checkerframework.checker.nullness; |
| class NullnessStore extends InitializationStore<NullnessValue, NullnessStore> |
| \end{verbatim} |
| |
| |
| \subsubsection{Transfer functions} |
| \label{sec:transfer_functions} |
| |
| A transfer function (\autoref{sec:transfer_function_classes}) is |
| explicitly represented as a node visitor that takes a TransferInput |
| (\autoref{sec:transfer_input_classes}) and produces a TransferResult |
| (\autoref{sec:transfer_result_classes}). |
| |
| \subsubsubsection{TransferInput} |
| \label{sec:transfer_input_classes} |
| |
| The TransferInput represents the set of dataflow facts known to be |
| true immediately before the node to be analyzed. A TransferInput may |
| contain a single store, or a pair of `then' and `else' stores when |
| following a boolean-valued expression. |
| |
| \begin{verbatim} |
| package org.checkerframework.dataflow.analysis; |
| class TransferInput<V extends AbstractValue<V>, |
| S extends Store<S>> |
| \end{verbatim} |
| |
| \subsubsubsection{TransferResult} |
| \label{sec:transfer_result_classes} |
| |
| A TransferResult is the output of a transfer function. In other |
| words, it is the set of dataflow facts known to be true immediately |
| after a node. A Boolean-valued expression produces a |
| ConditionalTransferResult that contains both a `then' and an `else' |
| store, while most other Nodes produce a RegularTransferResult with a |
| single store. |
| |
| \begin{verbatim} |
| package class org.checkerframework.dataflow.analysis; |
| abstract TransferResult<V extends AbstractValue<V>, |
| S extends Store<S>> |
| class ConditionalTransferResult<V extends AbstractValue<V>, |
| S extends Store<S>> |
| extends TransferResult<A, S> |
| class RegularTransferResult<V extends AbstractValue<V>, |
| S extends Store<S>> |
| extends TransferResult<A, S> |
| \end{verbatim} |
| |
| \subsubsubsection{TransferFunction} |
| \label{sec:transfer_function_classes} |
| |
| A TransferFunction is a NodeVisitor that takes an input and produces an output. |
| |
| \begin{verbatim} |
| package org.checkerframework.dataflow.analysis; |
| interface TransferFunction<V extends AbstractValue<V>, |
| S extends Store<S>> |
| extends NodeVisitor<TransferResult<V, S>, TransferInput<V, S>> |
| interface ForwardTransferFunction<V extends AbstractValue<V>, |
| S extends Store<S>> |
| extends TransferFunction<V, S> |
| interface BackwardTransferFunction<V extends AbstractValue<V>, |
| S extends Store<S>> |
| extends TransferFunction<V, S> |
| \end{verbatim} |
| |
| The Checker Framework defines a derived class of TransferFunction to |
| serve as the default for most checkers. The class constrains the type |
| of abstract values and it overrides many node visitor methods to |
| refine the abstract values in their TransferResults. |
| |
| \begin{verbatim} |
| package org.checkerframework.framework.flow; |
| abstract class CFAbstractTransfer<V extends CFAbstractValue<V>, |
| S extends CFAbstractStore<V, S>, |
| T extends CFAbstractTransfer<V, S, T>> |
| extends AbstractNodeVisitor<TransferResult<V, S>, TransferInput<V, S>> |
| implements ForwardTransferFunction<V, S> |
| |
| class CFTransfer extends CFAbstractTransfer<CFValue, CFStore, CFTransfer> |
| \end{verbatim} |
| |
| The Initialization Checker's transfer function tracks which fields of |
| the `self' reference have been initialized. |
| |
| \begin{verbatim} |
| package org.checkerframework.checker.initialization; |
| class InitializationTransfer<V extends CFAbstractValue<V>, |
| T extends InitializationTransfer<V, T, S>, |
| S extends InitializationStore<V, S>> |
| extends CFAbstractTransfer<V, S, T> |
| \end{verbatim} |
| |
| The Regex Checker's transfer function overrides visitMethodInvocation |
| to special-case the \code{isRegex} and \code{asRegex} methods. |
| |
| \begin{verbatim} |
| package org.checkerframework.checker.regex; |
| class RegexTransfer extends CFAbstractTransfer<CFValue, CFStore, RegexTransfer> |
| \end{verbatim} |
| |
| |
| \subsubsection{Analysis} |
| \label{sec:analysis_classes} |
| |
| An Analysis performs iterative dataflow analysis over a control flow |
| graph using a given transfer function. Both forward and backward |
| analyses are supported. |
| |
| \begin{verbatim} |
| package org.checkerframework.dataflow.analysis; |
| interface Analysis<V extends AbstractValue<V>, |
| S extends Store<S>, |
| T extends TransferFunction<V, S>> |
| abstract class AbstractAnalysis<V extends AbstractValue<V>, |
| S extends Store<S>, |
| T extends TransferFunction<V, S>> |
| implements Analysis<V, S, T> |
| interface ForwardAnalysis<V extends AbstractValue<V>, |
| S extends Store<S>, |
| T extends ForwardTransferFunction<V, S>> |
| extends Analysis<V, S, T> |
| interface BackwardAnalysis<V extends AbstractValue<V>, |
| S extends Store<S>, |
| T extends BackwardTransferFunction<V, S>> |
| extends Analysis<V, S, T> |
| \end{verbatim} |
| |
| The Checker Framework defines a derived class of Analysis for use as |
| the default analysis of most checkers. This class adds information |
| about the type hierarchy being analyzed and acts as a factory for |
| abstract values, stores, and the transfer function. |
| |
| \begin{verbatim} |
| package org.checkerframework.framework.flow; |
| abstract class CFAbstractAnalysis<V extends CFAbstractValue<V>, |
| S extends CFAbstractStore<V, S>, |
| T extends CFAbstractTransfer<V, S, T>> |
| extends ForwardAnalysisImpl<V, S, T> |
| |
| class CFAnalysis extends CFAbstractAnalysis<CFValue, CFStore, CFTransfer> |
| \end{verbatim} |
| |
| The Nullness Checkers' analysis overrides the factory methods for |
| abstract values, stores, and the transfer function. |
| |
| \begin{verbatim} |
| package org.checkerframework.checker.nullness; |
| class NullnessAnalysis extends CFAbstractAnalysis<NullnessValue, |
| NullnessStore, NullnessTransfer> |
| \end{verbatim} |
| |
| The RegexChecker's analysis overrides the factory methods for abstract |
| values, stores, and the transfer function. |
| |
| \begin{verbatim} |
| package org.checkerframework.checker.regex; |
| class RegexAnalysis extends CFAbstractAnalysis<CFValue, CFStore, RegexTransfer> |
| \end{verbatim} |
| |
| |
| \subsubsection{AnalysisResult} |
| \label{sec:analysis_result_class} |
| |
| An AnalysisResult preserves the dataflow information computed by an |
| Analysis for later use by clients. The information consists of an |
| AbstractValue for each node in the CFG and a Store that is valid at |
| the start of each Block. The AnalysisResult class can return |
| AbstractValues for either Nodes or Trees and it can re-run the |
| transfer function to compute Stores that are valid immediately before |
| or after any Tree. |
| |
| \begin{verbatim} |
| package org.checkerframework.dataflow.analysis; |
| class AnalysisResult<V extends AbstractValue<V>, |
| S extends Store<S>> |
| \end{verbatim} |
| |
| |
| \subsubsection{AnnotatedTypeFactory} |
| \label{sec:annotated_type_factory_classes} |
| |
| AnnotatedTypeFactorys are not part of the Dataflow Framework, per se, |
| but they are parameterized by the Dataflow Framework classes that they |
| use. |
| |
| \begin{verbatim} |
| package org.checkerframework.framework.type; |
| class AnnotatedTypeFactory implements AnnotationProvider |
| \end{verbatim} |
| |
| In the Checker Framework, dataflow analysis is performed on demand, |
| one class at a time, the first time that a ClassTree is passed to |
| getAnnotatedType. This is implemented in the abstract class |
| GenericAnnotatedTypeFactory with concrete implementation in |
| BaseAnnotatedTypeFactory. |
| |
| \begin{verbatim} |
| package org.checkerframework.framework.type; |
| abstract class GenericAnnotatedTypeFactory<Checker extends BaseTypeChecker<?>, |
| Value extends CFAbstractValue<Value>, |
| Store extends CFAbstractStore<Value, Store>, |
| TransferFunction extends CFAbstractTransfer<Value, Store, TransferFunction>, |
| FlowAnalysis extends CFAbstractAnalysis<Value, Store, TransferFunction>> |
| extends AnnotatedTypeFactory |
| |
| package org.checkerframework.common.basetype; |
| class BaseAnnotatedTypeFactory |
| extends GenericAnnotatedTypeFactory<CFValue, CFStore, CFTransfer, CFAnalysis> |
| \end{verbatim} |
| |
| |
| \section{The Control-Flow Graph} |
| \label{sec:cfg} |
| |
| A control-flow graph (CFG) represents a single method or field |
| initialization. (The Dataflow Framework performs an intra-procedural |
| analysis. This analysis is modular and every method is considered in |
| isolation.) |
| This section also describes the translation from the abstract syntax tree |
| (AST) to the CFG\@. |
| We start with a simple example, then give a more formal |
| definition of the CFG and its properties, and finally describe the |
| translation from the AST to the CFG. |
| |
| As is standard, a control-flow graph is a set of |
| basic blocks that are linked by control-flow edges. Possibly less |
| standard, every basic block consists of a sequence of so-called nodes, |
| each of which represents a minimal Java operation or expression. |
| |
| |
| \flow{CFGSimple}{.33}{1.1}{A simple Java code snippet to introduce the CFG. |
| In CFG visualizations, special basic blocks are shown as ovals; |
| conditional basic blocks are polygons with eight sides; and regular and exception |
| basic blocks are rectangles.} |
| |
| Consider the method \code{test} of \autoref{fig:CFGSimple}. The if |
| conditional got translated to a \emph{conditional basic block} |
| (octagon) with two successors. There are also two special basic blocks |
| (ovals) to denote the entry and exit point of the method. |
| |
| |
| \subsection{Formal Definition of the Control-Flow Graph} |
| \label{sec:cfg-formal} |
| |
| The control-flow graph models all paths that can possibly be taken by an |
| execution of the method. |
| |
| \begin{definition}[Control-Flow Graph] |
| A \emph{control-flow graph} consists of a set of \emph{basic |
| blocks} and a set of directed edges between these basic blocks, |
| some of which are labeled. |
| \end{definition} |
| |
| \begin{definition}[Basic Block] |
| A \emph{basic block} is a sequence of \emph{nodes}, where the only |
| control flow between the nodes inside |
| the basic block is sequential. Furthermore, there is no |
| control flow occurring between those nodes and nodes of other basic |
| blocks, except between the last node of one block $b_1$ and the first node |
| of another block $b_2$, if $b_2$ is a successor of $b_1$. A basic |
| block may have multiple successors. |
| \end{definition} |
| |
| |
| |
| \begin{definition}[Types of Basic Blocks] |
| There are four \emph{types} of basic blocks in a control-flow graph: |
| \begin{enumerate} |
| \item \textbf{Regular basic block.} A \emph{regular basic |
| block} contains any non-empty sequence of nodes and has |
| exactly one successor. None of the nodes in the block can |
| throw an exception at runtime. |
| |
| \item \textbf{Special basic blocks.} A \emph{special basic |
| block} contains the empty sequence of nodes (i.e., is empty) |
| and denotes either the entry or one of the exit blocks of a |
| method. There are three types of special basic blocks: |
| \begin{itemize} |
| \item Entry block. This basic block is the (only) entry |
| point of the method and thus is the only basic block |
| without predecessors. |
| \item Exit block. This basic block denotes the (normal) |
| exit of a method, and it has no successors. |
| \item Exceptional exit block, which indicates exceptional |
| termination of the method. As an exit block, this block |
| has no successors. |
| \end{itemize} |
| Every method has exactly one entry block, zero or one exit blocks, |
| and zero or one exceptional exit blocks. There is always |
| either an exit block, an exceptional exit block, or both. |
| |
| \item \textbf{Exception basic block.} An \emph{exception basic |
| block} contains exactly one node that \emph{might} throw an |
| exception at runtime (e.g., a method call). There are zero |
| or one non-exceptional successors (only a basic block containing a |
| \code{throw} statement does not have a non-exceptional |
| successor). There are one or more |
| exceptional successors (see \autoref{def:edges}). In all |
| cases there is at least one successor (regular or |
| exceptional). |
| |
| \item \textbf{Conditional basic block.} A \emph{conditional |
| basic block} does not contain any nodes and is used as a |
| \emph{split point} after the execution of a node of boolean |
| type. It has exactly two successors (both non-exceptional): |
| the \emph{then} successor that is reached when the previous node |
| evaluates to true and the \emph{else} successor that is reached |
| when the previous node evaluates to false. There is always |
| exactly a single predecessor block for every conditional |
| basic block, which is either a regular basic block or an |
| exception basic block. In both cases, the last node in the |
| predecessor will be of boolean type and the boolean value |
| controls which successor of the conditional block is |
| executed. |
| \end{enumerate} |
| \end{definition} |
| |
| The Java implementation of the four block types above is described in |
| \autoref{sec:block_classes}. |
| |
| \begin{definition}[Control-Flow Graph Edges] |
| \label{def:edges} |
| The basic blocks of a control-flow graph are connected by directed |
| \emph{edges}. If $b_1$ and $b_2$ are connected by a directed edge |
| $(b_1,b_2)$, we call $b_1$ a predecessor of $b_2$, and we call $b_2$ |
| a successor of $b_1$. In a control-flow graph, there are three |
| types of edges: |
| \begin{enumerate} |
| \item \textbf{Exceptional edges}. An \emph{exceptional edge} |
| connects an exception basic block with its exceptional |
| successors, and it is labeled by the most general exception that |
| might cause execution to take this edge during run time. Note |
| that the outgoing exceptional edges of a basic block do not need |
| to have mutually exclusive labels; the semantics is that the |
| control flow follows the most specific edge. For instance, if |
| one edge is labeled with type \code{A} and another is labeled |
| with type \code{B} where \code{B} is a subtype of \code{A}, then |
| the execution only takes the first edge if the exception is of a |
| subtype of \code{A}, but not a subtype of \code{B}. |
| |
| \begin{workinprogress} |
| There is not necessarily a most specific exception type in the program |
| text; in that case, does the translation add a most specific case that will |
| never be executed at run time? |
| |
| In general, what is the relation of the ordering in source code to the |
| one here? |
| \end{workinprogress} |
| |
| There is at most one successor for every exception type. |
| |
| \item \textbf{Conditional edges}. A \emph{conditional edge} is a |
| non-exceptional edge that connects a conditional basic block |
| with one of its successors, and is labeled with either ``true'' |
| or ``false''. |
| |
| \item \textbf{Regular, non-conditional edge.} Any other edge is a |
| \emph{regular edge}, and does not carry a label. Only regular |
| basic blocks, the entry basic block, and exception basic blocks |
| have outgoing regular edges. |
| \end{enumerate} |
| \end{definition} |
| |
| |
| \begin{definition}[Nodes] |
| \label{def:node} |
| A \emph{node} is a minimal Java operation or expression. It is |
| minimal in the sense that it cannot be decomposed further into |
| subparts between which control flow occurs. Examples for such |
| nodes include integer literals, an addition node (which performs |
| the mathematical addition of two nodes) or a method call. Control |
| flow such as \code{if} and \code{break} are not represented as |
| nodes. The full list of nodes is given in \autoref{tab:nodes} and |
| several of them are described in more detail in |
| \autoref{sec:noteworthy-translations}. |
| |
| It is important to note that, even though nodes can contain |
| references to other nodes, it is only the ``top-level'' node which |
| is considered at that point in the basic block. In the example of |
| the addition node, this means that only the addition operation is |
| to be executed, and its operands would occur earlier in the |
| control-flow graph (as they are evaluated first, before performing |
| the addition). |
| \end{definition} |
| |
| In the visualization, a string representation of the node is used, |
| followed by the node type in square brackets. Note that the string |
| representation often also includes more than just the ``top-level'' |
| node. For instance, the addition expression \code{a + b[0];} will |
| appear as ``a + b[0] [ NumericalAddition ]'' rather than ``a'' plus |
| some temporary variable. This is done for clarity, so that it is easy |
| to see what expressions are summed up and because we don't create |
| internal names for expression results. |
| |
| \autoref{tab:nodes} lists all node types in the framework. We use the |
| Java class name of the implementation, but leave out the suffix |
| \code{Node}, which is present for every type. |
| All classes are in package \code{org.checkerframework.dataflow.cfg.node}. |
| |
| \begin{longtable}{lp{0.4\linewidth}l} |
| \midrule |
| \multicolumn{3}{c}{\autoref{tab:nodes}: All node types in the Dataflow Framework.} \\ \\ |
| \textbf{Node type} & \textbf{Notes} & \textbf{Example} \\ \midrule \endfirsthead |
| |
| \textbf{Node type} & \textbf{Notes} & \textbf{Example} \\ \midrule \endhead |
| \hline \multicolumn{3}{|c|}{{Continued on next page}} \\ \hline \endfoot |
| \endlastfoot |
| |
| \code{Node} & The base class of all nodes. & \\ |
| \midrule |
| |
| \code{ValueLiteral} & The base class of literal value nodes. & \\ |
| \code{BooleanLiteral} & & \code{true} \\ |
| \code{CharacterLiteral} & & \code{'c'} \\ |
| \code{DoubleLiteral} & & \code{3.14159} \\ |
| \code{FloatLiteral} & & \code{1.414f} \\ |
| \code{IntegerLiteral} & & \code{42} \\ |
| \code{LongLiteral} & & \code{1024L} \\ |
| \code{NullLiteral} & & \code{null} \\ |
| \code{ShortLiteral} & & \code{512} \\ |
| \code{StringLiteral} & & \code{"memo"} \\ |
| \midrule |
| |
| & Accessor expressions & \\ |
| \code{ArrayAccess} & & \code{args[i]} \\ |
| \code{FieldAccess} & & \code{f}, \code{obj.f} \\ |
| \code{MethodAccess} & & \code{obj.hashCode} \\ |
| \code{This} & Base class of references to \code{this} & \\ |
| \code{ExplicitThis} & Explicit use of \code{this} in an expression & \\ |
| \code{ImplicitThis} & Implicit use of \code{this} in an expression & \\ |
| \code{Super} & Explicit use of \code{super} in expression. & \code{super(x, y)} \\ |
| \code{LocalVariable} & Use of a local variable, either as l-value or r-value & \\ |
| \midrule |
| |
| \code{MethodInvocation} & Note that access and invocation are distinct. & \code{hashCode()} \\ |
| \midrule |
| |
| & Arithmetic and logical operations. & \\ |
| \code{BitwiseAnd} & & a \& \code{b} \\ |
| \code{BitwiseComplement} & & \verb|~b| \\ |
| \code{BitwiseOr} & & \code{a | b} \\ |
| \code{BitwiseXor} & & \code{a ^ b} \\ |
| \code{ConditionalAnd} & Short-circuiting. & a \&\& \code{b} \\ |
| \code{ConditionalNot} & & \code{!a} \\ |
| \code{ConditionalOr} & Short-circuiting. & \code{a || b} \\ |
| \code{FloatingDivision} & & \code{1.0 / 2.0} \\ |
| \code{FloatingRemainder} & & \code{13.0 \% 4.0} \\ |
| \code{LeftShift} & & \code{x << 3} \\ |
| \code{IntegerDivision} & & \code{3 / 2} \\ |
| \code{IntegerRemainder} & & \code{13 \% 4} \\ |
| \code{NumericalAddition} & & \code{x + y} \\ |
| \code{NumericalMinus} & & \code{-x} \\ |
| \code{NumericalMultiplication} & & \code{x * y} \\ |
| \code{NumericalPlus} & & \code{+x} \\ |
| \code{NumericalSubtraction} & & \code{x - y} \\ |
| \code{SignedRightShift} & & \code{x >> 3} \\ |
| \code{StringConcatenate} & & \code{s + ".txt"} \\ |
| \code{TernaryExpression} & & \code{c ? t : f} \\ |
| \code{UnsignedRightShift} & & \code{x >>> 5} \\ |
| \midrule |
| |
| & Relational operations & \\ |
| \code{EqualTo} & & \code{x == y} \\ |
| \code{NotEqual} & & \code{x != y} \\ |
| \code{GreaterThan} & & \code{x > y} \\ |
| \code{GreaterThanOrEqual} & & \code{x >= y} \\ |
| \code{LessThan} & & \code{x < y} \\ |
| \code{LessThanOrEqual} & & \code{x <= y} \\ |
| |
| \code{Case} & Case of a switch. Acts as an equality test. & \\ |
| \midrule |
| |
| \code{Assignment} & & \code{x = 1} \\ |
| % \midrule |
| |
| \code{StringConcatenateAssignment} & A compound assignment. & \code{s += ".txt"} \\ |
| \midrule |
| |
| \code{ArrayCreation} & & \code{new double[]} \\ |
| \code{ObjectCreation} & & \code{new Object()} \\ |
| \midrule |
| |
| \code{TypeCast} & & \code{(float) 42} \\ |
| \code{InstanceOf} & & \code{x instanceof Float} \\ |
| \midrule |
| |
| & Conversion nodes. & \\ |
| \code{NarrowingConversion} & Implicit conversion. & \\ |
| \code{StringConversion} & Might be implicit. & \code{obj.toString()} \\ |
| \code{WideningConversion} & Implicit conversion. & \\ |
| \midrule |
| |
| \midrule |
| & \textbf{Non-value nodes} & \\ |
| |
| & Types appearing in expressions, such as \code{MyType.class} & \\ |
| \code{ArrayType} & & \\ |
| \code{ParameterizedType} & & \\ |
| \code{PrimitiveType} & & \\ |
| \midrule |
| |
| \code{ClassName} & Identifier referring to Java class or interface. & \code{java.util.HashMap} \\ |
| \code{PackageName} & Identifier referring to Java package. & \code{java.util} \\ |
| \midrule |
| |
| \code{Throw} & Throw an exception. & \\ |
| \code{Return} & Return from a method. & \\ |
| \midrule |
| |
| \code{AssertionError} & & \code{assert x != null : "Hey"} \\ |
| \midrule |
| |
| \code{Marker} & No-op nodes used to annotate a CFG with |
| information of the underlying Java source code. Mostly useful |
| for debugging and visualization. An example is indicating the |
| start/end of switch statements. & \\ \midrule |
| |
| \code{NullChk} & Null checks inserted by javac & \\ |
| \midrule |
| |
| \code{VariableDeclaration} & Declaration of a local variable & \\ |
| \midrule |
| |
| \caption{All node types in the Dataflow Framework.} |
| \label{tab:nodes} |
| \end{longtable} |
| |
| |
| In theory, nearly any statement can throw an \code{Error} such as |
| \code{OutOfMemoryError} or \code{NoSuchFieldError}. The Dataflow Framework |
| does not represent all that possible flow. It only creates the exceptional |
| edges shown in \autoref{tab:nodesWithException}. |
| |
| \begin{table} |
| \begin{tabular}{ll} |
| \hline |
| \textbf{Node type} & \textbf{Exception type} \\ \hline |
| |
| \code{ArrayAccess} & \code{NullPointerException}, \code{ArrayIndexOutOfBoundsException} \\ |
| \code{FieldAccess} & \code{NullPointerException} \\ |
| \code{MethodAccess} & \code{NullPointerException} \\ |
| \code{MethodInvocation} & \code{Throwable}, types in throws clause of the signature \\ |
| \code{IntegerDivision} & \code{ArithmeticException} \\ |
| \code{IntegerRemainder} & \code{ArithmeticException} \\ |
| \code{ObjectCreation} & \code{Throwable}, types in throws clause of the signature \\ |
| \code{ArrayCreation} & \code{NegativeArraySizeException}, \code{OutOfMemoryError} \\ |
| \code{TypeCast} & \code{ClassCastException} \\ |
| \code{Throw} & Type of \code{e} when \code{throw e} \\ |
| \code{AssertionError} & \code{AssertionError} \\ |
| \code{ClassName} & \code{ClassCircularityError}, \code{ClassFormatError}, \\ |
| & \code{NoClassDefFoundError}, \code{OutOfMemoryError} \\ |
| \hline |
| \end{tabular} |
| |
| \caption{All node types that could throw Exception, and the types |
| to be thrown. |
| Java class name of nodes are simplified as with \autoref{tab:nodes}. |
| All exception types are in package \code{java.lang}.} |
| \label{tab:nodesWithException} |
| \end{table} |
| |
| \begin{workinprogress} |
| Can \code{StringConversion} be implicit? I think so, but in any event discuss. |
| \end{workinprogress} |
| |
| \begin{workinprogress} |
| For each non-expression, explain its purpose, just like the explanation for |
| Marker that still needs to be fleshed out. |
| \end{workinprogress} |
| |
| \begin{workinprogress} |
| ``Could be desugared'' on ``StringConcatenateAssignment'' and ``Conversion |
| nodes'' was confusing. What is the design rationale for |
| desugaring in the Dataflow Framework? Discuss that. Here, at least |
| forward-reference to \autoref{sec:desugaring}, if that's relevant. \\ |
| More generally, for any cases that will be discussed in the text, add a |
| forward reference to the section with the discussion. |
| \end{workinprogress} |
| |
| \begin{workinprogress} |
| There is a \code{StringConcatenateAssignmentNode}. |
| What about other compound assignments? Are they desugared? |
| \end{workinprogress} |
| |
| \begin{workinprogress} |
| When we added Java~8 support, did we add additional nodes that are not |
| listed here? Cross-check with implementation. |
| \end{workinprogress} |
| |
| \begin{workinprogress} |
| Why is it \code{AssertionErrorNode} instead of \code{AssertNode}? |
| \end{workinprogress} |
| |
| |
| \subsection{Noteworthy Translations and Node Types} |
| \label{sec:noteworthy-translations} |
| |
| In this section we mention any non-straightforward translations from the AST to |
| the CFG, or special properties about individual nodes. |
| |
| |
| \subsubsection{Program Structure} |
| \label{sec:prog-structure} |
| |
| Java programs are structured using high-level programming constructs |
| such as loops, if-then-else constructs, |
| try-catch-finally blocks or switch statements. During the translation |
| from the AST to the CFG some of this program structure is lost and all |
| non-sequential control flow is represented by two low-level |
| constructs: conditional basic blocks and control-flow edges between |
| basic blocks. For instance, a \code{while} loop is translated into its |
| condition followed by a conditional basic block that models the two |
| possible outcomes of the condition: either the control flow follows |
| the `true' branch and continues with the loop's body, or control goes to the |
| `false' successor and executes the first statement after the loop. |
| |
| |
| \subsubsection{Assignment} |
| |
| As described in \jlsref{15.26.1}, the execution of an assignment is in |
| general not strictly left-to-right. Rather, the right-hand side might |
| be evaluated even if the left-hand side of the assignment causes an |
| exception. This semantics is faithfully represented in the CFG |
| produced by the translation. An example of a field assignment |
| exhibiting this behavior is shown in \autoref{fig:CFGFieldAssignment}. |
| |
| \flow{CFGFieldAssignment}{.33}{1}{Control flow for a field assignment is not strictly |
| left-to-right (cf.\ \jlsref{15.26.1}), |
| which is properly handled by the translation.} |
| |
| \subsubsection{Postfix/Prefix Increment/Decrement} |
| \label{sec:postpre-incdec} |
| Postfix and prefix increment and decrement have a side effect to |
| update the variable or field. To represent this side effect, the Dataflow |
| Framework creates an artificial assignment node like \code{n = n + 1} |
| for \code{++n} or \code{n++}. This artificial assignment node is stored |
| in \code{unaryAssignNodeLookup} of \code{ControlFlowGraph}. The assignment |
| node is also stored in \code{treeLookup} for prefix increment or decrement |
| so that the result of it is after the assignment. However, the node before |
| the assignment is stored in \code{treeLookup} for postfix increment or decrement |
| because the result of it should be before the assignment. For further information |
| about node-tree mapping, see \autoref{sec:conversions}. |
| |
| \subsubsection{Conditional stores} |
| \label{sec:cond-stores} |
| |
| The Dataflow Framework extracts information from control-flow splits |
| that occur in \code{if}, \code{for}, \code{while}, and \code{switch} statements. In order to have |
| the information available at the split, we eagerly produce two stores |
| contained in a \code{ConditionalTransferResult} after certain |
| boolean-valued expressions. The stores are called the \emph{then} and |
| \emph{else} stores. So, for example, after the expression \code{x == |
| null}, two different stores will be created. The Nullness Checker |
| would produce a then store that maps \code{x} to @Nullable and an else |
| store that maps \code{x} to @NonNull. |
| |
| The Dataflow Framework allows a maximum of two stores and when there |
| are two distinct stores, they always refer to the most recent |
| boolean-valued expression. Stores are propagated through most nodes |
| and they are reversed for conditional not expressions. The transfer |
| functions for many nodes merge conditional stores back together |
| because they cannot maintain the distinction between them. Merging |
| just means taking the least upper bound of the two stores and it |
| happens automatically by calling \code{TransferInput.getRegularStore}. |
| |
| |
| \subsubsection{Branches} |
| |
| The control flow graph represents all non-exceptional control-flow |
| splits, or branches, as \code{ConditionalBlock}s that contain no |
| nodes. If there is one store flowing into a conditional block, then |
| it is duplicated to both successors. If there are two stores flowing |
| into a conditional block, the then store is propagated to the block's |
| then successor and the else store is propagated to the block's else |
| successor. |
| |
| Consider the control flow graph generated for the simple if statement |
| in \autoref{fig:CFGIfStatement}. The conditional expression \code{b1} |
| immediately precedes the \code{ConditionalBlock}, represented by the |
| octagonal node. The \code{ConditionalBlock} is followed by both a |
| then and an else successor block, after which control flow merges back |
| together at the exit block. The edge labels \code{EACH_TO_EACH}, |
| \code{THEN_TO_BOTH}, and \code{ELSE_TO_BOTH} are flow rules described |
| in \autoref{sec:flow-rules}. As described above, the then store |
| propagates to (both stores of) the block's then successor according to |
| rule \code{THEN_TO_BOTH} and the else store propagates to (both stores |
| of) the block's else successor according to rule \code{ELSE_TO_BOTH}. |
| More precise rules are used to preserve dataflow information for |
| short-circuiting expressions, as described in \autoref{sec:cond-exp}. |
| |
| \flow{CFGIfStatement}{.33}{1.25}{Example of an if statement translated into a |
| \code{ConditionalBlock}.} |
| |
| \subsubsection{Conditional Expressions} |
| \label{sec:cond-exp} |
| |
| The conditional and (\code{&&}, cf. \jlsref{15.23}) and the |
| conditional or (\code{||}, cf. \jlsref{15.24}) expressions are subject |
| to short-circuiting: if evaluating the left-hand side already |
| determines the result, then the right-hand side is not evaluated. This |
| semantics is faithfully represented in the constructed CFG and more |
| precise flow rules (\autoref{sec:flow-rules}) are used to preserve |
| additional dataflow information. |
| |
| An example program using conditional or is shown in |
| \autoref{fig:CFGConditionalOr}. Note that the CFG correctly |
| represents short-circuiting. The expression \code{b2 || b3} is only |
| executed if \code{b1} is false and \code{b3} is only evaluated if |
| \code{b1} and \code{b2} are false. |
| |
| Observe in \autoref{fig:CFGConditionalOr} that the flow rule between |
| the first conditional block and its then successor is |
| \code{THEN_TO_THEN}, rather than the default flow rule for such edges |
| \code{THEN_TO_BOTH}, which is present on the edge from the last |
| conditional block to its then successor. \code{THEN_TO_THEN} is a |
| more precise rule which propagates the then store from the predecessor |
| of the conditional block to the then store of the then successor and |
| leaves the else store of the successor untouched. This is a valid |
| rule for propagating information along the short-circuit edge of a |
| conditional or expression because \code{b1 || (b2 || b3)} being false |
| implies that \code{b1} is false, so dataflow information that obtains |
| when \code{b1} is true has no effect on the dataflow information |
| obtains when \code{b1 || (b2 || b3)} is false. To put it another way, |
| if control reaches the block containing \code{b1 || (b2 || b3)} and |
| that expression is false, then control must have flowed along the else |
| branches of both conditional blocks and only the facts that obtain |
| along those edges need to be kept in the else store of the block |
| containing \code{b1 || (b2 || b3)}. |
| |
| \flow{CFGConditionalOr}{.33}{1.33}{Example of a conditional or expression |
| (\code{||}) with short-circuiting and more precise flow rules.} |
| |
| |
| \subsubsection{Implicit \code{this} access} |
| |
| The Java compiler AST uses the same type (\code{IdentifierTree}) for |
| local variables and implicit field accesses (where \code{this.} is |
| left out). To relieve the user of the Dataflow Framework from |
| manually determining the two cases, we consistently use |
| \code{FieldAccessNode} for field accesses, where the receiver might be |
| an \code{ImplicitThisNode}. |
| |
| |
| \subsubsection{Assert statements} |
| \label{sec:assert-stmts} |
| |
| Assert statements are treated specially by the CFG builder because it |
| is unknown at CFG construction time whether or not assertions will be |
| enabled when the program is run. When assertions are enabled, the |
| dataflow information gained by analyzing the assert statement can |
| improve precision and allow the programmer to avoid redundant |
| annotations. However, when assertions are disabled, it would be |
| unsound to assume that they had any effect on dataflow information. |
| |
| The user of the Dataflow Framework may specify that |
| assertions are enabled or disabled. When assertions are assumed to be |
| disabled, no CFG Nodes are built for the assert statement. |
| When assertions are assumed to be enabled, CFG Nodes are built to |
| represent the condition of the assert statement and, in the else |
| successor of a ConditionalBlock, CFG Nodes are built to represent the |
| detail expression of the assert, if any. |
| |
| If assertions are not assumed to be enabled or disabled, then |
| the CFG is conservative and represents the fact that the |
| assert statement may execute or may not. This takes the form of a |
| ConditionalBlock that branches on a fake variable. For example, see |
| \autoref{fig:CFGAssert}. The fake variable named |
| \code{assertionsEnabled#num0} controls the first ConditionalBlock. |
| The then successor of the ConditionalBlock is the same subgraph of CFG |
| Nodes that would be created if assertions were assumed to be enabled, |
| while the else successor of the ConditionalBlock is the same, empty, |
| subgraph of CFG Nodes that would be created if assertions were assumed |
| to be disabled. |
| |
| \flow{CFGAssert}{.15}{2.9}{Example of an assert statement translated with |
| assertions neither assumed to be enabled nor assumed to be |
| disabled.} |
| |
| |
| \subsubsection{Varargs method invocation} |
| \label{sec:varargs} |
| In Java, varargs in method or constructor invocation is compiled |
| as new array creation (cf.\ \jlsref{15.12.4.2}). For example, |
| \code{m(1, 2, 3)} will be compiled as \code{m(new int[]\{1, 2, 3\})} |
| when the signature of \code{m} is \code{m(int... args)}. Dataflow |
| Framework creates an \code{ArrayCreationNode} with initializer for varargs |
| in the same way as the Java compiler does. |
| Note that it doesn't create an \code{ArrayCreationNode} |
| when the varargs is an array with the same depth as the type of |
| the formal parameter, or if \code{null} is given as the actual varargs argument. |
| |
| \subsubsection{Default case and fall through for switch statement} |
| \label{sec:default-switch} |
| A switch statement is handled as a chain of \code{CaseNode} and nodes in |
| the case. \code{CaseNode} makes a branch by comparing the equality of |
| the expression of the switch statement and the expression of the case. |
| Note that the expression of a switch statement must be executed just only |
| once at the beginning of the switch statement. To refer to its value, a fake variable |
| is created and it is assigned to a fake variable. \code{THEN_TO_BOTH} |
| edge goes to nodes in the case and \code{ELSE_TO_BOTH} edge goes to next |
| \code{CaseNode}. When a next is default case, it goes to nodes in the default |
| case. If a break statement is in nodes, it creates an edge to next node of |
| the switch statement. If there is any possibility of fall-through, an edge |
| to the first of nodes in the next case is created after nodes in the case. |
| For example, see \autoref{fig:CFGSwitch}. The fake variable named \code{switch#num0} |
| is created and each of case nodes creates the branches. |
| |
| \flow{CFGSwitch}{.21}{1.45}{Example of a switch statement with case, default and fall through.} |
| |
| \subsubsection{Handling \code{finally} blocks} |
| \label{sec:try-finally} |
| |
| Control flow statements, like \code{return}, \code{break}, and \code{continue}, |
| within \code{try} blocks will cause execution of the \code{finally} block before |
| continuing at the target of the jump. The Dataflow Framework models this |
| behavior by adding a jump to a duplicate of the finally block before the |
| jump to the original target of the control flow statement. |
| |
| |
| \subsection{AST to CFG Translation} |
| \label{sec:ast_to_cfg_translation} |
| |
| This section gives a high-level overview of the translation process |
| from the abstract syntax tree to the control-flow graph as described |
| in \autoref{sec:cfg-formal}. |
| |
| First, we define several entities, which will be used in the translation. |
| |
| \begin{definition}[Extended Node] |
| In the translation process the data type \emph{extended node} is used. |
| An extended node can be one of four possibilities: |
| \begin{itemize} |
| \item \textbf{Simple extended node.} An extended node can just be a |
| wrapper for a node that does not throw an exception, |
| as defined in Definition~\ref{def:node}. |
| \item \textbf{Exception extended node.} Similar to a simple |
| node, an exception extended node contains a node, but this |
| node might throw an exception at runtime. |
| \item \textbf{Unconditional jump.} An unconditional jump |
| indicates that control flow proceeds non-sequentially to a |
| location indicated by a target label. |
| \item \textbf{Conditional jump.} A conditional jump can follow |
| an extended node that contains a node of boolean type. It |
| contains two target labels, one if the node evaluates to |
| true and one for false. |
| \end{itemize} |
| \end{definition} |
| |
| \textbf{Comparison of nodes and extended nodes.} |
| Nodes themselves never contain control flow information; they only |
| represent computation. |
| |
| An extended node is a wrapper around a node that represents control flow |
| information. It contains: a node, a label, a predecessor, and a |
| successor. |
| |
| An extended node is temporarily used to keep track of some control flow |
| information. Later, the basic block data structures are created, and they |
| represent the control flow. (And at that point the extended nodes are |
| discarded.) |
| \begin{definition}[Label] |
| A \emph{label} is a marker that is used to refer to extended |
| nodes. It is used only temporarily during CFG construction. |
| \end{definition} |
| |
| % I agree, we should keep this in mind in the future. |
| %\begin{workinprogress} |
| %I find it useful to never name phases ``one'', ``two'', ``three'', but |
| %always to give them English names. This makes the meaning clearer to |
| %readers --- often because it forces the writer to think harder about the |
| %meaning of each. It can still be useful to number the phases even after |
| %they have more mnemonic names. |
| %\end{workinprogress} |
| |
| |
| The process of translating an AST to a CFG proceeds in three distinct phases. |
| \begin{enumerate} |
| \item \textbf{Phase one.} In the first phase, a single linear |
| sequence of extended nodes is created. The control flow is |
| implicitly assumed to be sequential through the sequence of |
| extended nodes, until a (conditional or unconditional) jump is |
| encountered in an if, for, while, or switch statement, in which |
| case the jump target decides where execution proceeds. |
| |
| The labels used as targets of jumps are associated with positions |
| in this sequence and are managed by maintaining a binding function |
| from labels to sequence positions. The advantage of having this |
| indirection is that one can create a label and associate with the |
| next free position in the sequence, without knowing which exact |
| extended node will be placed there. Furthermore, labels can be |
| created and used before they are actually bound to their correct |
| position in the sequence (e.g., when that position is not yet |
| known). At the end, the binding function can be used to resolve |
| labels to extended nodes. |
| |
| Furthermore, phase one also computes a mapping from AST tree |
| elements to nodes, as well as a set of leaders. A \emph{leader} is |
| an extended node for which one of the following conditions |
| applies: |
| \begin{itemize} |
| \item It is the first extended node in the sequence. |
| \item It is the target of a jump (i.e. there is a label bound to |
| the location of the node in the sequence). |
| \item It is the first node following a jump. |
| \end{itemize} |
| |
| \item \textbf{Phase two.} Phase two translates the linear |
| representation to a control-flow graph by performing the |
| following transformations: |
| \begin{itemize} |
| \item Simple extended nodes are translated to regular basic |
| blocks, where multiple nodes can be grouped in one regular |
| basic block. |
| \item Exception extended nodes are translated to exception |
| basic blocks with the correct edges. |
| \item Unconditional jumps are replaced with edges between the |
| correct basic blocks. |
| \item Conditional jumps are replaced by a conditional basic |
| block. |
| \end{itemize} |
| To greatly simplify the implementation, phase two is allowed to |
| produce a degenerated control-flow graph. In particular, the |
| following deficiencies are possible: |
| \begin{itemize} |
| \item Regular basic blocks might be empty. |
| \item Some conditional basic blocks might be unnecessary, in that |
| they have the same target for both the `then' as well as the |
| `else' branch. |
| \item Two consecutive, non-empty, regular basic blocks can exist, |
| even if the second block has only exactly one predecessor and |
| the two blocks could thus be merged. |
| \end{itemize} |
| \item \textbf{Phase three.} In the third and last phase, the |
| control-flow graph is transformed such that the deficiencies |
| remaining from phase two are removed. It is ensured that |
| removing one kind of deficiency does not create another |
| degenerate case. |
| \end{enumerate} |
| |
| |
| |
| \subsubsection{Desugaring} |
| \label{sec:desugaring} |
| |
| Desugaring means replacing complicated source language constructs by |
| simpler ones, or removing syntactic sugar from an input program. |
| Originally, we intended for the control flow graph representation to |
| be as close as possible to the Java abstract syntax tree to simplify |
| the mapping from tree to CFG node and back and to reuse existing |
| checker code written in terms of trees. However, we ran into several |
| cases that were better handled by desugaring. |
| |
| \begin{itemize} |
| \item We decided to represent implicit conversion operations like |
| boxing, unboxing, widening, and narrowing as explicit CFG nodes |
| because they change the type of a value. For example, implicit |
| unboxing of an \code{Integer} will be translated into a call to |
| \code{Integer.intValue}. The pre-conversion type can be |
| associated with the original node and the post-conversion type |
| can be associated with the explicit conversion node. It also |
| makes it possible for the transfer function to operate on the |
| conversion nodes. |
| |
| \item Enhanced for loops are defined in terms of a complicated |
| translation into simpler operations, including field accesses, |
| branches, and method calls that could affect dataflow |
| information. It would be prohibitively difficult for a checker |
| writer to write a transfer function that correctly accounted for |
| all of those operations, so we desugar enhanced for loops. |
| |
| \item Once we decided to make conversion nodes explicit it made |
| sense to desugar compound assignments. A compound assignment |
| like \begin{verbatim}Integer i; i += 3;\end{verbatim} performs |
| both an unboxing and a boxing operation on \code{i}. Desugaring |
| all compound assignments greatly reduced the total number of |
| node classes. |
| |
| \end{itemize} |
| |
| In order to desugar code and still maintain the invariant that every |
| CFG node maps to a tree, we needed to create new AST tree nodes that |
| were not present in the input program. Javac allows us to do this |
| through non-supported APIs and we wrote some utility classes in |
| \code{javacutil} to make the process easier. The new trees are |
| created during CFG building and they persist as long as some CFG node |
| refers to them. However, the trees are not inserted into the AST, so |
| they are not type-checked or seen by other tree visitors. Their main |
| purpose is to carry Java types and to satisfy AnnotatedTypeFactory |
| methods. |
| |
| A further complication is that these newly-introduced AST trees are |
| not part of the TreePath when visiting the AST. We work around this |
| problem by giving the AnnotatedTypeFactory a mapping, called the |
| \code{pathHack}, from newly-introduced trees to their containing |
| MethodTree and ClassTree. |
| |
| Possibly even worse, we needed to create fake symbols for variables |
| created when desugaring enhanced for loops. Javac does not expose the |
| ability to create a symbol, so we created a new subclass of |
| Symbol.VarSymbol called \code{javacutil.tree.DetachedVarSymbol} for |
| this purpose. AnnotatedTypeFactory explicitly checks for |
| DetachedVarSymbols in its DeclarationFromElement method. |
| |
| |
| |
| \subsubsection{Conversions and node-tree mapping} |
| \label{sec:conversions} |
| |
| As mentioned in \autoref{sec:desugaring}, we represent implicit Java |
| type conversions such as boxing, unboxing, widening, and narrowing by |
| explicit CFG nodes. This means that some AST tree nodes correspond to |
| multiple CFG nodes: a pre-conversion node and a post-conversion node. |
| We will describe how the conversions work and how the node-tree |
| mappings are implemented. |
| |
| Boxing and unboxing are represented in terms of calls to Java standard |
| library methods. Boxing corresponds to a call to |
| \code{BoxedClass.valueOf} while unboxing corresponds to a call to |
| \code{BoxedClass.\*Value}. This allows annotations on the library |
| methods, as well as transfer functions for method invocations, to |
| apply to the conversions with no special work on the part of a checker |
| developer. |
| |
| Widening and narrowing conversions are still represented as special |
| node types, although it would be more consistent to change them into |
| type casts. |
| |
| \begin{workinprogress} |
| Is the last point a to-do item? |
| \end{workinprogress} |
| |
| We maintain the invariant that a CFG node maps to zero or one AST tree |
| and almost all of them map to a single tree. But we can't maintain a |
| unique inverse mapping because some trees have both pre- and |
| post-conversion nodes. Instead, we remember two mappings, one from |
| tree to pre-conversion node and, for those trees that were converted, |
| one from tree to post-conversion node. Both the CFGBuilder and the |
| ControlFlowGraph store two separate mappings. The Analysis class |
| explicitly stores the tree to pre-conversion node mapping as |
| \code{treeLookup} and it indirectly uses the tree to post-conversion |
| mapping in \code{Analysis.getValue(Tree)}. This has effectively |
| hidden the distinction between pre and post-conversion nodes from the |
| Checker Framework, but in the long run it may be necessary to expose |
| it. |
| |
| |
| |
| \section{Dataflow Analysis} |
| |
| This section describes how the dataflow analysis over the control-flow |
| graph is performed and how to implement a particular analysis. |
| |
| Roughly, a dataflow analysis in the framework works as follows. Given |
| the abstract syntax tree of a method, the framework computes the |
| corresponding control-flow graph as described in |
| \autoref{sec:cfg}. Then, a simple forward or backward iterative |
| algorithm is used to compute a fix-point, by iteratively applying a |
| set of transfer functions to the nodes in the CFG\@. These transfer |
| functions are specific to the particular analysis and are used to |
| approximate the runtime behavior of different statements and expressions. |
| |
| |
| \subsection{Managing Intermediate Results of the Analysis} |
| \label{sec:node-mapping} |
| \label{sec:store-management} |
| |
| |
| Conceptually, the dataflow analysis computes an abstract value for |
| every node and flow expression\footnote{Certain dataflow analyses |
| might choose not to produce an abstract value for every node. For |
| instance, a constant propagation analysis would only be concerned |
| with nodes of a numerical type, and could ignore other nodes.}. The |
| transfer function (\autoref{sec:transfer-fnc}) produces these abstract |
| values, taking as input the abstract values computed earlier for |
| sub-expressions. For instance, in a constant propagation analysis, |
| the transfer function for addition (\code{+}) would look at the |
| abstract values for the left and right operand, and determine that the |
| \code{AdditionNode} is a constant if and only if both operands are |
| constant. |
| |
| An analysis result contains two parts: |
| |
| \begin{enumerate} |
| \item |
| The \emph{node-value mapping} (\code{Analysis.nodeValues}) maps \code{Node}s to their abstract |
| values. Only nodes that can take on an abstract value are |
| used as keys. For example, in the Checker Framework, the mapping is |
| from expression nodes to annotated types. |
| |
| The framework consciously does not store the abstract value |
| directly in the node, to remove any coupling between the control-flow |
| graph and a particular analysis. This allows the control-flow graph |
| to be constructed only once, and then reused for different dataflow |
| analyses. |
| |
| \item |
| A set of \emph{stores}. Each store maps a flow expression to an |
| abstract value. Each store is associated with a specific program point. |
| |
| The stores tracked by an analysis implement the \code{Store} |
| interface, which defines the following operations: |
| \begin{itemize} |
| \item Least upper bound: Compute the least upper bound of two stores |
| (e.g., at a merge-point in the control-flow graph). |
| \item Equivalence: Compare two stores if they are (semantically) |
| different, which is used to determine if a fix-point is reached in |
| the dataflow analysis. Note that reference-equality is most likely |
| not sufficient. |
| \item Copy mechanism: Clone a store to get an exact copy. |
| \end{itemize} |
| The store is analysis-dependent, but the framework provides a default |
| store implementation which can be reused. The default implementation |
| is |
| \begin{verbatim}org.checkerframework.framework.flow.CFStore\end{verbatim} |
| |
| What information is tracked in the store depends on the analysis to be |
| performed. Some examples of stores include |
| \begin{verbatim} |
| org.checkerframework.checker.initialization.InitializationStore |
| org.checkerframework.checker.nullness.NullnessStore |
| \end{verbatim} |
| |
| Every store is associated with a particular point in the control-flow |
| graph, and all stores are managed by the framework. It saves an explicit store |
| for the start of each basic block. |
| When dataflow information |
| is requested for a later point in a block, the analysis applies the |
| transfer function to recompute it from the initial store. |
| |
| \end{enumerate} |
| |
| After an analysis has iterated to a fix-point, the computed dataflow |
| information is maintained in an AnalysisResult, which can map either |
| nodes or trees to abstract values. |
| |
| |
| \subsection{Answering Questions} |
| \label{sec:answering-questions} |
| After the flow analysis for a particular method has been computed, |
| there are two kinds of information that have been computed. Firstly, |
| the node-value mapping stores an abstract value for every node, and |
| secondly, the information maintained in various stores is available. |
| |
| Two kinds of queries are possible to the dataflow analysis after the |
| analysis is complete: |
| \begin{enumerate} |
| \item For a given AST tree node, what is its abstract value? Both |
| pre- and postconversion values can be retrieved. |
| A discussion of conversions can be found in \autoref{sec:conversions}. |
| \item For a given AST tree node, what is the state right after |
| this AST tree node? Examples of questions include: |
| \begin{itemize} |
| \item Which locks are currently held? |
| \item Are all fields of a given object initialized? |
| \end{itemize} |
| \end{enumerate} |
| |
| The store may first need to be (re-)computed, as the framework does not |
| store all intermediate stores but rather only those for key positions |
| as described in \autoref{sec:store-management}. |
| |
| To support both kinds of queries, the framework builds a map from AST |
| tree nodes (of type \code{com.sun.source.tree.Tree}) to CFG nodes. To |
| answer questions of the first type it is then possible to go from the |
| AST tree node to the CFG node and look up its abstract value in the |
| node-value mapping (this is provided by the framework). By default, |
| the abstract value returned for a tree by |
| \code{Analysis.getValue(Tree)} includes any implicit conversions |
| because it uses the mapping from tree node to post-conversion CFG |
| node. To request the pre-conversion value, one currently uses the |
| \code{ControlFlowGraph.treelookup} map directly. |
| |
| To support questions of the second kind, every node has a reference to |
| the basic block it is part of. Thus, for a given AST tree node, the |
| framework can determine the CFG node and thereby the CFG basic block, |
| and compute the necessary store to answer the question. |
| |
| |
| \subsection{Transfer Function} |
| \label{sec:transfer-fnc} |
| |
| A transfer function is an object that has a transfer method for |
| every \code{Node} type, and also a transfer method for procedure entry. |
| |
| \begin{itemize} |
| \item A transfer method for a \code{Node} type takes a store |
| and the node, and produces an updated store. This is achieved by |
| implementing the \code{NodeVisitor<S, S>} interface for the store |
| type \code{S}. |
| |
| These transfer methods also get access to the abstract value of any |
| sub-node of the node \code n under consideration. This is not limited |
| to immediate children, but the abstract value for any node contained |
| in \code n can be queried. |
| |
| \item A transfer method for procedure entry returns the initial store, given the |
| list of parameters (as \code{LocalVariableNode}s that represent the formal |
| parameters) and the |
| \code{MethodTree} (useful if the initial store depends on the procedure |
| signature, for instance). |
| |
| \end{itemize} |
| |
| |
| \subsection{Flow Rules} |
| \label{sec:flow-rules} |
| |
| As mentioned in \autoref{sec:cond-stores}, dataflow analysis |
| conceptually maintains two stores for each program point, a then store |
| containing information valid when the previous boolean-valued |
| expression was true and an else store containing information valid |
| when the expression was false. In many cases, there is only a single |
| store because there is no boolean-valued expression to split on or |
| there was an expression, but it yielded no useful dataflow |
| information. However, any CFG edge may potentially have a predecessor |
| with two stores and a successor with two stores. |
| |
| We could simply propagate information from both predecessor stores to |
| both successor stores, but that would throw away useful information, |
| so we define five flow rules that allow more precise propagation. |
| |
| \begin{itemize} |
| \item \code{EACH_TO_EACH} is the default rule for an edge with a |
| predecessor that is not a \code{ConditionalBlock}. It |
| propagates information from the then store of the predecessor to |
| the then store of the successor and from the else store of the |
| predecessor to the else store of the successor. |
| \item \code{THEN_TO_BOTH} is the default rule for an edge from a |
| \code{ConditionalBlock} to its then successor. It propagates |
| information from the then store of its predecessor to both the |
| then and else stores of the then successor, thereby splitting |
| the conditional store to take advantage of the fact that the |
| condition is known to be true. |
| \item \code{ELSE_TO_BOTH} is the default rule for an edge from a |
| \code{ConditionalBlock} to its else successor. It propagates |
| information from the else store of its predecessor to both the |
| then and else stores of the else successor, thereby splitting |
| the conditional store to take advantage of the fact that the |
| condition is known to be false. |
| \item \code{THEN_TO_THEN} is a special rule for a short-circuit edge |
| from a \code{ConditionalBlock} to its then successor. It only |
| propagates information from the then store of its predecessor to |
| the then store of its successor. This flow rule is used for |
| conditional or expressions because the else store of \code{a || |
| b} is not influenced by the then store of \code{a}. |
| \item \code{ELSE_TO_ELSE} is a special rule for a short-circuit edge |
| from a \code{ConditionalBlock} to its else successor. It only |
| propagates information from the else store of its predecessor to |
| the else store of its successor. This flow rule is used for |
| conditional and expressions because the then store of \code{a && |
| b} is not influenced by the else store of \code{a}. |
| \end{itemize} |
| |
| Note that the more precise flow rules \code{THEN_TO_THEN} and |
| \code{ELSE_TO_ELSE} improve the precision of the store they do not |
| write to. In other words, \code{THEN_TO_THEN} yields a more precise |
| else store of its successor by not propagating information to the else |
| store which might conflict with information already there, and |
| conversely for \code{ELSE_TO_ELSE}. |
| See \autoref{sec:cond-exp} for more details and an example. |
| |
| Currently, we only use flow rules for short-circuiting edges of |
| conditional ands and ors. The CFG builder sets the flow rule of each |
| short-circuiting edge as it builds the CFG for the conditional and/or |
| expression. |
| |
| The dataflow analysis logic requires that both the then and the else |
| store of each block contain some information before the block is |
| analyzed, so it is a requirement that at least one predecessor block |
| writes the then store and at least one writes the else store. |
| |
| |
| \subsection{Concurrency} |
| |
| By default, the Dataflow Framework analyzes the code as if it is |
| executed sequentially. This is unsound if the code is run |
| concurrently. Use the \code{-AconcurrentSemantics} command-line |
| option to enable concurrent semantics. |
| |
| In the concurrent mode, the dataflow analysis cannot infer any |
| local information for fields. This is because after a local update, |
| another thread might change the field's value before the next use. |
| |
| An exception to this are monotonic type properties, such as the |
| \code{@MonotonicNonNull} annotation of the nullness type system. The |
| meta-annotation \code{@MonotonicQualifier} declares that a qualifier |
| behaves monotonically, however it is not yet used to preserve dataflow |
| information about fields under concurrent semantics. |
| |
| |
| \section{Example: Constant Propagation} |
| |
| As a proof-of-concept, I (Stefan) implemented a constant propagation |
| analysis for local variables and integer values. The main class is |
| \code{org.checkerframework.dataflow.cfg.playground.ConstantPropagationPlayground}. I |
| describe the most important aspects here. |
| |
| \textbf{Abstract values.} A class \code{Constant} is used as an |
| abstract value, which can either be \emph{top} (more than one integer |
| value seen), \emph{bottom} (no value seen yet), or \emph{constant} |
| (exactly one value seen; in which case the value is also stored). |
| |
| \textbf{The store.} The store maps \code{Node}s to \code{Constant}, |
| where only \code{LocalVariableNode}s and \code{IntegerLiteralNode}s |
| are used as keys. Only those two nodes actually are of interest |
| (there is no addition/multiplication/etc. yet, and other constructs |
| like fields are not yet supported by the analysis). |
| |
| Two different instances of \code{LocalVariableNode} can be uses of the |
| same local variable, and thus the \code{equals} method has been |
| implemented accordingly. Therefore, every local variable occurs at |
| most once in the store, even if multiple (equal) |
| \code{LocalVariableNode}s for it exist. |
| |
| \textbf{The transfer function.} The transfer function is very |
| simple. The initial store contains \emph{top} for all parameters, as |
| any value could have been passed in. When an integer literal is |
| encountered, the store is extended to indicate what abstract value |
| this literal stands for. Furthermore, for an assignment, if the |
| left-hand side is a local variable, the transfer function updates its |
| abstract value in the store with the abstract value of the right-hand |
| side (which can be looked up in the store). |
| |
| To illustrate how we can have different information in the then and |
| else block of a conditional, I also implemented another transfer |
| function that considers the \code{EqualToNode}, and if it is of the |
| form \code{a == e} for a local variable \code{a} and constant |
| \code{e}, passes the correct information to one of the branches. This |
| is also shown in \autoref{fig:ConstSimple}. |
| |
| \text{Example.} A small example is shown in \autoref{fig:ConstSimple}. |
| |
| \flow{ConstSimple}{.45}{1}{Simple sequential program to illustrate constant |
| propagation. Intermediate analysis results are shown.} |
| |
| |
| \section{Example: Live Variable} |
| |
| A live variable analysis for local variables and fields was implemented |
| to show the backward analysis works properly. The main class is |
| \code{org.checkerframework.dataflow.cfg.playground.LiveVariablePlayground}. |
| |
| \textbf{Abstract values.} A class \code{LiveVarValue} is a live |
| variable (which is represented by a node) wrapper turning node into |
| abstract value. A node can be \code{LocalVariableNode} or \code{FieldAccessNode}. |
| |
| \textbf{The store.} The live variable store \code{LiveVarStore} has a field |
| \code{liveVarValueSet} which contains a set of \code{LiveVarValue}s. Only |
| \code{LocalVariableNode} or \code{FieldAccessNode} will be considered as a |
| live variable and added to \code{liveVarValueSet}. The store defines methods |
| \code{putLiveVar(LiveVarValue)} and \code{killLiveVar(LiveVarValue)} to add |
| and kill live variables. |
| |
| \textbf{The transfer function.} The transfer function \code{LiveVarTransfer} |
| initializes empty stores at normal and exceptional exit blocks (because this |
| is a backward transfer function). The transfer function visits assignments to |
| update the information of live variables for each node in the stores. |
| |
| \textbf{Example.} An example is shown in \autoref{fig:LiveSimple}. |
| |
| \flow{LiveSimple}{.33}{1}{Simple sequential program to illustrate live variable. Intermediate analysis results are shown.} |
| |
| |
| \section{Default Analysis} |
| |
| |
| \subsection{Overview} |
| |
| The default flow-sensitive analysis \code{org.checkerframework.framework.flow.CFAnalysis} |
| works for any checker defined in the |
| Checker Framework. This generality is both a strength and a weakness |
| because the default analysis can always run but the facts it can |
| deduce are limited. The default analysis is extensible so checkers |
| can add logic specific to their own qualifiers. |
| |
| The default flow-sensitive analysis takes advantage of several forms |
| of control-flow to improve the precision of type qualifiers. It |
| tracks assignments to flow expressions, propagating type qualifiers |
| from the right-hand side of the assignment. It considers equality and |
| inequality tests to propagate the most precise qualifiers from the |
| left or right-hand side to the true (resp. false) successor. It also |
| applies type qualifiers from method postconditions after calls. |
| |
| \begin{workinprogress} |
| Preconditions are not mentioned at all in this manual. How are they handled? |
| \end{workinprogress} |
| |
| |
| \subsection{Interaction of the Checker Framework and the Dataflow Analysis} |
| \label{sec:flow-cf-interaction} |
| |
| This section describes how the dataflow analysis is integrated into the |
| Checker Framework to enable flow-sensitive type checking. |
| |
| A main purpose of a type factory is to create an \code{AnnotatedTypeMirror} |
| based on an input tree node. Using the results of the dataflow analysis, |
| the type factory can return a more refined type than otherwise possible. |
| |
| Type factories that extend from \code{GeneralAnnotatedTypeFactory} |
| and set the constructor parameter \code{useFlow} to true will automatically |
| run dataflow analysis and use the result of the analysis when creating an |
| \code{AnnotatedTypeMirror}. The first time that a \code{GenericAnnotatedTypeFactory} |
| instance visits a \code{ClassTree}, the type factory runs the dataflow analysis on all the field initializers of the class first, then the bodies of methods in |
| the class, and then finally the dataflow analysis is |
| ran recursively on the members of nested classes. The result of |
| dataflow analysis are stored in the \code{GenericAnnotatedTypeFactory} instance. |
| |
| When creating an \code{AnnotatedTypeMirror} for a tree node, the type factory |
| queries the result of the dataflow analysis to determine if a more refined |
| type for the tree node was inferred by the analysis. This is the first |
| type of query described in \autoref{sec:answering-questions}. |
| |
| \begin{workinprogress} |
| I found the below section confusing. I had a hard time putting my finger |
| on it, but perhaps you could re-read the section. As one minor issue, |
| ``very similar'': similar to what? ``intermediary nodes'': what are |
| those? ``we'': is that the analysis writer or the framework implementor |
| (or the runtime system)? |
| \end{workinprogress} |
| |
| Dataflow itself uses the type factory to get the initial |
| \code{AnnotatedTypeMirror} for a tree node in the following way. |
| |
| For a given node \code{n} |
| \begin{itemize} |
| \item |
| If it has no corresponding AST tree node, use ``top'' as its |
| abstract value. |
| \item |
| If it has a corresponding AST tree node, ask the |
| \code{AnnotatedTypeFactory} about the type of the tree node. The |
| factory will then use its checker-dependent logic to compute this |
| type. A typical implementation will look at the type of sub-trees |
| and compute the overall type based on the information about these |
| sub-trees. |
| |
| Note that the factory recursively uses information provided by the |
| flow analysis to determine the types of sub-trees. There is a |
| check in \code{Analysis.getValue} that the node whose type is |
| being requested is a sub-node of the node to which the transfer |
| function is currently being applied. For other nodes, the |
| analysis will return \code{null} (i.e., no information) and the |
| factory will return the flow-insensitive annotated type. |
| |
| \end{itemize} |
| |
| The \code{AnnotatedTypeFactory} and helper classes need to be prepared to work |
| even when dataflow analysis is not finished yet. Code should either check |
| whether dataflow analysis is still in progress (using |
| \code{analysis.isRunning()}) or handle possible null values. The factory should |
| return conservative, flow-insensitive types if the analysis is still in |
| progress. |
| |
| |
| \subsection{The Checker Framework Store and Dealing with Aliasing} |
| |
| \begin{workinprogress} |
| Word of caution: The exact rules of what information is |
| retained may or may not be implemented exactly as described |
| here. This is a good starting point in any case, but if very |
| precise information is needed, then the source code is very |
| readable and well documented. |
| \end{workinprogress} |
| |
| The Dataflow Framework provides a default implementation of a store |
| with the class \code{CFAbstractStore}, which is used (as |
| \code{CFStore}) as the default store if a checker does not provide its |
| own implementation. This implementation of a store tracks the |
| following information: |
| |
| \begin{itemize} |
| \item Abstract values of local variables. |
| \item Abstract values of fields where the receiver is an |
| access sequence compose of the following: |
| \begin{itemize} |
| \item Field access. |
| \item Local variable. |
| \item Self reference (i.e., \code{this}). |
| \item Pure or non-pure method call. |
| \end{itemize} |
| \end{itemize} |
| |
| The most challenging part is ensuring that the information about field |
| accesses is kept up to date in the face of incomplete aliasing |
| information. In particular, at method calls and assignments care |
| needs to be taken about which information is still valid afterwards. |
| |
| The store maintains a mapping from field accesses (as defined in |
| Section~\ref{sec:field-access}) to abstract values, and the subsequent |
| sections describe the operations that keep this mapping up-to-date. |
| |
| \begin{workinprogress} |
| There hasn't been a good introduction of pure vs. non-pure methods. |
| Maybe this is a good location for a discussion? |
| Introduce the purity annotations somewhere. |
| \end{workinprogress} |
| |
| |
| \subsubsection{Internal Representation of field access} |
| \label{sec:field-access} |
| |
| To keep track of the abstract values of fields, the Dataflow Framework |
| uses its own representation of field accesses (that is different from |
| the \code{Node} type introduced earlier). This data type is defined |
| inductively as follows: |
| |
| \begin{bnfgrammar} |
| \production{\nonterminal{FieldAccess}} |
| {\nonterminal{Receiver} \nonterminal{Field} \qquad Java field (identified by its \code{Element})} |
| \production{\nonterminal{Receiver}} |
| {\nonterminal{SelfReference} \qquad \code{this}} |
| \altline{\nonterminal{LocalVariable} \qquad local variable (identified by its \code{Element})} |
| \altline{\nonterminal{FieldAccess}} |
| \altline{\nonterminal{MethodCall} \qquad Java method call of a method} |
| \altline{\nonterminal{Unknown} \qquad any other Java expression} |
| \production{\nonterminal{MethodCall}} |
| {\nonterminal{Receiver} \literal{.} \nonterminal{Method} |
| \literal{(} \nonterminal{Receiver}$^{,*}$ \literal{)}} |
| \end{bnfgrammar} |
| |
| \nonterminal{Unknown} is only used to determine which |
| information needs to be removed (e.g., after an assignment), but no field |
| access that contains \nonterminal{Unknown} is stored in the mapping to |
| abstract values. For instance, \nonterminal{Unknown} could stand for |
| a non-pure method call, an array access, or a ternary expression. |
| |
| |
| \subsubsection{Updating Information in the Store} |
| |
| \newcommand{\alias}{\operatorname{might\_alias}} |
| |
| In the following, let $o$ be any \nonterminal{Receiver}, $x$ a local |
| variable, $f$ a field, $m$ a pure method, and $e$ an expression. |
| Furthermore, we assume to have access to a predicate $\alias(o_1,o_2)$ |
| that returns true if and only if~$o_1$ might alias~$o_2$; see |
| Section~\ref{sec:alias}. |
| |
| \subsubsubsection{At Field Assignments} |
| |
| For a field update of the form $o_1.f_1 = e$, the |
| dataflow analysis first determines the abstract value $e_\text{val}$ for $e$. |
| Then, it updates the store $S$ as follows. |
| \begin{enumerate} |
| \item For every field access $o_2.f_2$ that is a key in $S$, remove its |
| information if $f_1 = f_2$ and |
| $o_2$ lexically contains a \nonterminal{Receiver} that \emph{might} |
| alias $o_1.f$ as determined by the $\alias$ |
| predicate. |
| Note that the ``lexically contains'' notion for pure method calls |
| includes both the receiver as well as the arguments. |
| |
| This includes the case where $o_1 = o_2$ (that is, they are |
| syntactically the same) and the case where $o_2$ and $o_1.f$ might be |
| aliases. |
| |
| \begin{workinprogress} |
| Should the two occurrences of field $f$ in this paragraph be $f_1$? |
| \end{workinprogress} |
| |
| \item $S[o_1.f_1] = e_\text{val}$ |
| \end{enumerate} |
| |
| |
| \subsubsubsection{At Local Variable Assignments} |
| |
| For a local variable assignment of the form $x = e$, |
| the dataflow analysis first determines the abstract value $e_\text{val}$ for |
| $e$. |
| Then, it updates the store $S$ as follows. |
| \begin{enumerate} |
| \item For every field access $o_2.f_2$ that is a key in $S$, |
| remove its information if $o_2$ lexically contains a |
| \nonterminal{Receiver} that \emph{might} alias $x$ as determined |
| by the $\alias$ predicate. |
| \item $S[x] = e_\text{val}$ |
| \end{enumerate} |
| |
| \subsubsubsection{At Other Assignments} |
| |
| For any other assignment $z = e$ where the |
| assignment target $z$ is not a field or local variable, |
| the dataflow analysis first determines the abstract value $e_\text{val}$ for |
| $e$. |
| Then, it updates the store $S$ as follows. |
| \begin{enumerate} |
| \item For every field access $o_2.f_2$, remove its information if |
| $o_2$ lexically contains a \nonterminal{Receiver} that \emph{might} |
| alias $z$ as determined by the $\alias$ predicate. |
| \end{enumerate} |
| |
| |
| \subsubsubsection{At Non-Pure Method Calls} |
| |
| A non-pure method call might modify the value of any field arbitrarily. |
| Therefore, at a method call, any information about fields is lost. |
| |
| |
| \subsubsubsection{Alias Information} |
| \label{sec:alias} |
| |
| The Checker Framework does not include an aliasing analysis, which could |
| provide precise aliasing information. For this reason, we implement the |
| predicate $\alias$ as follows: |
| \[ \alias(o_1,o_2) := |
| \left(\operatorname{type}(o_1) <: \operatorname{type}(o_2) |
| \;\;\text{or}\;\; |
| \operatorname{type}(o_2) <: \operatorname{type}(o_1) \right) \] |
| where $\operatorname{type}(o)$ determines the Java type of a reference $o$ |
| and $<:$ denotes standard Java subtyping. |
| |
| |
| %%% Local Variables: |
| %%% mode: latex |
| %%% TeX-master: "dataflow" |
| %%% TeX-command-default: "PDF" |
| %%% End: |
| |
| % LocalWords: pre cfg javacutils InternalUtils DivideByZero BlockImpl se |
| % LocalWords: SingleSuccessorBlock RegularBlock SingleSuccessorBlockImpl |
| % LocalWords: ExceptionBlock SpecialBlock ConditionalBlock SpecialBlocks |
| % LocalWords: ControlFlowGraph ControlFlowGraphs CFGBuilder ast CFValue |
| % LocalWords: JavaExpressions AbstractValue AbstractValues PolyNull Poly |
| % LocalWords: AnnotatedTypeMirrors CFAbstractValue NullnessValue CFStore |
| % LocalWords: NullnessCheckers CFAbstractStore InitializationStore Regex |
| % LocalWords: NullnessStore TransferInput TransferResult TransferInputs |
| % LocalWords: ConditionalTransferResult RegularTransferResult CFTransfer |
| % LocalWords: TransferFunction NodeVisitor TransferResults isRegex lst |
| % LocalWords: CFAbstractTransfer AbstractNodeVisitor asRegex ClassTree |
| % LocalWords: InitializationTransfer visitMethodInvocation RegexTransfer |
| % LocalWords: CFAbstractAnalysis NullnessTransfer RegexChecker's lp Uber |
| % LocalWords: AnalysisResult AnnotatedTypeFactory AnnotatedTypeFactorys |
| % LocalWords: AnnotationProvider getAnnotatedType BaseTypeChecker |
| % LocalWords: GenericAnnotatedTypeFactory FlowAnalysis CFAnalysis |
| % LocalWords: BaseAnnotatedTypeFactory CFGSimple ValueLiteral txt |
| % LocalWords: BooleanLiteral CharacterLiteral DoubleLiteral FloatLiteral |
| % LocalWords: IntegerLiteral LongLiteral NullLiteral ShortLiteral args |
| % LocalWords: StringLiteral Accessor ArrayAccess FieldAccess This |
| % LocalWords: MethodAccess ExplicitThis ImplicitThis java NullAway |
| % LocalWords: LocalVariable MethodInvocation BitwiseAnd BitwiseOr MyType |
| % LocalWords: BitwiseComplement BitwiseXor ConditionalAnd ConditionalNot |
| % LocalWords: ConditionalOr FloatingDivision FloatingRemainder LeftShift |
| % LocalWords: IntegerDivision IntegerRemainder NumericalAddition EqualTo |
| % LocalWords: NumericalMinus NumericalMultiplication NumericalPlus util |
| % LocalWords: NumericalSubtraction SignedRightShift StringConcatenate |
| % LocalWords: TernaryExpression UnsignedRightShift NotEqual GreaterThan |
| % LocalWords: GreaterThanOrEqual LessThan LessThanOrEqual ArrayCreation |
| % LocalWords: StringConcatenateAssignment ObjectCreation TypeCast cond |
| % LocalWords: InstanceOf instanceof NarrowingConversion StringConversion |
| % LocalWords: WideningConversion ArrayType ParameterizedType ClassName |
| % LocalWords: PrimitiveType PackageName AssertionError NullChk accessor |
| % LocalWords: VariableDeclaration desugared desugaring FieldAssignment |
| % LocalWords: CFGFieldAssignment getRegularStore CFGConditionalOr fnc |
| % LocalWords: CFGConditionalOr2 ConditionalOrNode ConditionalOr2 valueOf |
| % LocalWords: IdentifierTree FieldAccessNode ImplicitThisNode unboxing |
| % LocalWords: TreePath pathHack MethodTree VarSymbol DetachedVarSymbols |
| % LocalWords: DeclarationFromElement BoxedClass treeLookup getValue |
| % LocalWords: nodeValues AdditionNode postconversion treelookup desugar |
| % LocalWords: LocalVariableNode AconcurrentSemantics MonotonicNonNull |
| % LocalWords: MonotonicQualifier IntegerLiteralNode EqualToNode |
| % LocalWords: ConstSimple SelfReference PureMethodCall javacutil |
| % LocalWords: stubparser TreeParser TreeBuilder DetachedVarSymbol LiveSimple |