blob: 6c4e6a8e2a7a749e3c26c7834468bc27a47547f4 [file] [log] [blame]
\htmlhr
% Reinstate when lists are supported:
% \chapterAndLabel{Index Checker for sequence bounds (arrays, strings, lists)}{index-checker}
\chapterAndLabel{Index Checker for sequence bounds (arrays and strings)}{index-checker}
The Index Checker warns about potentially out-of-bounds accesses to sequence
data structures, such as arrays
% , lists,
and strings.
The Index Checker prevents \<IndexOutOfBoundsException>s that result from
an index expression that might be negative or might be equal to or larger
than the sequence's length.
It also prevents \<NegativeArraySizeException>s that result from a negative
array dimension in an array creation expression.
(A caveat: the Index Checker does not check for arithmetic overflow. If
an expression overflows, the Index Checker might fail to warn about a
possible exception. This is unlikely to be a problem in practice unless
you have an array whose length is \<Integer.MAX\_VALUE>.)
% Here's a pathological example of overflow leading to unsoundness:
%
% public class IndexOverflow {
% public static void main(String[] args) {
% @Positive int x = 1073741825; // 2 ^ 30 + 1
% @Positive int x2 = x + x; // 2 ^ 31 + 2 == - 2 ^ 31 + 2
% int[] a = new int[0];
% if (x2 < a.length) {
% a[x2] = 42;
% }
% }
% }
The programmer can write annotations that indicate which expressions are
indices for which sequences. The Index Checker prohibits any operation that
may violate these properties, and the Index Checker takes advantage of
these properties when verifying indexing operations.
%
Typically, a programmer writes few annotations, because the Index Checker
infers properties of indexes from
the code around them. For example, it will infer that \<x> is positive
within the \<then> block of an \code{if (x > 0)} statement.
The programmer does need to write field types and method pre-conditions or post-conditions. For instance,
if a method's formal parameter is used as an index for
\<myArray>, the programmer might need to
write an \refqualclasswithparams{checker/index/qual}{IndexFor}{"myArray"}
annotation on the formal parameter's types.
The Index Checker checks fixed-size data structures, whose size is never
changed after creation. A fixed-size data structure has no \<add> or
\<remove> operation. Examples are strings and arrays, and you can add
support for other fixed-size data structures (see
Section~\ref{index-annotating-fixed-size}).
To run the Index Checker, run the command
\begin{alltt}
javac -processor index \emph{MyJavaFile}.java
\end{alltt}
Recall that in Java, type annotations are written before the type;
in particular,
array annotations appear immediately before ``\<[]>''.
Here is how to declare a length-9 array of positive integers:
\begin{Verbatim}
@Positive int @ArrayLen(9) []
\end{Verbatim}
Multi-dimensional arrays are similar.
Here is how to declare a length-2 array of length-4 arrays:
\begin{Verbatim}
String @ArrayLen(2) [] @ArrayLen(4) []
\end{Verbatim}
\sectionAndLabel{Index Checker structure and annotations}{index-annotations}
Internally, the Index Checker computes information about integers that
might be indices:
\begin{itemize}
\item
the lower bound on an integer, such as whether it is known to be positive
(Section~\ref{index-lowerbound})
\item
the upper bound on an integer, such as whether it is less than the length
of a given sequence (Section~\ref{index-upperbound})
\item
whether an integer came from calling the JDK's binary search routine on
an array (Section~\ref{index-searchindex})
\item
whether an integer came from calling a string search routine
(Section~\ref{index-substringindex})
\end{itemize}
\noindent
and about sequence lengths:
\begin{itemize}
\item
the minimum length of a sequence, such ``\<myArray> contains at least 3
elements'' (Section~\ref{index-minlen})
\item
whether two sequences have the same length (Section~\ref{index-samelen})
\end{itemize}
The Index Checker checks of all these properties at once, but
this manual discusses each type system in a different section.
There are some annotations that are shorthand for writing multiple
annotations, each from a different type system:
\begin{description}
\item[\refqualclasswithparams{checker/index/qual}{IndexFor}{String[] names}]
The value is a valid index for the named sequences. For example, the
\sunjavadoc{java.base/java/lang/String.html\#charAt(int)}{String.charAt(int)}
method is declared as
\begin{Verbatim}
class String {
char charAt(@IndexFor("this") index) { ... }
}
\end{Verbatim}
More generally, a variable
declared as \<@IndexFor("someArray") int i> has type
\<@IndexFor("someArray") int> and its run-time value is guaranteed to be
non-negative and less than the length of \<someArray>. You could also
express this as
\<\refqualclass{checker/index/qual}{NonNegative}
\refqualclasswithparams{checker/index/qual}{LTLengthOf}{"someArray"}
int i>,
but \<@IndexFor("someArray") int i> is more concise.
\item[\refqualclasswithparams{checker/index/qual}{IndexOrHigh}{String[] names}]
The value is non-negative and is less than or equal to the length of
each named sequence. This type combines
\refqualclass{checker/index/qual}{NonNegative} and
\refqualclass{checker/index/qual}{LTEqLengthOf}.
For example, the
\sunjavadoc{java.base/java/util/Arrays.html\#fill(java.lang.Object\%5B\%5D,int,int,java.lang.Object)}{Arrays.fill}
method is declared as
\begin{mysmall}
\begin{Verbatim}
class Arrays {
void fill(Object[] a, @IndexFor("#1") int fromIndex, @IndexOrHigh("#1") int toIndex, Object val)
}
\end{Verbatim}
\end{mysmall}
\item[\refqualclasswithparams{checker/index/qual}{LengthOf}{String[] names}]
The value is exactly equal to the length of the named
sequences. In the implementation, this type aliases
\refqualclass{checker/index/qual}{IndexOrHigh}, so writing it
only adds documentation (although future versions of the Index Checker
may use it to improve precision).
\item[\refqualclasswithparams{checker/index/qual}{IndexOrLow}{String[] names}]
The value is -1 or is a valid index for
each named sequence. This type combines
\refqualclass{checker/index/qual}{GTENegativeOne} and
\refqualclass{checker/index/qual}{LTLengthOf}.
% Example commented out; IndexOrLow is not sound for indexOf, because "".indexOf("") returns 0
%
% For example, the
% \sunjavadoc{java.base/java/lang/String.html\#indexOf(java.lang.String)}{String.indexOf(String)}
% method is declared as
%
% \begin{Verbatim}
% class String {
% @IndexOrLow("this") int indexOf(String str) { ... }
% }
% \end{Verbatim}
\item[\refqualclass{checker/index/qual}{PolyIndex}]
indicates qualifier polymorphism. This type combines
\refqualclass{checker/index/qual}{PolyLowerBound} and
\refqualclass{checker/index/qual}{PolyUpperBound}.
For a description of qualifier polymorphism, see
Section~\ref{method-qualifier-polymorphism}.
\item[\refqualclass{checker/index/qual}{PolyLength}]
is a special polymorphic qualifier that combines
\refqualclass{checker/index/qual}{PolySameLen} and
\refqualclass{common/value/qual}{PolyValue} from the
Constant Value Checker (see \chapterpageref{constant-value-checker}).
\refqualclass{checker/index/qual}{PolyLength} exists
as a shorthand for these two annotations, since
they often appear together.
\end{description}
\sectionAndLabel{Lower bounds}{index-lowerbound}
The Index Checker issues an error when
a sequence is indexed by an integer that might be negative.
The Lower Bound Checker uses a type system (Figure~\ref{fig-index-int-types}) with the following
qualifiers:
\begin{description}
\item[\refqualclass{checker/index/qual}{Positive}]
The value is 1 or greater, so it is not too low to be used as an index.
Note that this annotation is trusted by the Constant Value Checker,
so if the Constant Value Checker is run on code containing this annotation,
the Lower Bound Checker must be run on the same code in order to
guarantee soundness.
\item[\refqualclass{checker/index/qual}{NonNegative}]
The value is 0 or greater, so it is not too low to be used as an index.
\item[\refqualclass{checker/index/qual}{GTENegativeOne}]
The value is -1 or greater.
It may not be used as an index for a sequence, because it might be too low.
(``\<GTE>'' stands for ``Greater Than or Equal to''.)
\item[\refqualclass{checker/index/qual}{PolyLowerBound}]
indicates qualifier polymorphism.
For a description of qualifier polymorphism, see
Section~\ref{method-qualifier-polymorphism}.
\item[\refqualclass{checker/index/qual}{LowerBoundUnknown}]
There is no information about the value.
It may not be used as an index for a sequence, because it might be too low.
\item[\refqualclass{checker/index/qual}{LowerBoundBottom}]
The value cannot take on any integral types. The bottom type, which
should not need to be written by the programmer.
\end{description}
\begin{figure}
\begin{center}
\hfill
\includeimagenocentering{lowerbound}{5cm}
~~~~\hfill~~~~
\includeimagenocentering{upperbound}{7cm}
\hfill
\end{center}
\caption{The two type hierarchies for integer types used by the Index
Checker. On the left is a type system for lower bounds. On the right
is a type system for upper bounds. Qualifiers written in gray should
never be written in source code; they are used internally by the type
system.
% Using "\\" works for some but not all installations of LaTeX.
\newline
In the Upper Bound type system, subtyping rules depend on both the
array name (\<"myArray">, in the figure) and on the offset (which is 0,
the default, in the figure). Another qualifier is
\refqualclass{checker/index/qual}{UpperBoundLiteral}, whose subtyping
relationships depend on its argument and on offsets for other qualifiers.
}
\label{fig-index-int-types}
\end{figure}
\sectionAndLabel{Upper bounds}{index-upperbound}
The Index Checker issues an error when a sequence index might be
too high. To do this, it maintains information about which expressions are
safe indices for which sequences.
The length of a sequence is \code{arr.length} for arrays and
\code{str.length()} for strings.
It uses a type system (Figure~\ref{fig-index-int-types}) with the following
qualifiers:
It issues an error when a sequence \code{arr}
is indexed by an integer that is not of type \code{@LTLengthOf("arr")}
or \code{@LTOMLengthOf("arr")}.
\begin{description}
\item[\refqualclasswithparams{checker/index/qual}{LTLengthOf}{String[] names, String[] offset}]
An expression with this type
has value less than the length of each sequence listed in \<names>.
The expression may be used as an index into any of those sequences,
if it is non-negative.
For example, an expression of type \code{@LTLengthOf("a") int} might be
used as an index to \<a>.
The type \code{@LTLengthOf(\{"a", "b"\})} is a subtype of both
\code{@LTLengthOf("a")} and \code{@LTLengthOf("b")}.
(``\<LT>'' stands for ``Less Than''.)
\<@LTLengthOf> takes an optional \<offset> element, meaning that the
annotated expression plus the offset is less than the length of the given
sequence. For example, suppose expression \<e> has type \<@LTLengthOf(value
= \{"a", "b"\}, offset = \{"-1", "x"\})>. Then \<e - 1> is less than
\<a.length>, and \<e + x> is less than \<b.length>. This helps to make
the checker more precise. Programmers rarely need to write the \<offset>
element.
\item[\refqualclasswithparams{checker/index/qual}{LTEqLengthOf}{String[] names}]
An expression with this type
has value less than or equal to the length of each sequence listed in \<names>.
It may not be used as an index for these sequences, because it might be too high.
\code{@LTEqLengthOf(\{"a", "b"\})} is a subtype of both
\code{@LTEqLengthOf("a")} and \code{@LTEqLengthOf("b")}.
(``\<LTEq>'' stands for ``Less Than or Equal to''.)
\<@LTEqLengthOf(\{"a"\})> = \<@LTLengthOf(value=\{"a"\}, offset=-1)>, and \\
\<@LTEqLengthOf(value=\{"a"\}, offset=x)> = \<@LTLengthOf(value=\{"a"\},
offset=x-1)> for any x.
\item[\refqualclasswithparams{checker/index/qual}{LTOMLengthOf}{String[] names}]
An expression with this type
has value at least 2 less than the length of each sequence listed in \<names>.
It may always used as an index for a sequence listed in \<names>, if it is
non-negative.
This type exists to allow the checker to infer the safety of loops of
the form:
\begin{Verbatim}
for (int i = 0; i < array.length - 1; ++i) {
arr[i] = arr[i+1];
}
\end{Verbatim}
This annotation should rarely (if ever) be written by the programmer; usually
\refqualclasswithparams{checker/index/qual}{LTLengthOf}{String[] names}
should be written instead.
\code{@LTOMLengthOf(\{"a", "b"\})} is a subtype of both
\code{@LTOMLengthOf("a")} and \code{@LTOMLengthOf("b")}.
(``\<LTOM>'' stands for ``Less Than One Minus'', because another way of
saying ``at least 2 less than \<a.length>'' is ``less than \<a.length-1>''.)
\<@LTOMLengthOf(\{"a"\})> = \<@LTLengthOf(value=\{"a"\}, offset=1)>, and \\
\<@LTOMLengthOf(value=\{"a"\}, offset=x)> = \<@LTLengthOf(value=\{"a"\},
offset=x+1)> for any x.
\item[\refqualclass{checker/index/qual}{UpperBoundLiteral}]
repesents a constant value, typically a literal written in source code.
Its subtyping relationship is:
\<@UpperBoundLiteral(lit)> <: \<LTLengthOf(value="myArray", offset=off)>
if \<lit>+\<offset> $\le$ -1.
\item[\refqualclass{checker/index/qual}{PolyUpperBound}]
indicates qualifier polymorphism.
For a description of qualifier polymorphism, see
Section~\ref{method-qualifier-polymorphism}.
\item[\refqualclass{checker/index/qual}{UpperBoundUnknown}]
There is no information about the upper bound on the value of an expression with this type.
It may not be used as an index for a sequence, because it might be too high.
This type is the top type, and should never need to be written by the
programmer.
\item[\refqualclass{checker/index/qual}{UpperBoundBottom}]
This is the bottom type for the upper bound type system. It should
never need to be written by the programmer.
\end{description}
The following method annotations can be used to establish a method postcondition
that ensures that a certain expression is a valid index for a sequence:
\begin{description}
\item[\refqualclasswithparams{checker/index/qual}{EnsuresLTLengthOf}{String[] value, String[] targetValue, String[] offset}]
When the method with this annotation returns, the expression (or all the expressions) given in the \code{value} element
is less than the length of the given sequences with the given offsets. More precisely, the expression
has the \code{@LTLengthOf} qualifier with the \code{value} and \code{offset} arguments
taken from the \code{targetValue} and \code{offset} elements of this annotation.
\item[\refqualclasswithparams{checker/index/qual}{EnsuresLTLengthOfIf}{String[] expression, boolean result, String[] targetValue, String[] offset}]
If the method with this annotation returns the given boolean value,
then the given expression (or all the given expressions)
is less than the length of the given sequences with the given offsets.
\end{description}
There is one declaration annotation that indicates the relationship between
two sequences:
\begin{description}
\item[\refqualclasswithparams{checker/index/qual}{HasSubsequence}{String[]
value, String[] from, String[] to}]
indicates that a subsequence (from \code{from} to \code{to}) of the
annotated sequence is equal to some other sequence, named by
\code{value}).
For example, to indicate that \<shorter> is a subsequence of \<longer>:
\begin{Verbatim}
int startIndex;
int endIndex;
int[] shorter;
@HasSubsequence(value="shorter", from="this.start", to="this.end")
int[] longer;
\end{Verbatim}
Thus, a valid index into \<shorter> is also a valid index (between
\code{start} and \code{end-1} inclusive) into \<longer>. More generally,
if \code{x} is \code{@IndexFor("shorter")} in the example above, then
\code{start + x} is \code{@IndexFor("longer")}. If \code{y} is
\code{@IndexFor("longer")} and \code{@LessThan("end")}, then \code{y -
start} is \code{@IndexFor("shorter")}. Finally, \code{end - start} is
\code{@IndexOrHigh("shorter")}.
This annotation is in part checked and in part trusted. When an array is
assigned to \code{longer}, three facts are checked: that \code{start} is
non-negative, that \code{start} is less than or equal to \code{end}, and
that \code{end} is less than or equal to the length of \code{longer}. This
ensures that the indices are valid. The programmer must manually verify
that the value of \code{shorter} equals the subsequence that they describe.
\end{description}
\sectionAndLabel{Sequence minimum lengths}{index-minlen}
The Index Checker estimates, for each sequence expression, how long its value
might be at run time by computing a minimum length that
the sequence is guaranteed to have. This enables the Index Checker to
verify indices that are compile-time constants. For example, this code:
\begin{Verbatim}
String getThirdElement(String[] arr) {
return arr[2];
}
\end{Verbatim}
\noindent
is legal if \<arr> has at least three elements, which can be indicated
in this way:
\begin{Verbatim}
String getThirdElement(String @MinLen(3) [] arr) {
return arr[2];
}
\end{Verbatim}
When the index is not a compile-time constant, as in \<arr[i]>, then the
Index Checker depends not on a \<@MinLen> annotation but on \<i> being
annotated as
\refqualclasswithparams{checker/index/qual}{LTLengthOf}{"arr"}.
The MinLen type qualifier is implemented in practice by the Constant Value Checker,
using \<@ArrayLenRange> annotations (see \chapterpageref{constant-value-checker}).
This means that errors related to the minimum lengths of arrays must be suppressed using
the "value" argument to \<@SuppressWarnings>.
\refqualclass{common/value/qual}{ArrayLenRange} and \refqualclass{common/value/qual}{ArrayLen}
annotations can also be used to establish the minimum length of a sequence, if a
more precise estimate of length is known. For example,
if \<arr> is known to have exactly three elements:
\begin{Verbatim}
String getThirdElement(String @ArrayLen(3) [] arr) {
return arr[2];
}
\end{Verbatim}
The following type qualifiers (from \chapterpageref{constant-value-checker})
can establish the minimum length of a sequence:
\begin{description}
\item[\refqualclasswithparams{common/value/qual}{MinLen}{int value}]
The value of an expression of this type is a sequence with at least
\code{value} elements. The default annotation is
\code{@MinLen(0)}, and it may be applied to non-sequences.
\code{@MinLen($x$)} is a subtype of \code{@MinLen($x-1$)}.
An \code{@MinLen} annotation is treated internally as an
\refqualclass{common/value/qual}{ArrayLenRange} with only its
\code{from} field filled.
\item[\refqualclasswithparams{common/value/qual}{ArrayLen}{int[] value}]
The value of an expression of this type is a sequence whose
length is exactly one of the integers listed in its argument.
The argument can contain at most ten integers; larger collections of
integers are converted to \refqualclass{common/value/qual}{ArrayLenRange}
annotations. The minimum length of a sequence with this annotation
is the smallest element of the argument.
\item[\refqualclasswithparams{common/value/qual}{ArrayLenRange}{int from, int to}]
The value of an expression of this type is a sequence whose
length is bounded by its arguments, inclusive.
The minimum length of a sequence with this annotation is its \<from> argument.
\end{description}
\begin{figure}
\begin{center}
\hfill
\includeimage{samelen}{5cm}
\hfill
\end{center}
\caption{The type hierarchy for arrays of equal length ("a" and "b" are
assumed to be in-scope sequences). Qualifiers
written in gray should never be written in source code; they are used
internally by the type system.}
\label{fig-index-array-types}
\end{figure}
The following method annotation can be used to establish a method postcondition
that ensures that a certain sequence has a minimum length:
\begin{description}
\item[\refqualclasswithparams{common/value/qual}{EnsuresMinLenIf}{String[] expression, boolean result, int targetValue}]
If the method with this annotation returns the given boolean value,
then the given expression (or all the given expressions) is a sequence
with at least \code{targetValue} elements.
\end{description}
\sectionAndLabel{Sequences of the same length}{index-samelen}
The Index Checker determines whether two or more sequences have the same length.
This enables it to verify that all the indexing operations are safe in code
like the following:
\begin{Verbatim}
boolean lessThan(double[] arr1, double @SameLen("#1") [] arr2) {
for (int i = 0; i < arr1.length; i++) {
if (arr1[i] < arr2[i]) {
return true;
} else if (arr1[i] > arr2[i]) {
return false;
}
}
return false;
}
\end{Verbatim}
When needed, you can specify which sequences have the same length using the following type qualifiers (Figure~\ref{fig-index-array-types}):
\begin{description}
\item[\refqualclasswithparams{checker/index/qual}{SameLen}{String[] names}]
An expression with this type represents a sequence that has the
same length as the other sequences named in \<names>. In general,
\code{@SameLen} types that have non-intersecting sets of names
are \textit{not} subtypes of each other. However, if at least one
sequence is named by both types, the types are actually the same,
because all the named sequences must have the same length.
\item[\refqualclass{checker/index/qual}{PolySameLen}]
indicates qualifier polymorphism.
For a description of qualifier polymorphism, see
Section~\ref{method-qualifier-polymorphism}.
\item[\refqualclass{checker/index/qual}{SameLenUnknown}]
No information is known about which other sequences have the same length
as this one.
This is the top type, and programmers should never need to write it.
\item[\refqualclass{checker/index/qual}{SameLenBottom}]
This is the bottom type, and programmers should rarely need to write it.
\code{null} has this type.
\end{description}
\sectionAndLabel{Binary search indices}{index-searchindex}
The JDK's
\sunjavadoc{java.base/java/util/Arrays.html\#binarySearch(java.lang.Object\%5B\%5D,java.lang.Object)}{Arrays.binarySearch}
method returns either where the value was found, or a negative value
indicating where the value could be inserted. The Search Index Checker
represents this concept.
\begin{figure}
\begin{center}
\hfill
\includeimage{searchindex}{7cm}
\hfill
\end{center}
\caption{The type hierarchy for the Index Checker's internal type system
that captures information about the results of calls to
\sunjavadoc{java.base/java/util/Arrays.html\#binarySearch(java.lang.Object\%5B\%5D,java.lang.Object)}{Arrays.binarySearch}.}
\label{fig-index-searchindex}
\end{figure}
The Search Index Checker's type hierarchy (Figure~\ref{fig-index-searchindex}) has four type qualifiers:
\begin{description}
\item[\refqualclasswithparams{checker/index/qual}{SearchIndexFor}{String[] names}]
An expression with this type represents an integer that could have been
produced by calling
\sunjavadoc{java.base/java/util/Arrays.html\#binarySearch(java.lang.Object\%5B\%5D,java.lang.Object)}{Arrays.binarySearch}:
for each array \<a> specified in the annotation, the annotated integer is
between \<-a.length-1> and \<a.length-1>, inclusive
\item[\refqualclasswithparams{checker/index/qual}{NegativeIndexFor}{String[] names}]
An expression with this type represents a ``negative index'' that is
between \<a.length-1> and \<-1>, inclusive; that is, a value that is both
a \<@SearchIndex> and is negative. Applying the bitwise complement
operator (\verb|~|) to an expression of this type produces an expression
of type \refqualclass{checker/index/qual}{IndexOrHigh}.
\item[\refqualclass{checker/index/qual}{SearchIndexBottom}]
This is the bottom type, and programmers should rarely need to write it.
\item[\refqualclass{checker/index/qual}{SearchIndexUnknown}]
No information is known about whether this integer is a search index.
This is the top type, and programmers should rarely need to write it.
\end{description}
\sectionAndLabel{Substring indices}{index-substringindex}
The methods
\sunjavadoc{java.base/java/lang/String.html\#indexOf(java.lang.String)}{String.indexOf}
and
\sunjavadoc{java.base/java/lang/String.html\#lastIndexOf(java.lang.String)}{String.lastIndexOf}
return an index of a given substring within a given string, or -1 if no
such substring exists. The index \<i> returned from
\<receiver.indexOf(substring)> satisfies the following property, which is
stated here in three equivalent ways:
\begin{Verbatim}
i == -1 || ( i >= 0 && i <= receiver.length() - substring.length() )
i == -1 || ( @NonNegative && @LTLengthOf(value="receiver", offset="substring.length()-1") )
@SubstringIndexFor(value="receiver", offset="substring.length()-1")
\end{Verbatim}
% This new annotation is similar to \<@LTLengthOf\allowbreak(value =
% "receiver", offset = "substring.length()-1")>, but explicitly allows the
% index to be -1 even if the upper bound would not allow it because of the
% offset. The Upper Bound Checker can infer the corresponding
% \<@LTLengthOf> annotation for expressions that have a
% \<@SubstringIndexFor> annotation and at the same time are known to be
% non-negative (according to the Lower Bound Checker).
The return type of methods \sunjavadoc{java.base/java/lang/String.html\#indexOf(java.lang.String)}{String.indexOf}
and \sunjavadoc{java.base/java/lang/String.html\#lastIndexOf(java.lang.String)}{String.lastIndexOf} has the annotation
\refqualclasswithparams{checker/index/qual}{SubstringIndexFor}{value="this", offset="\#1.length()-1")}.
This allows writing code such as the following with no warnings from the
Index Checker:
\begin{Verbatim}
public static String removeSubstring(String original, String removed) {
int i = original.indexOf(removed);
if (i != -1) {
return original.substring(0, i) + original.substring(i + removed.length());
}
return original;
}
\end{Verbatim}
% The code removes the first occurrence of \<removed> from
% \<original>. After checking that \code{i != -1}, the value of \<i> must
% be a valid index for \<original>. Because this index is the start of an
% occurrence of \<removed>, \code{i + removed.length()} is the index of the
% end of the occurrence. Without the \<@SubstringIndexFor> annotation, the
% Upper Bound Checker would not be able to verify that \code{i +
% removed.length()} is a valid argument to \<substring>, which requires
% both arguments to be \<@IndexOrHigh("original")>.
\begin{figure}
\begin{center}
\hfill
\includeimage{substringindex}{3.5cm}
\hfill
\end{center}
\caption{The type hierarchy for the Substring Index Checker, which
captures information about the results of calls to
\sunjavadoc{java.base/java/lang/String.html\#indexOf(java.lang.String)}{String.indexOf}
and
\sunjavadoc{java.base/java/lang/String.html\#lastIndexOf(java.lang.String)}{String.lastIndexOf}.}
\label{fig-index-substringindex}
\end{figure}
The \<@SubstringIndexFor> annotation is implemented in a Substring Index
Checker that runs together with the Index Checker and has its own type
hierarchy (Figure~\ref{fig-index-substringindex}) with three type
qualifiers:
\begin{description}
\item[\refqualclasswithparams{checker/index/qual}{SubstringIndexFor}{String[] value, String[] offset}]
An expression with this type represents an integer that could have been
produced by calling
\sunjavadoc{java.base/java/lang/String.html\#indexOf(java.lang.String)}{String.indexOf}:
the annotated integer is either -1, or it is non-negative and is less
than or equal to \<receiver.length - offset> (where the sequence
\<receiver> and the offset \<offset> are corresponding elements of the
annotation's arguments).
\item[\refqualclass{checker/index/qual}{SubstringIndexBottom}]
This is the bottom type, and programmers should rarely need to write it.
\item[\refqualclass{checker/index/qual}{SubstringIndexUnknown}]
No information is known about whether this integer is a substring index.
This is the top type, and programmers should rarely need to write it.
\end{description}
\subsectionAndLabel{The need for the \<@SubstringIndexFor> annotation}{index-substringindex-justification}
No other annotation supported by the Index Checker precisely represents the
possible return values of methods
\sunjavadoc{java.base/java/lang/String.html\#indexOf(java.lang.String)}{String.indexOf}
and
\sunjavadoc{java.base/java/lang/String.html\#lastIndexOf(java.lang.String)}{String.lastIndexOf}.
The reason is the methods' special cases for empty strings and for failed matches.
Consider the result \<i> of \<receiver.indexOf(substring)>:
\begin{itemize}
\item
\<i> is \<@GTENegativeOne>, because \code{i >= -1}.
\item
\<i> is \<@LTEqLengthOf("receiver")>, because \code{i <= receiver.length()}.
\item
\<i> is not \<@IndexOrLow("receiver")>, because for
\code{receiver = "", substring = "", i = 0}, the property
\code{i >= -1 \&\& i < receiver.length()} does not hold.
\item
\<i> is not \<@IndexOrHigh("receiver")>, because for
\code{receiver = "", substring = "b", i = -1}, the property
\code{i >= 0 \&\& i <= receiver.length()} does not hold.
\item
\<i> is not
\<@LTLengthOf(value = "receiver", offset = "substring.length()-1")>,
because for \code{receiver = "", substring = "abc", i = -1}, the property
\code{i + substring.length() - 1 < receiver.length()} does not hold.
\end{itemize}
\noindent
The last annotation in the list above,
\<@LTLengthOf(value = "receiver", offset = "substring.length()-1")>,
is the correct and precise upper bound for all values of \<i> except -1.
The offset expresses the fact that we can add \<substring.length()> to this
index and still get a valid index for \<receiver>. That is useful for
type-checking code that adds the length of the substring to the found
index, in order to obtain the rest of the string. However, the upper bound
applies only after the index is explicitly checked not to be -1:
\begin{Verbatim}
int i = receiver.indexOf(substring);
// i is @GTENegativeOne and @LTEqLengthOf("receiver")
// i is not @LTLengthOf(value = "receiver", offset = "substring.length()-1")
if (i != -1) {
// i is @NonNegative and @LTLengthOf(value = "receiver", offset = "substring.length()-1")
int j = i + substring.length();
// j is @IndexOrHigh("receiver")
return receiver.substring(j); // this call is safe
}
\end{Verbatim}
The property of the result of \<indexOf> cannot be expressed by any
combination of lower-bound (Section~\ref{index-lowerbound}) and upper-bound
(Section~\ref{index-upperbound}) annotations, because the upper-bound
annotations apply independently of the lower-bound annotations, but in this
case, the upper bound \code{i <= receiver.length() - substring.length()}
holds only if \code{i >= 0}. Therefore, to express this property and make
the example type-check without false positives, a new annotation such as
\<@SubstringIndexFor\allowbreak(value = "receiver", offset = "substring.length()-1")>
is necessary.
\sectionAndLabel{Inequalities}{index-inequalities}
The Index Checker estimates which expression's values are less than other expressions' values.
\begin{description}
\item[\refqualclasswithparams{checker/index/qual}{LessThan}{String[] values}]
An expression with this type has a value that is less than the value of each
expression listed in \<values>. The expressions in values must be composed of
final or effectively final variables and constants.
\item[\refqualclass{checker/index/qual}{LessThanUnknown}]
There is no information about the value of an expression this type relative to other expressions.
This is the top type, and should not be written by the programmer.
\item[\refqualclass{checker/index/qual}{LessThanBottom}]
This is the bottom type for the less than type system. It should
never need to be written by the programmer.
\end{description}
\sectionAndLabel{Annotating fixed-size data structures}{index-annotating-fixed-size}
The Index Checker has built-in support for Strings and arrays.
You can add support for additional fixed-size data structures by writing
annotations.
This allows the Index Checker to typecheck the data structure's
implementation and to typecheck uses of the class.
This section gives an example: a fixed-length collection.
%% The code that follows is copied from checker/tests/index/ArrayWrapper.java.
%% If this code is updated, please update that file, too.
\begin{Verbatim}
/** ArrayWrapper is a fixed-size generic collection. */
public class ArrayWrapper<T> {
private final Object @SameLen("this") [] delegate;
@SuppressWarnings("index") // constructor creates object of size @SameLen(this) by definition
ArrayWrapper(@NonNegative int size) {
delegate = new Object[size];
}
public @LengthOf("this") int size() {
return delegate.length;
}
public void set(@IndexFor("this") int index, T obj) {
delegate[index] = obj;
}
@SuppressWarnings("unchecked") // required for normal Java compilation due to unchecked cast
public T get(@IndexFor("this") int index) {
return (T) delegate[index];
}
}
\end{Verbatim}
The Index Checker treats methods annotated with \code{@LengthOf("this")} as
the length of a sequence like \code{arr.length} for arrays and
\code{str.length()} for strings.
With these annotations, client code like the following typechecks with no
warnings:
\begin{Verbatim}
public static void clearIndex1(ArrayWrapper<? extends Object> a, @IndexFor("#1") int i) {
a.set(i, null);
}
public static void clearIndex2(ArrayWrapper<? extends Object> a, int i) {
if (0 <= i && i < a.size()) {
a.set(i, null);
}
}
\end{Verbatim}
%% LocalWords: NegativeArraySizeException pre myArray IndexFor someArray
%% LocalWords: MyJavaFile LTLengthOf LTEqLengthOf GTENegativeOne GTE str
%% LocalWords: LowerBoundUnknown LTOMLengthOf LTEq LTOM UpperBoundBottom
%% LocalWords: UpperBoundUnknown MinLen MinLenBottom SameLen indexOf abc
%% LocalWords: SameLenUnknown SameLenBottom lastIndexOf html lang charAt
%% LocalWords: LengthOf IndexOrLow PolyIndex PolyLowerBound PolyLength
%% LocalWords: PolyUpperBound PolySameLen PolyValue LowerBoundBottom
%% LocalWords: lowerbound upperbound EnsuresLTLengthOf targetValue
%% LocalWords: EnsuresLTLengthOfIf boolean HasSubsequence LessThan
%% LocalWords: minlen ArrayLenRange ArrayLen EnsuresMinLenIf samelen
%% LocalWords: searchindex binarySearch SearchIndexFor NegativeIndexFor
%% LocalWords: SearchIndex bitwise SearchIndexBottom SearchIndexUnknown
%% LocalWords: Substring substringindex substring SubstringIndexFor
%% LocalWords: SubstringIndexBottom SubstringIndexUnknown LessThanBottom
%% LocalWords: LessThanUnknown typecheck typechecks