blob: 36ce18c4c7508aa77d788b48c79c2cb73c110ec8 [file] [log] [blame] [edit]
\htmlhr
\chapterAndLabel{Format String Checker}{formatter-checker}
% VERIFICATION
\begin{sloppypar}
The Format String Checker
prevents use of incorrect format strings
in format methods such as
\sunjavadoc{java.base/java/io/PrintStream.html\#printf(java.lang.String,java.lang.Object...)}{System.out.printf}
and \sunjavadoc{java.base/java/lang/String.html\#format(java.lang.String,java.lang.Object...)}{String.format}.
\end{sloppypar}
The Format String Checker warns you if you write an invalid format string,
and it warns you if the other arguments are not consistent with the format
string (in number of arguments or in their types).
Here are examples of errors that the
Format String Checker
detects at compile time.
Section~\ref{formatter-guarantees} provides more details.
% BUG FINDER
% The Format String Checker helps to prevent bugs that are the result of an
% incorrect use of format methods such as
% \sunjavadoc{java.base/java/io/PrintStream.html\#printf(java.lang.String(java.lang.Object...)}{System.out.printf}.
% VERIFICATION
% BUG FINDER
% The Format String Checker helps to prevent bugs in two ways:
%
% \begin{itemize}
% \item An error is issued if a format method would fail to execute at
% runtime. For example, if an invalid format string is passed.
% \item A warning is issued for possibly legal but likely unintended uses of a
% format method. For example, if unused format arguments are passed.
% \end{itemize}
%
% \noindent
% Following are examples of common errors that the Format String Checker detects at
% compile time, more details are provided in Section~\ref{formatter-guarantees}.
\begin{Verbatim}
String.format("%y", 7); // error: invalid format string
String.format("%d", "a string"); // error: invalid argument type for %d
String.format("%d %s", 7); // error: missing argument for %s
String.format("%d", 7, 3); // warning: unused argument 3
String.format("{0}", 7); // warning: unused argument 7, because {0} is wrong syntax
\end{Verbatim}
\begin{sloppypar}
To run the Format String Checker, supply the
\code{-processor org.checkerframework.checker.formatter.FormatterChecker} command-line option to javac.
\end{sloppypar}
\sectionAndLabel{Formatting terminology}{formatter-terminology}
Printf-style formatting takes as an argument a \emph{format string} and a
list of arguments. It produces a new string in which each \emph{format
specifier} has been replaced by the corresponding argument.
The format specifier determines how the format argument is converted to a
string.
%% Redundant
% The Java standard library provides printf-style formatting in \emph{format methods} such as
% \sunjavadoc{java.base/java/lang/String.html\#format(java.lang.String,java.lang.Object...)}{String.format}
% and
% \sunjavadoc{java.base/java/io/PrintStream.html\#printf(java.lang.String,java.lang.Object...)}{System.out.printf}.
A format specifier is introduced by a \code{\%} character. For example,
\code{String.format("The \%s is \%d.","answer",42)} yields
\code{"The answer is 42."}. \code{"The \%s is \%d."} is
the format string, \code{"\%s"} and \code{"\%d"} are the format specifiers;
\code{"answer"} and \code{42} are format arguments.
\sectionAndLabel{Format String Checker annotations}{formatter-annotations}
The \refqualclass{checker/formatter/qual}{Format} qualifier on a string type
indicates a
\sunjavadoc{java.base/java/util/Formatter.html\#syntax}{\textrm{valid format string}}.
A programmer rarely writes the \<@Format> annotation, as it is inferred for
string literals. A programmer may need to write it on fields and on method
signatures.
%% This is premature; it is not discussed here, and it was already
%% mentioned briefly to introduce readers to the idea, so that isn't
%% necessary either.
% Passing a valid format string to a format method does not guarantee that the
% invocation will succeed. The format method invocation \code{String.format("\%d","hello")}
% for example, will fail despite the fact that \code{"\%d"} is valid.
The \refqualclass{checker/formatter/qual}{Format} qualifier is parameterized with
a list of conversion categories that impose restrictions on the format arguments.
Conversion categories are explained in more detail in
Section~\ref{formatter-categories}. The type qualifier for \code{"\%d \%f"} is
for example \code{@Format(\{INT, FLOAT\})}.
Consider the below \<printFloatAndInt> method. Its parameter must be a
format string that can be used in a format method, where the first format
argument is ``float-like'' and the second format argument is
``integer-like''. The type of its parameter, \<@Format(\{FLOAT, INT\})
String>, expresses that contract.
\begin{Verbatim}
void printFloatAndInt(@Format({FLOAT, INT}) String fs) {
System.out.printf(fs, 3.1415, 42);
}
printFloatAndInt("Float %f, Number %d"); // OK
printFloatAndInt("Float %f"); // error
\end{Verbatim}
\begin{figure}
\includeimage{formatter-hierarchy}{3.5cm}
\caption{The
Format String Checker type qualifier hierarchy.
The type qualifiers are applicable to \<CharSequence> and its subtypes.
The figure does not show the subtyping rules among different
\code{@Format(...)}
qualifiers; see
Section~\ref{formatter-format-subtyping}.
}
\label{fig-formatter-hierarchy}
\end{figure}
Figure~\ref{fig-formatter-hierarchy} shows all the type qualifiers.
The annotations other than \<@Format> are only used
internally and cannot be written in your code.
\refqualclass{checker/formatter/qual}{InvalidFormat} indicates an invalid format
string --- that is, a string that cannot be used as a format string. For
example, the type of \code{"\%y"} is \<@InvalidFormat String>.
\refqualclass{checker/formatter/qual}{FormatBottom} is the type of the
\code{null} literal.
\refqualclass{checker/formatter/qual}{UnknownFormat} is the default that is
applied to strings that are not literals and on which the user has not
written a \<@Format> annotation.
There is also a \refqualclass{checker/formatter/qual}{FormatMethod}
annotation; see Section~\ref{formatter-FormatMethod}.
\subsectionAndLabel{Conversion Categories}{formatter-categories}
Given a format specifier, only certain format arguments are compatible with
it, depending on its ``conversion'' --- its last, or last two,
characters. For example, in the format specifier \code{"\%d"}, the
conversion \code{d} restricts the corresponding format argument
to be ``integer-like'':
\begin{Verbatim}
String.format("%d", 5); // OK
String.format("%d", "hello"); // error
\end{Verbatim}
\noindent Many conversions enforce the same restrictions. A set of
restrictions is represented as a \emph{conversion
category}. The ``integer like'' restriction is for example the conversion
category \refenum{checker/formatter/qual}{ConversionCategory}{INT}{}\@. The following conversion categories are defined in the
\code{\refclass{checker/formatter/qual}{ConversionCategory}} enumeration:
\begin{description}
\item{\refenum{checker/formatter/qual}{ConversionCategory}{GENERAL}{}} imposes no restrictions on a format argument's type. Applicable for
conversions b, B, h, H, s, S.
\item{\refenum{checker/formatter/qual}{ConversionCategory}{CHAR}{}} requires that a format argument represents a Unicode character.
Specifically, \code{char}, \code{Character}, \code{byte},
\code{Byte}, \code{short}, and \code{Short} are allowed.
\code{int} or \code{Integer} are allowed if
\code{Character.isValidCodePoint(argument)} would return \code{true}
for the format argument. (The Format String Checker permits any \<int>
or \<Integer> without issuing a warning or error --- see
Section~\ref{formatter-missed-alarms}.)
Applicable for conversions c, C.
\item{\refenum{checker/formatter/qual}{ConversionCategory}{INT}{}} requires that a format argument represents an integral type. Specifically,
\code{byte}, \code{Byte}, \code{short}, \code{Short},
\code{int} and \code{Integer}, \code{long},
\code{Long}, and \code{BigInteger} are allowed. Applicable for
conversions d, o, x, X.
\item{\refenum{checker/formatter/qual}{ConversionCategory}{FLOAT}{}} requires that a format argument represents a floating-point type. Specifically,
\code{float}, \code{Float}, \code{double},
\code{Double}, and \code{BigDecimal} are allowed. Surprisingly, integer
values are not allowed. Applicable for
conversions e, E, f, g, G, a, A.
\item{\refenum{checker/formatter/qual}{ConversionCategory}{TIME}{}} requires that a format argument represents a date or time.
Specifically, \code{long}, \code{Long}, \code{Calendar}, and
\code{Date} are allowed. Applicable for conversions t, T.
\item{\refenum{checker/formatter/qual}{ConversionCategory}{UNUSED}{}} imposes no restrictions on a format argument. This is the case if a
format argument is not used as replacement for any format specifier.
\code{"\%2\$s"} for example ignores the first format argument.
% This conversion category is similar to GENERAL, but does allow objects that
% throw exceptions in \code{toString} and \code{formatTo}.
\end{description}
\noindent All conversion categories accept \code{null}.
Furthermore, \<null> is always a legal argument array, because it is treated
as supplying \<null> to each format specifer. For example,
\<String.format("\%d \%f \%s", (Object[]) null)> evaluates to \<"null null null">.
The same format argument may serve as a replacement for multiple format specifiers.
Until now, we have assumed that the format specifiers simply consume format arguments left to right.
But there are two other ways for a format specifier to select a format argument:
\begin{itemize}
\item $n$\code{\$} specifies a one-based index $n$. In the
format string \code{"\%2\$s"}, the format specifier selects the
second format argument.
\item The \code{<} \emph{flag} references the format argument
that was used by the previous format specifier. In the format string
\code{"\%d \%<d"} for example, both format specifiers select the first
format argument.
\end{itemize}
\noindent
In the following example,
the format argument must be compatible with both conversion
categories, and can therefore be neither a \code{Character} nor a \code{long}.
\begin{Verbatim}
format("Char %1$c, Int %1$d", (int)42); // OK
format("Char %1$c, Int %1$d", new Character(42)); // error
format("Char %1$c, Int %1$d", (long)42); // error
\end{Verbatim}
Only three additional conversion categories are needed represent all possible
intersections of previously-mentioned conversion categories:
\begin{description}
\item{\refenum{checker/formatter/qual}{ConversionCategory}{NULL}{}} is used if no object of any type can be
passed as parameter. In this case, the only legal value is \code{null}.
For example, the format string \code{"\%1\$f \%1\$c"} requires that the first
format argument be \code{null}. Passing either \code{4} or
\code{4.2} would lead to an exception.
\item{\refenum{checker/formatter/qual}{ConversionCategory}{CHAR\_AND\_INT}{}} is used if a format argument is restricted by a \refenum{checker/formatter/qual}{ConversionCategory}{CHAR}{} and a \refenum{checker/formatter/qual}{ConversionCategory}{INT}{} conversion category (\code{CHAR} $\cap$ \code{INT}).
\item{\refenum{checker/formatter/qual}{ConversionCategory}{INT\_AND\_TIME}{}} is used if a format argument is restricted by an \refenum{checker/formatter/qual}{ConversionCategory}{INT}{} and a \refenum{checker/formatter/qual}{ConversionCategory}{TIME}{} conversion category (\code{INT} $\cap$ \code{TIME}).
\end{description}
\noindent All other intersections lead to already existing conversion categories.
For example, \code{GENERAL} $\cap$ \code{CHAR} $=$ \code{CHAR} and
\code{UNUSED} $\cap$ \code{GENERAL} $=$ \code{GENERAL}.
Figure~\ref{fig-formatter-cat} summarizes the subset
relationship among all conversion categories.
\begin{figure}[thbp]
\includeimage{formatter-categories}{7.8cm}
\caption{The subset relationship
among conversion categories.}
\label{fig-formatter-cat}
\end{figure}
\subsectionAndLabel{Subtyping rules for \<@Format>}{formatter-format-subtyping}
Here are the subtyping rules among different
\refqualclass{checker/formatter/qual}{Format}
qualifiers.
It is legal to:
\begin{itemize}
\item use a format string with a weaker (less restrictive) conversion category than required.
\item use a format string with fewer format specifiers than required.
Although this is legal a warning is issued because most occurrences of
this are due to programmer error.
\end{itemize}
The following example shows the subtyping rules in action:
\begin{Verbatim}
@Format({FLOAT, INT}) String f;
f = "%f %d"; // OK
f = "%s %d"; // OK, %s is weaker than %f
f = "%f"; // warning: last argument is ignored
f = "%f %d %s"; // error: too many arguments
f = "%d %d"; // error: %d is not weaker than %f
String.format(f, 0.8, 42);
\end{Verbatim}
\sectionAndLabel{What the Format String Checker checks}{formatter-guarantees}
% VERIFICATION
If the Format String Checker issues no errors, it provides the following guarantees:
\begin{enumerate}
\item
The following guarantees hold for every format method invocation:
\begin{enumerate}
\item The format method's first parameter (or second if a \sunjavadoc{java.base/java/util/Locale.html}{Locale} is provided) is a valid
format string (or \code{null}).
\item A warning is issued if one of the format string's conversion categories is \code{UNUSED}.
\label{formatter-unused-category-warning}
\item None of the format string's conversion categories is \code{NULL}.
\label{formatter-null-category-error}
\end{enumerate}
\item If the format arguments are passed to the format method as varargs, the
Format String Checker guarantees the following additional properties:
\begin{enumerate}
\item No fewer format arguments are passed than required by the format string.
\item A warning is issued if more format arguments are passed than required by the format string.
\item Every format argument's type satisfies its conversion category's restrictions.
\end{enumerate}
\item If the format arguments are passed to the format method as an array,
a warning is issued by the Format String Checker.
\label{formatter-array-warning}
\end{enumerate}
\noindent Following are examples for every guarantee:
\begin{Verbatim}
String.format("%d", 42); // OK
String.format(Locale.GERMAN, "%d", 42); // OK
String.format(new Object()); // error (1a)
String.format("%y"); // error (1a)
String.format("%2$s", "unused", "used"); // warning (1b)
String.format("%1$d %1$f", 5.5); // error (1c)
String.format("%1$d %1$f %d", null, 6); // error (1c)
String.format("%s"); // error (2a)
String.format("%s", "used", "ignored"); // warning (2b)
String.format("%c",4.2); // error (2c)
String.format("%c", (String)null); // error (2c)
String.format("%1$d %1$f", new Object[]{1}); // warning (3)
String.format("%s", new Object[]{"hello"}); // warning (3)
\end{Verbatim}
\subsectionAndLabel{Possible false alarms}{formatter-false-alarms}
There are three cases in which the Format String Checker may issue a
warning or error, even though the code cannot fail at run time.
(These are in addition to the general conservatism of a type system: code
may be correct because of application invariants that are not captured by
the type system.)
In each of these cases, you can rewrite the code, or you can manually check
it and write a \code{@SuppressWarnings} annotation if you can reason that
the code is correct.
Case \ref{formatter-unused-category-warning}:
Unused format arguments. It is legal to provide more arguments than are
required by the format string; Java ignores the extras. However, this is
an uncommon case. In practice, a mismatch between the number of format
specifiers and the number of format arguments is usually an error.
Case \ref{formatter-null-category-error}:
Format arguments that can only be \code{null}.
It is legal to write a format string that permits only null arguments and
throws an exception for any other argument. An example is
\code{String.format("\%1\$d \%1\$f", null)}.
The Format String Checker forbids such a format string.
If you should ever need such a format string, simply replace the problematic
format specifier with \code{"null"}. For example, you would replace the
call above by \code{String.format("null null")}.
Case \ref{formatter-array-warning}:
Array format arguments.
The Format String Checker performs no analysis of
arrays, only of varargs invocations. It is better style to use varargs
when possible.
% BUG FINDER
% Whenever a format method invocation is found in your code, the Format String
% Checker performs certain checks. If it issues an \emph{error}, the invocation
% will definitely fail at runtime. If a \emph{warning} is issued, the Format
% String Checker detected a common source for errors, and it is very likely that
% the invocation contains a bug.
%
% \noindent I.) The following checks are run for every format method invocation. The checks:
%
% \begin{enumerate}
% \item Issue an \emph{error}, if the format method's first format argument (or
% second if a \sunjavadoc{java.base/java/util\Locale.html}{Locale} is provided) is not a
% \code{String} annotated with the \code{@Format} qualifier.
% \item Issue a \emph{warning}, if any of the \code{@Format} string's
% conversion categories is \code{UNUSED}.
% \end{enumerate}
%
% \noindent II.) If the format arguments are passed to the format method as varargs, the
% Format String Checker performs the following additional checks, that:
%
% \begin{enumerate}
% \item Issue an \emph{error}, if fewer format arguments are passed than required
% by the \code{@Format} qualifier.
% \item Issue a \emph{warning}, if more format arguments are passed than required
% by the \code{@Format} qualifier.
% \item The following checks are performed for every format argument (unless the
% format argument is the \code{null} literal) and the associated conversion category
% from the \code{@Format} qualifier. The checks:
% \begin{enumerate}
% \item Issue a \emph{warning}, if the conversion category is \code{NULL}.
% \item Issue a \emph{warning}, if the format argument's type does not satisfy
% the conversion category's restrictions.
% \end{enumerate}
% \end{enumerate}
%
% \noindent III.) If the format arguments are passed to the format method as array, the
% Format String Checker performs the following checks instead, that:
%
% \begin{enumerate}
% \item Issue a \emph{warning}, if any of the \code{@Format} string's
% conversion categories is \code{NULL} (unless the array is the
% null-array literal \code{(Object[])null}).
% \end{enumerate}
%
% \noindent Following are examples for every check:
%
% \begin{Verbatim}
% String.format("%d", 42); // ok
% String.format(Locale.GERMAN, "%d", 42); // ok (I 1)
% String.format(new Object()); // error (I 1)
% String.format("%y"); // error (I 1)
% String.format("%2$s","unused","used"); // warning (I 2)
% String.format("%s"); // error (II 1)
% String.format("%s","used","ignored"); // warning (II 2)
% String.format("%1$d %1$f", null); // ok (II 3)
% String.format("%d", null); // ok (II 3)
% String.format("%1$d %1$f", 4); // warning (II 3 a)
% String.format("%c",4.2); // warning (II 3 b)
% String.format("%1$d %1$f", new Object[]{1}); // warning (III 1)
% \end{Verbatim}
\subsectionAndLabel{Possible missed alarms}{formatter-missed-alarms}
The Format String Checker helps prevent bugs by detecting, at compile time,
which invocations of format methods will fail. While the Format String Checker
finds most of these invocations, there are cases in which a format method call
will fail even though the Format String Checker issued neither errors nor
warnings. These cases are:
\begin{enumerate}
\item The format string is \code{null}. Use the \ahrefloc{nullness-checker}{Nullness Checker} to prevent this.
\item A format argument's \code{toString} method throws an exception.
\item A format argument implements the \code{Formattable} interface and throws an
exception in the \code{formatTo} method.
\item A format argument's conversion category is \code{CHAR} or \code{CHAR\_AND\_INT},
and the passed value is an \code{int} or \code{Integer}, and
\code{Character.isValidCodePoint(argument)} returns \code{false}.
% VERIFICATION
% BUG FINDER
% \item Illegal format arguments are passed as an array (instead of varargs).
\end{enumerate}
\noindent The following examples illustrate these limitations:
% VERIFICATION
\begin{Verbatim}
class A {
public String toString() {
throw new Error();
}
}
class B implements Formattable {
public void formatTo(Formatter fmt, int f,
int width, int precision) {
throw new Error();
}
}
// The checker issues no errors or warnings for the
// following illegal invocations of format methods.
String.format(null); // NullPointerException (1)
String.format("%s", new A()); // Error (2)
String.format("%s", new B()); // Error (3)
String.format("%c", (int)-1); // IllegalFormatCodePointException (4)
\end{Verbatim}
% BUG FINDER
% \begin{Verbatim}
% class A {
% public String toString() {
% throw new Error();
% }
% }
%
% class B implements Formattable {
% public void formatTo(Formatter fmt, int f,
% int width, int precision) {
% throw new Error();
% }
% }
%
% // the checker issues no errors or warnings for the
% // following illegal invocations of format methods
% String.format(null); // NullPointerException (1)
% String.format("%s", new A()); // Error (2)
% String.format("%s", new B()); // Error (3)
% String.format("%d", new Object[]{4.2}); // IllegalFormatConversionException (4)
% String.format("%c", (int)-1); // IllegalFormatCodePointException (5)
% \end{Verbatim}
\sectionAndLabel{Implicit qualifiers}{formatter-implicit}
The Format String Checker adds implicit
qualifiers, reducing the number of annotations that must appear in your code
(see Section~\ref{effective-qualifier}).
The checker implicitly adds the \code{@Format} qualifier with the appropriate
conversion categories to any String literal that is a valid format string.
\sectionAndLabel{@FormatMethod}{formatter-FormatMethod}
Your project may contain methods that forward their arguments to a format method.
Consider for example the following \code{log} method:
\begin{Verbatim}
@FormatMethod
void log(String format, Object... args) {
if (enabled) {
logfile.print(indent_str);
logfile.printf(format , args);
}
}
\end{Verbatim}
You should annotate such a method with the
\refqualclass{checker/formatter/qual}{FormatMethod} annotation,
which indicates that the \<String> argument is a format string for the
remaining arguments.
\sectionAndLabel{Testing whether a format string is valid}{formatter-run-time-tests}
% Copied from the Regex Checker.
% When one is improved, improve the other as well.
The Format String Checker automatically determines whether each \<String>
literal is a valid format string or not. When a string is computed or is
obtained from an external resource, then the string must be trusted or tested.
One way to test a string is to call the
\refmethod{checker/formatter/util}{FormatUtil}{asFormat}{-java.lang.String-org.checkerframework.checker.formatter.qual.ConversionCategory...-}
method to check whether the format string is valid and its format
specifiers match certain conversion categories.
If this is not the case, \<asFormat> raises an exception. Your code should
catch this exception and handle it gracefully.
The following code examples may fail at run time, and therefore they do not
type check. The type-checking errors are indicated by comments.
\begin{Verbatim}
Scanner s = new Scanner(System.in);
String fs = s.next();
System.out.printf(fs, "hello", 1337); // error: fs is not known to be a format string
\end{Verbatim}
\begin{Verbatim}
Scanner s = new Scanner(System.in);
@Format({GENERAL, INT}) String fs = s.next(); // error: fs is not known to have the given type
System.out.printf(fs, "hello", 1337); // OK
\end{Verbatim}
\noindent The following variant does not throw a run-time error, and
therefore passes the type-checker:
\begin{Verbatim}
Scanner s = new Scanner(System.in);
String format = s.next()
try {
format = FormatUtil.asFormat(format, GENERAL, INT);
} catch (IllegalFormatException e) {
// Replace this by your own error handling.
System.err.println("The user entered the following invalid format string: " + format);
System.exit(2);
}
// fs is now known to be of type: @Format({GENERAL, INT}) String
System.out.printf(format, "hello", 1337);
\end{Verbatim}
\noindent
To use the \refclass{checker/formatter/util}{FormatUtil} class, the \<checker-util.jar> file
must be on the classpath at run time.
% LocalWords: printf InvalidFormat Formatter FormatBottom specifier's
% LocalWords: ConversionCategory isValidCodePoint BigInteger BigDecimal
% LocalWords: varargs TODO Formattable formatTo FormatUtil java qual
% LocalWords: asFormat printFloatAndInt formatter CharSequence
%% LocalWords: UnknownFormat FormatMethod