liblouis/doc/liblouis.texi - liblouis_for_aas - Git at Google

 \input texinfo
 @c %**start of header
 @setfilename liblouis.info
 @documentencoding UTF-8
 @include version.texi
 @settitle Liblouis User's and Programmer's Manual

 @dircategory Misc
 @direntry
 * Liblouis: (liblouis). A braille translator and back-translator
 @end direntry

 @finalout

 @c Macro definitions

 @defindex opcode

 @c Opcode.
 @macro opcode{name, args}
 @opcodeindex \name\
 @anchor{\name\ opcode}
 @item \name\ \args\
 @end macro

 @macro opcoderef{name}
 @code{\name\} opcode (@pxref{\name\ opcode,\name\,@code{\name\}})
 @end macro

 @c Opcode.
 @macro deprecatedopcode{name, args, replacement}
 @opcodeindex \name\
 @anchor{\name\ opcode}
 @item \name\ \args\
 This opcode is deprecated. Use the @opcoderef{\replacement\} instead.
 @end macro

 @copying
 This manual is for liblouis (version @value{VERSION}, @value{UPDATED}),
 a Braille Translation and Back-Translation Library derived from the
 Linux screen reader @acronym{BRLTTY}.

 @vskip 10pt

 @noindent
 Copyright @copyright{} 1999-2006 by the BRLTTY Team.

 @noindent
 Copyright @copyright{} 2004-2007 ViewPlus Technologies, Inc.
 @uref{www.viewplus.com}.

 @noindent
 Copyright @copyright{} 2007, 2009 Abilitiessoft, Inc.
 @uref{www.abilitiessoft.org}.

 @noindent
 Copyright @copyright{} 2014, 2016 Swiss Library for the Blind, Visually
 Impaired and Print Disabled. @uref{www.sbs.ch}.

 @vskip 10pt

 @quotation
 This file is free software; you can redistribute it and/or modify it
 under the terms of the GNU Lesser (or library) General Public License
 (LGPL) as published by the Free Software Foundation; either version 3,
 or (at your option) any later version.

 This file is distributed in the hope that it will be useful, but
 WITHOUT ANY WARRANTY; without even the implied warranty of
 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
 Lesser (or Library) General Public License LGPL for more details.

 You should have received a copy of the GNU Lesser (or Library) General
 Public License (LGPL) along with this program; see the file COPYING.
 If not, write to the Free Software Foundation, 51 Franklin Street,
 Fifth Floor, Boston, MA 02110-1301, USA.
 @end quotation
 @end copying

 @titlepage
 @title Liblouis User's and Programmer's Manual

 @subtitle for version @value{VERSION}, @value{UPDATED}
 @author by John J. Boyer

 @c The following two commands start the copyright page.
 @page
 @vskip 0pt plus 1filll
 @insertcopying
 @end titlepage

 @c Output the table of contents at the beginning.
 @contents

 @ifnottex
 @node Top
 @top Liblouis User's and Programmer's Manual

 @insertcopying
 @end ifnottex

 @menu
 * Introduction::
 * How to Write Translation Tables::
 * Notes on Back-Translation::
 * Table Metadata::
 * Testing Translation Tables interactively::
 * Automated Testing of Translation Tables::
 * Programming with liblouis::
 * Concept Index::
 * Opcode Index::
 * Function Index::
 * Program Index::

 @detailmenu
  --- The Detailed Node Listing ---

 How to Write Translation Tables

 * Overview::
 * Hyphenation Tables::
 * Character-Definition Opcodes::
 * Braille Indicator Opcodes::
 * Emphasis Opcodes::
 * Special Symbol Opcodes::
 * Special Processing Opcodes::
 * Translation Opcodes::
 * Character-Class Opcodes::
 * Swap Opcodes::
 * The Context and Multipass Opcodes::
 * The correct Opcode::
 * The match Opcode::
 * Miscellaneous Opcodes::

 Emphasis Opcodes

 * Emphasis class::
 * Contexts::
 * Fallback behavior::
 * Computer braille::

 Contexts

 * None::
 * Letter::
 * Word::
 * Phrase::
 * Symbol::

 Testing Translation Tables interactively

 * lou_debug::
 * lou_trace::
 * lou_checktable::
 * lou_allround::
 * lou_translate (program)::
 * lou_checkhyphens::
 * lou_checkyaml::

 Programming with liblouis

 * Overview (library)::
 * Data structure of liblouis tables::
 * How tables are found::
 * Deprecation of the logging system::
 * lou_version::
 * lou_translateString::
 * lou_translate::
 * lou_backTranslateString::
 * lou_backTranslate::
 * lou_hyphenate::
 * lou_compileString::
 * lou_getTypeformForEmphClass::
 * lou_dotsToChar::
 * lou_charToDots::
 * lou_registerLogCallback::
 * lou_setLogLevel::
 * lou_logFile::
 * lou_logPrint::
 * lou_logEnd::
 * lou_setDataPath::
 * lou_getDataPath::
 * lou_getTable::
 * lou_findTable::
 * lou_indexTables::
 * lou_checkTable::
 * lou_readCharFromFile::
 * lou_free::
 * lou_charSize::
 * Python bindings::

 @end detailmenu
 @end menu

 @node Introduction
 @chapter Introduction

 Liblouis is an open-source braille translator and back-translator
 derived from the translation routines in the BRLTTY screen reader for
 Linux. It has, however, gone far beyond these routines. It is named in
 honor of Louis Braille. In Linux and Mac OSX it is a shared library,
 and in Windows it is a DLL. For installation instructions see the
 README file. Please report bugs and oddities to the mailing list,
 @email{liblouis-liblouisxml@@freelists.org}

 This documentation is derived from the BRLTTY manual, but
 it has been extensively rewritten to cover new features.

 @section Who is this manual for

 This manual has two main audiences: People who want to write or
 improve a braille translation table and people who want to use the
 braille translator library in their own programs. This manual is
 probably not for people who are looking for some turn-key braille
 translation software.

 @section How to read this manual

 If you are mostly interested in writing braille translation tables
 then you want to focus on @ref{How to Write Translation Tables}. You
 might want to look at @ref{Notes on Back-Translation} if you are
 interested in back-translation. Read @ref{Table Metadata} if you want
 to find out how you can augment your tables with metadata in order to
 make them discoverable by programs. Finally @ref{Testing Translation
 Tables interactively} and @ref{Automated Testing of Translation
 Tables} will show how your braille translation tables can be tested
 interactively and also in an automated fashion.

 If you want to use the braille translation library in your own program
 or you are interested in enhancing the braille translation library
 itself then you will want to look at @ref{Programming with liblouis}.

 @node How to Write Translation Tables
 @chapter How to Write Translation Tables

 For many languages there is already a translation table, so before
 creating a new table start by looking at existing tables to modify
 them as needed.

 Typically, a braille translation table consists of several parts.
 First are header and includes, in which you write what the table is
 for, license information and include tables you need for your table.

 Following this, you'll write various translation rules and lastly you
 write special rules to handle certain situations.

 @cindex opcode
 A translation rule is composed of at least three parts: the opcode
 (translation command), character(s) and braille dots. An opcode is a
 command you give to a machine or a program to perform something on
 your behalf. In liblouis, an opcode tells it which rule to use when
 translating characters into braille. An operand can be thought of as
 parameters for the translation rule and is composed of two parts: the
 character or word to be translated and the braille dots.

 For example, suppose you want to read the word @samp{world} using
 braille dots @samp{456}, followed by the letter @samp{W} all the time.
 Then you'd write:

 @example
 always world 456-2456
 @end example

 The word @code{always} is an opcode which tells liblouis to always
 honor this translation, that is to say when the word @samp{world} (an
 operand) is encountered, always show braille dots @samp{456} followed
 by the letter @samp{w} (@samp{2456}).

 When you write any braille table for any language, we'd recommend
 working from some sort of official standard, and have a device or a
 program in which you can test your work.

 @menu
 * Overview::
 * Hyphenation Tables::
 * Character-Definition Opcodes::
 * Braille Indicator Opcodes::
 * Emphasis Opcodes::
 * Special Symbol Opcodes::
 * Special Processing Opcodes::
 * Translation Opcodes::
 * Character-Class Opcodes::
 * Swap Opcodes::
 * The Context and Multipass Opcodes::
 * The correct Opcode::
 * The match Opcode::
 * Miscellaneous Opcodes::
 @end menu

 @node Overview
 @section Overview

 Many translation (contraction) tables have already been made up. They
 are included in the distribution in the tables directory and can be
 studied as part of the documentation. Some of the more helpful (and
 normative) are listed in the following table:

 @table @file
 @item chardefs.cti
 Character definitions for U.S. tables
 @item compress.ctb
 Remove excessive whitespace
 @item en-us-g1.ctb
 Uncontracted American English
 @item en-us-g2.ctb
 Contracted or Grade 2 American English
 @item en-us-brf.dis
 Make liblouis output conform to BRF standard
 @item en-us-comp8.ctb
 8-dot computer braille for use in coding examples
 @item en-us-comp6.ctb
 6-dot computer braille
 @item nemeth.ctb
 Nemeth Code translation for use with liblouisutdml
 @item nemeth_edit.ctb
 Fixes errors at the boundaries of math and text

 @end table

 The names used for files containing translation tables are completely
 arbitrary. They are not interpreted in any way by the translator.
 Contraction tables may be 8-bit ASCII files, UTF-8, 16-bit big-endian
 Unicode files or 16-bit little-endian Unicode files. Blank lines are
 ignored. Any leading and trailing whitespace (any number of blanks
 and/or tabs) is ignored. Lines which begin with a number sign or hatch
 mark (@samp{#}) are ignored, i.e. they are comments. If the number
 sign is not the first non-blank character in the line, it is treated
 as an ordinary character. If the first non-blank character is
 less-than (@samp{<}) the line is also treated as a comment. This makes
 it possible to mark up tables as xhtml documents. Lines which are not
 blank or comments define table entries. The general format of a table
 entry is:

 @example
 opcode operands comments
 @end example

 Table entries may not be split between lines. The opcode is a mnemonic
 that specifies what the entry does. The operands may be character
 sequences, braille dot patterns or occasionally something else. They
 are described for each opcode, please @pxref{Opcode Index}. With some
 exceptions, opcodes expect a certain number of operands. Any text on
 the line after the last operand is ignored, and may be a comment. A
 few opcodes accept a variable number of operands. In this case a
 number sign (@samp{#}) begins a comment unless it is preceded by a
 backslash (@samp{\}).

 Here are some examples of table entries.

 @example
 # This is a comment.
 always world 456-2456 A word and the dot pattern of its contraction
 @end example

 Most opcodes have both a "characters" operand and a "dots" operand,
 though some have only one and a few have other types.

 @cindex characters operand
 The characters operand consists of any combination of characters and
 escape sequences proceeded and followed by whitespace. Escape
 sequences are used to represent difficult characters. They begin with
 a backslash (@samp{\}). They are:

 @table @kbd
 @item \
 backslash
 @item \f
 form feed
 @item \n
 new line
 @item \r
 carriage return
 @item \s
 blank (space)
 @item \t
 horizontal tab
 @item \v
 vertical tab
 @item \e
 "escape" character (hex 1b, dec 27)
 @item \xhhhh
 4-digit hexadecimal value of a character

 @end table

 If liblouis has been compiled for 32-bit Unicode the following are
 also recognized.

 @table @kbd
 @item \yhhhhh
 5-digit (20 bit) character
 @item \zhhhhhhhh
 Full 32-bit value.

 @end table

 @cindex dots operand
 The dots operand is a braille dot pattern. The real braille dots, 1
 through 8, must be specified with their standard numbers.

 @cindex virtual dots
 @anchor{virtual dots}
 liblouis recognizes @emph{virtual dots}, which are used for special
 purposes, such as distinguishing accent marks. There are seven virtual
 dots. They are specified by the number 9 and the letters @samp{a}
 through @samp{f}.

 @cindex multi-cell dot pattern
 For a multi-cell dot pattern, the cell specifications must be
 separated from one another by a dash (@samp{-}). For example, the
 contraction for the English word @samp{lord} (the letter @samp{l}
 preceded by dot 5) would be specified as @samp{5-123}. A space may be
 specified with the special dot number 0.

 An opcode which is helpful in writing translation tables is
 @code{include}. Its format is:

 @example
 include filename
 @end example

 It reads the file indicated by @code{filename} and incorporates or
 includes its entries into the table. Included files can include other
 files, which can include other files, etc. For an example, see what
 files are included by the entry @code{include en-us-g1.ctb} in the table
 @file{en-us-g2.ctb}. If the included file is not in the same directory
 as the main table, use a full path name for filename. Tables can also be
 specified in a table list, in which the table names are separated by
 commas and given as a single table name in calls to the translation
 functions.

 The order of the various types of opcodes or table entries is
 important. Character-definition opcodes should come first. However, if
 the optional @opcoderef{display} is used it should precede
 character-definition opcodes. Braille-indicator opcodes should come
 next. Translation opcodes should follow. The @opcoderef{context} is a
 translation opcode, even though it is considered along with the
 multipass opcodes. These latter should follow the translation opcodes.
 The @opcoderef{correct} can be used anywhere after the
 character-definition opcodes, but it is probably a good idea to group
 all @code{correct} opcodes together. The @opcoderef{include} can be
 used anywhere, but the order of entries in the combined table must
 conform to the order given above. Within each type of opcode, the
 order of entries is generally unimportant. Thus the translation
 entries can be grouped alphabetically or in any other order that is
 convenient. Hyphenation tables may be specified either with an
 @code{include} opcode or as part of a table list. They should come after
 everything else. Character-definition opcodes are necessary for
 hyphenation tables to work.

 @node Hyphenation Tables
 @section Hyphenation Tables

 Hyphenation tables are necessary to make opcodes such as the
 @opcoderef{nocross} function properly. There are no opcodes for
 hyphenation table entries because these tables have a special format.
 Therefore, they cannot be specified as part of an ordinary table.
 Rather, they must be included using the @opcoderef{include} or as part
 of a table list. The liblouis hyphenation algorithm was adopted from the
 one used by OpenOffice. Note that Hyphenation tables must follow
 character definitions and should preferably be the last. For an example
 of a hyphenation table, see @file{hyph_en_US.dic}.

 @node Character-Definition Opcodes
 @section Character-Definition Opcodes

 These opcodes are needed to define attributes such as digit,
 punctuation, letter, etc. for all characters and their dot patterns.
 liblouis has no built-in character definitions, but such definitions
 are essential to the operation of the @opcoderef{context}, the
 @opcoderef{correct}, the multipass opcodes and the back-translator. If
 the dot pattern is a single cell, it is used to define the mapping
 between dot patterns and characters, unless a @opcoderef{display} for
 that character-dot-pattern pair has been used previously. If only a
 single-cell dot pattern has been given for a character, that dot
 pattern is defined with the character's own attributes. If more than
 one cell is given and some of them have not previously been defined as
 single cells, the undefined cells are entered into the dots table with
 the space attribute. This is done for backward compatibility with
 old tables, but it may cause problems with the above opcodes or
 back-translation. For this reason, every single-cell dot pattern
 should be defined before it is used in a multi-cell character
 representation. The best way to do this is to use the 8-dot computer
 braille representation for the particular braille code. If a character
 or dot pattern used in any rule, except those with the @code{display}
 opcode, the @opcoderef{repeated} or the @opcoderef{replace}, is not
 defined by one of the character-definition opcodes, liblouis will give
 an error message and refuse to continue until the problem is fixed. If
 the translator or back-translator encounters an undefined character in
 its input it produces a succinct error indication in its output, and
 the character is treated as a space.

 You may have multiple definitions of a character using the same or
 different dot patterns. If you use different dot patterns for the same
 character, only the first dot pattern will be used during forward
 translation. However, during back-translation, all the relevant dot
 patterns will back-translate to the character you defined.

 You can also define a character multiple times using the same dot
 pattern for the character, but using different character classes. The
 following example would define the character @samp{*} (star) as both
 @opcoderef{math} and @opcoderef{sign}.

 @example
 math * 16
 sign * 16
 @end example

 Likewise, you can define multiple characters as the same dot pattern.
 The characters you define this way will be forward translated to the
 same dot pattern. However, when back-translating, the dot pattern will
 always back-translate to the first character that was defined with
 this pattern.

 This technique may be useful when defining characters that have one
 representation in the Windows character set (CP1252) and another
 representation in the Unicode character set, e.g. the Euro sign,
 @samp{€}. It may also be of use when you have to define several
 variants of the same letter with different accents, which may be
 represented in your Braille code by the same dot pattern. This is a
 very common practice for accented letters that are foreign to the
 Braille code. In the following example using the @opcoderef{uplow}
 opcode, both e acute (@samp{é}) and e grave (@samp{è}) are defined as
 dot 4 followed by dots 1 and 5.

 @example
 uplow \x00c9\x00e9 4-15 # E acute
 uplow \x00c8\x00e8 4-15 # E grave
 @end example

 In this example, the dot pattern would always back-translate to e
 acute, since this is the first definition. You could use the
 @opcoderef{correct} opcode to correct at least the most common errors
 on that account. However, there is no fail-safe way to know what
 accented letter to use when you back-translate from a dot pattern
 representing more than one variant.

 @table @code
 @opcode{space, character dots}
 Defines a character as a space and also defines the dot pattern as
 such. for example:

 @example
 space \s 0 \s is the escape sequence for blank; 0 means no dots.
 @end example

 @opcode{punctuation, character dots}
 Associates a punctuation mark in the particular language with a
 braille representation and defines the character and dot pattern as
 punctuation. For example:

 @example
 punctuation . 46 dot pattern for period in NAB computer braille
 @end example

 @opcode{digit, character dots}
 Associates a digit with a dot pattern and defines the character as a
 digit. For example:

 @example
 digit 0 356 NAB computer braille
 @end example

 @opcode{uplow, characters dots [@comma{}dots]}
 The characters operand must be a pair of letters, of which the first
 is uppercase and the second lowercase. The first dots suboperand
 indicates the dot pattern for the upper-case letter. It may have more
 than one cell. The second dots suboperand must be separated from the
 first by a comma and is optional, as indicated by the square brackets.
 If present, it indicates the dot pattern for the lower-case letter. It
 may also have more than one cell. If the second dots suboperand is not
 present the first is used for the lower-case letter as well as the
 upper-case letter. This opcode is needed because not all languages
 follow a consistent pattern in assigning Unicode codes to upper and
 lower case letters. It should be used even for languages that do. The
 distinction is important in the forward translator. for example:

 @example
 uplow Aa 17,1
 @end example

 @opcode{grouping, name characters dots @comma{}dots}
 This opcode is used to indicate pairs of grouping symbols used in
 processing mathematical expressions. These symbols are usually
 generated by the MathML interpreter in liblouisutdml. They are used in
 multipass opcodes. The name operand must contain only letters, but
 they may be upper- or lower-case. The characters operand must contain
 exactly two Unicode characters. The dots operand must contain exactly
 two braille cells, separated by a comma. Note that grouping dot
 patterns also need to be declared with the @opcoderef{exactdots}. The
 characters may need to be declared with the @opcoderef{math}.

 @example
 grouping mrow \x0001\x0002 1e,2e
 grouping mfrac \x0003\x0004 3e,4e
 @end example

 @opcode{letter, character dots}
 Associates a letter in the language with a braille representation and
 defines the character as a letter. This is intended for letters which
 are neither uppercase nor lowercase.

 @opcode{lowercase, character dots}
 Associates a character with a dot pattern and defines the character as
 a lowercase letter. Both the character and the dot pattern have the
 attributes lowercase and letter.

 @opcode{uppercase, character dots}
 Associates a character with a dot pattern and defines the character as
 an uppercase letter. Both the character and the dot pattern have the
 attributes uppercase and letter. @code{lowercase} and @code{uppercase}
 should be used when a letter has only one case. Otherwise use the
 @opcoderef{uplow}.

 @opcode{litdigit, digit dots}
 Associates a digit with the dot pattern which should be used to
 represent it in literary texts. For example:

 @example
 litdigit 0 245
 litdigit 1 1
 @end example

 @opcode{sign, character dots}
 Associates a character with a dot pattern and defines both as a sign.
 This opcode should be used for things like at sign (@samp{@@}),
 percent (@samp{%}), dollar sign (@samp{$}), etc. Do not use it to
 define ordinary punctuation such as period and comma. For example:

 @example
 sign % 4-25-1234 literary percent sign
 @end example

 @opcode{math, character dots}
 Associates a character and a dot pattern and defines them as a
 mathematical symbol. It should be used for less than (@samp{<}),
 greater than(@samp{>}), equals(@samp{=}), plus(@samp{+}), etc. For
 example:

 @example
 math + 346 plus
 @end example

 @end table

 @node Braille Indicator Opcodes
 @section Braille Indicator Opcodes

 Braille indicators are dot patterns which are inserted into the
 braille text to indicate such things as capitalization, italic type,
 computer braille, etc. The opcodes which define them are followed only
 by a dot pattern, which may be one or more cells.

 @table @code
 @opcode{capsletter, dots}
 The dot pattern which indicates capitalization of a single letter. In
 English, this is dot 6. For example:

 @example
 capsletter 6
 @end example

 @opcode{begcapsword, dots}
 The dot pattern which begins a block of capital letters at the
 beginning or within a word. For example:

 @example
 begcapsword 6-6
 @end example

 @opcode{endcapsword, dots}
 The dot pattern which ends a block of capital letters within a word.
 For example:

 @example
 endcapsword 6-3
 @end example

 @opcode{begcaps, dots}
 The dot pattern which begins a block of capital letters defined by the
 provided @code{typeform} without regard for any other rules. For
 example:

 @example
 begcaps 6-6
 @end example

 @opcode{endcaps, dots}
 The dot pattern which ends a block of capital letters defined by the
 provided @code{typeform} without regard for any other rules. For
 example:

 @example
 endcaps 6-3
 @end example

 @opcode{letsign, dots}
 This indicator is needed in Grade 2 to show that a single letter is
 not a contraction. It is also used when an abbreviation happens to be
 a sequence of letters that is the same as a contraction. For example:

 @example
 letsign 56
 @end example

 @opcode{noletsign, letters}

 The letters in the operand will not be proceeded by a letter sign.
 More than one @code{noletsign} opcode can be used. This is equivalent
 to a single entry containing all the letters. In addition, if a single
 letter, such as @samp{a} in English, is defined as a @code{word}
 (@pxref{word opcode,word,@code{word}}) or @code{largesign}
 (@pxref{largesign opcode,largesign,@code{largesign}}), it will be
 treated as though it had also been specified in a @code{noletsign}
 entry.

 @opcode{noletsignbefore, characters}
 If any of the characters proceeds a single letter without a space a
 letter sign is not used. By default the characters apostrophe
 (@samp{'}) and period (@samp{.}) have this property. Use of a
 @code{noletsignbefore} entry cancels the defaults. If more than one
 @code{noletsignbefore} entry is used, the characters in all entries
 are combined.

 @opcode{noletsignafter, characters}
 If any of the characters follows a single letter without a space a
 letter sign is not used. By default the characters apostrophe
 (@samp{'}) and period (@samp{.}) have this property. Use of a
 @code{noletsignafter} entry cancels the defaults. If more than one
 @code{noletsignafter} entry is used the characters in all entries are
 combined.

 @opcode{nocontractsign, dots}

 The dots in this opcode are used to indicate a letter or a sequence of
 letters that are not a contraction, e.g. @samp{CD}. The opcode is
 similar to the @opcoderef{letsign}.

 @c FIXME: In what way is the nocontractsign opcode different from the
 @c letsign opcode, apart from apparently being a more focused version of
 @c letsign?

 @opcode{numsign, dots}
 The translator inserts this indicator before numbers made up of digits
 defined with the @opcoderef{litdigit} to show that they are a number
 and not letters or some other symbols. For example:

 @example
 numsign 3456
 @end example

 @end table

 @node Emphasis Opcodes
 @section Emphasis Opcodes

 In many braille systems emphasis such as bold, italics or underline is
 indicated using special dot patterns that mark the start and often
 also the end. For some languages these braille indicators differ
 depending on the context, i.e. here is an separate indicator for an
 emphasized word and another one for an emphasized phrase. To
 accommodate for all these usage scenarios liblouis provides a number of
 opcodes for various contexts.

 At the same time some braille systems use different indicators for
 different kinds of emphasis while others know only one kind of
 emphasis. For that reason liblouis doesn't hard code any emphasis but
 the table author defines which kind of emphasis exist for a specific
 language using the @opcoderef{emphclass} opcode.

 @menu
 * Emphasis class::
 * Contexts::
 * Fallback behavior::
 * Computer braille::
 @end menu

 @node Emphasis class
 @subsection Emphasis class

 The @code{emphclass} opcode defines the classes of emphasis that are
 relevant for a particular language. For all emphasis that need special
 indicators an emphasis class has to be declared.

 @table @code
 @opcode{emphclass, <emphasis class>}
 Define an emphasis class to be used later in other emphasis related
 opcodes in the table.

 @example
 emphclass italic
 emphclass underline
 emphclass bold
 emphclass transnote
 @end example

 @end table

 @node Contexts
 @subsection Contexts

 In order to understand the capabilities of Liblouis for emphasis
 handling we have to look at the different contexts that are supported.

 @menu
 * None::
 * Letter::
 * Word::
 * Phrase::
 * Symbol::
 @end menu

 @node None
 @subsubsection None

 For some languages there is no such concept as contexts. Emphasis is
 always handled the same regardless of context. There is simply an
 indicator for the beginning of emphasis and another one for the end of
 the emphasis.

 @table @code
 @opcode{begemph, <emphasis class> <dot pattern>}
 Braille dot pattern to indicate the beginning of emphasis.

 @example
 begemph italic 46-3
 @end example

 @opcode{endemph, <emphasis class> <dot pattern>}
 Braille dot pattern to indicate the end of emphasis.

 @example
 endemph italic 46-36
 @end example

 @end table

 @node Letter
 @subsubsection Letter

 Some languages have special indicators for single letter emphasis.

 @table @code
 @opcode{emphletter, <emphasis class> <dot pattern>}
 Braille dot pattern to indicate that the next character is emphasized.

 @example
 emphletter italic 46-25
 @end example

 @end table

 @node Word
 @subsubsection Word

 Many languages have special indicators for emphasized words. Usually
 they start at the beginning of the word and and implicitly, i.e.
 without a closing indicator at the end of the word. There are also use
 cases where the emphasis starts in the middle of the word and an
 explicit closing indicator is required.

 @table @code
 @opcode{begemphword, <emphasis class> <dot pattern>}
 Braille dot pattern to indicate the beginning of an emphasized word
 or the beginning of emphasized characters within a word.

 @example
 begemphword underline 456-36
 @end example

 @opcode{endemphword, <emphasis class>  <dot pattern>}
 Generally emphasis with word context ends when the word ends. However
 when an indication is required to close a word emphasis then this
 opcode defines the Braille dot pattern that indicates the end of a word
 emphasis.

 @example
 endemphword transnote 6-3
 @end example

 If emphasis ends in the middle of a word the Braille dot pattern
 defined in this opcode is also used.

 @end table

 @node Phrase
 @subsubsection Phrase

 Many languages have a concept of a phrase where the emphasis is valid
 for a number of words. The beginning of the phase is indicated with a
 braille dot pattern and a closing indicator is put before or after the
 last word of the phrase. To define how many words are considered a
 phrase in your language use the @opcoderef{lenemphphrase}.

 @table @code
 @opcode{begemphphrase, <emphasis class> <dot pattern>}
 Braille dot pattern to indicate the beginning of a phrase.

 @example
 begemphphrase bold 456-46-46
 @end example

 @c define a special opcode macro that can handle the two-word nature
 @c of the endemphphrase opcode
 @macro endemphphraseopcode{where}
 @opcodeindex endemphphrase \where\
 @anchor{endemphphrase \where\ opcode}
 @item endemphphrase <emphasis class> \where\ <dot pattern>
 @end macro

 @endemphphraseopcode{before}
 Braille dot pattern to indicate the end of a phrase. The closing indicator
 will be placed before the last word of the phrase.

 @example
 endemphphrase bold before 456-46
 @end example

 @endemphphraseopcode{after}
 Braille dot pattern to indicate the end of a phrase. The closing
 indicator will be placed after the last word of the phrase. If both
 @code{endemphphrase <emphasis class> before} and @code{endemphphrase
 <emphasis class> after} are defined an error will be signaled.

 @example
 endemphphrase underline after 6-3
 @end example

 @opcode{lenemphphrase, <emphasis class> <number>}
 Define how many words are required before a sequence of words is
 considered a phrase.

 @example
 lenemphphrase underline 3
 @end example

 @end table

 @node Symbol
 @subsubsection Symbol
 UEB has a concept of symbols that need special indication. When the
 translator detects an emphasis sequence that needs to be indicated
 with the rules for a symbol then it will use the dots defined with the
 @opcoderef{emphletter}. To indicate the end of the symbol it will use
 the dots defined in the @opcoderef{endemphword}.

 @node Fallback behavior
 @subsection Fallback behavior

 Many braille systems either handle emphasis using no contexts or
 otherwise by employing a combination of the letter, word and phrase
 contexts. So if a table defines any opcodes for the letter, word or
 phrase contexts then liblouis will signal an error for opcodes that
 define emphasis with no context. In other words contrary to previous
 versions of liblouis there is no fallback behavior.

 As a consequence, there will only be emphasis for a context when the
 table defines it. So for example when defining a braille dot pattern
 for phrases and not for words liblouis will not indicate emphasis on
 words that aren't part of a phrase.

 @node Computer braille
 @subsection Computer braille

 For computer braille there are only two braille indicators, for the
 beginning and end of a sequence of characters to be rendered in
 computer braille. Such a sequence may also have other emphasis. The
 computer braille indicators are applied not only when computer braille
 is indicated in the @code{typeform} parameter, but also when a
 sequence of characters is determined to be computer braille because it
 contains a subsequence defined by the @opcoderef{compbrl}.

 @node Special Symbol Opcodes
 @section Special Symbol Opcodes

 These opcodes define certain symbols, such as the decimal point, which
 require special treatment.

 @table @code
 @opcode{decpoint, character dots}

 This opcode defines the decimal point. It is useful if your Braille
 code requires the decimal separator to show as a dot pattern different
 from the normal representation of this character, i.e. period or
 comma. In addition, it allows the notation @samp{.001} to be
 translated correctly. This notation is common in some languages
 instead of @samp{0.001} (no leading 0). When you use the
 @code{decpoint} opcode, the decimal point will be taken to be part of
 the number and correctly preceded by number sign.

 The character operand must have only one character. For example, in
 @file{en-us-g1.ctb} we have:

 @example
 decpoint . 46
 @end example

 @opcode{hyphen, character dots}
 This opcode defines the hyphen, that is, the character used in
 compound words such as @samp{have-nots}. The back-translator uses it
 to determine the end of individual words.

 @end table

 @node Special Processing Opcodes
 @section Special Processing Opcodes

 These opcodes cause special processing to be carried out.

 @table @code
 @opcode{capsnocont,}
 This opcode has no operands. If it is specified, words or parts of
 words in all caps are not contracted. This is needed for languages
 such as Norwegian.

 Note: If you use the capsnocont opcode and do not define the
       @opcoderef{begcapsword} indicator, every cap will be marked with the
 @opcoderef{capsletter} indicator. This is useful if you need to process caps
 separately in a later pass.

 @end table

 @node Translation Opcodes
 @section Translation Opcodes

 These opcodes define the braille representations for character
 sequences. Each of them defines an entry within the contraction table.
 These entries may be defined in any order except, as noted below, when
 they define alternate representations for the same character sequence.

 Each of these opcodes specifies a condition under which the
 translation is legal, and each also has a characters operand and a
 dots operand. The text being translated is processed strictly from
 left to right, character by character, with the most eligible entry
 for each position being used. If there is more than one eligible entry
 for a given position in the text, then the one with the longest
 character string is used. If there is more than one eligible entry for
 the same character string, then the one defined first is is tested for
 legality first. (This is the only case in which the order of the
 entries makes a difference.)

 The characters operand is a sequence or string of characters preceded
 and followed by whitespace. Each character can be entered in the
 normal way, or it can be defined as a four-digit hexadecimal number
 preceded by @samp{\x}.

 The dots operand defines the braille representation for the characters
 operand. It may also be specified as an equals sign (@samp{=}). This
 means that the the default representation for each character
 (@pxref{Character-Definition Opcodes}) within the sequence is to be
 used. Note however that the @samp{=} shortcut for dot patterns is
 deprecated. Dot patterns should be written out. Otherwise
 back-translation may not be correct.

 In what follows the word @samp{characters} means a sequence of one or
 more consecutive letters between spaces and/or punctuation marks.

 @table @code

 @opcode{noback, opcode ...}
 This is an opcode prefix, that is to say, it modifies the operation of
 the opcode that follows it on the same line. noback specifies that
 back-translation is not to use information on this line.

 @example
 noback always ;\s; 0
 @end example

 @opcode{nofor, opcode ...}
 This is an opcode prefix which modifies the operation of the opcode
 following it on the same line. nofor specifies that forward translation
 is not to use the information on this line.

 @opcode{compbrl, characters}
 If the characters are found within a block of text surrounded by
 whitespace the entire block is translated according to the default
 braille representations defined by the @ref{Character-Definition
 Opcodes}, if 8-dot computer braille is enabled or according to the dot
 patterns given in the @opcoderef{comp6}, if 6-dot computer braille is
 enabled. For example:

 @example
 compbrl www translate URLs in computer braille
 @end example

 @opcode{comp6, character dots}
 This opcode specifies the translation of characters in 6-dot computer
 braille. It is necessary because the translation of a single character
 may require more than one cell. The first operand must be a character
 with a decimal representation from 0 to 255 inclusive. The second
 operand may specify as many cells as necessary. The opcode is somewhat
 of a misnomer, since any dots, not just dots 1 through 6, can be
 specified. This even includes virtual dots (@pxref{virtual dots}).

 @opcode{nocont, characters}
 Like @code{compbrl}, except that the string is uncontracted.
 @opcoderef{prepunc} and @opcoderef{postpunc} rules are applied,
 however. This is useful for specifying that foreign words should not
 be contracted in an entire document.

 @opcode{replace, characters @{characters@}}
 Replace the first set of characters, no matter where they appear, with
 the second. Note that the second operand is @emph{NOT} a dot pattern.
 It is also optional. If it is omitted the character(s) in the first
 operand will be discarded. This is useful for ignoring characters. It
 is possible that the "ignored" characters may still affect the
 translation indirectly. Therefore, it is preferable to use
 @opcoderef{correct}.

 @opcode{always, characters dots}
 Replace the characters with the dot pattern no matter where they
 appear. Do @emph{NOT} use an entry such as @code{always a 1}. Use the
 @code{uplow}, @code{letter}, etc. character definition opcodes
 instead. For example:

 @example
 always world 456-2456 unconditional translation
 @end example

 @opcode{repeated, characters dots}
 Replace the characters with the dot pattern no matter where they
 appear. Ignore any consecutive repetitions of the same character
 sequence. This is useful for shortening long strings of spaces or
 hyphens or periods. For example:

 @example
 repeated --- 36-36-36 shorten separator lines made with hyphens
 @end example

 @opcode{repword, characters dots}
 When characters are encountered check to see if the word before this
 string matches the word after it. If so, replace characters with dots
 and eliminate the second word and any word following another
 occurrence of characters that is the same. This opcode is used in
 Malaysian braille. In this case the rule is:

 @example
 repword - 123456
 @end example

 @opcode{largesign, characters dots}
 Replace the characters with the dot pattern no matter where they
 appear. In addition, if two words defined as large signs follow each
 other, remove the space between them. For example, in
 @file{en-us-g2.ctb} the words @samp{and} and @samp{the} are both
 defined as large signs. Thus, in the phrase @samp{the cat and the dog}
 the space would be deleted between @samp{and} and @samp{the}, with the
 result @samp{the cat andthe dog}. Of course, @samp{and} and @samp{the}
 would be properly contracted. The term @code{largesign} is a bit of
 braille jargon that pleases braille experts.

 @opcode{word, characters dots}
 Replace the characters with the dot pattern if they are a word, that
 is, are surrounded by whitespace and/or punctuation.

 @opcode{syllable, characters dots}
 As its name indicates, this opcode defines a "syllable" which must be
 represented by exactly the dot patterns given. Contractions may not
 cross the boundaries of this "syllable" either from left or right. The
 character string defined by this opcode need not be a lexical
 syllable, though it usually will be. The equal sign in the following
 example means that the the default representation for each character
 within the sequence is to be used (@pxref{Translation Opcodes}):

 @example
 syllable horse = sawhorse, horseradish
 @end example

 @opcode{nocross, characters dots}
 Replace the characters with the dot pattern if the characters are all
 in one syllable (do not cross a syllable boundary). For this opcode to
 work, a hyphenation table must be included. If this is not done,
 @code{nocross} behaves like the @opcoderef{always}. For example, if
 the English Grade 2 table is being used and the appropriate
 hyphenation table has been included @code{nocross sh 146} will cause
 the @samp{sh} in @samp{monkshood} not to be contracted.

 @opcode{joinword, characters dots}
 Replace the characters with the dot pattern if they are a word which
 is followed by whitespace and a letter. In addition remove the
 whitespace. For example, @file{en-us-g2.ctb} has @code{joinword to
 235}. This means that if the word @samp{to} is followed by another
 word the contraction is to be used and the space is to be omitted. If
 these conditions are not met, the word is translated according to any
 other opcodes that may apply to it.

 @opcode{lowword, characters dots}
 Replace the characters with the dot pattern if they are a word
 preceded and followed by whitespace. No punctuation either before or
 after the word is allowed. The term @code{lowword} derives from the
 fact that in English these contractions are written in the lower part
 of the cell. For example:

 @example
 lowword were 2356
 @end example

 @opcode{contraction, characters}
 If you look at @file{en-us-g2.ctb} you will see that some words are
 actually contracted into some of their own letters. A famous example
 among braille transcribers is @samp{also}, which is contracted as
 @samp{al}. But this is also the name of a person. To take another
 example, @samp{altogether} is contracted as @samp{alt}, but this is
 the abbreviation for the alternate key on a computer keyboard.
 Similarly @samp{could} is contracted into @samp{cd}, but this is the
 abbreviation for compact disk. To prevent confusion in such cases, the
 letter sign (see @opcoderef{letsign}) is placed before such letter
 combinations when they actually are abbreviations, not contractions.
 The @code{contraction} opcode tells the translator to do this.

 @opcode{sufword, characters dots}
 Replace the characters with the dot pattern if they are either a word
 or at the beginning of a word.

 @opcode{prfword, characters dots}
 Replace the characters with the dot pattern if they are either a word
 or at the end of a word.

 @opcode{begword, characters dots}
 Replace the characters with the dot pattern if they are at the
 beginning of a word.

 @opcode{begmidword, characters dots}
 Replace the characters with the dot pattern if they are either at the
 beginning or in the middle of a word.

 @opcode{midword, characters dots}
 Replace the characters with the dot pattern if they are in the middle
 of a word.

 @opcode{midendword, characters dots}
 Replace the characters with the dot pattern if they are either in the
 middle or at the end of a word.

 @opcode{endword, characters dots}
 Replace the characters with the dot pattern if they are at the end of
 a word.

 @opcode{partword, characters dots}
 Replace the characters with the dot pattern if the characters are
 anywhere in a word, that is, if they are proceeded or followed by a
 letter.

 @opcode{exactdots, @@dots}
 Note that the operand must begin with an at sign (@samp{@@}). The dot
 pattern following it is evaluated for validity. If it is valid,
 whenever an at sign followed by this dot pattern appears in the source
 document it is replaced by the characters corresponding to the dot
 pattern in the output. This opcode is intended for use in liblouisutdml
 semantic-action files to specify exact dot patterns, as in
 mathematical codes. For example:

 @example
 exactdots @@4-46-12356
 @end example
 will produce the characters with these dot patterns in the output.

 @opcode{prepunc, characters dots}
 Replace the characters with the dot pattern if they are part of
 punctuation at the beginning of a word.

 @opcode{postpunc, characters dots}
 Replace the characters with the dot pattern if they are part of
 punctuation at the end of a word.

 @opcode{begnum, characters dots}
 Replace the characters with the dot pattern if they are at the
 beginning of a number, that is, before all its digits. For example, in
 @file{en-us-g1.ctb} we have @code{begnum # 4}.

 @opcode{midnum, characters dots}
 Replace the characters with the dot pattern if they are in the middle
 of a number. For example, @file{en-us-g1.ctb} has @code{midnum . 46}.
 This is because the decimal point has a different dot pattern than the
 period.

 @opcode{endnum, characters dots}
 Replace the characters with the dot pattern if they are at the end of
 a number. For example @file{en-us-g1.ctb} has @code{endnum th 1456}.
 This handles things like @samp{4th}. A letter sign is @emph{NOT}
 inserted.

 @opcode{joinnum, characters dots}
 Replace the characters with the dot pattern. In addition, if
 whitespace and a number follows omit the whitespace. This opcode can
 be used to join currency symbols to numbers for example:

 @example
 joinnum \x20AC 15 (EURO SIGN)
 joinnum \x0024 145 (DOLLAR SIGN)
 joinnum \x00A3 1234 (POUND SIGN)
 joinnum \x00A5 13456 (YEN SIGN)
 @end example

 @end table

 @node Character-Class Opcodes
 @section Character-Class Opcodes

 These opcodes define and use character classes. A character class
 associates a set of characters with a name. The name then refers to
 any character within the class. A character may belong to more than
 one class.

 The basic character classes correspond to the character definition
 opcodes, with the exception of the @opcoderef{uplow}, which defines
 characters belonging to the two classes @code{uppercase} and
 @code{lowercase}. These classes are:

 @table @code
 @item space
 Whitespace characters such as blank and tab
 @item digit
 Numeric characters
 @item letter
 Both uppercase and lowercase alphabetic characters
 @item lowercase
 Lowercase alphabetic characters
 @item uppercase
 Uppercase alphabetic characters
 @item punctuation
 Punctuation marks
 @item sign
 Signs such as percent (@samp{%})
 @item math
 Mathematical symbols
 @item litdigit
 Literary digit
 @item undefined
 Not properly defined

 @end table

 The opcodes which define and use character classes are shown below.
 For examples see @file{el.ctb}.

 @table @code

 @opcode{class, name characters}
 Define a new character class. The characters operand must be specified
 as a string. A character class may not be used until it has been
 defined.

 @opcode{after, class opcode ...}
 The specified opcode is further constrained in that the matched
 character sequence must be immediately preceded by a character
 belonging to the specified class. If this opcode is used more than
 once on the same line then the union of the characters in all the
 classes is used.

 @opcode{before, class opcode ...}
 The specified opcode is further constrained in that the matched
 character sequence must be immediately followed by a character
 belonging to the specified class. If this opcode is used more than
 once on the same line then the union of the characters in all the
 classes is used.

 @end table

 @node Swap Opcodes
 @section Swap Opcodes

 The swap opcodes are needed to tell the @opcoderef{context}, the
 @opcoderef{correct} and multipass opcodes which dot patterns to swap
 for which characters. There are three, @code{swapcd}, @code{swapdd}
 and @code{swapcc}. The first swaps dot patterns for characters. The
 second swaps dot patterns for dot patterns and the third swaps
 characters for characters. The first is used in the @code{context}
 opcode and the second is used in the multipass opcodes. Dot patterns
 are separated by commas and may contain more than one cell.

 @table @code

 @opcode{swapcd, name characters dots@comma{} dots@comma{} dots@comma{} ...}
 See above paragraph for explanation. For example:

 @example
 swapcd dropped 0123456789 356,2,23,...
 @end example

 @opcode{swapdd, name dots@comma{} dots@comma{} dots ... dotpattern1@comma{} dotpattern2@comma{} dotpattern3@comma{} ...}
 The @code{swapdd} opcode defines substitutions for the multipass
 opcodes. In the second operand the dot patterns must be single cells,
 but in the third operand multi-cell dot patterns are allowed. This is
 because multi-cell patterns in the second operand would lead to
 ambiguities.

 @opcode{swapcc, name characters characters}
 The @code{swapcc} opcode swaps characters in its second operand for
 characters in the corresponding places in its third operand. It is
 intended for use with @code{correct} opcodes and can solve problems
 such as formatting phone numbers.

 @end table

 @node The Context and Multipass Opcodes
 @section The Context and Multipass Opcodes

 The @code{context} and multipass opcodes (@code{pass2}, @code{pass3}
 and @code{pass4}) provide translation capabilities beyond those of the
 basic translation opcodes (@pxref{Translation Opcodes}) discussed
 previously. The multipass opcodes cause additional passes to be made
 over the string to be translated. The number after the word
 @code{pass} indicates in which pass the entry is to be applied. If no
 multipass opcodes are given, only the first translation pass is made.
 The @code{context} opcode is basically a multipass opcode for the
 first pass. It differs slightly from the multipass opcodes per se.
 When back-translating, the passes are performed in the reverse order, i.e.
 @code{pass4}, @code{pass3}, @code{pass2}, @code{context}.
 Each of these opcodes must be prefixed by either
 the @opcoderef{noback} or the @opcoderef{nofor}.
 The format of all these opcodes is @code{opcode test action}.
 The specific opcodes are invoked as follows:

 @table @code
 @anchor{context opcode}
 @opcodeindex context
 @opcodeindex pass2
 @opcodeindex pass3
 @opcodeindex pass4
 @item context test action
 @itemx pass2 test action
 @itemx pass3 test action
 @itemx pass4 test action
 @end table

 The @code{test} and @code{action} operands have suboperands. Each
 suboperand begins with a non-alphanumeric character and ends when
 another non-alphanumeric character is encountered. The suboperands and
 their initial characters are as follows.

 @table @kbd
 @item " (double quote)
 a string of characters. This string must be terminated by another
 double quote. It may contain any characters. If a double quote is
 needed within the string, it must be preceded by a backslash
 (@samp{\}). If a space is needed, it must be represented by the escape
 sequence \s. This suboperand is valid
 in the test and action parts of the @code{correct} opcode,
 in the test part of the @code{context} opcode when forward translating,
 and in the action part of the @code{context} opcode when back translating.

 @item @@ (at sign)
 a sequence of dot patterns. Cells are separated by hyphens as usual.
 This suboperand is valid in the test and action parts of
 the @code{pass2}, @code{pass3}, and @code{pass4} opcodes,
 in the action part of the @code{context} opcode when forward translating,
 and in the test part of the @code{context} opcode when back translating.

 @item ` (accent mark)
 If this is the beginning of the string being translated this
 suboperand is true. It is valid only in the test part and must be the
 first thing in this operand.

 @item ~ (tilde)
 If this is the end of the string being translated this suboperand is
 true. It is valid only in the test part and must be the last thing in
 this operand.

 @item $ (dollar sign)
 a string of attributes, such as @samp{d} for digit, @samp{l} for
 letter, etc. More than one attribute can be given. If you wish to
 check characters with any attribute, use the letter @samp{a}. Input
 characters are checked to see if they have at least one of the
 attributes. The attribute string can be followed by numbers specifying
 how many characters are to be checked. If no numbers are given, 1 is
 assumed. If two numbers separated by a hyphen are given, the input is
 checked to make sure that at least the first number of characters with
 the attributes are present, but no more than the second number. If
 only one number is present, then exactly that many characters must
 have the attributes. A period instead of the numbers indicates an
 indefinite number of characters (for technical reasons the number of
 characters that are actually matched is limited to 65535).

 This suboperand is valid in all test parts but not in action parts.
 For the characters which can be used in attribute strings, see the
 following table.

 @item ! (exclamation point)
 reverses the logical meaning of the suboperand which follows. For
 example, !$d is true only if the character is @emph{NOT} a digit. This
 suboperand is valid in test parts only.

 @item % (percent sign)
 the name of a class defined by the @opcoderef{class} or the name of a
 swap set defined by the swap opcodes (@pxref{Swap Opcodes}). Names may
 contain only letters. The letters may be upper or
 lower-case. The case matters. Class names may be used in test parts
 only. Swap names are valid everywhere.

 @item  @{ (left brace)
 Name: the name of a grouping pair. The left brace indicates that the
 first (or left) member of the pair is to be used in matching. If this
 is between replacement brackets it must be the only item. This is also
 valid in the action part.

 @item  @} (right brace)
 Name: the name of a grouping pair. The right brace indicates that the
 second (or right) member is to be used in matching. See the remarks on
 the left brace immediately above.

 @item / (slash)
 Search the input for the expression following the slash and return
 true if found. This can be used to set a variable.

 @item _ (underscore)
 Move backward. If a number follows, move backward that number of
 characters. The default is to move backward one character. This
 suboperand is valid only in test parts. The test fails if moving
 backward beyond the beginning of the input string.

 @item [ (left bracket)
 start replacement here. This suboperand must always be paired with a
 right bracket and is valid only in test parts. Multiple pairs of
 square brackets in a single expression are not allowed.

 @item ] (right bracket)
 end replacement here. This suboperand must always be paired with a
 left bracket and is valid only in test parts.

 @item # (number sign or crosshatch)
 test or set a variable. Variables are referred to by numbers
 (0 through 49), e.g. @code{#1}, @code{#2}, @code{#25}.
 Variables may be set by one @code{context} or multipass opcode and tested
 by another. Thus, an operation that occurs at one place in a translation
 can tell an operation that occurs later within the same pass about itself.
 This feature is used in math translation, and may also help to alleviate
 the need for new opcodes. This suboperand is valid everywhere.

 Variables are set in the action part. To set a variable, use an
 expression like @code{#1=1}. All of the variables are initialized to 0
 at the start of each pass.

 Variables can also be incremented and decremented by one in the action
 part with expressions like @code{#1+} and @code{#3-} respectively.
 An attempt to decrement a variable below 0 is silently ignored.

 Variables are tested in the test part with conditional expressions like:
 @code{#1=2}, @code{#3<4}, @code{#5>6}, @code{#7<=8}, @code{#9>=10}.

 @item * (asterisk)
 Copy the input characters or dot patterns within the replacement brackets
 into the output, and discard anything else that was matched. If there are
 no replacement brackets then copy all of the matched input. This
 suboperand is only valid within the action part. It may be specified any
 number of times. This feature is used, for example, for handling numeric
 subscripts in Nemeth.

 @item ? (question mark)
 Valid only in the action part. The characters to be replaced are
 simply ignored. That is, they are replaced with nothing. If either
 member of a grouping pair is in the replace brackets the other member
 at the same level is also removed.

 @end table

 The characters which can be used in attribute strings are as follows:

 @table @kbd
 @item a
 any attribute
 @item d
 digit
 @item D
 literary digit
 @item l
 letter
 @item m
 math
 @item p
 punctuation
 @item S
 sign
 @item s
 space
 @item U
 uppercase
 @item u
 lowercase
 @item w
 first user-defined class
 @item x
 second user-defined class
 @item y
 third user-defined class
 @item z
 fourth user-defined class
 @end table

 The following illustrates the algorithm how text is evaluated with
 multipass expressions:

 @noindent
 Loop over context, pass2, pass3 and pass4 and do the following for each pass:

 @enumerate a
 @item
 Match the text following the cursor against all expressions in the
 current pass
 @item
 If there is no match: shift the cursor one position to the right and
 continue the loop
 @item
 If there is a match: choose the longest match
 @item
 Do the replacement (everything between square brackets)
 @item
 Place the cursor after the replaced text
 @item
 continue loop
 @end enumerate

 @node The correct Opcode
 @section The correct Opcode

 @table @code
 @opcode{correct, test action}
 Because some input (such as that from an OCR program) may contain
 systematic errors, it is sometimes advantageous to use a
 pre-translation pass to remove them. The errors and their corrections
 are specified by the @code{correct} opcode. If there are no
 @code{correct} opcodes in a table, the pre-translation pass is not used.
 If any back-translation corrections have been specified then they are
 applied in a post-translation (i.e. the very last) pass.

 Note that like the @opcoderef{context} and multi-pass opcodes, the
 @code{correct} opcode must be preceded by @opcoderef{noback} or
 @opcoderef{nofor}.

 The format of the @code{correct} opcode is very similar to that
 of the @opcoderef{context}. The only difference is that in the action
 part strings may be used and dot patterns may not be used. Some
 examples of @code{correct} opcode entries are:

 @example
 noback correct "\\" ? Eliminate backslashes
 noback correct "cornf" "comf" fix a common "scano"
 noback correct "cornm" "comm"
 noback correct "cornp" "comp"
 noback correct "*" ? Get rid of stray asterisks
 noback correct "|" ? ditto for vertical bars
 noback correct "\s?" "?" drop space before question mark
 @end example

 @end table

 @node The match Opcode
 @section The match Opcode

 The match opcode is similar the multipass opcodes and can be seen as
 the more low-level and powerful cousin to the @opcoderef{context}.

 @strong{Note:} For historical reasons despite being fairly similar in
 syntax and functionality both the @opcoderef{context} and the
 @opcoderef{match} exist and are in use in modern braille tables. But
 in the future they might be merged under some common opcode. For that
 reason consider the match opcode @emph{somewhat experimental}.

 @table @code
 @opcode{match, pre-pattern characters post-pattern dots}

 This opcode allows for matching a string of characters via @emph{pre}
 and @emph{post patterns}. The patterns are specified using an
 expression syntax somewhat like regular expressions (@pxref{pattern
 expression syntax}). A single hyphen (@samp{-}) by itself means no
 pattern is specified.

 The following will replace @samp{xyz} with the dots
 @samp{1346-13456-1356} when it appears in the string @samp{abxyzcd}.

 @example
 match ab xyz cd 1346-13456-1356
 @end example

 The following will replace @samp{ONE} with @samp{3456-1} when it
 starts the input and is followed by @samp{:}

 @example
 match ^ ONE : 3456-1
 @end example
 @end table

 @anchor{pattern expression syntax}
 The @code{pre-pattern} and the @code{post-pattern} can contain
 any of the following expressions:

 @table @samp
 @item [ ]
 Expression can be any of the characters between the brackets. If only
 one character present then the brackets are not needed unless it is a
 special character, in which it should be escaped with the backslash.

 @item .
 Expression can be any character.

 @item %[ ]
 Expression is a character with the attributes listed between the
 brackets. If only one character is present then the brackets are not
 needed. The set of attributes are specified as follows:

 @table @samp
 @item _
 space
 @item #
 digit
 @item a
 letter
 @item u
 uppercase
 @item l
 lowercase
 @item .
 punctuation
 @item $
 sign
 @end table

 @item ^
 Match at the end of input processing (or beginning depending of the
 direction pre or post).

 @item $
 Same as @samp{^}.
 @end table

 For example the following will replace @samp{bb} with the dots @samp{23} when it
 is between letters.

 @example
 match %a bb %a 23
 @end example

 The following will replace @samp{con} with the dots @samp{25} when it
 is preceded by a space or beginning of input, and followed by an
 @samp{s} and then any letter.

 @example
 match %[^_] con s%a 25
 @end example

 Similar to regular expressions the pattern expressions can contain
 grouping, quantifiers and even negation:

 @table @samp
 @item ( )
 Expressions between parentheses are grouped together as one
 expression.

 @item !
 The following expression is negated.

 @item ?
 The previous expression must match zero or one times.

 @item *
 The previous expression must match zero or more times.

 @item +
 The previous expression must match one or more times.

 @item |
 Either the previous or the following expressions must match.
 @end table

 For example the following will replace @samp{ing} with the dots
 @samp{346} when it is @emph{not} preceded by a space or beginning of
 input. What follows after the @samp{ing} does not matter, hence the
 @samp{-}.

 @example
 !%[^_] ing - 346
 @end example

 The following will replace @samp{con} with the dots @samp{25} when it
 is preceded by a space, or beginning of input; then followed by a
 @samp{c} that is followed by any character but @samp{h}.

 @example
 match %[^_] con c!h 25
 @end example

 @node Miscellaneous Opcodes
 @section Miscellaneous Opcodes

 @table @code
 @opcode{include, filename}
 Read the file indicated by @code{filename} and incorporate or include
 its entries into the table. Included files can include other files,
 which can include other files, etc. For an example, see what files are
 included by the entry include @file{en-us-g1.ctb} in the table
 @file{en-us-g2.ctb}. If the included file is not in the same directory
 as the main table, use a full path name for filename.

 @opcode{locale, characters}
 Not implemented, but recognized and ignored for backward
 compatibility.

 @opcode{undefined, dots}
 If this opcode is used in a table any characters which have not been
 defined in the table but are encountered in the text will be replaced
 by the dot pattern. If this opcode is not used, any undefined
 characters are replaced by @code{'\xhhhh'}, where the h's are
 hexadecimal digits.

 @opcode{display, character dots}
 Associates dot patterns with the characters which will be sent to a
 braille embosser, display or screen font. The character must be in the
 range 0-255 and the dots must specify a single cell. Here are some
 examples:

 @example
 # When the character a is sent to the embosser or display,
 # it will produce a dot 1.
 display a 1
 @end example

 @example
 # When the character L is sent to the display or embosser
 # it will produce dots 1-2-3.
 display L 123
 @end example

 The @code{display} opcode is optional. It is used when the embosser or
 display has a different mapping of characters to dot patterns than
 that given in @ref{Character-Definition Opcodes}. If used, display
 entries must proceed character-definition entries.

 A possible use case would be to define display opcodes so that the
 result is Unicode braille for use on a display and a second set of
 display opcodes (in a different file) to produce plain ASCII braille
 for use with an embosser.

 @opcode{multind, dots opcode opcode ...}
 The @code{multind} opcode tells the back-translator that a sequence of
 braille cells represents more than one braille indicator. For example,
 in @file{en-us-g2.ctb} we have @code{multind 56-6 letsign capsletter}.
 The back-translator can generally handle single braille indicators,
 but it cannot apply them when they immediately follow each other. It
 recognizes the letter sign if it is followed by a letter and takes
 appropriate action. It also recognizes the capital sign if it is
 followed by a letter. But when there is a letter sign followed by a
 capital sign it fails to recognize the letter sign unless the sequence
 has been defined with @code{multind}. A @code{multind} entry may not
 contain a comment because liblouis would attempt to interpret it as an
 opcode.

 @end table

 @node Notes on Back-Translation
 @chapter Notes on Back-Translation

 @anchor{General Notes}
 @section General Notes

 Back-translation refers to the process of translating backwards, i.e.
 from Braille to text. For many years, Liblouis was mainly concerned
 with forward translation, and so were most of the authors of the
 translation tables. Today however, Liblouis is being used extensively
 in conjunction with screen reading programs like NVDA and JAWS for
 Windows as well as Braille note-takers like BrailleSense from HIMS and
 BrailleNote from HumanWare. So when writing a translation table for
 Liblouis, it is indeed relevant to consider how the table will work
 when used for back-translation, if anything special must be done, or if
 you want to write separate tables for forward translation and
 back-translation.

 Back-translation is generally harder to do in a computer program than
 forward translation. Ideally, any text could be translated to Braille
 and then translated back to text giving exactly the same result as the
 original. However, many Braille codes omit a lot of information and
 leaves it to the reader to fill in the missing bits. An example of this
 is letters with accents. In languages where accents are uncommon, e.g.
 English, Accented letters are usually just marked with a Braille
 indicator stating that there is an accent, but not which accent, even
 though this may be crucial to the meaning of the word or the sentence.
 Another example of this is when not all capital letters are marked in
 the Braille code, but only the "important" capital letters. A third
 example is when a Braille character serves as both a punctuation sign,
 a math sign, and perhaps even as a contraction, and the Braille code
 then leaves it up to the reader to use his/her knowledge of the context
 to decide the meaning of the Braille character.

 In some cases, you may need to bend the rules of the Braille code if it
 is important to create Braille that can be properly back-translated.
 This may include marking all capital letters instead of just the
 "important" ones, or perhaps marking a Braille character with an
 indicator stating that this character should in fact be interpreted as
 a math sign and not a punctuation or Braille contraction. In some
 cases, the best solution may be to create two separate sets of tables
 for forward translation: One set for Braille that must be
 back-translatable (for use with screen readers and note-takers), and
 another for good and nice literary Braille (for embossing).
 But no matter how you bend the Braille code, the back-translation
 process may not be perfect.

 @anchor{Back-translation with Liblouis}
 @section Back-translation with Liblouis

 Back-translation is carried out by the function
 @code{lou_backTranslateString}. Its calling sequence is described in
 @ref{Programming with liblouis}. @code{lou_backTranslateString} first
 performs @code{pass4}, if
 present, then @code{pass3}, then @code{pass2}, then the
 backtranslation, then corrections. Note that this is exactly the
 inverse of forward translation.

 Most opcodes can be preceded by @opcoderef{noback} or @opcoderef{nofor},
 and the @code{correct}, @code{context} and multi-pass opcodes must be
 preceded with either @code{noback} or @code{nofor}. So in most cases,
 it will be perfectly possible to make one table for translation in both
 directions, although a separate table for forward and backward
 translation might be more readable in some cases.

 Most of the opcodes associated with pass 1 have two operands, a
 character operand to the left and a dots operand to the right. During
 forward translation, these operands are used to replace the characters
 with the dot pattern according to the conditions of the opcode. The
 opcode works from left to right. When back-translating, these opcodes
 work the opposite way. The dot patterns are replaced by the text. The
 opcodes work from right to left.

 On the other hand, the @code{correct}, @code{context} and multi-pass
 opcodes have a test part to the left and an action part to the right.
 These opcodes work from left to right in both translation directions.
 The test is performed, and if true, the action is executed, i.e.
 replacing, inserting or deleting characters or dots. This is why a
 translation direction always has to be specified with these opcodes
 using @code{noback} or @code{nofor}.

 @node Table Metadata
 @chapter Table Metadata

 Translation tables may contain metadata. This makes them
 discoverable. Programs may for example use the Liblouis function
 @ref{lou_findTable,@code{lou_findTable}} to find a table based on a
 special query of which the @ref{Query Syntax,syntax} is described
 below.

 @section Syntax

 Metadata must be defined in special comments within the table
 header. The table header is the area at the top of the file, before
 the first translation rule, consisting of only comments or empty
 lines. Any metadata within included tables is ignored.

 A metadata field must be defined on its own line, starting with
 @code{#+}. It has the following syntax:

 @example
 #+<key>: <value>
 @end example

 where @samp{<key>} and @samp{<value>} are sequences of
 one or more characters @code{0} to @code{9}, @code{a} to @code{z},
 @code{A} tot @code{Z}, @code{-} and @code{_}. The colon that separates
 the key and value may have zero or more spaces or tabs on either side.

 A value is optional. In case of no value the colon must be omitted as
 well:

 @example
 #+<key>
 @end example

 There is no restriction on which keys and values are allowed, as long
 as the syntax is correct. However in order to be really useful there
 must be some standard keys and values. A possible grammar is proposed
 on the wiki page
 @url{https://github.com/liblouis/liblouis/wiki/Table-discovery-based-on-table-metadata#standard-metadata-tags, Standard metadata tags}.

 @anchor{Query Syntax}
 @section Query Syntax

 A query that is passed to the @ref{lou_findTable,@code{lou_findTable}}
 function must have the following syntax:

 @example
 <feature1> <feature2> <feature3> ...
 @end example

 where @samp{<feature>} is either:

 @example
 <key>: <value>
 @end example

 or:

 @example
 <key>
 @end example

 Features are separated by one or more spaces or tabs. No spaces are
 allowed around colons.

 @node Testing Translation Tables interactively
 @chapter Testing Translation Tables interactively

 A number of test programs are provided as part of the liblouis
 package. They are intended for testing liblouis and for debugging
 tables. None of them is suitable for braille transcription. An
 application that can be used for transcription is @command{file2brl},
 which is part of the liblouisutdml package (@pxref{Top, , Introduction,
 liblouisutdml, Liblouisutdml User's and Programmer's Manual}). The source
 code of the test programs can be studied to learn how to use the
 liblouis library and they can be used to perform the following
 functions.

 @anchor{common options}
 All of these programs recognize the @option{--help} and
 @option{--version} options.

 @table @option

 @item --help
 @itemx -h
 Print a usage message listing all available options, then exit
 successfully.

 @item --version
 @itemx -v
 Print the version number, then exit successfully.

 @end table

 Most test programs let you specify one or multiple tables to use.
 These tables are usually found in standard locations in the file
 system or local to where the command is executed. @xref{How tables are
 found}, for a description on how the tables are located.

 @menu
 * lou_debug::
 * lou_trace::
 * lou_checktable::
 * lou_allround::
 * lou_translate (program)::
 * lou_checkhyphens::
 * lou_checkyaml::
 @end menu

 @node lou_debug
 @section lou_debug
 @pindex lou_debug

 The @command{lou_debug} tool is intended for debugging liblouis
 translation tables. The command line for @command{lou_debug} is:

 @example
 lou_debug [OPTIONS] TABLE[,TABLE,...]
 @end example

 The command line options that are accepted by @command{lou_debug} are
 described in @ref{common options}.

 The table (or comma-separated list of tables) is compiled. If no
 errors are found a brief command summary is printed, then the prompt
 @samp{Command:}. You can then input one of the command letters and get
 output, as described below.

 Most of the commands print information in the various arrays of
 @code{TranslationTableHeader}. Since these arrays are pointers to
 chains of hashed items, the commands first print the hash number, then
 the first item, then the next item chained to it, and so on. After
 each item there is a prompt indicated by @samp{=>}. You can then press
 enter (@kbd{@key{RET}}) to see the next item in the chain or the first
 item in the next chain. Or you can press @kbd{h} (for next-(h)ash) to
 skip to the next hash chain. You can also press @kbd{e} to exit the
 command and go back to the @samp{command:} prompt.

 @table @kbd
 @item h
 Brings up a screen of somewhat more extensive help.

 @item f
 Display the first forward-translation rule in the first non-empty hash
 bucket. The number of the bucket is displayed at the beginning of the
 chain. Each rule is identified by the word @samp{Rule:}. The fields
 are displayed by phrases consisting of the name of the field, an equal
 sign, and its value. The before and after fields are displayed only if
 they are nonzero. Special opcodes such as the @opcoderef{correct} and
 the multipass opcodes are shown with the code that instructs the
 virtual machine that interprets them. If you want to see only the
 rules for a particular character string you can type @kbd{p} at the
 @samp{command:} prompt. This will take you to the @samp{particular:}
 prompt, where you can press @kbd{f} and then type in the string. The
 whole hash chain containing the string will be displayed.

 @item b
 Display back-translation rules. This display is very similar to that
 of forward translation rules except that the dot pattern is displayed
 before the character string.

 @item c
 Display character definitions, again within their hash chains.

 @item d
 Displays single-cell dot definitions. If a character-definition opcode
 gives a multi-cell dot pattern, it is displayed among the
 back-translation rules.

 @item C
 Display the character-to-dots map. This is set up by the
 character-definition opcodes and can also be influenced by the
 @opcoderef{display}.

 @item D
 Display the dot to character map, which shows which single-cell dot
 patterns map to which characters.

 @item z
 Show the multi-cell dot patterns which have been assigned to the
 characters from 0 to 255 to comply with computer braille codes such as
 a 6-dot code. Note that the character-definition opcodes should use
 8-dot computer braille.

 @item p
 Bring up a secondary (@samp{particular:}) prompt from which you can
 examine particular character strings, dot patterns, etc. The commands
 (given in its own command summary) are very similar to those of the
 main @samp{command:} prompt, but you can type a character string or
 dot pattern. They include @kbd{h}, @kbd{f}, @kbd{b}, @kbd{c}, @kbd{d},
 @kbd{C}, @kbd{D}, @kbd{z} and @kbd{x} (to exit this prompt), but not
 @kbd{p}, @kbd{i} and @kbd{m}.

 @item i
 Show braille indicators. This shows the dot patterns for various
 opcodes such as the @opcoderef{capsletter} and the @opcoderef{numsign}.
 It also shows emphasis dot patterns, such as those for the
 @opcoderef{begemphword}, the @opcoderef{begemphphrase}, etc. If a
 given opcode has not been used nothing is printed for it.

 @item m
 Display various miscellaneous information about the table, such as the
 number of passes, whether certain opcodes have been used, and whether
 there is a hyphenation table.

 @item q
 Exit the program.
 @end table

 @node lou_trace
 @section lou_trace
 @pindex lou_trace

 When working on translation tables it is sometimes useful to determine
 what rules were applied when translating a string. @command{lou_trace}
 helps with exactly that. It list all the the applied rules for a given
 translation table and an input string.

 @example
 lou_trace [OPTIONS] TABLE[,TABLE,...]
 @end example

 Aside from the standard options (@pxref{common options})
 @command{lou_trace} also accepts the following options:

 @table @option

 @item --forward
 @itemx -f
 Trace a forward translation.

 @item --backward
 @itemx -b
 Trace a backward translation.

 @end table

 If no options are given forward translation is assumed.

 Once started you can type an input string followed by @kbd{@key{RET}}.
 @command{lou_trace} will print the braille translation followed by
 list of rules that were applied to produce the translation. A possible
 invocation is listed in the following example:

 @example
 $ lou_trace tables/en-us-g2.ctb
 the u.s. postal service
 ! u4s4 po/al s@}vice
 1.      largesign       the     2346
 2.      repeated                0
 3.      lowercase       u       136
 4.      punctuation     .       46
 5.      context _$l["."]$l      @@256
 6.      lowercase       s       234
 7.      postpunc        .       256
 8.      repeated                0
 9.      begword post    1234-135-34
 10.     largesign       a       1
 11.     lowercase       l       123
 12.     repeated                0
 13.     lowercase       s       234
 14.     always  er      12456
 15.     lowercase       v       1236
 16.     lowercase       i       24
 17.     lowercase       c       14
 18.     lowercase       e       15
 19.     pass2   $s1-10  @@0
 20.     pass2   $s1-10  @@0
 21.     pass2   $s1-10  @@0
 @end example

 @node lou_checktable
 @section lou_checktable
 @pindex lou_checktable

 To use this program type the following:

 @example
 lou_checktable [OPTIONS] TABLE
 @end example

 Aside from the standard options (@pxref{common options})
 @command{lou_checktable} also accepts the following options:

 @table @option

 @item --quiet
 @itemx -q
 Do not write to standard error if there are no errors.

 @end table

 If the table contains errors, appropriate messages will be displayed.
 If there are no errors the message @samp{no errors found.} will be
 shown.

 @node lou_allround
 @section lou_allround
 @pindex lou_allround

 This program tests every capability of the liblouis library. It is
 completely interactive. Invoke it as follows:

 @example
 lou_allround [OPTIONS]
 @end example

 The command line options that are accepted by @command{lou_allround}
 are described in @ref{common options}.

 You will see a few lines telling you how to use the program. Pressing
 one of the letters in parentheses and then enter will take you to a
 message asking for more information or for the answer to a yes/no
 question. Typing the letter @samp{r} and then @key{RET} will take you
 to a screen where you can enter a line to be processed by the library
 and then view the results.

 @node lou_translate (program)
 @section lou_translate
 @pindex lou_translate

 This program translates whatever is on the standard input unit and
 prints it on the standard output unit. It is intended for large-scale
 testing of the accuracy of translation and back-translation. The
 command line for @command{lou_translate} is:

 @example
 lou_translate [OPTION] TABLE[,TABLE,...]
 @end example

 Aside from the standard options (@pxref{common options}) this program
 also accepts the following options:

 @table @option

 @item --forward
 @itemx -f
 Do a forward translation.

 @item --backward
 @itemx -b
 Do a backward translation.

 @end table

 If no options are given forward translation is assumed.

 Use the following command to do a forward translation with translation
 table @file{en-us-g2.ctb}. The resulting braille is ASCII encoded (as
 defined in @file{en-us-g2.ctb}).

 @example
 lou_translate --forward en-us-g2.ctb < input.txt
 @end example

 The next example illustrates a forward translation with translation
 table @file{en-us-g2.ctb} and display table @file{unicode.dis}. The
 resulting braille is encoded as Unicode dot patterns (as defined in
 @file{unicode.dis}).

 @example
 lou_translate --forward unicode.dis,en-us-g2.ctb < input.txt
 @end example

 Use a pipe if you would rather just pass some given text to the
 translator.

 @example
 echo "The quick brown fox jumps over the lazy dog" | lou_translate -f unicode.dis,en-us-g2.ctb
 @end example

 The result will be written to standard output:

 @example
 ⠠⠮ ⠟⠅ ⠃⠗⠪⠝ ⠋⠕⠭ ⠚⠥⠍⠏⠎ ⠕⠧⠻ ⠮ ⠇⠁⠵⠽ ⠙⠕⠛
 @end example

 Backward translation can be done as follows:

 @example
 echo ",! qk br@{n fox jumps ov@} ! lazy dog" | lou_translate --backward en-us-g2.ctb
 @end example

 which results in

 @example
 The quick brown fox jumps over the lazy dog
 @end example

 You can also do a backward translation using Unicode dot patterns

 @example
 echo "⠠⠮ ⠟⠅ ⠃⠗⠪⠝ ⠋⠕⠭" | lou_translate --backward unicode.dis,en-us-g2.ctb
 @end example

 resulting in

 @example
 The quick brown fox
 @end example

 @node lou_checkhyphens
 @section lou_checkhyphens
 @pindex lou_checkhyphens

 This program checks the accuracy of hyphenation in Braille translation
 for both translated and untranslated words. It is completely
 interactive. Invoke it as follows:

 @example
 lou_checkhyphens [OPTIONS]
 @end example

 The command line options that are accepted by
 @command{lou_checkhyphens} are described in @ref{common options}.

 You will see a few lines telling you how to use the program.

 @node lou_checkyaml
 @section lou_checkyaml
 @pindex lou_checkyaml

 This program tests a liblouis table against a corpus of known good
 Braille translations defined in YAML format. For a description of the
 format refer to @ref{YAML Tests}. The program returns 0 if all tests
 pass or 1 if any of the tests fail. Invoke it as follows:

 @example
 lou_checkyaml YAML_TEST_FILE
 @end example

 The command line options that are accepted by
 @command{lou_checkyaml} are described in @ref{common options}.

 @cindex Running YAML tests manually
 @cindex Running individual YAML tests
 Due to some technical limitations the YAML tests work best if the
 @env{LOUIS_TABLEPATH} is set up correctly. By running @command{make}
 this is all taken care for you. You can also run individual YAML tests
 as shown in the following example:

 @example
 cd tests
 make check TESTS=yaml/en-ueb-g2_backward.yaml
 @end example

 @node Automated Testing of Translation Tables
 @chapter Automated Testing of Translation Tables

 There are a number of automated tests for liblouis and they are
 proving to be of tremendous value. When changing the code the
 developers can run the tests to see if anything broke.

 The easiest way to test the translation tables is to write a YAML file
 where you define the table that is to be tested and any number of
 words or phrases to translate together with their respective expected
 translation.

 The YAML tests are data driven, i.e. you give the test data, a string
 to translate and the expected output. The data is in a standard format
 namely YAML. If you have @file{libyaml} installed they will
 automatically be invoked as part of the standard @command{make check}
 command.

 @anchor{YAML Tests}
 @section YAML Tests
 @url{http://yaml.org/,YAML} is a human readable data serialization
 format that allows for an easy and compact way to define tests.

 A YAML file first defines which tables are to be used for the tests.
 Then it optionally defines flags such as the @samp{testmode}. Finally
 all the tests are defined.

 Let's just look at a simple example how tests could be defined:

 @iftex
 @emph{(For technical reasons the Unicode braille in the expected
 translation in the following YAML examples is not displayed correctly.
 Please refer to the example YAML file @file{example_test.yaml} in the
 @file{tests} directory of the source distribution or read these
 examples in another version of the documentation such as HTML)}
 @end iftex

 @example
 # comments start with '#' anywhere on a line
 # first define which tables will be used for your tests
 table: [unicode.dis, en-ueb-g1.ctb]

 # then optionally define flags such as testmode. If no flags are
 # defined forward translation is assumed

 # now define the tests
 tests:
   - # each test is a list.
     # The first item is the string to translate. Quoting of strings is
     # optional
     - hello
     # The second item is the expected translation
     - ⠓⠑⠇⠇⠕
   - # optionally you can define additional parameters in a third
     # item such as typeform or expected failure, etc
     - Hello
     - ⠨⠶⠠⠓⠑⠇⠇⠕⠨⠄
     - @{typeform: @{italic: '++++ '@}, xfail: true@}
   - # a simple, no-frills test
     - Good bye
     - ⠠⠛⠕⠕⠙ ⠃⠽⠑
   # same as above using "flow style" notation
   - [Good bye,  ⠠⠛⠕⠕⠙ ⠃⠽⠑]
 @end example

 The three basic components of a test file are as follows:

 @table @samp
 @item tables
 A list containing table names, which the tests should be run against.
 This is usually just one table, but for some situations more than one
 table can be required.

 To test the @file{en-ueb-g1.ctb} table using unicode braille you could
 use the following definition:

 @example
 table: [unicode.dis, en-ueb-g1.ctb]
 @end example

 If you wanted to test the @file{eo-g1.ctb} table using brf notation
 then you would use the following definition:

 @example
 table: [en-us-brf.dis, eo-g1.ctb]
 @end example

 @item flags
 The flags that apply for all tests in this file. At the moment only
 the @samp{testmode} flag is supported. It can have three possible
 values:

 @table @samp
 @item forward
 This indicates that the tests are for forward translation
 @item backward
 This indicates that the tests are for backward translation
 @item hyphenate
 This indicates that the tests are for hyphenation
 @end table

 If no flags are defined forward translation is assumed.

 @item tests
 A list of tests. Each test consists of a list of two, three or in some
 cases even four items. The first item is the unicode text to be
 tested. The second item is the expected braille output. This can be
 either unicode braille or an ASCII-braille like encoding. Quoting
 strings is optional. Comments can be inserted almost anywhere using
 the @samp{#} sign. A simple test would look at follows:

 @example
   - # a simple, no-frills test
     - Good bye
     - ⠠⠛⠕⠕⠙ ⠃⠽⠑
 @end example

 Using the more compact ``flow style'' notation it would look like the
 following:

 @example
   - [Good bye, ⠠⠛⠕⠕⠙ ⠃⠽⠑]
 @end example

 An optional third item can contain additional options for a test such
 as the typeform, or whether a test is expected to fail. The following
 shows a typical example:

 @example
   -
     - Hello
     - ⠨⠶⠠⠓⠑⠇⠇⠕⠨⠄
     - @{typeform: @{italic: '++++ '@}, xfail: true@}
   # same test more compact
   - [Hello, ⠨⠶⠠⠓⠑⠇⠇⠕⠨⠄, @{typeform: @{italic: '++++ '@}, xfail: true@}]
 @end example

 The valid additional options for a test are as follows:

 @table @samp
 @item xfail
 Whether a test is expected to fail. If you expect a test to fail, set
 this to @samp{true}. If you prefer you can also specify a reason for
 the failure:

 @example
   - [Hello, ⠨, @{xfail: Test case is not complete@}]
 @end example

 If you expect a test case to pass then just don't mark it with
 @samp{xfail} or if you really have to, set @samp{xfail} to
 @samp{false} or @samp{off}.

 @item typeform
 The typeform used for a translation. It consists of one or more
 emphasis specifications. For each character in the specifications that
 is not a space the corresponding emphasis will be set. Valid options
 for emphasis are @samp{italic}, @samp{underline}, @samp{bold},
 @samp{computer_braille}, @samp{passage_break}, @samp{word_reset},
 @samp{script}, @samp{trans_note}, @samp{trans_note_1},
 @samp{trans_note_2}, @samp{trans_note_3}, @samp{trans_note_4} or
 @samp{trans_note_5}. The following shows an example where both
 @samp{italic} and @samp{underline} are specified:

 @example
   -
     - Hello
     - ⠨⠶⠠⠓⠑⠇⠇⠕⠨⠄
     - typeform:
         italic:    '++++ '
         underline: '    +'
 @end example

 @item inputPos
 A list of 0-based input positions, one for each output position. Useful when
 simulating screen reader interaction, to debug contraction and cursor
 behavior as in the following example.
 Note that all positions in this and the following examples start at 0.
 Also note that in these examples the additional options are not
 passed using the ``flow style'' notation.

 @example
   -
     - went
     - ⠺⠢⠞
     - inputPos: [0,1,3]
 @end example

 @item outputPos
 A list of 0-based output positions, one for each input position. Useful when
 simulating screen reader interaction, to debug contraction and cursor
 behavior as in the following example.

 @example
   -
     - went
     - ⠺⠢⠞
     - outputPos: [0,1,1,2]
 @end example

 @item cursorPos
 A list of cursor positions, one for each input position. Useful when
 simulating screen reader interaction, to debug contraction and cursor
 behavior as in the following example:

 Note that compbrlAtCursor is implicitly specified for all cursor
 positions. This makes this test suitable only for testing a single
 word, since the translation would otherwise vary according to the
 cursor position.

 @example
   -
     - went
     - ⠺⠑⠝⠞
     - cursorPos: [0,1,2,3]
 @end example

 @item mode
 A list of translation modes that should be used for this test. If not
 defined defaults to 0. Valid mode values are @samp{noContractions},
 @samp{compbrlAtCursor}, @samp{dotsIO}, @samp{comp8Dots},
 @samp{pass1Only}, @samp{compbrlLeftCursor},
 @samp{ucBrl}, @samp{noUndefinedDots} or @samp{partialTrans}.

 For a description of the various translation mode flags, please see
 the function @ref{lou_translateString}.

 @end table

 @end table

 @subsection Optional test description
 When a test contains three or four items the first item is assumed to
 be a test description, the second item is the unicode text to be
 tested and the third item is the expected braille output. Again an
 optional fourth item can contain additional options for the test. The
 following shows an example:

 @example
   -
     - Number-text-transitions with italic
     - 123abc
     - ⠼⠁⠃⠉⠨⠶⠰⠁⠃⠉⠨⠄
     - @{typeform: '000111'@}
 @end example

 In case the test fails the description will be printed together with
 the expected and the actual braille output.

 For more examples and inspiration please see the YAML tests
 (@file{*.yaml}) in the @file{tests} directory of the source
 distribution.

 @subsection Testing multiple tables within the same YAML test file
 Sometimes you are more focused on testing a particular feature across
 several tables rather than just testing one table. For that reason the
 following is also allowed:

 @example
 table: ...
 tests:
   - [..., ...]
   - [..., ...]
 table: ...
 tests:
   - [..., ...]
   - [..., ...]
 @end example

 @subsection Inline definition of tables
 When testing very specific opcode combinations it is sometimes tedious
 to create specific test tables just for that. Hence the YAML tests
 allow for specification of table definitions inline. Instead of
 referring to a table by name you just define the table inline by using
 what the YAML spec calls a
 @url{http://www.yaml.org/spec/1.2/spec.html#id2795688,Literal Style
 Block}. Start the definition with a @samp{|}, then list the opcodes
 with an indentation. The inline table ends when the indentation ends.

 @example
 table: |
   sign a 1
   ...
 tests:
   - ...
   - ...
 @end example

 @subsection Running the same test data on multiple tables
 Sometimes you maintain multiple tables which are very similar and
 basically contain the same test data. Instead of copying the YAML test
 and changing the table name you can also define multiple tables. This
 will cause the YAML tests to be checked against both tables.

 @example
 table: nl-NL
 table: nl-BE
 tests:
   - [..., ...]
   - [..., ...]
 @end example

 @node Programming with liblouis
 @chapter Programming with liblouis

 @menu
 * Overview (library)::
 * Data structure of liblouis tables::
 * How tables are found::
 * Deprecation of the logging system::
 * lou_version::
 * lou_translateString::
 * lou_translate::
 * lou_backTranslateString::
 * lou_backTranslate::
 * lou_hyphenate::
 * lou_compileString::
 * lou_getTypeformForEmphClass::
 * lou_dotsToChar::
 * lou_charToDots::
 * lou_registerLogCallback::
 * lou_setLogLevel::
 * lou_logFile::
 * lou_logPrint::
 * lou_logEnd::
 * lou_setDataPath::
 * lou_getDataPath::
 * lou_getTable::
 * lou_findTable::
 * lou_indexTables::
 * lou_checkTable::
 * lou_readCharFromFile::
 * lou_free::
 * lou_charSize::
 * Python bindings::
 @end menu

 @node Overview (library)
 @section Overview

 You use the liblouis library by calling the following functions,
 @code{lou_translateString}, @code{lou_backTranslateString},
 @code{lou_translate}, @code{lou_backTranslate},
 @code{lou_registerLogCallback}, @code{lou_setLogLevel},
 @code{lou_logFile}, @code{lou_logPrint}, @code{lou_logEnd},
 @code{lou_getTable}, @code{lou_findTable}, @code{lou_indexTables},
 @code{lou_checkTable}, @code{lou_hyphenate}, @code{lou_charToDots},
 @code{lou_dotsToChar}, @code{lou_compileString},
 @code{lou_getTypeformForEmphClass}, @code{lou_readCharFromFile},
 @code{lou_version}, @code{lou_free} and @code{lou_charSize}. These are
 described below. The header file, @file{liblouis.h}, also contains
 brief descriptions. Liblouis is written in straight C. It has four
 code modules, @file{compileTranslationTable.c}, @file{logging.c},
 @file{lou_translateString.c} and @file{lou_backTranslateString.c}. In
 addition, there are two header files, @file{liblouis.h}, which defines
 the API, and @file{louis.h}, used only internally and by
 liblouisutdml. The latter includes @file{liblouis.h}.

 Persons who wish to use liblouis from Python may want to skip ahead to
 @ref{Python bindings}.

 @file{compileTranslationTable.c} keeps track of all translation tables
 which an application has used. It is called by the translation,
 hyphenation and checking functions when they start. If a table has not
 yet been compiled @file{compileTranslationTable.c} checks it for
 correctness and compiles it into an efficient internal representation.
 The main entry point is @code{lou_getTable}. Since it is the module
 that keeps track of memory usage, it also contains the @code{lou_free}
 function. In addition, it contains the @code{lou_checkTable} function,
 plus some utility functions which are used by the other modules.

 By default, liblouis handles all characters internally as 16-bit
 unsigned integers. It can be compiled for 32-bit characters as
 explained below. The meanings of these integers are not hard-coded.
 Rather they are defined by the character-definition opcodes. However,
 the standard printable characters, from decimal 32 to 126 are
 recognized for the purpose of processing the opcodes. Hence, the
 following definition is included in @file{liblouis.h}. It is correct
 for computers with at least 32-bit processors.

 @example
 #define widechar unsigned short int
 @end example

 To make liblouis handle 32-bit Unicode simply remove the word
 @code{short} in the above @code{define}. This will cause the translate and
 back-translate functions to expect input in 32-bit form and to deliver
 their output in this form. The input to the compiler (tables) is
 unaffected except that two new escape sequences for 20-bit and 32-bit
 characters are recognized.

 At runtime, the width of a character specified during compilation may
 be obtained using @code{lou_charSize}.

 Here are the definitions of the eleven liblouis functions and their
 parameters. They are given in terms of 16-bit Unicode. If liblouis has
 been compiled for 32-bit Unicode simply read 32 instead of 16.

 @node Data structure of liblouis tables
 @section Data structure of liblouis tables

 The data structure @code{TranslationTableHeader} is defined by a
 @code{typedef} statement in @file{louis.h}. To find the beginning,
 search for the word @samp{header}. As its name implies, this is
 actually the table header. Data are placed in the @code{ruleArea}
 array, which is the last item defined in this structure. This array is
 declared with a length of 1 and is expanded as needed. The table
 header consists mostly of arrays of pointers of size @code{HASHNUM}.
 These pointers are actually offsets into @code{ruleArea} and point to
 chains of items which have been placed in the same hash bucket by a
 simple hashing algorithm. @code{HASHNUM} should be a prime and is
 currently 1123. The structure of the table was chosen to optimize
 speed rather than memory usage.

 The first part of the table contains miscellaneous information, such
 as the number of passes and whether various opcodes have been used. It
 also contains the amount of memory allocated to the table and the
 amount actually used.

 The next section contains pointers to various braille indicators and
 begins with @code{capitalSign}. The rules pointed to contain the
 dot pattern for the indicator and an opcode which is used by the
 back-translator but does not appear in the list of opcodes. The
 braille indicators also include various kinds of emphasis, such as
 italic and bold and information about the length of emphasized
 phrases. The latter is contained directly in the table item instead of
 in a rule.

 After the braille indicators comes information about when a letter
 sign should be used.

 Next is an array of size @code{HASHNUM} which points to character
 definitions. These are created by the character-definition opcodes.

 Following this is a similar array pointing to definitions of
 single-cell dot patterns. This is also created from the
 character-definition opcodes. If a character definition contains a
 multi-cell dot pattern this is compiled into ordinary forward and
 backward rules. If such a multi-cell dot pattern contains a single
 cell which has not previously been defined that cell is placed in this
 array, but is given the attribute @code{space}.

 Next come arrays that map characters to single-cell dot patterns and
 dots to characters. These are created from both character-definition
 opcodes and display opcodes.

 Next is an array of size 256 which maps characters in this range to
 dot patterns which may consist of multiple cells. It is used, for
 example, to map @samp{@{} to dots 456-246. These mappings are created
 @c FIXME: the compdots opcode should be documented
 @c by the @opcoderef{compdots}
 by the @code{compdots}
 or the @opcoderef{comp6}.

 Next are two small arrays that held pointers to chains of rules
 produced by the @opcoderef{swapcd} and the @opcoderef{swapdd} and by
 some multipass, @code{context} and @code{correct} opcodes.

 Now we get to an array of size @code{HASHNUM} which points to chains
 of rules for forward translation.

 Following this is a similar array for back-translation.

 Finally is the @code{ruleArea}, an array of variable size to which
 various structures are mapped and to which almost everything else
 points.

 @node How tables are found
 @section How tables are found
 @cindex Table search path
 @cindex LOUIS_TABLEPATH
 liblouis knows where to find all the tables that have been distributed
 with it. So you can just give a table name such as @code{en-us-g2.ctb}
 and liblouis will load it. You can also give a table name which
 includes a path. If this is the first table in a list, all the tables
 in the list must be on the same path. You can specify a path on which
 liblouis will look for table names by setting the environment variable
 @env{LOUIS_TABLEPATH}. This environment variable can contain one or
 more paths separated by commas. On receiving a table name liblouis
 first checks to see if it can be found on any of these paths. If not,
 it then checks to see if it can be found in the current directory, or,
 if the first (or only) name in a table list, if it contains a
 path name, can be found on that path. If not, it checks to see if it
 can be found on the path where the distributed tables have been
 installed. If a table has already been loaded and compiled this
 path-checking is skipped.

 @node Deprecation of the logging system
 @section Deprecation of the logging system

 As of version 2.6.0 @code{lou_logFile}, @code{lou_logPrint} and
 @code{lou_logEnd} are deprecated. They are replaced by a more powerful,
 abstract API consisting of @code{lou_registerLogCallback} and
 @code{lou_setLogLevel}.

 Usage of @code{lou_logFile}, @code{lou_logPrint} and @code{lou_logEnd} is
 discouraged as they may not be part of future releases. Applications using
 Liblouis should implement their own logging system.

 During the transitional phase, @code{lou_logPrint} is registered as default
 callback in @code{lou_registerLogCallback}. @code{lou_logPrint} is overwritten
 by the first call to @code{lou_registerLogCallback} and reattached when
 @code{NULL} is set as callback. Note that calling @code{lou_logPrint} directly
 will not cause an invocation of the registered callback.

 @node lou_version
 @section lou_version
 @findex lou_version

 @example
 char *lou_version ()
 @end example

 This function returns a pointer to a character string containing the
 version of liblouis, plus other information, such as the release date
 and perhaps notable changes.

 @node lou_translateString
 @section lou_translateString
 @findex lou_translateString

 @example
 int lou_translateString(
   const char *tableList,
   const widechar *inbuf,
   int *inlen,
   widechar *outbuf,
   int *outlen,
   formtype *typeform,
   char *spacing,
   int mode);
 @end example

 This function takes a string of 16-bit Unicode characters in
 @code{inbuf} and translates it into a string of 16-bit characters in
 @code{outbuf}. Each 16-bit character produces a particular dot pattern
 in one braille cell when sent to an embosser or braille display or to
 a screen type font. Which 16-bit character represents which dot pattern
 is indicated by the character-definition and display opcodes in the
 translation table.

 @anchor{translation-tables}
 The @code{tableList} parameter points to a list of translation tables
 separated by commas. @xref{How tables are found}, for a description on
 how the tables are located in the file system. If only one table is
 given, no comma should be used after it. It is these tables which
 control just how the translation is made, whether in Grade 2, Grade 1,
 or something else.

 The tables in a list are all compiled into the same internal table.
 The list is then regarded as the name of this table. As explained in
 @ref{How to Write Translation Tables}, each table is a file which may
 be plain text, big-endian Unicode or little-endian Unicode. A table
 (or list of tables) is compiled into an internal representation the
 first time it is used. Liblouis keeps track of which tables have been
 compiled. For this reason, it is essential to call the @code{lou_free}
 function at the end of your application to avoid memory leaks. Do
 @emph{NOT} call @code{lou_free} after each translation. This will
 force liblouis to compile the translation tables each time they are
 used, leading to great inefficiency.

 Note that both the @code{*inlen} and @code{*outlen} parameters are
 pointers to integers. When the function is called, these integers
 contain the maximum input and output lengths, respectively. When it
 returns, they are set to the actual lengths used.

 The @code{typeform} parameter is used to indicate italic type,
 boldface type, computer braille, etc. It is an array of @code{formtype}
 with the same length as the input buffer pointed to by @code{*inbuf}.
 However, it is used to pass back character-by-character results, so
 enough space must be provided to match the @code{*outlen} parameter.
 Each element indicates the typeform of the corresponding character
 in the input buffer. The values and their meaning can be consulted in the
 @code{typeforms} enum in @file{liblouis.h}. These values can be
 added for multiple emphasis. If this parameter is @code{NULL}, no
 checking for type forms is done. In addition, if this parameter is not
 @code{NULL}, it is set on return to have an 8 at every position
 corresponding to a character in @code{outbuf} which was defined to
 have a dot representation containing dot 7, dot 8 or both, and to 0
 otherwise.

 The @code{spacing} parameter is used to indicate differences in
 spacing between the input string and the translated output string. It
 is also of the same length as the string pointed to by @code{*inbuf}.
 If this parameter is @code{NULL}, no spacing information is computed.

 The @code{mode} parameter specifies how the translation should be
 done. The valid values of mode are defined in @file{liblouis.h}. They
 are all powers of 2, so that a combined mode can be specified by
 adding up different values.

 Note that the @code{mode} parameter is an integer, not a pointer to
 an integer.

 A combination of the following mode flags can be used with the
 @code{lou_translateString} function:

 @table @code
 @item compbrlAtCursor
 If this bit is set in the @code{mode} parameter the space-bounded
 characters containing the cursor will be translated in computer
 braille.

 @item compbrlLeftCursor
 If this bit is set, only the characters to the left of the cursor will
 be in computer braille. This bit overrides @code{compbrlAtCursor}.

 @item dotsIO
 When this bit is set, during forward translation, Liblouis will produce
 output as dot patterns. During back-translation Liblouis accepts input
 as dot patterns. Note that the produced dot patterns are affected if
 you have any @opcoderef{display} defined in any of your tables.

 @item ucBrl
 The @code{ucBrl} (Unicode Braille) bit is used by the functions
 @code{lou_charToDots} and @code{lou_translate}. It causes the dot
 patterns to be Unicode Braille rather than the liblouis representation.
 Note that you will not notice any change when setting @code{ucBrl}
 unless @code{dotsIO} is also set. @code{lou_dotsToChar} and
 @code{lou_backTranslate} recognize Unicode braille automatically.

 @item pass1Only
 When this bit is set, Only pass 1 of the translation will be run. This
 excludes all rules that use the @code{correct} and @code{multi-pass}
 opcodes as well as some rules using the @code{context} opcode. The flag
 was originally introduced for the benefit of screen reading programs,
 but has now been deprecated, and will be removed in the near future.

 @item partialTrans
 This flag specifies that back-translation input should be treated as an
 incomplete word. Rules that apply only for complete words or at the end
 of a word will not take effect. This is intended to be used when
 translating input typed on a braille keyboard to provide a rough idea
 to the user of the characters they are typing before the word is
 complete.

 @item noUndefinedDots
 Setting this bit disables the output of dot numbers when
 back-translating undefined Braille patterns. When back translating
 input from a braille keyboard cell by cell, it is desirable to output
 characters as soon as they are produced. Similarly, when back
 translating contracted braille, it is desirable to provide a "guess" to
 the user of the characters they typed. To achieve this, liblouis needs
 to have the ability to produce no text when indicators (which don't
 produce a character by themselves) are not followed by another cell.
 This works automatically for indicators liblouis knows about such as
 capital sign, number sign, etc., but it does not work for indicators
 which are not (and cannot be) specifically defined as indicators. For
 example, in UEB, dots 4 5 6 alone produces the text "\456/". Setting
 the noUndefinedDots mode suppresses this dot number output.

 @end table

 The function returns 1 if no errors were encountered and 0 if a
 complete translation could not be done.

 @node lou_translate
 @section lou_translate
 @findex lou_translate

 @example
 int lou_translate(
   const char *tableList,
   const widechar *inbuf,
   int *inlen,
   widechar *outbuf,
   int *outlen,
   formtype *typeform,
   char *spacing,
   int *outputPos,
   int *inputPos,
   int *cursorPos,
   int mode);
 @end example

 This function adds the parameters @code{outputPos}, @code{inputPos}
 and @code{cursorPos}, to facilitate use in screen reader programs. The
 @code{outputPos} parameter must point to an array of integers with at
 least @code{inlen} elements. On return, this array will contain the
 position in @code{outbuf} corresponding to each input position.
 Similarly, @code{inputPos} must point to an array of integers of at
 least @code{outlen} elements. On return, this array will contain the
 position in @code{inbuf} corresponding to each position in
 @code{outbuf}. @code{cursorPos} must point to an integer containing
 the position of the cursor in the input. On return, it will contain
 the cursor position in the output. Any parameter after @code{outlen}
 may be @code{NULL}. In this case, the actions corresponding to it will
 not be carried out.

 For a description of all other parameters, please see
 @ref{lou_translateString}.

 @node lou_backTranslateString
 @section lou_backTranslateString
 @findex lou_backTranslateString

 @example
 int lou_backTranslateString(
   const char *tableList,
   const widechar *inbuf,
   int *inlen,
   widechar *outbuf,
   int *outlen,
   formtype *typeform,
   char *spacing,
   int mode);
 @end example

 This is exactly the opposite of @code{lou_translateString}.
 @code{inbuf} is a string of 16-bit Unicode characters representing
 braille. @code{outbuf} will contain a string of 16--bit Unicode
 characters. @code{typeform} will indicate any emphasis found in the
 input string, while @code{spacing} will indicate any differences in
 spacing between the input and output strings. The @code{typeform} and
 @code{spacing} parameters may be @code{NULL} if this information is
 not needed. @code{mode} again specifies how the back-translation
 should be done.

 There are two additional modes that only apply to back-translation. By
 default, if a dot pattern in the input is undefined, the dot numbers
 will be included in the output. If the @code{noUndefinedDots} mode is
 set, this does not occur; an undefined dot pattern simply produces no
 output. The @code{partialTrans} mode specifies that the input should be
 treated as an incomplete word. That is, rules that apply only for
 complete words or at the end of a word will not take effect. This is
 intended to be used when translating input typed on a braille keyboard
 to provide a rough idea to the user of the characters they are typing
 before the word is complete.

 @node lou_backTranslate
 @section lou_backTranslate
 @findex lou_backTranslate

 @example
 int lou_backTranslate(
   const char *tableList,
   const widechar *inbuf,
   int *inlen,
   widechar *outbuf,
   int *outlen,
   formtype *typeform,
   char *spacing,
   int *outputPos,
   int *inputPos,
   int *cursorPos,
   int mode);
 @end example

 This function is exactly the inverse of @code{lou_translate}.

 @node lou_hyphenate
 @section lou_hyphenate
 @findex lou_hyphenate

 @example
 int lou_hyphenate (
   const char *tableList,
   const widechar *inbuf,
   int inlen,
   char *hyphens,
   int mode);
 @end example

 This function looks at the characters in @code{inbuf} and if it finds
 a sequence of letters attempts to hyphenate it as a word. Note that
 lou_hyphenate operates on single words only, and spaces or punctuation
 marks between letters are not allowed. Leading and trailing
 punctuation marks are ignored. The table named by the @code{tableList}
 parameter must contain a hyphenation table. If it does not, the
 function does nothing. @code{inlen} is the length of the character
 string in @code{inbuf}. @code{hyphens} is an array of characters and
 must be of size @code{inlen} + 1 (to account for the NULL terminator).
 If hyphenation is successful it will have a 1 at the beginning of each
 syllable and a 0 elsewhere. If the @code{mode} parameter is 0
 @code{inbuf} is assumed to contain untranslated characters. Any
 nonzero value means that @code{inbuf} contains a translation. In this
 case, it is back-translated, hyphenation is performed, and it is
 re-translated so that the hyphens can be placed correctly. The
 @code{lou_translate} and @code{lou_backTranslate} functions are used
 in this process. @code{lou_hyphenate} returns 1 if hyphenation was
 successful and 0 otherwise. In the latter case, the contents of the
 @code{hyphens} parameter are undefined. This function was provided for
 use in liblouisutdml.

 @node lou_compileString
 @section lou_compileString
 @findex lou_compileString

 @example
 int lou_compileString (const char *tableList, const char *inString)
 @end example

 This function enables you to compile a table entry on the fly at
 run-time. The new entry is added to @code{tableList} and remains in force
 until @code{lou_free} is called. If @code{tableList} has not previously
 been loaded it is loaded and compiled. @code{inString} contains the
 table entry to be added. It may be anything valid. Error messages
 will be produced if it is invalid. The function returns 1 on success and
 0 on failure.

 @node lou_getTypeformForEmphClass
 @section lou_getTypeformForEmphClass
 @findex lou_getTypeformForEmphClass

 @example
 int lou_getTypeformForEmphClass (const char *tableList, const char *emphClass);
 @end example

 This function returns the typeform bit associated with the given
 emphasis class. If the emphasis class is undefined this function
 returns @code{0}. If errors are found error messages are logged to the
 log callback (see @code{lou_registerLogCallback}) and the return value
 is @code{0}. @code{tableList} is a list of names of table files
 separated by commas, as explained previously
 (@pxref{translation-tables,,@code{tableList} parameter in
 @code{lou_translateString}}). @code{emphClass} is the name of an
 emphasis class.

 @node lou_dotsToChar
 @section lou_dotsToChar
 @findex lou_dotsToChar

 @example
 int lou_dotsToChar (
   const char *tableList,
   const widechar *inbuf,
   widechar *outbuf,
   int length,
   int mode)
 @end example

 This function takes a widechar string in @code{inbuf} consisting of dot
 patterns and converts it to a widechar string in @code{outbuf}
 consisting of characters according to the specifications in
 @code{tableList}. @code{length} is the length of both @code{inbuf} and
 @code{outbuf}. The dot patterns in @code{inbuf} can be in either
 liblouis format or Unicode braille. The function returns 1 on success
 and 0 on failure.

 @node lou_charToDots
 @section lou_charToDots
 @findex lou_charToDots

 @example
 int lou_charToDots (
   const char *tableList,
   const widechar *inbuf,
   widechar *outbuf,
   int length,
   int mode)
 @end example

 This function is the inverse of @code{lou_dotsToChar}. It takes a
 widechar string in @code{inbuf} consisting of characters and converts it
 to a widechar string in @code{outbuf} consisting of dot patterns
 according to the specifications in @code{tableList}. @code{length} is the
 length of both @code{inbuf} and @code{outbuf}. The dot patterns in
 @code{outbufbuf} are in liblouis format if the mode bit @code{ucBrl} is
 not set and in Unicode format if it is set. The function returns 1 on
 success and 0 on failure.

 @node lou_registerLogCallback
 @section lou_registerLogCallback
 @findex lou_registerLogCallback

 @example
 typedef void (*logcallback) (
   int level,
   const char *message);

 void lou_registerLogCallback (
   logcallback callback);
 @end example

 This function can be used to register a custom logging callback. The
 callback must take a single argument, the message string. By default
 log messages are printed to stderr, or if a filename was specified
 with @code{lou_logFile} then messages are logged to that
 file. @code{lou_registerLogCallback} overrides the default
 callback. Passing @code{NULL} resets to the default callback.

 @node lou_setLogLevel
 @section lou_setLogLevel
 @findex lou_setLogLevel

 @example
 typedef enum
 @{
   LOG_ALL = 0,
   LOG_DEBUG = 10000,
   LOG_INFO = 20000,
   LOG_WARN = 30000,
   LOG_ERROR = 40000,
   LOG_FATAL = 50000,
   LOG_OFF = 60000
 @} logLevels;
 void lou_setLogLevel (
   logLevels level);
 @end example

 This function can be used to influence the amount of logging, from
 fatal error messages only to detailed debugging messages. Supported
 values are @code{LOG_DEBUG}, @code{LOG_INFO}, @code{LOG_WARN},
 @code{LOG_ERROR}, @code{LOG_FATAL} and @code{LOG_OFF}. Enabling
 logging at a given level also enables logging at all higher
 levels. Setting the level to @code{LOG_OFF} disables logging. The
 default level is @code{LOG_INFO}.

 @node lou_logFile
 @section lou_logFile (deprecated)
 @findex lou_logFile

 @example
 void lou_logFile (
   char *fileName);
 @end example

 This function is used when it is not convenient either to let messages
 be printed on stderr or to use redirection, as when liblouis is used
 in a GUI application or in liblouisutdml. Any error messages generated
 will be printed to the file given in this call. The entire path name of
 the file must be given.

 This function is deprecated. See @ref{Deprecation of the logging system}.

 @node lou_logPrint
 @section lou_logPrint (deprecated)
 @findex lou_logPrint

 @example
 void lou_logPrint (
   char *format,
   ...);
 @end example

 This function is called like @code{fprint}. It can be used by other
 libraries to print messages to the file specified by the call to
 @code{lou_logFile}. In particular, it is used by the companion
 library liblouisutdml.

 This function is deprecated. See @ref{Deprecation of the logging system}.

 @node lou_logEnd
 @section lou_logEnd (deprecated)
 @findex lou_logEnd

 @example
 lou_logEnd ();
 @end example

 This function is used at the end of processing a document to close the
 log file, so that it can be read by the rest of the program.

 This function is deprecated. See @ref{Deprecation of the logging system}.

 @node lou_setDataPath
 @section lou_setDataPath
 @findex lou_setDataPath

 @example
 char *lou_setDataPath (
   char *path);
 @end example

 This function is used to tell liblouis and liblouisutdml where tables
 and files are located. It thus makes them completely relocatable, even
 on Linux. The @code{path} is the directory where the subdirectories
 @code{liblouis/tables} and @code{liblouisutdml/lbu_files} are rooted
 or located. The function returns a pointer to the @code{path}.

 @node lou_getDataPath
 @section lou_getDataPath
 @findex lou_getDataPath

 @example
 char *lou_getDataPath ();
 @end example

 This function returns a pointer to the path set by
 @code{lou_setDataPath}. If no path has been set it returns
 @code{NULL}.

 @node lou_getTable
 @section lou_getTable
 @findex lou_getTable

 @example
 void *lou_getTable (
   char *tableList);
 @end example

 @code{tableList} is a list of names of table files separated by
 commas, as explained previously
 (@pxref{translation-tables,,@code{tableList} parameter in
 @code{lou_translateString}}). If no errors are found this function
 returns a pointer to the compiled table. If errors are found error
 messages are logged to the log callback (see
 @code{lou_registerLogCallback}). Errors result in a @code{NULL}
 pointer being returned.

 @node lou_findTable
 @section lou_findTable
 @findex lou_findTable

 @example
 char *lou_findTable (const char *query);
 @end example

 This function can be used to find a table based on
 metadata. @code{query} is a string in the special @ref{Query
 Syntax,query syntax}. It is matched against @ref{Table Metadata,table
 metadata} inside the tables that were previously indexed with
 @ref{lou_indexTables,@code{lou_indexTables}}. Returns the file name of
 the best match. Returns @code{NULL} if the query is invalid or if no
 match can be found.

 The match algorithm works as follows:

 @itemize @bullet
 @item
 For every table a match quotient with the query is computed. The table
 with the highest (positive) match quotient wins. If no table has a
 positive quotient, there is no match.
 @item
 A query is a list of features. Features defined first have a higher
 importance (have a higher impact on the final quotient) than features
 defined later.
 @item
 A feature that matches a metadata field in the table (keys equal and
 values equal, or both values absent) adds to the quotient.
 @item
 A feature that is undefined in the table (no field with that key)
 creates a medium penalty.
 @item
 A feature that is defined in the table but does not match (keys equal
 but values not equal) creates the highest penalty.
 @item
 Every field in the table that has no corresponding feature in the
 query creates a very small penalty.
 @end itemize

 @node lou_indexTables
 @section lou_indexTables
 @findex lou_indexTables

 @example
 void lou_indexTables (const char **tables);
 @end example

 This function must be called prior to
 @ref{lou_findTable,@code{lou_findTable}}. It parses, analyzes and
 indexes all specified tables. @code{tables} must be an array of file
 names. Tables that contain invalid metadata are ignored.

 @node lou_checkTable
 @section lou_checkTable
 @findex lou_checkTable

 @example
 int lou_checkTable (const char *tableList);
 @end example

 This function does the same as @code{lou_getTable} but does not return
 a pointer to the resulting table. It is to be preferred if only the
 validity of a table needs to be checked. @code{tableList} is a list of
 names of table files separated by commas, as explained previously
 (@pxref{translation-tables,,@code{tableList} parameter in
 @code{lou_translateString}}). If no errors are found this function
 returns a non-zero. If errors are found error messages are logged to
 the log callback (see @code{lou_registerLogCallback}) and the return
 value is @code{0}.

 @node lou_readCharFromFile
 @section lou_readCharFromFile
 @findex lou_readCharFromFile

 @example
 int lou_readCharFromFile (
   const char *fileName,
   int *mode);
 @end example

 This function is provided for situations where it is necessary to read
 a file which may contain little-endian or big-endian 16-bit Unicode
 characters or ASCII8 characters. The return value is a little-endian
 character, encoded as an integer. The @code{fileName} parameter is the
 name of the file to be read. The @code{mode} parameter is a pointer to
 an integer which must be set to 1 on the first call. After that, the
 function takes care of it. On end-of-file the function returns
 @code{EOF}.

 @node lou_free
 @section lou_free
 @findex lou_free

 @example
 void lou_free ();
 @end example

 This function should be called at the end of the application to free
 all memory allocated by liblouis. Failure to do so will result in
 memory leaks. Do @emph{NOT} call @code{lou_free} after each
 translation. This will force liblouis to compile the translation
 tables every time they are used, resulting in great inefficiency.

 @node lou_charSize
 @section lou_charSize
 @findex lou_charSize

 @example
 int lou_charSize ();
 @end example

 This function returns the size of @code{widechar} in bytes and can
 therefore be used to differentiate between 16-bit and 32bit-Unicode
 builds of liblouis.

 @node Python bindings
 @section Python bindings

 There are Python bindings for @code{lou_translateString},
 @code{lou_translate}, @code{lou_backTranslateString},
 @code{lou_backTranslate}, @code{lou_hyphenate}, @code{checkTable},
 @code{lou_compileString} and @code{lou_version}. For installation
 instructions see the the @file{README} file in the @file{python}
 directory. Usage information is included in the Python module itself.


 @node Concept Index
 @unnumbered Concept Index
 @printindex cp

 @node Opcode Index
 @unnumbered Opcode Index
 @printindex opcode

 @node Function Index
 @unnumbered Function Index
 @printindex fn

 @node Program Index
 @unnumbered Program Index
 @printindex pg

 @bye

 @c The following list is a list of exceptions for the ispell spell
 @c checker

 @c  LocalWords:  liblouis opcode args BRLTTY ViewPlus Abilitiessoft LGPL lou
 @c  LocalWords:  checktable allround checkhyphens Opcodes Multipass dotsToChar
 @c  LocalWords:  translateString backTranslateString backTranslate charToDots
 @c  LocalWords:  compileString logFile logPrint checkyaml findTable
 @c  LocalWords:  getTable checkTable readCharFromFile itemx charSize
 @c  LocalWords:  README liblouisxml pindex samp kbd opcodes opcoderef numsign
 @c  LocalWords:  FIXME ctb nemeth filename multipass suboperand uplow litdigit
 @c  LocalWords:  begcaps endcaps letsign noletsign largesign typeform
 @c  LocalWords:  noletsignbefore noletsignafter compbrl firstwordital
 @c  LocalWords:  lenitalphrase doubleOpcode lastworditalbefore firstletterital
 @c  LocalWords:  lastworditalafter lastletterital firstwordbold UTF
 @c  LocalWords:  singleletterital lastwordboldbefore lastwordboldafter
 @c  LocalWords:  firstletterbold lastletterbold lenboldphrase filll
 @c  LocalWords:  singleletterbold firstwordunder lastwordunderbefore
 @c  LocalWords:  lastwordunderafter firstletterunder lastletterunder
 @c  LocalWords:  singleletterunder lenunderphrase begcomp endcomp decpoint texi
 @c  LocalWords:  capsnocont noback nofor texinfo setfilename settitle direntry
 @c  LocalWords:  dircategory finalout defindex opcodeindex noindent uref vskip
 @c  LocalWords:  titlepage insertcopying ifnottex dir detailmenu italword RET
 @c  LocalWords:  TranslationTableHeader txt cti nocross exactdots nocont emph
 @c  LocalWords:  prepunc postpunc repword joinword lowword sufword prfword API
 @c  LocalWords:  begword begmidword midword midendword endword partword begnum
 @c  LocalWords:  midnum endnum joinnum swapcd swapdd swapcc multind endLog
 @c  LocalWords:  backtranslation compileTranslationTable typedef louis ruleArea
 @c  LocalWords:  HASHNUM capitalSign compdots findex const inbuf outbuf outlen
 @c  LocalWords:  tableList TABLEPATH widechar inputPos cursorPos outputPos
 @c  LocalWords:  inlen compbrlAtCursor compbrlLeftCursor trantab stderr endian
 @c  LocalWords:  tablelist fileName printindex deprecatedopcode setDataPath
 @c  LocalWords:  getDataPath MathML suboperands logEnd liblouisutdml whitespace
 @c  LocalWords:  xhhhh yhhhhh zhhhhhhhh OpenOffice documentencoding
 @c  LocalWords:  YAML JSON logLevels nocontractsign OSX DLL env NVDA
 @c  LocalWords:  MERCHANTABILITY registerLogCallback setLogLevel brf
 @c  LocalWords:  cindex chardefs xhtml pxref dec multi hyph dic Aa al
 @c  LocalWords:  mrow mfrac emphclass transnote subsubsection begemph
 @c  LocalWords:  endemph emphletter begemphword endemphword www cd th
 @c  LocalWords:  lenemphphrase begemphphrase endemphphrase andthe se
 @c  LocalWords:  abrege decrement pre cornf comf scano cornm cornp po
 @c  LocalWords:  h's brl testtrans UCS asis libyaml url yaml formtype
 @c  LocalWords:  testmode iftex unicode ueb xfail eo noContractions
 @c  LocalWords:  dotsIO ucBrl noUndefinedDots partialTrans capsletter
 @c  LocalWords:  abc doctest inString enum cp outbufbuf logcallback fprint
 @c  LocalWords:  lbu EOF heckTable fn ispell getTypeformForEmphClass
 @c  LocalWords:  indexTables begcapsword endcapsword typeforms
 @c  LocalWords:  endemphphraseopcode emphClass BrailleSense HumanWare
 @c  LocalWords:  BrailleNote