| .TH hunspell 3 "2011-02-01" |
| .LO 1 |
| .hy 0 |
| .SH NAME |
| \fBhunspell\fR - spell checking, stemming, morphological generation and analysis |
| .SH SYNOPSIS |
| \fB#include <hunspell/hunspell.hxx> /* or */\fR |
| .br |
| \fB#include <hunspell/hunspell.h>\fR |
| .br |
| .sp |
| .BI "Hunspell(const char *" affpath ", const char *" dpath ); |
| .sp |
| .BI "Hunspell(const char *" affpath ", const char *" dpath ", const char * " key ); |
| .sp |
| .BI "~Hunspell(" ); |
| .sp |
| .BI "int add_dic(const char *" dpath ); |
| .sp |
| .BI "int add_dic(const char *" dpath ", const char *" key ); |
| .sp |
| .BI "int spell(const char *" word ); |
| .sp |
| .BI "int spell(const char *" word ", int *" info ", char **" root ); |
| .sp |
| .BI "int suggest(char***" slst ", const char *" word); |
| .sp |
| .BI "int analyze(char***" slst ", const char *" word); |
| .sp |
| .BI "int stem(char***" slst ", const char *" word); |
| .sp |
| .BI "int stem(char***" slst ", char **" morph ", int " n); |
| .sp |
| .BI "int generate(char***" slst ", const char *" word ", const char *" word2); |
| .sp |
| .BI "int generate(char***" slst ", const char *" word ", char **" desc ", int " n); |
| .sp |
| .BI "void free_list(char ***" slst ", int " n); |
| .sp |
| .BI "int add(const char *" word); |
| .sp |
| .BI "int add_with_affix(const char *" word ", const char *" example); |
| .sp |
| .BI "int remove(const char *" word); |
| .sp |
| .BI "char * get_dic_encoding(" ); |
| .sp |
| .BI "const char * get_wordchars(" ); |
| .sp |
| .BI "unsigned short * get_wordchars_utf16(int *" len); |
| .sp |
| .BI "struct cs_info * get_csconv(" ); |
| .sp |
| .BI "const char * get_version(" ); |
| .SH DESCRIPTION |
| The \fBHunspell\fR library routines give the user word-level |
| linguistic functions: spell checking and correction, stemming, |
| morphological generation and analysis in item-and-arrangement style. |
| .PP |
| The optional C header contains the C interface of the C++ library with |
| Hunspell_create and Hunspell_destroy constructor and destructor, and |
| an extra HunHandle parameter (the allocated object) in the |
| wrapper functions (see in the C header file \fBhunspell.h\fR). |
| .PP |
| The basic spelling functions, \fBspell()\fR and \fBsuggest()\fR can |
| be used for stemming, morphological generation and analysis by |
| XML input texts (see XML API). |
| . |
| .SS Constructor and destructor |
| Hunspell's constructor needs paths of the affix and dictionary files. |
| See the \fBhunspell\fR(4) manual page for the dictionary format. |
| Optional \fBkey\fR parameter is for dictionaries encrypted by |
| the \fBhzip\fR tool of the Hunspell distribution. |
| . |
| .SS Extra dictionaries |
| The add_dic() function load an extra dictionary file. |
| The extra dictionaries use the affix file of the allocated Hunspell |
| object. Maximal number of the extra dictionaries is limited in the source code (20). |
| . |
| .SS Spelling and correction |
| The spell() function returns non-zero, if the input word is recognised |
| by the spell checker, and a zero value if not. Optional reference |
| variables return a bit array (info) and the root word of the input word. |
| Info bits checked with the SPELL_COMPOUND, SPELL_FORBIDDEN or SPELL_WARN |
| macros sign compound words, explicit forbidden and probably bad words. |
| From version 1.3, the non-zero return value is 2 for the dictionary |
| words with the flag "WARN" (probably bad words). |
| .PP |
| The suggest() function has two input parameters, a reference variable |
| of the output suggestion list, and an input word. The function returns |
| the number of the suggestions. The reference variable |
| will contain the address of the newly allocated suggestion list or NULL, |
| if the return value of suggest() is zero. Maximal number of the suggestions |
| is limited in the source code. |
| .PP |
| The spell() and suggest() can recognize XML input, see the XML API section. |
| . |
| .SS Morphological functions |
| The plain stem() and analyze() functions are similar to the suggest(), but |
| instead of suggestions, return stems and results of the morphological |
| analysis. The plain generate() waits a second word, too. This extra word |
| and its affixation will be the model of the morphological generation of |
| the requested forms of the first word. |
| .PP |
| The extended stem() and generate() use the results of a |
| morphological analysis: |
| .PP |
| .RS |
| .nf |
| char ** result, result2; |
| int n1 = analyze(&result, "words"); |
| int n2 = stem(&result2, result, n1); |
| .fi |
| .RE |
| .PP |
| The morphological annotation of the Hunspell library has fixed |
| (two letter and a colon) field identifiers, see the |
| \fBhunspell\fR(4) manual page. |
| .PP |
| .RS |
| .nf |
| char ** result; |
| char * affix = "is:plural"; // description depends from dictionaries, too |
| int n = generate(&result, "word", &affix, 1); |
| for (int i = 0; i < n; i++) printf("%s\n", result[i]); |
| .fi |
| .RE |
| .PP |
| .SS Memory deallocation |
| The free_list() function frees the memory allocated by suggest(), |
| analyze, generate and stem() functions. |
| .SS Other functions |
| The add(), add_with_affix() and remove() are helper functions of a |
| personal dictionary implementation to add and remove words from the |
| base dictionary in run-time. The add_with_affix() uses a second word |
| as a model of the enabled affixation of the new word. |
| .PP |
| The get_dic_encoding() function returns "ISO8859-1" or the character |
| encoding defined in the affix file with the "SET" keyword. |
| .PP |
| The get_csconv() function returns the 8-bit character case table of the |
| encoding of the dictionary. |
| .PP |
| The get_wordchars() and get_wordchars_utf16() return the |
| extra word characters definied in affix file for tokenization by |
| the "WORDCHARS" keyword. |
| .PP |
| The get_version() returns the version string of the library. |
| .SS XML API |
| The spell() function returns non-zero for the "<?xml?>" input |
| indicating the XML API support. |
| .PP |
| The suggest() function stems, analyzes and generates the forms of the |
| input word, if it was added by one of the following "SPELLML" syntaxes: |
| .PP |
| .RS |
| .nf |
| <?xml?> |
| <query type="analyze"> |
| <word>dogs</word> |
| </query> |
| .fi |
| .RE |
| .PP |
| |
| .PP |
| .RS |
| .nf |
| <?xml?> |
| <query type="stem"> |
| <word>dogs</word> |
| </query> |
| .fi |
| .RE |
| .PP |
| |
| .PP |
| .RS |
| .nf |
| <?xml?> |
| <query type="generate"> |
| <word>dog</word> |
| <word>cats</word> |
| </query> |
| .fi |
| .RE |
| .PP |
| |
| .PP |
| .RS |
| .nf |
| <?xml?> |
| <query type="generate"> |
| <word>dog</word> |
| <code><a>is:pl</a><a>is:poss</a></code> |
| </query> |
| .fi |
| .RE |
| .PP |
| |
| The outputs of the type="stem" query and the stem() library function |
| are the same. The output of the type="analyze" query is a string contained |
| a <code><a>result1</a><a>result2</a>...</code> element. This |
| element can be used in the second syntax of the type="generate" query. |
| .SH EXAMPLE |
| See analyze.cxx in the Hunspell distribution. |
| .SH AUTHORS |
| Hunspell based on Ispell's spell checking algorithms and OpenOffice.org's Myspell source code. |
| .PP |
| Author of International Ispell is Geoff Kuenning. |
| .PP |
| Author of MySpell is Kevin Hendricks. |
| .PP |
| Author of Hunspell is László Németh. |
| .PP |
| Author of the original C API is Caolan McNamara. |
| .PP |
| Author of the Aspell table-driven phonetic transcription algorithm and code is Björn Jacke. |
| .PP |
| See also THANKS and Changelog files of Hunspell distribution. |