Tisane language models are stored in directories. They can be divided into:
- Language-specific data that describes a particular language.
- Crosslingual data used by all languages (for example, semantic connections between concepts).
Language-specific data stores are named according to the following convention: (language_code)-(data_store_name)
- Language code: Based on ISO-639-1 language code standard, optionally including dialects.
- Data store name: Structures stored.
Examples:
en-phrase
: English phrasal patternsfr-nondic
: French nondictionary entity heuristicszh_CN-phrase
: Chinese (Simplified) phrasal patterns
These data stores used by all languages:
family
role
pragma
Important: All data stores for a language must reside in the same directory.
In order to conserve space or out of other considerations, it is possible to exclude languages or components from deployment.
To include only specific languages, identify the appropriate language codes (e.g., en
, de
, zh_CN
) and include the corresponding language-specific data stores along with the three shared data stores (family
, role
, pragma
).
Stores xx-famlex
and xx-famphrase
are used for translation only, and can be excluded from distribution if Tisane is not used for translation.
Spellchecking data is stored under xx-spell
stores. If omitted, spellchecking will not work.