Adding custom dictionaries
Creating custom dictionary files
One custom dictionary can be created for each language already supported by the spell checker (see supported languages) or any arbitrary language added by additional Hunspell dictionary files included in Hunspell Dictionary Path (See Add Hunspell dictionaries to Spell Checker). It’s also possible to define an additional "global" dictionary that contains words that are valid across all languages, such as trademarks.
A custom dictionary file for a particular language must be named with the language code of the language (see supported languages for language code examples), plus the suffix .txt
: E.g. en.txt
, en_gb.txt
, fr.txt
, de.txt
etc.
The "global" dictionary file for language-independent words must be called "global.txt".
The server will scan the dictionary directory as per configuration above and pick up "txt"-files for each language and the global file as present.
Custom dictionary file format
A dictionary file must be a simple text file with:
-
one word on each line,
-
either Windows-style or Linux-style line endings (CR or CR+LF)
-
no comments or blank lines, and
-
saved in UTF-8 encoding, with or without BOM (byte-order mark).
The last point is important for files created or edited on non-Linux (Windows or Mac) systems, as these will usually encode text files differently. However, Windows or Mac editors such as Windows Notepad can optionally save files in UTF-8 if asked to do so. Please check your editor of choice for this functionality. Failure to chose the correct encoding will result in problems with non-English letters such as umlauts and accents.
German and Finnish languages - Spell checking in German and Finnish will employ compound word spell checking. Compound words such as "Fußballtennis" will be assumed correct as long as the root words "Fußball" and "Tennis" are individually present in the dictionary. It is not necessary to add "Fußballtennis" separately. |
Configuring the custom dictionary feature
Additional configuration to your application.conf
file is required. (Don’t forget to restart the Java application server after updating the configuration.)
The ephox.spelling.custom-dictionaries-path
element is used to define the location of the custom dictionaries. When the setting is not provided, no custom dictionaries are loaded.
Requirements:
-
The directory containing the custom dictionaries must be on same server machine as the java service.
-
The directory should not contain subdirectories or non-dictionary files.
Tiny recommends storing the custom dictionaries in a similar location to the application.conf
file. For example, if application.conf
is in a directory called /opt/ephox
, the dictionary files could be stored in the subdirectory /opt/ephox/dictionaries
.
Example:
ephox {
spelling {
custom-dictionaries-path = "/opt/ephox/dictionaries"
}
}
Dynamic Custom Dictionaries
Adding the ephox.spelling.dynamic-custom-dictionaries
element and setting it to true
instructs the spelling service to periodically check the custom-dictionaries-path
for changes, and update the custom dictionaries accordingly. This allows updates to the custom dictionaries without restarting the spelling service. The default value is false
.
Example:
ephox {
spelling {
custom-dictionaries-path = "/opt/ephox/dictionaries"
dynamic-custom-dictionaries = true
}
}
Verifying custom dictionary functionality
If successfully configured, the custom dictionary feature will report dictionaries found in the application server’s log at service startup.
Example:
2017-06-12 17:46:00 [main] INFO com.ephox.ironbark.IronbarkBoot - Starting task (booting Ironbark)
2017-06-12 17:46:00 [main] INFO com.ephox.ironbark.IronbarkBoot - using custom dictionary: [global] = 1 words
2017-06-12 17:46:00 [main] INFO com.ephox.ironbark.IronbarkBoot - using custom dictionary: "en" = 3 words
2017-06-12 17:46:00 [main] INFO com.ephox.ironbark.IronbarkBoot - using custom dictionary: "fr" = 2 words
2017-06-12 17:46:01 [main] INFO com.ephox.ironbark.IronbarkBoot - Finished task (booting Ironbark)
The above log shows that 3 custom dictionaries were found, one "global", language-independent one and one each for English and French. They were found to contain 1, 3 and 2 words, respectively. Please check that this report matches your expectations.