The Spelling and Grammar Checker DITA-OT Plug-in is an extension of the base DITA Validator which adds simple rule-based spelling and grammar validation for the text elements within DITA documents.
The plug-in supports three transtypes
:
auto-correct
- remove reported spelling and grammar errors from a DITA documenttext-rules
- create an error report in Schematron Validation Report Language (SVRL
) formattext-rules-echo
- display the results of anSVRL
error report in a human-readable format
More information about SVRL
can be found at www.schematron.com
Most of the spell-checking rules are based on a list of known typographical errors and faults, and the ruleset can be easily altered to include new constraints. Checking against a list of known errors means that no false positives should occur, but the existing list will never be fully comprehensive.
Table of Contents
The validator has been tested against DITA-OT 3.0.x. It is recommended that you
upgrade to the latest version. Running the validator plug-in against DITA-OT 1.8.5 or earlier versions of DITA-OT will
not work as it uses the newer getVariable
template. To work with DITA-OT 1.8.5 this would need to be refactored to use
getMessage
. The validator can also be run safely against DITA-OT 2.x., DITA-OT 3.x and DITA-OT 4.x
The spell-checker is a plug-in for the DITA open toolkit. Futhermore, it is not a stand alone plug-in as it extends the
base validator plug-in (com.here.validate.svrl
).
-
Full installation instructions for downloading DITA-OT can be found here.
- Download the
dita-ot-4.0.zip
package from the project website at dita-ot.org/download - Extract the contents of the package to the directory where you want to install DITA-OT.
- Optional: Add the absolute path for the
bin
directory to the PATH system variable. This defines the necessary environment variable to run thedita
command from the command line.
- Download the
curl -LO https://github.com/dita-ot/dita-ot/releases/download/4.0/dita-ot-4.0.zip
unzip -q dita-ot-4.0.zip
rm dita-ot-4.0.zip
- Run the plug-in installation commands:
dita install https://github.com/doctales/org.doctales.xmltask/archive/master.zip
dita install https://github.com/jason-fox/com.here.validate.svrl/archive/master.zip
dita install https://github.com/jason-fox/com.here.validate.svrl.text-rules/archive/master.zip
The dita
command line tool requires no additional configuration.
A test document can be found within the plug-in at: PATH_TO_DITA_OT/plugins/com.here.validate.svrl.text-rules/sample
To create an SVRL file with the spell-checker plug-in use the text-rules
transform with the
--args.validate.mode=report
parameter.
- From a terminal prompt move to the directory holding the document to validate
- SVRL file creation can be run like any other DITA-OT transform:
PATH_TO_DITA_OT/bin/dita -f text-rules -o out -i document.ditamap --args.validate.mode=report
Once the command has run, an SVRL
file is created:
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<svrl:schematron-output>
<active-pattern role="dita" name="/incorrect-spelling.dita"/>
<fired-rule context="common" role="grammar"/>
<fired-rule context="default-lang" role="spelling"/>
<failed-assert role="error" location="/topic/body[1]/section[2]/p[1]">
<diagnostic-reference diagnostic="incorrect-spelling">
Line 17: p - [incorrect-spelling]
The word 'separate' is spelt incorrectly ('seperate') in the following text:
Seperate accommodation can be found within the main building...
</diagnostic-reference>
</failed-assert>
<fired-rule context="default-lang" role="grammar"/>
<fired-rule context="english" role="grammar"/>
</svrl:schematron-output>
To echo results to the command line with the spell-checker plug-in use the text-rules
transform without specifying a
report
- Spell-checking (
text-rules
) can be run like any other DITA-OT transform:
PATH_TO_DITA_OT/bin/dita -f text-rules -i document.ditamap
Once the command has run, all errors and warnings are echoed to the command line:
[ERROR] [/topics/incorrect-spelling.dita]
Line 17: p - [incorrect-spelling]
The word 'separate' is spelt incorrectly ('seperate') in the following text:
Seperate accommodation can be found within the main building.
Additionally, if an error occurs, the command will fail
[ERROR] [/topics/incorrect-spelling.dita]
Line 17: p - [incorrect-spelling]
The word 'separate' is spelt incorrectly ('seperate') in the following text:
Seperate accommodation can be found within the main building.
Found 1 Errors 0 Warnings
Error: [SVRL001F][FATAL] Error: Errors detected during validation
To auto-correct spelling mistakes with the spell-checker plug-in use the auto-correct
transform.
- Auto-correction (
auto-correct
) can be run like any other DITA-OT transform:
PATH_TO_DITA_OT/bin/dita -f auto-correct -i document.ditamap
Once the command has run spelling mistakes will have been removed from the document.
Note: The auto-correct
transtype
only removes spelling, duplicate and grammar errors specified in the dictionary files
args.validate.ignore.rules
- Comma separated list of rule IDs to be ignoredargs.validate.blacklist
- Comma separated list of words that should not be present in the running textargs.validate.cachefile
- Specifies the location of cache file to be used. Validation will only run across altered files if this parameter is presentargs.validate.check.case
- Comma separated list of words which have a specified capitalizationargs.validate.color
- When set, errors and warnings are Output highlighted using ANSI color codesargs.validate.mode
- Validation reporting mode. The following values are supported: -strict
- Outputs both warnings and errors. Fails on errors and warnings. -default
- Outputs both warnings and errors. Fails on errors only -lax
- Ignores all warnings and outputs errors only. Fails on Errors only -report
- Creates an SVRL filesvrl.customization.dir
- Specifies the customization directorysvrl.filter.file
- Specifies the location of the XSL file used to filter the echo output. If this parameter is not present, the default echo output format will be used.text-rules.ruleset.file
- Specifies severity of the rules to apply. If this parameter is not present, default severity levels will be used.
An ANT build file is supplied in the same directory as the sample document. The main target can be seen below:
<dirname property="dita.dir" file="PATH_TO_DITA_OT"/>
<property name="dita.exec" value="${dita.dir}/bin/dita"/>
<property name="args.input" value="PATH_TO_DITA_DOCUMENT/document.ditamap"/>
<target name="spell-check" description="spell-check a document">
<!-- For Unix run the DITA executable-->
<exec executable="${dita.exec}" osfamily="unix" failonerror="true">
<arg value="-input"/>
<arg value="${args.input}"/>
<arg value="-output"/>
<arg value="${dita.dir}/out/svrl"/>
<arg value="-format"/>
<arg value="text-rules-echo"/>
<!-- validation transform specific parameters -->
<arg value="--args.validate.blacklist=(kilo)?metre|colour|teh|seperate"/>
<arg value="--args.validate.check.case=Bluetooth|HTTP[S]? |IoT|JSON|Java|Javadoc|JavaScript|XML"/>
<arg value="--args.validate.color=true"/>
</exec>
<!-- For Windows run from a DOS command -->
<exec dir="${dita.dir}/bin" executable="cmd" osfamily="windows" failonerror="true">
<arg value="/C"/>
<arg value="dita -input ${args.input} -output ${dita.dir}/out/svrl -format text-rules-echo --args.validate.blacklist="(kilo)?metre|colour|teh|seperate" --args.validate.check.case="Bluetooth|HTTP[S]? |IoT|JSON|Java|Javadoc|JavaScript|XML""/>
</exec>
</target>
The spelling and grammar checker currently supports three languages:
en
- Englishde
- Germanfr
- French
The language checked is based on the default.language
setting of the DITA Open toolkit. This can be modified in the
lib/configuration.properties
file.
Please note that error messages have not been internationalized into French and currently all error messages will be displayed in English.
Sample lists of duplicated and mis-spelt words are available in all three languages. Grammar and punctuation lists have
only been supplied for en
.
Additional languages can be added by creating a new language folder and files under cfg/dictionary
The list of misspelt words to check when spell-checking can be altered by amending entries in the xml files under
cfg/dictionary
. The plug-in recognizes four types of errors:
- duplicates.xml - Duplicated words.
- grammar.xml - Grammar errors (includes a ban on the use of contractions in formal text )
- punctuation.xml - Punctuation marks (includes a ban on smart quotes).
- spelling.xml - Spelling mistakes
Each entry takes the form of a pair
<entry>
<mistake>accessable</mistake>
<corrected>accessible</corrected>
</entry>
<entry>
<mistake>acident</mistake>
<corrected>accident</corrected>
</entry>
<entry>
<mistake>accidentaly</mistake>
<corrected>accidentally</corrected>
</entry>
The severity of a validator rule can be altered by amending entries in the cfg/ruleset/default.xml
file The plug-in
supports four severity levels:
- FATAL - Fatal rules will fail validation and cannot be overridden.
- ERROR - Error rules will fail validation. Errors can be overridden as described above.
- WARNING - Warning rules will display a warning on validation, but do not fail the validation. Warnings can also be individually overridden.
- INACTIVE - Inactive rules are not applied.
A custom ruleset file can be passed into the plug-in using the text-rules.ruleset.file
parameter
PATH_TO_DITA_OT/bin/dita -f text-rules-echo -i document.ditamap --text-rules.ruleset.file=PATH_TO_CUSTOM/ruleset.xml
Rules can be made inactive by altering the severity (see above). Alternatively a rule can be commented out in the XSL configuration file.
Individual rules can be ignored by passing the args.validate.ignore.rules
parameter to the command line. The value of
the parameter should be a comma-delimited list of each rule-id
to ignore.
For example to ignore the latin-abbreviation
validation rule within a document you would run:
PATH_TO_DITA_OT/dita -f text-rules-echo -i document.ditamap -Dargs.validate.ignore.rules=latin-abbreviation
Specific instances of a rule can be ignored by adding a comment within the *.dita
file. The comment should start with
ignore-rule
and needs to be added at the location where the error is flagged.
<!--
This is an example of a spelling mistake
-->
<p>
<!-- ignore-rule:incorrect-spelling -->
I have deliberately misspelt the word accidentaly (sic) - it should be written with a double l.
</p>
-
A block of DITA can be excluded from firing all rules at WARNING level by adding the comment
ignore-all-warnings
to the block. -
A block of DITA can be excluded from firing all rules at ERROR level by adding the comment
ignore-all-errors
to the block. -
Rules set at FATAL level cannot be ignored.
A sample document can be found within the plug-in which can used to test plug-in rules. The document covers both
positive and negative test cases. The sample document contains valid DITA which can be built as an HTML or as a PDF
document - please use the html
or pdf
transform to read the contents or examine the *.dita
files directly.
A complete list of rules covered by the plug-in can be found below. The final <chapters>
of the sample document
contain a set of test DITA <topics>
, each demonstrating a broken validation rule.
The <topic>
files are sorted as follows:
- The base validation DITA-OT plugin (
com.here.validate.svrl
) - this<chapter>
contains two common textual validation rules. - The text-rules DITA-OT plugin (
com.here.validate.svrl.text-rules
) - This<chapter>
contains a set of English language spelling and grammar rules.
Spell-checker Error Messages
The following table list the spell-checker error messages by message ID.
Message ID | Message | Corrective Action/Comment |
---|---|---|
a-followed-by-vowel | In the following text, change 'a' to 'an' where appropriate: | In English, the general guideline is that the indefinite article in front of count nouns that begin with a vowel sound should be 'an'. |
an-followed-by-consonant | In the following text, change 'an' to 'a' where appropriate: | In English, the general guideline is that the indefinite article in front of count nouns that begin with a consonant sound should be 'a'. |
blacklisted-word | The word '{word}' is not allowed in the following text: | The indicated word has been banned from use. Rewrite the phrase using alternatives. |
duplicated-punctuation | The punctuation mark '{char}' is duplicated in the following text: | The indicated punctuation mark is duplicated. Delete the duplicated punctuation mark. |
duplicated-words | The word '{word}' is duplicated in the following text: | The indicated word is duplicated. Delete the duplicated word.. |
incorrect-capitalization | The word '{word}' is incorrectly capitalized in the following text: | The indicated word is not capitalized correctly. Fix the capitalization. |
incorrect-grammar | The phrase '{phrase}' is grammatically incorrect in the following text: | The indicated phrase does not make sense. Rewrite the phrase using the correct grammar. |
incorrect-punctuation | The punctuation mark '{char}' has been used incorrectly in the following text: | The indicated punctuation mark is non-standard. Replace with a corrected punctuation mark. |
incorrect-spelling | The word '{word}' is a spelling mistake in the following text: | The indicated word is not spelled correctly. Fix the spelling. It is assumed that documentation follows US English spelling conventions. |
latin-abbreviation | The accronym i.e or e.g has been used in the following text: | Latin accronyms are difficult to understand. Consider rewriting the phrase using alternatives, such as "for example" |
sentence-capitalization | The run on sentence in the following text does not start with a capital letter | The indicated sentence is not punctuated correctly. Fix the punctuation. |
split-inifinitive | The phrase '{phrase}' is written using a split-infinitive in the following text: | The indicated sentence includes a split-infinitive, which is considered poor grammatical style - consider rephrasing the sentence. |
where-not-were | The word 'were' has been used to start a subordinate clause in the following text: | The indicated sentence does not make sense. Rewrite the phrase using the correct grammar. |
PRs accepted.
Apache 2.0 © 2018 - 2022 HERE Europe B.V.
See the LICENSE file in the root of this project for license details.