Selecting a Translation Tool for DITA
By Rodolfo M. Raya (rmraya@maxprograms.com)
Chief Technical Officer, Maxprograms
July 2012
Introduction
DITA is an XML vocabulary, but not just any XML. It has certain particularities that are not easy to handle by an ordinary XML editor or a translation tool.
Like an XML editor that is good for authoring in DITA, a translation tool capable of properly handling DITA files should:
- Be able to resolve DITA content references, supporting the
conref
attribute or thekeyref
mechanism; - Be able to support DITA specializations, allowing the customization of translatable elements and attributes.
- Understand the
translate
attribute.
The content referencing problem
The DITA file shown in Listing 1 below has
conref
attributes that reference elements from the file
shown in Listing 2.
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE task PUBLIC "-//OASIS//DTD DITA Task//EN" "task.dtd"> <task id="task_hdj_drv_bh"> <title>Applying XSL Transformation</title> <taskbody> <steps> <step> <cmd>Open the document to transform.</cmd> </step> <step> <cmd>In <uicontrol conref="ui_reference.dita#uiref/xsl_menu"/> menu, select <uicontrol conref="ui_reference.dita#uiref/xsl_trans"/>.</cmd> </step> <step> <cmd>Select the appropriate XSL Stylesheet</cmd> </step> <step> <cmd>Click the <uicontrol conref="ui_reference.dita#uiref/xsl_apply"/> button.</cmd> </step> </steps> </taskbody> </task>
Listing 1 - DITA topic that uses
conref
mechanism
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd"> <concept id="uiref"> <title>UI Elements</title> <conbody> <p><uicontrol id="xsl_menu">Transformation</uicontrol>: program menu that contains all transformation options.</p> <p><uicontrol id="xsl_trans">XSL Transformation</uicontrol>: applies an XSL Stylesheet to an XML document.</p> <p><uicontrol id="xsl_apply">Apply Transformation</uicontrol>: applies the selected XSL Stylesheet to the current open document.</p> </conbody> </concept>
Listing 2 - DITA topic that contains referenced text
An XML editor able to resolve the conref
attributes in would
display that file in WYSIWYG mode as
shown in Figure 1.
For a technical writer working with DITA, it is important that the chosen
XML editor resolves conref
attributes and displays the
referenced content.
For a translator it is also essential to see the text being translated in
a complete representation. If conref
content is not
resolved when translatable text is extracted from the DITA file, the
translator will lack the necessary context for performing the
translation task.
In Figure 2 below you can see translatable text from Listing 1 extracted by a Computer Aided Translation (CAT) tool that supports DITA content referencing. In Figure 3 and Figure 4 you see the same text extracted by two tools that treat DITA documents as regular XML.
The pictures shown above include markers that represent the original DITA
markup. In one case (Figure 2) you can see the
actual text referenced by conref
attributes; in the other
picture you see just markers.
By using tools that extract complete sentences from your DITA sources, you give translators the context they need. Although this adds to the price you pay if your Localization Service Provider (LSP) charges you by words, the cost increase should be compensated by an improvement in translation quality that would require less review work.
The customization problem
DITA includes a set of DTDs and XML Schemas that contain almost all elements and attributes needed in a standard documentation project. Nevertheless, sometimes the standard set of elements and attributes is not enough and custom extensions are needed.
DITA has a standard extension mechanism known as "specialization". DITA users are allowed to modify the default set of DTDs and XML Schemas, following certain rules, to incorporate the pieces they need.
As DITA is becoming more and more popular, many translation tool vendors include configuration files for the XML filters of their tools that facilitate text extraction from standard DITA documents. Unfortunately, not all tools allow support for DITA specializations.
If you use specialization in your DITA projects, the translation tool used to process your files should:
- Allow you to customize the list of translatable elements and attributes;
- Allow you to incorporate your custom DTDs and XML Schemas in the tool's XML catalog (if it uses one).
Even if you don't use specializations, you may still require customized
translations. For example, the standard <draft-comment>
element is normally used for internal consumption and readers of the
published documentation almost never see its content. Thereafter, the
element <draft-comment>
is usually treated as
untranslatable by CAT tools. However, you may still need a translation
of <draft-comment>
for your content reviewers. Only if
you or your LSP use customizable CAT tools you will be able to get the
desired translations.
Dealing with the translate
attribute
Sometimes you will include portions of text in your DITA files that
should not be translated. To mark those pieces as untranslatable you
simply set the value of the translate
attribute to
no
, as shown below in Listing
3.
<p translate="no">Warning: this text should not be translated.</p>
Listing 3 - Untranslatable text
Some translation tools simply ignore the translate
attribute
and extract the text for translation anyway.
Notice that the translate
attribute should be used with
block level elements (those that contain full paragraphs or sentences),
like <p>
. Setting the translate
attribute
to no
in an element that appears in the middle of a
sentence is a bad idea, as the translator working with the surrounding
text still needs to see the element content for context. Listing 4 shows how you
can safely protect
untranslatable text that appears in the middle of a sentence by
referencing a copy stored in an untranslatable element.
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd"> <concept id="locking"> <title translate="no">Untranslatable Title</title> <conbody> <p>This sentence contains <ph conref="#locking/lock"/> text.</p> <draft-comment translate="no"><ph id="lock">untranslatable</ph></draft-comment> </conbody> </concept>
Listing 4 - Untranslatable inline text protected in
<draft-comment>
A translation tool parsing Listing 4 should be able to:
- Ignore the
<title>
element; - Include the word "untranslatable" when extracting the
<p>
element; - Ignore the
<draft-comment>
element.
Below, in Figure 5, Figure 6 and Figure 7, you can see how three translation tools interpreted the content of Listing 4.
- All respected the
traslate
attribute in<title>
- Only one was able to include the referenced text in
<p>
for context. - One of them presents the
<draft-comment>
element with nothing to translate in it.
Make sure your translation tool can ignore block elements that have the
translate
attribute set to no.
The file handling problem
A DITA project may contain hundreds of small files. That's not unusual but normally makes file handling somewhat annoying.
When working with a large number of files, DITA teams may opt for using a Content Management System (CMS) or a version management system like CVS or SVN. A CMS is not really required for working with DITA but it may simplify project management.
A CMS may help you separate the files referenced by a DITA map and prepare a package for translation. If you don't have a CMS, you may use a DITA-enabled translation tool for separating the files that need translation from those that don't.
A DITA-enabled translation tool should be able to parse a DITA map and resolve the references to all topics and subtopics, preparing a unified package that you can send to your LSP.
If your LSP charges you for file management, you can reduce cost by preparing a consolidated translation package in-house.
Resources
- Download a test package containing the files
shown in Listing 1, Listing
2 and Listing 4 plus a DITA map and
verify if your translation tool can:
- resolve
conref
andhref
attributes; - understand the
translate
attribute; - generate a unified package by parsing a DITA map.
- resolve
- Read Using XLIFF to Translate DITA Projects, an article prepared by the OASIS DITA Adoption Technical Committee and learn how to improve your translation workflow.
- Download a copy of Fluenta DITA Translation Manager, a tool that implements the translation workflows suggested by the DITA Adoption TC at OASIS.
About the author
Rodolfo Raya is Maxprograms' CTO (Chief Technical Officer), where he develops multi-platform translation/localisation and content publishing tools using XML and Java technology. He can be reached at rmraya@maxprograms.com.