Organizing Your DITA Files

lenses on document

By Rodolfo M. Raya (rmraya@maxprograms.com)
Chief Technical Officer, Maxprograms
July 2016

Introduction

If you need to work with DITA, chances are that you have to deal with many files of different types, shapes and colors. I’m writing this article to help you avoid the most common problems related to file management and DITA that I’ve seen in the last few years.

Keeping your files well organized is easy but sometimes it takes a few mistakes to find the right way to manage all the pieces in your documentation puzzle.

Prefixes or Folders

It is quite common to find teams that prefix their topics with “C_” when referring to a Concept, “T_” when the topic is a Task and “R_” when the file is intended for Reference. If your project includes hundreds of files, finding them in a single folder is not trivial. The prefix helps a little but very often it is not enough.

If you store your files in separate folders according to their content types, you have less files in each folder and the prefix is not needed. Use meaningful file names and locating content will become easier.

The basic folder structure that I always use looks like this:

 

Basic folder structure

 

A DITA project exists because you want to generate output from your topics, probably in more than one format. Keep the output folder separated and store each publication type in its own subfolder. The basic structure changes to this:

 

Basic folders including publications

 

 

Multilingual Projects

You may not have planned it, but one day you may be required to translate your DITA project. If you organized your files using folders as suggested above, publishing your projects in a different language would be easy.

Your final goal is to have a folder structure that looks the same for every language in your project.

 

Multilingual folder structure

 

After translating your topics, all you would have to do is copy the pictures from the Images folder of the original file set to the new Images folder of the translation. After doing that, your translated project should be ready for publishing.

You may want to separate untranslatable images, like logos or backgrounds pictures, and content that you don't want to translate. Put those files in a separate folder tree and adjust the references in your topics. Doing so will avoid unnecessary duplicated files in your structure.

 

Multilingual folder structer with folder for untranslatable files

 

Translating Images

If you are documenting software products, you may have to capture screens for every language in your project and place them in the corresponding folder before publishing.

If you create your own graphics for your publications, prepare them using SVG (Scalable Vector Graphics) format. Text in SVG files can easily be extracted for translation. If you use SVG graphics, all you might have to do is resize the text in your images after translation to account for differences in length.

Keys instead of @conref

When the mechanism for content referencing using keys was included in DITA, I didn't like it. I was wrong and after dealing with some massive projects, I tend to like content referencing via keys more than using the @conref attribute.

If you have a topic with content referenced from multiple files using @conref attributes, renaming or moving that topic can be a pain because you would have to update all documents that reuse its content. If you use keys and @conkeyref instead, all you have to do after moving or renaming a topic is update the place where the corresponding key is defined.

Version Control Systems

Content changes all the time. It seems that the quest for improving topics never ends. One day you realize that you need something as it was before the last change and the quest for old content recovery begins.

If you use a version control system, like GIT, Subversion (SVN) or CVS, you should be able to retrieve old versions of your content from the system repository when the need to go back in time arrives.

If you work in a team and many writers may potentially introduce simultaneous changes in a topic, a version control system becomes essential for resolving text conflicts.

Years ago CVS was the best tool available. Subversion, also known as SVN, was created to solve some management problems present in CVS. GIT appeared later as an improved system.

I used to work with CVS and then adopted SVN because moving files and folder was not tracked by CVS. Both SVN and GIT work very well for controlling DITA sources.

Some XML editors capable of working with DITA include support for SVN. If your editor supports SVN, don’t think twice and use it.

CVS, SVN and GIT are free tools. There is no excuse for not using one of them.

Common Sense

The ideas explained above may seem simple and just a matter of common sense. Nevertheless, I’ve seen many translation projects that involve DITA fail because the principles outlined in this article were not considered by the technical writers.

Apply the concepts presented here in your DITA projects, adjusting them to fit your particular needs, and you will not regret doing any of the required changes.

About the author

Rodolfo M. Raya

Rodolfo Raya is Maxprograms' CTO (Chief Technical Officer), where he develops multi-platform translation/localisation and content publishing tools using XML and Java technology. He can be reached at rmraya@maxprograms.com.