Home > Articles > Programming > Java

  • Print
  • + Share This
From the author of

Testing for common problems

All professional function testers have their favorite problem areas on which they concentrate, such as bounds testing, stress testing, and common path testing. This is also the case for the translation test professional, who has mastered the language for which they test and developed a mental list of common problem areas. Most of these apply to all target language translations, but others are specific to a given language, culture, or country. Let's focus on the former, since these points can be addressed without first mastering the target language. They can also be applied early in the verification test phase, that is, before your translation test experts arrive and time and money start to slip away.

Tip: Entering accented characters

Windows support for the US-International Keyboard Layout is helpful for entering accented characters on a QWERTY keyboard (go to Control Panel > Keyboard > Input Locales). It enables so-called "dead key" input of accented characters. For example, typing the single quote + the letter you want to accent will result in that letter with the acute accent ('+e = é, '+a = á, etc.). Similarly, the caret (^) will add the circumflex (), the left grave accent modifier (´) will add the grave accent (à), and the double-quote (") will add the dieresis (ä).

However, while this enables QWERTY keyboard users to enter accented characters, it does not assure that the equivalent accented key on a "native" keyboard will work. This is true simply because the key scan code sequences will be different in the two cases, permitting the possibility that an input error is not detected on a non-native keyboard. Input errors for double-byte characters are equally problematic. And while Windows does support double-byte character input on a non-DBCS machine, the setup is more intrusive than simply toggling keyboard layouts -- it involves installing DBCS fonts and using multi-stroke key entry.

Testing for failure to use locale-sensitive functions

This type of problem centers around the locale-specific display of data, covered in detail in How to Internationalize your Eclipse Plug-in and the Java Tutorial: Internationalization trail. Developers can test this in advance by changing their regional settings for numbers, currency, time, and dates, then validating that these fields are indeed displayed using the current locale and that input is accepted as expected. Verify that sorted lists are correct, even if they include accented characters. The precise collation order may not be obvious to a non-native and thus will have to wait for native translation testers, but generally you can expect that unaccented characters and their accented counterparts should sort near each other. That is, a failure to use the Collator class or its equivalent for sorting is evident because a binary compare will typically result in the accented equivalent appearing quite far from its counterpart (for example, a = \u0061, á = \u00e8).

The good news is that failures to use Java's locale-sensitive classes are less common than those problems that are listed further on in this article, owing perhaps to the fact that locale-dependent fields are in the minority. Programmers also tend to have a more intuitive appreciation for the nature of NL problems solved by these classes, as opposed to those introduced by the translation process itself. Before moving on to the more subtle testing concerns below, remember to consider input field entry. Verify that you can enter accented characters and that they are not corrupted by codepage transformations (or lack thereof) when re-read from permanent storage. Verify that double-byte characters are not corrupted (split) by non-Unicode aware manipulations.

Testing for new code problems introduced by the translation

It is difficult to generalize about the characteristics of this type of problem, but common threads point to the erroneous assumption that the form of NL data adheres to the programmer's native language, or that the data is not NL sensitive at all. Examples include parsing text under the assumption that the separator will be a period or space, or inadvertently using NL data to define or modify character set-sensitive data like database column names without coding appropriate user entry field validation.

Testing for hardcoded strings

Hardcoding a string is surely the "granddaddy" of all TVT errors, outnumbering all others by far. There are two basic approaches to ferreting them out: the black-box approach and the white-box approach. The white-box approach relies on scanning the Java source code, XML, and HTML looking for hardcoded strings. The External Strings wizard of the Eclipse Workbench Java Development Tooling (JDT) automates the Java source code scanning process.

The black-box approach is manually intensive, as it relies on executing testcase scenarios that display all the product's user interface elements (views, menus, dialogs, etc.) on a non-US English workstation to a tester who validates all text is translated.

There is, however, a gray-box approach that falls somewhere in between. It involves translating all the text into an easily recognizable benign string of the same number of characters and words. For example, the information message below:

Import resources from the local file system


****** ********* **** *** ***** **** ******

If your product test team was wise enough to use an automated test tool, their scripts can be modified to detect the cases where strings are hardcoded and were not detected by the black-box approach. These omissions could be because of untranslated third-party products that your product uses, composed strings, etc. If you don't have the benefit of automated testing, this approach will still simplify manual hardcoded string detection. Furthermore, it has the advantage of requiring no specific language skills. Some testers prefer the "Pig Latin" approach. Our example above would then be:

Importway esourcesray omfray ethay ocallay ilefay ystemsay

This has the advantage of retaining some level of readability while making it equally clear what text has not been translated.

Testing for inadvertently translated strings

The gray-box approach is perhaps the most rigorous method to validate that all strings have been translated. But it also is helpful in detecting that there are not any inadvertent translations. Consider, for example, an extract of the properties file org.eclipse.jdt.internal.formatter\options.properties below:

style.reuseExistingLayout.name=&Reuse existing layout
style.reuseExistingLayout.possibleValues=2|Reuse|Do not reuse
style.reuseExistingLayout.description=If the user formatted code a certain way, ...

Translating all the apparent user text to asterisks would include the key and value below:

style.reuseExistingLayout.possibleValues=2|*****|** *** *****

While this may not cause a run-time bug per se, it is not evident that these values actually represent the key used to persist the user's preference (in other words, "Reuse|Do not reuse" are never displayed to the end user). If the translator unknowingly translated two terms that are different in English into the same word in their target language, it is possible that changing one preference would overwrite an unrelated preference.

That is the risk of mixing the use of properties files for translatable text and run-time parameterization. This is certainly a valid programming technique, but it requires that the translator be aware of it, at a minimum by adding comments to the properties files itself. Better still, use values that are clearly programmatic. Returning to our example:


Here it is obvious to both the translator and their translation tools that the values after the equal sign should not be modified.

Testing for text expansion problems

The average text expansion of English to several European languages is around 40%. Consider these examples. First, one English word translated to two German words:

Restart -> Neu starten

Not a problem, an expansion of seven characters to eleven. But here's a surprise. Two English words translated to one (long) German word:

Counter Logs -> Leistungsindikatorenprotokolle

Ouch, an expansion of 12 characters to 30! While the German language is well recognized for text expansion relative to English, it is not the only one. Consider the Acadamie Française's official French language equivalent of "air bag": coussin gonflable de sécurité. To address this in development before a translation is available, you can modify the text of your properties files to double their lengths. To make it obvious that this is a testcase, a simple script can double each word. Taking our example from above once again:

Import import resources resources from from the the local local file file system system

Now rerun your application and verify that the page layouts are still appropriate, that text is not truncated, etc. If a page is resizable, resize it from each of the four corners. Recognize that even this test has a weakness, since it assumes that phrases will include spaces and can word wrap; this is not true for some non-Latin character based languages.

For those cases where the layout demands that the translated text be minimal, document the limit in the original language file. For example, adding a comment:

# translation note: Text below is limited to approximately 60
# characters, see testcase 1.13 to validate

Here the translator is alerted to the fact that verbosity is not allowed. The testcase reference will allow the translation tester to validate that the text is not truncated, because the choice of letters affects the final text width in proportional fonts. Translators may also need the ability to specify layout constraints, such as a column width in a table, especially those that are not resizable:

# translation note: Width in pixels of the "Completed"
# column. The text is a 'C' in US English, it should
# be as short as possible in the translation.

Here the translator or translation tester can provide an optimal size without resorting to unnaturally terse translations, while taking into consideration the default fonts associated with the language and operating system.


Text truncation implies the inability to access the complete text, even employing alternative user interface mechanisms like scrolling, displaying another dialog, or providing a "More >>" button. Clearly text truncation with no means to visualize it is more serious than simply requiring that the user scroll the viewing area. For these reasons and for reasons of accessibility, avoid creating text areas that cannot scroll or wrap.

Testing for font size changes

Font pixel size for different operating systems and languages can change in both height and width. The majority of these cases are handled by the base widgets; for example, a text field will resize automatically to accommodate a larger font, or autoscroll if necessary. But if you draw your own graphics and text, the appropriate system metrics must be queried in order to avoid arbitrary text truncation.

This is especially true for double-byte language fonts, where the minimum height is generally larger than that of single-byte language fonts. As an aside, handling font size changes is not required for Section 508 accessibility compliance, but it is surely appreciated by those that wish to choose a larger font because of a visual impairment. You can perform a quick test on Windows by increasing the size of the default font and restarting your machine (Display > Settings > Advanced > General > Font Size > Large Fonts). You will be surprised how many of the applications you use everyday fail this simple test. Don't follow their example!

Testing for out-of-context translation errors

One aspect that directly impacts the quality of these translations is the clarity of the context. This is readily apparent when translating property files:

eojMessage = {0} at {1}

This example is fictitious, but it demonstrates the point. What is the context in which this message is displayed? The first parameter could be the date and the second the time. But maybe the first is a resource and the second is the URL of the server where it was stored. Without a comment to indicate, the translator will simply translate it literally to the most likely choice.

Here is an actual example from the Eclipse Workbench:

WorkbenchPreference.autobuild =
  Perform build automatically on resource modification

In this context, "on" means "when a resource is modified." This may be clear to a Workbench user, but the first translation is done by a central organization without specific product knowledge. Since "resource modification" means little to them, "on" could be interpreted literally, as in, "on top of," or quite narrowly in a programmatic sense, as in, "as a super task."

Consider adding notes to the property files so translators will know the context of a given message. It is especially important in those cases where the subject is implied, since many languages must explicitly know the subject in order to choose the appropriate adjective and verb forms. Here are a few messages from the JDT that are displayed as markers within the Java source code editor, augmented with example translator-friendly comments:

# Note: The error messages below are displayed in the Java source
#     editor with an "X" next to the offending line. They are also
#     displayed in the Task List with a reference back to the file.
#     Double-clicking the error in the Task List will open the editor
#     and scroll to the corresponding line number.

# The subject is a method, i.e., "Method must return a result of type 'x'"
108 = Must return a result of type {0}

# "Variable (or field) must provide either dimension expressions...
159 = Must provide either dimension expressions or an array initializer

# "Class must implement the inherited abstract method 'x'"
400 = Must implement the inherited abstract method {0}

# "This class overrides deprecated method from 'its superclass'"
412 = Overrides deprecated method from {0}

Nobody expects a developer writing such a message to be aware of the subtle interpretations of a phrase. The examples above demonstrate that even given the developer's best efforts, there is no substitute for translation testing.

Testing for missed translations

In addition to performing the TVT on the running version of the translated product, you can use the Property Files Compare view to speed up the testing cycle and detect common errors, resulting in a higher quality NL version of your product.

Briefly, the objectives of the tool are two-fold:

  • Allow a speedy verification of the translation by displaying the content of the translated file side-by-side with its corresponding original language file. This is an "out of context" verification, but gives the testers a chance to rapidly review the translation and guarantee that 100% of the files are translated and converted to the correct codepage.

  • Ensure your NL product, once released, will not throw an exception due to missing keywords from the translated files.

Remember there is a gap of time between the moment the original source language files were sent for translation and the time TVT started. It is during this period that the code -- as well as some of the original source language files -- could have changed. New keywords may have been added to them that are not in the translation, resulting in a run-time error.

To resolve this situation, use this view to compare side-by-side the source language files against their corresponding translated files. The tool flags missing keywords and gives you a chance to correct the files.

  • + Share This
  • 🔖 Save To Your Account