Localizing Your IOS/OS X Apps the Right Way: Stringsdict to the Rescue!

The Problem of Plurals

Borys Zibrov
Universal Language

--

Translation is hard. Localization is harder. There are many nuances and intricacies to deal with and each different platform and library adds its own complications. Unfortunately, developers often don’t think of internationalization until after the main coding is done, and even if they do, they may not be aware of the details of other languages, and as a result they make costly blunders. So, engineers, read this article! I’ll discuss using Apple’s .stringsdict file format to solve one of the most persistent problems in localization: allowing strings with gendered nouns and plurals to be properly localized and used in your iOS/OS X apps. I’ll show you some basic and more advanced usages of this file format, explain its strong and weak points, and show you how we parse and handle those files in Smartling. Please note that I won’t talk much about genders here because the same principles apply as for plurals and because stringsdict gender support is not as well-defined and mature.

But first I need to define the problem to make sure we all understand why Apple decided to create yet another localization format.

Remember the last time you saw a message like this?

Hopefully a long time ago or, perhaps, never (well, Windows command line comes from MS DOS times, right?). That’s an easy example, plain English, but there’s something disturbing in here: just one message is used for any number of files processed.

To understand why this is disturbing you need to know about plural rules for different languages. In English, we write ‘file’ for one file, and ‘files’ for any other number, but what if we had more than two forms? This topic is widely covered elsewhere, so please see the CLDR website for a quick explanation and a chart for all the languages here (you can find other useful internalization charts and articles on the website).

So, in the example above, the string is not only somewhat hard to read but also impossible to translate to languages that have other number of plural forms. Well, you could try something like “1 файл(и)|(ів) було успішно опрацьовано” for Ukrainian, with its 4 plural forms, but that’s just ugly!.

Now, back to our example — let’s discuss implementation. You can of course code all the rules for this case with a simple if then else statement:

if (number of files == 0) then

else if (number of files == 1) then

else

But that’s only for English. For Ukrainian, you will have 4 branches, for Arabic six! That’s messy. This is where all of those different internationalization libraries come in. You ask the library to get a specific plural form for a specific locale, and it does all the dirty work. Depending on the language and platform, your mileage might vary.

The final missing piece is how localization messages are stored. I assume you use some common format for messages and not a custom solution like storing all the localizable strings in the database and exporting / importing translations all the time (that would not be the best approach obviously). Different file format were developed for different platforms, like .properties file for Java messages, strings for IOS/OS X apps; Android uses xml files with a specific schema; Ruby uses YAML, etc. So you ask your localization library for a string and it does a lookup into the message source and returns you the string. Now the catch is that not all the file formats were developed with plurals and genders in mind (remember my comment in the beginning that the people writing software might not be intimate with intricacies of how different languages work?). And if you’ve chosen a file format with zero support for language-specific rules (say, java properties or xliff or json) you’re pretty much stuck, unless someone created an extended library for your file format that you can leverage (like Smartling did for IOS/OS X .strings files back when there was no stringsdict format, see here). Well, now it should be clear why Apple has come up with a new file format supporting plural and gender rules (one of only a few formats to support genders!) in IOS 7, OS X 10.9 (release notes).

Stringsdict overview

You can read about file format here and see how it’s used here. I will also give just a quick example and explain key parts:

<key>%tu match(es) found</key>
<dict>
<key>NSStringLocalizedFormatKey</key>
<string>%#@tu_matches_found@</string>
<key>tu_matches_found</key>
<dict>
<key>NSStringFormatSpecTypeKey</key>
<string>NSStringPluralRuleType</string>
<key>NSStringFormatValueTypeKey</key>
<string>tu</string>
<key>zero</key>
<string>No matches found</string>
<key>one</key>
<string>%tu match found</string>
<key>other</key>
<string>%tu matches found</string>
</dict>
</dict>
  • First, there’s a key (%tu match(es) found) that will be referenced in your NSLocalizedString macro. Then, there’s a dict with all the crucial information.
  • Inside the dict there’s a NSStringLocalizedFormatKey that may contain one or many variables (in our example the variable is %#@tu_matches_found@) and text. Other valid examples are: %#@chapters@, Book a room for %#@num_nights@, %1$#@geese@ landed on %2$#@fields@.
  • Then follows a dictionary for every variable (in our example just one) that contains information on plural / gender rules. NSStringFormatSpecTypeKey gives the type of the rule: plural (NSStringPluralRuleType) / gender (official documentation doesn’t say a word on what value should be used for genders seems to only have started working from iOS 9); NSStringFormatValueTypeKey (our example, tu) gives string format specifier; then plural / gender rules follow (in our example, zero / one / other rules. note, that although zerois not an official CLDR plural form for English you can still use it, as it might be convenient).

You then retrieve your string from the dictionary with the NSLocalizedString macro:

NSString *localizedString = [NSString localizedStringWithFormat:NSLocalizedString(@"%tu match(es) found", nil), count];

That’s a very good example of separation of concerns — you don’t have to think about plurals (well, almost) when you write the code. But, as you can see, format is a bit noisy and it’s quite easy to mess things up during translation (as a translator you have to identify translatable parts, insert all the special words in their respective places, perhaps even add / remove plural forms). As you already know, I’m not a linguist, nor a translator, but I guess stringsdict can look daunting first time you see it.

So, assuming you now understand how to use simple one variable — one dictionary stringsdict file, let’s move on to more complex examples.

Having some text in NSStringLocalizedFormatKey

Consider an example:

<key>%d geese landed on a field</key>
<dict>
<key>NSStringLocalizedFormatKey</key>
<string>%#@geese@ landed on a field</string>
<key>geese</key>
<dict>
<key>NSStringFormatSpecTypeKey</key>
<string>NSStringPluralRuleType</string>
<key>NSStringFormatValueTypeKey</key>
<string>tu</string>
<key>one</key>
<string>One goose</string>
<key>other</key>
<string>%tu geese</string>
</dict>
</dict>

There’s one translatable string here with 2 plural forms (English) but it’s scattered across 3 different places. It’s a bad idea to present these strings as different entities to the translator as context could be lost, conjugations might go awry etc. So, when Smartling processes a file like this, we first move all the text out of NSStringLocalizedFormatKey to the dictionary, leaving only one variable there, like this:

<key>%d geese landed on a field</key>
<dict>
<key>NSStringLocalizedFormatKey</key>
<string>%#@geese@</string>
<key>geese</key>
<dict>
<key>NSStringFormatSpecTypeKey</key>
<string>NSStringPluralRuleType</string>
<key>NSStringFormatValueTypeKey</key>
<string>tu</string>
<key>one</key>
<string>One goose landed on a field</string>
<key>other</key>
<string>%tu geese landed on a field</string>
</dict>
</dict>

From internationalization perspective, both files are equivalent. We could have used the first form, and done all the transformations internally in our system, but it would be much more complex, and I don’t see any real benefits.

In the Smartling Translation Interface it looks like this:

It will even work if the localized format key has a variable in some random position, say “On a field %#@geese@ landed”.

Multiple Plurals

Now, what if we have a couple of plurals in one string? Consider this more complex example:

<key>geese.landed.ct</key>
<dict>
<key>NSStringLocalizedFormatKey</key>
<string>%1$#@geese@ landed on %2$#@fields@</string>
<key>geese</key>
<dict>
<key>NSStringFormatSpecTypeKey</key>
<string>NSStringPluralRuleType</string>
<key>NSStringFormatValueTypeKey</key>
<string>d</string>
<key>one</key>
<string>A goose</string>
<key>other</key>
<string>%d geese</string>
</dict>
<key>fields</key>
<dict>
<key>NSStringFormatSpecTypeKey</key>
<string>NSStringPluralRuleType</string>
<key>NSStringFormatValueTypeKey</key>
<string>d</string>
<key>one</key>
<string>1 field</string>
<key>other</key>
<string>%d fields</string>
</dict>
</dict>

You can see that here we have some text, and two variables. For each of the variables there’s a dict with plural rules. To process such files at Smartling we do what we’ve done in the previous example: move text and all but one variable out of the NSStringLocalizedFormatKey.

<key>geese.landed.ct</key>
<dict>
<key>NSStringLocalizedFormatKey</key>
<string>%1$#@geese@</string>
<key>geese</key>
<dict>
<key>NSStringFormatSpecTypeKey</key>
<string>NSStringPluralRuleType</string>
<key>NSStringFormatValueTypeKey</key>
<string>d</string>
<key>one</key>
<string>A goose landed on %2$#@fields@</string>
<key>other</key>
<string>%1$d geese landed on %2$#@fields@</string>
</dict>
<key>fields</key>
<dict>
<key>NSStringFormatSpecTypeKey</key>
<string>NSStringPluralRuleType</string>
<key>NSStringFormatValueTypeKey</key>
<string>d</string>
<key>one</key>
<string>1 field</string>
<key>other</key>
<string>%3$d fields</string>
</dict>
</dict>

Let’s see how this looks in the Translation Interface:

As you can see, for translator %d and %#@fields@ are placeholders. Placeholders are not translated, they can be just moved around (deleted too, but that’s usually an error). We apply ordering to placeholders (like this, %1$d, %2$#@fields@) in case they need to be swapped (for instance %1$#@fields@, %2$d). But from a stringsdict perspective %#@fields@ is not a placeholder, it’s just a variable.

So there is a trade-off here for us, leave %#@fields@ alone and risk a mess if this variable is not copied to the translation (this means basically that the translator needs to be ‘stringsdict-aware’), or converting it to a placeholder and risking non-consecutive numeration (assuming we’re not willing to make a special type of placeholder just for stringsdict).

You still retrieve the formatted string like this:

NSString *localizedString = [NSString localizedStringWithFormat:NSLocalizedString(@"geese.landed.ct", nil), geeseCount, fieldsCount];

and geeseCount is provided for %1$d format specifier and fieldsCount is provided for %$3#d format specifier even though they are non-consecutive. This works if they are swapped too. So here we rely on an objective-c dependent implementation, which is not guaranteed and may change in the future. And we are being cautious! Actually, according to IEEE Std 1003.1 you have to provide all the arguments from 0 to N-1th. This requires you to write a work around function in case you want to omit some arguments (which is often the case).

To sum it up, we have covered plurals localization theory, stringsdict format structure, simple and more advanced usage examples, and some details of our implementation. Even though you can create more complex expressions inside NSStringLocalizedFormatKey involving paragraphs of texts and dozens of variables, I suggest you keep things nice and simple when possible. Split messages into smaller units of high cohesion and please do make use of the nice and smart rule-based features of stringsdict files.

--

--