I was quite enthusiastic the first time I came across Online Quality Models. The idea of Reviewers providing quality feedback in the Translation Management System (TMS) where the translations reside always seemed to make sense to me. It is essential to the Linguistic Review process that feedback becomes available to all participants in the supply chain, a part of the Global Reference Material and not a piece of individual criticism from one linguist to another.
Online quality evaluation also opens the way to more scalable reporting and meaningful quality trend monitoring. If Translation Management Systems could offer a linguistic quality dashboard, there would be no need to periodically collect data, run macros and pivot tables to support translation quality management.
The reality however is that very few of the TMS setups I’ve come across use Online QA Models. In this post, we look at the reasons for this apparent failure and what might be the future of interactive quality evaluation.
Some of the comments and conclusions below are based on the results of a small survey we ran recently. Thanks to all who participated or sent screen caps and comments directly!
What are Quality Models?
Quality models are standards and metrics used to attempt to quantify linguistic quality. As we all know quality is somewhat subjective. Linguistic quality is possibly even more so. And everyone who has worked with translation review knows that there are times when linguists will just not be able to agree on a particular translation. The balancing act between grammatic accuracy and engaging style is a prime example in the area of Gaming or Social Media for instance.
In order to be able to evaluate translation quality, how it meets the customer’s requirements and very importantly how it evolves over time, it is necessary to assemble a predefined set of references and select a measuring scale.
In Localisation, the set of references is traditionally made up of a number of building blocks:
- the pre-existing translations: local websites, applications, video games, perhaps the competition’s content
- the tools used by linguists to capture this content and ensure consistency (glossaries, translation memories etc.)
- the client’s expectations, which can be captured in style guides, marketing instructions, end-user feedback
- and the language rules: grammar, vocabulary, field of speech, trends.
Quality models are the scale used to quantify a translation’s success at meeting these requirements. They should not be confused with Quality Checks, which refer to more objective parameters such as punctuation rules, adherence to Glossary, in-line formatting (bold, URLs placement) and which can usually be run automatically. Quality models set the rules by which linguistic reviewers evaluate translations.
Two of the most recognised translation quality standards are the LISA model and SAE J2450. The first was developed by the now defunct Localization Industry Standards Association and uses 3 ratings against 39 categories and subcategories (see screenshot). The second comes from the Automotive industry and while it is often listed as an alternative to LISA, its 7 categories and 2 ratings are restrictive due to a lack of emphasis on Style and an over emphasis on accurracy.
A large proportion of Linguistic Review service providers uses a simplified, more manageable version of the LISA model. Typically, the Ratings are Critical, Major, Minor, and perhaps the much debated Preferential. The number of points associated which each varies, even between 2 content types for the same language team. Those ratings are often applied to 3 or 4 Categories covering as Accuracy, Style, Language rules and Consistency with reference material.
What are Online QA Models?
Some translation tools, mostly the online translation interface of Translation Management Systems, offer a facility for reviewers to rate and categorise errors, usually at segment level.
In other words, instead of having to create a report, reviewers can enter their feedback directly in the online translation tool. Not only can they enter comments, there is an interface for them to select pre-defined error categories and severity levels in the very same interface where the translators translated.
The ratings and categories are completely customisable. The system administrators can set up projects to use out-of-the-box settings, such as the aforementioned LISA QA model.
They can also create a completely new set of categories and ratings and may in some cases assign a different amount of points to be deducted for certain errors depending on the content type. For example a high severity error in the Style category might fail a project immediately if it is high visibility content such as Marketing. But the same error category and rating might cause a much lower amount of points to be deducted for a content type where Style is less of a priority, for example Support content.
Another advantage to the reviewers is they no longer need to copy & paste source segments and existing translations to a spread sheet-based report or scorecard. They no longer need to copy & paste their suggested new version from that scorecard to the comments section, or indeed back to the target segment in the TMS – when they are in charge of implementation.
Instead the scorecard becomes a virtual concept within the online translation project, immediately available to the Translators, assuming the project gets back to them after review.
Failure to Launch
Unfortunately a variety of factors has prevented online QA Models from realising their full potential. Although our survey was too small for the figures to have real statistical meaning, a few of these factors clearly emerged in the replies.
User acceptance and Lack of Visibility during quality follow-up were the top 2 answers to the question What would you say are the biggest obstacles to deploying and using Online QA Models? with equal votes. Loss of Reviewer Productivity came close behind. Respondents were Client-side Managers, LSP Project Managers, Reviewers and TMS vendors in almost equal numbers, so it is significant that usability from the linguists point of view was seen as the main issue.
This is not really a surprise. Like Translators, Reviewers prefer to work offline because:
- they need a copy of their deliveries to use during quality follow-up, including after the online job has been submitted to the next step in the workflow or indeed completed.
- desktop CAT tools offer far more automatic quality checks and productivity tools. For example, WorldServer’s Browser Workbench doesn’t even have a Spell checker or Terminology consistency checker.
- limited internet bandwidth can make working online cumbersome for users who are geographically far from where the TMS is hosted.
- they may need a record of their work in case of escalations and for invoicing.
- finally, the User Interface for input quality data is always click-intensive and lacking in shortcuts.
Spread sheets are a resilient species
Scorecards are often the main deliverable for Review services. Reviewers don’t always implement changed, or indeed review complete jobs. Instead they take samples of significant size (1,000 words is often considered the standard). They might implement all changes, but only rate and categorise errors on samples.
Furthermore, the amount of review might differ depending of the level of confidence a translation team has achieved in the previous sampling period. Or different again, they might be measuring work which is only partly done through the TMS. The rest, for example DTP projects, might be translated completely offline.
The virtual scorecard has to offer at least the same level of flexibility to all participants in the program.
Reporting and Monitoring
In most if not all cases, reporting is not available out-of-the-box in the TMS. It has to be customised, using skills such a SQL or InfoPath, which are less readily available to small and mid-size LSPs than the ability to create Excel macros. All Review Services run macros on batches of scorecards to deliver periodic reports detailing trend analysis, quality improvement plans etc.
Achieving the same results in a TMS report environment requires the initial support of the TMS Administrator (or even the TMS vendor themselves) and almost inevitably proves less achievable than the old reliable macro and pivot table approach.
How can we move past this status quo?
All these permutations and moving goal posts lead us to the conclusion that the Translation Management Systems are not necessarily the best home for a Quality Center.
Before it can gain momentum, Online Quality Evaluation must deliver on 3 levels:
- there must be sufficient Return-on-investment for all involved to input this information into a TMS: why add it, if it then has to be exported again before it can be reused?
- the Standards issue must be resolved with 1 set of flexible rules creating sufficient consensus within the industry, and overcoming the static nature often attributed to the LISA Model.
- the Technology must also offer a standard for Quality Evaluation data to be efficiently exchanged between online and offline tools
Regarding the Standards, the Localisation industry is notorious for its inability to agree on them, but several initiatives from TAUS, W3C and InterpolarityNow are aiming in the right direction. The demise of LISA has left a gap for a new Organisation, let alone new Standards, to emerge. The industry desperately lacks globally accepted Standards and Quality Measurement is only one of the affected areas.
TAUS’s recent release of the Dynamic Quality Framework, which was covered in great details in an article by Sharon O’Brien, lecturer at Dublin City University, aims at addressing the issues in existing Models by offering an array of new modular standards which can selected and adjusted to the needs of a particular Program. It tackles the growing needs for evaluating the quality of MT outputs, the changing nature of what linguistic quality means in modern content like Gaming, and offers an independent platform with dedicated tools.
Standardisation work is also in progress within the context of the World Wide Web Consortium where the MultilingualWeb-LT Working Group are finalising the Internationalization Tag Set 2.0. ITS 2.0 provides among other things, protocols to embed Localization Quality ratings within web file formats. These are used by participating companies such as Okapi and VistaTEC in the development of standalone editing tools like the VistaTEC’s Reviewer’s Workbench, where reviewers can add and edit these tags. The metadata can then be aggregated, for example by crawlers, to build automatic and scalable reports.
There is definitely a need to improve the automation around translation quality measurement. The proliferation of TMS’s works against standardisation in general. The focus of TMS’s tends to be their clients’ needs: connection to Content Management System, business analysis, macro reporting. There is often a disconnect with the requirements of the linguists, which are assumed to be fulfilled by the CAT tool. It seems then that the Quality Evaluation data must be embedded either in the bilingual files (e.g. XLIFF) or project packages via some open-source xml format (is .scx for Score Card eXchange already trade marked?). Quality Evaluation data must be efficiently transferred from the offline translation environment to an online Quality Center, whether this is via the TMS or not, as suggested in our end-to-end workflow at the top of this article.
- Online QA Models Survey (localizationlocalisation.wordpress.com)