Localization, Localisation

Practical and concise answers to common questions in G11N, I18N and L10N

Archive for the ‘Linguistic Review’ Category

Online QA Models: Untapped Potential

Posted by Nick Peris on February 5, 2013

End-to-end Quality Evaluation Integration

End-to-end Quality Evaluation Integration

I was quite enthusiastic the first time I came across Online Quality Models. The idea of Reviewers providing quality feedback in the Translation Management System (TMS) where the translations reside always seemed to make sense to me. It is essential to the Linguistic Review process that feedback becomes available to all participants in the supply chain, a part of the Global Reference Material and not a piece of individual criticism from one linguist to another.

Online quality evaluation also opens the way to more scalable reporting and meaningful quality trend monitoring.  If Translation Management Systems could offer a linguistic quality dashboard, there would be no need to periodically collect data, run macros and pivot tables to support translation quality management.

Adam Houser, AVP Intellectual Property & Localization at FM Global
Adam Houser, AVP Intellectual Property & Localization at FM Global

The reality however is that very few of the TMS setups I’ve come across use Online QA Models. In this post, we look at the reasons for this apparent failure and what might be the future of interactive quality evaluation.

Some of the comments and conclusions below are based on the results of a small survey we ran recently. Thanks to all who participated or sent screen caps and comments directly!

What are Quality Models?

Quality models are standards and metrics used to attempt to quantify linguistic quality. As we all know quality is somewhat subjective. Linguistic quality is possibly even more so. And everyone who has worked with translation review knows that there are times when linguists will just not be able to agree on a particular translation. The balancing act between grammatic accuracy and engaging style is a prime example in the area of Gaming or Social Media for instance.

In order to be able to evaluate translation quality, how it meets the customer’s requirements and very importantly how it evolves over time, it is necessary to assemble a predefined set of references and select a measuring scale.

In Localisation, the set of references is traditionally made up of a number of building blocks:

– the pre-existing translations: local websites, applications, video games, perhaps the competition’s content
the tools used by linguists to capture this content and ensure consistency (glossaries, translation memories etc.)
the client’s expectations, which can be captured in style guides, marketing instructions, end-user feedback
and the language rules: grammar, vocabulary, field of speech, trends.

Quality models are the scale used to quantify a translation’s success at meeting these requirements. They should not be confused with Quality Checks, which refer to more objective parameters such as punctuation rules, adherence to Glossary, in-line formatting (bold, URLs placement) and which can usually be run automatically. Quality models set the rules by which linguistic reviewers evaluate translations.

LISA Model  in SDL TMS

The LISA QA Model Online in SDL TMS

Two of the most recognised translation quality standards are the LISA model and SAE J2450. The first was developed by the now defunct Localization Industry Standards Association and uses 3 ratings against 39 categories and subcategories (see screenshot). The second comes from the Automotive industry and while it is often listed as an alternative to LISA, its 7 categories and 2 ratings are restrictive due to a lack of emphasis on Style and an over emphasis on accurracy.

A large proportion of Linguistic Review service providers uses a simplified, more manageable version of the LISA model. Typically, the Ratings are Critical, Major, Minor, and perhaps the much debated Preferential. The number of points associated which each varies, even between 2 content types for the same language team. Those ratings are often applied to 3 or 4 Categories  covering as Accuracy, Style, Language rules and Consistency with reference material.

What are Online QA Models?

Lionbridge Translation Workspace Online Review Module

Lionbridge Translation Workspace Online Review

Some translation tools, mostly the online translation interface of Translation Management Systems, offer a facility for reviewers to rate and categorise errors, usually at segment level.

In other words, instead of having to create a report, reviewers can enter their feedback directly in the online translation tool. Not only can they enter comments, there is an interface for them to select pre-defined error categories and severity levels in the very same interface where the translators translated.

Custom QA Model in SDL TMS

Rating and Categorising Errors in SDL TMS

The ratings and categories are completely customisable. The system administrators can set up projects to use out-of-the-box settings, such as the aforementioned LISA QA model.

They can also create a completely new set of categories and ratings and may in some cases assign a different amount of points to be deducted for certain errors depending on the content type. For example a high severity error in the Style category might fail a project immediately if it is high visibility content such as Marketing. But the same error category and rating might cause a much lower amount of points to be deducted for a content type where Style is less of a priority, for example Support content.

Another advantage to the reviewers is they no longer need to copy & paste source segments and existing translations to a spread sheet-based report or scorecard. They no longer need to copy & paste their suggested new version from that scorecard to the comments section, or indeed back to the target segment in the TMS – when they are in charge of implementation.

WorldServer Browser Workbench - Reporting an Error

Reporting Errors in WorldServer Browser Workbench

Instead the scorecard becomes a virtual concept within the online translation project, immediately available to the Translators, assuming the project gets back to them after review.

Failure to Launch

Unfortunately a variety of factors has prevented online QA Models from realising their full potential. Although our survey was too small for the figures to have real statistical meaning, a few of these factors clearly emerged in the replies.

User acceptance

User acceptance and Lack of Visibility during quality follow-up were the top 2 answers to the question What would you say are the biggest obstacles to deploying and using Online QA Models? with equal votes. Loss of Reviewer Productivity came close behind. Respondents were Client-side Managers, LSP Project Managers, Reviewers and TMS vendors in almost equal numbers, so it is significant that usability from the linguists point of view was seen as the main issue.

This is not really a surprise. Like Translators, Reviewers prefer to work offline because:

  1. they need a copy of their deliveries to use during quality follow-up, including after the online job has been submitted to the next step in the workflow or indeed completed.
  2. desktop CAT tools offer far more automatic quality checks and productivity tools. For example, WorldServer’s Browser Workbench doesn’t even have a Spell checker or Terminology consistency checker.
  3. limited internet bandwidth can make working online cumbersome for users who are geographically far from where the TMS is hosted.
  4. they may need a record of their work in case of escalations and for invoicing.
  5. finally, the User Interface for input quality data is always click-intensive and lacking in shortcuts.
Adam Houser, AVP Intellectual Property & Localization at FM Global

Adam Houser, AVP Intellectual Property & Localization at FM Global

Spread sheets are a resilient species

Scorecards are often the main deliverable for Review services. Reviewers don’t always implement changed, or indeed review complete jobs. Instead they take samples of significant size (1,000 words is often considered the standard). They might implement all changes, but only rate and categorise errors on samples.

Furthermore, the amount of review might differ depending of the level of confidence a translation team has achieved in the previous sampling period. Or different again, they might be measuring work which is only partly done through the TMS. The rest, for example DTP projects, might be translated completely offline.

The virtual scorecard has to offer at least the same level of flexibility to all participants in the program.

Reporting and Monitoring

WorldServer Admin - Creating a Quality Model

Creating a Custom QA Model in WorldServer

In most if not all cases, reporting is not available out-of-the-box in the TMS. It has to be customised, using skills such a SQL or InfoPath, which are less readily available to small and mid-size LSPs than the ability to create Excel macros. All Review Services run macros on batches of scorecards to deliver periodic reports detailing trend analysis, quality improvement plans etc.

Achieving the same results in a TMS report environment requires the initial support of the TMS Administrator (or even the TMS vendor themselves) and almost inevitably proves less achievable than the old reliable macro and pivot table approach.

How can we move past this status quo?

All these permutations and moving goal posts lead us to the conclusion that the Translation Management Systems are not necessarily the best home for a Quality Center.

Before it can gain momentum, Online Quality Evaluation must deliver on 3 levels:

  1. there must be sufficient Return-on-investment for all involved to input this information into a TMS: why add it, if it then has to be exported again before it can be reused?
  2. the Standards issue must be resolved with 1 set of flexible rules creating sufficient consensus within the industry, and overcoming the static nature often attributed to the LISA Model.
  3. the Technology must also offer a standard for Quality Evaluation data to be efficiently exchanged between online and offline tools
Liam Armstrong, Localization Manager at Symantec

Liam Armstrong, Localization Manager at Symantec

Regarding the Standards, the Localisation industry is notorious for its inability to agree on them, but several initiatives from TAUS, W3C and InterpolarityNow are aiming in the right direction. The demise of LISA has left a gap for a new Organisation, let alone new Standards, to emerge. The industry desperately lacks globally accepted Standards and Quality Measurement is only one of the affected areas.
TAUS’s recent release of the Dynamic Quality Framework, which was covered in great details in an article by Sharon O’Brien, lecturer at Dublin City University, aims at addressing the issues in existing Models by offering an array of new modular standards which can selected and adjusted to the needs of a particular Program. It tackles the growing needs for evaluating the quality of MT outputs, the changing nature of what linguistic quality means in modern content like Gaming, and offers an independent platform with dedicated tools.

ITS 2.0 implementation - Alpha Interface for Reviewer's Workbench

ITS 2.0 implementation – Prototype Interface for Reviewer’s Workbench

Standardisation work is also in progress within the context of the World Wide Web Consortium where the MultilingualWeb-LT Working Group are finalising the Internationalization Tag Set 2.0. ITS 2.0 provides among other things, protocols to embed Localization Quality ratings within web file formats. These are used by participating companies such as Okapi and VistaTEC in the development of standalone editing tools like the VistaTEC’s Reviewer’s Workbench, where reviewers can add and edit these tags. The metadata can then be aggregated, for example by crawlers, to build automatic and scalable reports.

There is definitely a need to improve the automation around translation quality measurement. The proliferation of TMS’s works against standardisation in general. The focus of TMS’s tends to be their clients’ needs: connection to Content Management System, business analysis, macro reporting. There is often a disconnect with the requirements of the linguists, which are assumed to be fulfilled by the CAT tool. It seems then that the Quality Evaluation data must be embedded either in the bilingual files (e.g. XLIFF) or project packages via some open-source xml format (is .scx for Score Card eXchange already trade marked?). Quality Evaluation data must be efficiently transferred from the offline translation environment to an online Quality Center, whether this is via the TMS or not, as suggested in our end-to-end workflow at the top of this article.

The final point I wanted to bring up is that the volumes of content translated which are exploding. This is progressively leading to a polarisation of translation projects between very high visibility content on one hand and very high volume on the other. Translators skills are pulled in the two corresponding directions of copywriting/transcreation and MT post-editing respectively. Quality Evaluation must respond to that by providing flixible Standards. This measuring scale we’ve been discussing, must be able to correctly rate the work of Translator A who is expected to free him or herself from the shackles of segment-level thinking and the rigourous matching of source content, in order to create localised games or ad campaigns that do not read like they were translated. But this same measuring scale must also fairly rate the work of Translator B (possibly the same person working on a different project), who must translate millions of words by the end of yesterday at the lowest cost possible because the target text will only ever be read once.

Posted in Linguistic Review, Quality Management | Tagged: , , , , , , | 2 Comments »

The Value of Professional Linguistic Review

Posted by Nick Peris on December 19, 2011

Basic Translation and Review Workflow

All Translators I know are consummate professionals, who take great pride in the quality of their work. They are well-used to using various sources of reference material to ensure they meet the expectations of their customers, and they systematically proof read their work before delivery. Most of them use CAT tools, which allow them to maximise consistency and partly automate quality control.

Translation agencies and Language Service Providers all offer what is known as TEP, Translating, Editing and Proofreading, as their most basic level of service. TEP provides a systematic Quality Assurance process, often involving several linguists with various levels of seniority.

And yet independent linguistic review services are one of the most dynamic sectors in our industry. This article explains why it is so successful and what you should take into consideration if you are ready to take this particular plunge.

Scalability

I am not always a strong supporter of outsourcing, but in the case of linguistic review there are compelling arguments in its favour.

Let’s first ask ourselves, who typically are the in-house reviewers? Two of the most common categories are linguists on one hand, and in-country Marketing and Brand staff on the other. It can be difficult for a company which purchases translation services to keep dedicated linguists in full-time employment. Product releases are often seasonal, or at least vary in pace from one month to the next, and the associated translation requirements follow the development cycles. By opposition, it may be difficult for in-country staff who are not linguists to commit to Localisation schedules. Review is a secondary task for them and they cannot drop everything else when review activity peaks. Moreover, they are unlikely to have the tools and skills a professional linguist employs.

A third-party linguistic review partner can provide the best of both worlds:

Translation, Linguistic and SME Review Workflow

  • in-country linguists who will become familiar with your international and local brand identity,
  • dedicated resources who can develop expertise based on your existing content
  • flexible workload to meet your peaks in translation activity
  • staff working on multiple accounts so they are easily redeployed when you do not need them full-time.

Sectors like the Life Science or the heavy vehicle industries also even require SME’s (Subject Matter Expert) as an alternative or even additional Review step to ensure your translations are not only of the highest quality in linguistic terms, but technically and legally accurate.

Error categorisation

Professional review services use customisable error categorisation. Often based on the LISA model, they are used to classify errors and better decide corrective and preventative actions.

Here are a few examples of categories and possible actions:

  • Terminology
    • Ensure Glossaries are used
    • Review the Terminology maintenance process (new Terms should be proposed continuously, approved periodically)
    • Root out the use of local copies by providing a Portal
    • Use a tool to automate Terminology checks
  • Style
    • Ensure Style guides are used
    • Review Style guides periodically (once or twice a year)
    • Root out the use of local copies by providing a Portal
    • Put in place a system to advertise Style guide updates
  • Consistency
    • Provide access to Global TMs for Concordance search
    • Provide a searchable linguistic query management tool (please see section on Query Management below)
    • Encourage communications between linguists during the translation process
  • Accuracy
    • Agree on a linguistic references
    • Improve translators proofreading process
    • Use tools to automate grammar or spell checks

Error Ratings

Measuring quality requires clearly defined and pre-agreed criteria, independence of the rator and historic data analysis so judgments can be made according to trends and not just levels.

Like for categorisation, error rating is often based on industry standard classifications like the LISA QA Model. The reviewer inputs the rating for each error found. This is mostly reported using QA report spread sheets but can also be fully integrated in Workflow technology such as WorldServer or SDL TMS. Each rating is associated with a number of points which is often deducted from a starting score of 100%.

A score can then be calculated for a project, job or sample. A Pass/Fail rate can even be decided in advance, with the Fails prompting for different levels of corrective actions, especially if they are repeated.

Reviewer Implemention WorkflowCorrective actions

Implementation may be the responsibility of the Translator or the Reviewer. Letting the Translators implement the changes, ensures they are aware of every change recommended by the Reviewer. On another hand, allowing the Reviewer to implement their own changes speeds up the overall process because the translation does not have to “change hands” again before it is delivered.

Whatever the choice is, a solid arbitration process must be in place. Translators must have an opportunity to discuss the Reviewer’s recommendations but it is advisable to set in advance the number of times this feedback loop is allowed to happen on a particular project, or the schedule will be affected by excessive discussions.

In the case of repeated concerns with one language or one set of Translators an escalation of the corrective actions may be needed. This may take the shape of closer collaboration between Translators and Reviewers, detailed training and improvement plans. Change in personnel or similar sanctions can occur as a last resort.

The proactive approach

Reviewers can bring a great amount of value to a translation process by taking part during the translation process rather than only afterwards. Think of it a prevention instead of cure.

Query Management

An efficient Query process promotes communication between Reviewers and Translators, and enables the Translators to consult with the Reviewer during the translation process. The aim is to avoid their having to make decisions which may or may not be approved during Review. The challenge in setting this up is that the Reviewer’s work becomes more difficult to measure and price. However, the use of a Query database should allow linguists to research previously answered Queries and compensate Reviewers based on the number of Queries answered.Integrated Query Management and Sampling Workflow

A slightly different process needs to be setup for Source Queries. Answering those questions about the source text, may be an area where your in-country Brand and Marketing staff as well as content creators and other stakeholders remain involved with the Translation supply chain. Ideally this should happen through the same Query database as Linguistic Queries.

Linguistic Asset Management

Reviewers may also be the ideal people to have the responsibility for maintaining Linguistic Assets such as Glossaries, Translation Memories or Style guides.

While Translators are the first linguists to get exposed to new content, the Reviewers should have a more global overview of your content, particularly if you use more than one LSP. A suggestion process is required for Translators to request new Terminology, Global changes in legacy translations or standardisation through Style guide updates. But the Reviewers are likely to be the only ones who can coordinate feedback from multiple sources. Professional Reviewers are experienced Translators and they often double-up as Terminologists.

For this to be succesful, it is essential to have a central repository where all involved can access the latest version of each piece of reference material at any time. This can be a Translation Management System or a separate repository like SharePoint, eRoom etc. It should prevent the use of local copies as much as possible, and an email notification system can be used to advertise updates at least for the more stable elements like the Style guides.

The update process may also need to be scheduled with clear cut-off and update publication dates if failure to comply results in errors  measurable during Review.

Cost effectiveness

Reviewers are usually experienced Translators and the hourly cost of a Reviewer can be substantially higher than that of a Translator.

This is easily offset by the value they bring if the process is setup correctly, even if you don’t move from a setup where review was done by in-house staff.

Professional review will lower the volume and therefore cost of error fixing. It will increase the quality and consistency of your content, and reenforce in-country brand integrity.

In more mature translation chains, the ratings are sometimes used to target languages where full review is required versus those where sampling might be enough because quality has been observed to be consistently high. In such cases, the make-up of the Reviewers role should transition to less review work and more production support activity through Query and Asset Management.

Posted in Linguistic Review, Quality Management | Tagged: , , , , , , , , | 8 Comments »