TM match categories

Description

The CAT tool should be fully capable of handling the same classification system used by WorldServer, e.g. ICE Match, Repaired matches, etc.

Environment

None

Activity

Show:
Chase Tingley
January 13, 2017, 5:29 PM

We've dealt with this before, and my memory is that it's not really supported well in the Okapi XLIFF filter. You will need to check for the presence of specific markup specific to SDLXLIFF and also the older Idiom XLIFF (iws:), if you want to support both.

Phil Ritchie
February 20, 2017, 4:03 PM
Edited

There are two items of metadata which seem relevant: <iws:segment-metadata tm-score="76"><iws:segment-metadata> and <iws:status match-quality="fuzzy"><iws:status>.

Ocelot already reads and renders <target state-qualifier="fuzzy-match"></target> attributes via a configuration in the rules.properties file where the colour defined there is applied to the segment labels.

Is your requirement to be able to "see" the exact match score (e.g. 76%) or is just the category of the macth enough (e.g. "fuzzy"). If just the category, is colouring the segment labels (see

) a satisfactory mechanism?

Francesco Pugliano
February 21, 2017, 6:35 PM

Phil, I would like translators to be able to:

1) If it's a fuzzy match see the percentage (e.g. 76%).
2) Follow the same color scheme as WorldServer (e.g. 100% Repaired Matches displayed by a stripped blue line).

Chase Tingley
February 21, 2017, 10:40 PM
Edited

I think the repair status can be determined from looking at the iws:is-repaired-match value in the alt-trans for the specific match. The rest will require handling additional iws:status/@match-quality values such as guaranteed.

If we did that we could probably expand the segment labelling mechanism that Phil mentioned to get the color-coding. Those colors are already customizable, although supporting a dotted line (which I think WS does?) would require extra work.

The other issue is that as with everything related to the iws: metadata, this only works for XLIFF generated from files that were filtered with the legacy WS filters. Files filtered with the newer FTS filters will generate SDLXLIFF and use a different set of flags for things like ICE matches.

The best place to support this metadata is probably in the Okapi XLIFF filter, because otherwise Ocelot will need to use regular expressions to scrape the data out of the skeleton.

Chase Tingley
May 12, 2017, 6:05 PM

I like Phil's suggestion of starting from the state-qualifier color-coding functionality that we already have.

However, I propose that we take it a step farther and turn it into a feature called "match quality". Ocelot will use various sources of information from XLIFF to assign a segment into one of several match categories:

  • No match

  • Fuzzy match

  • Exact match

  • Repaired exact match

  • Context match

  • ID match

  • MT match

Then, we can assign a segment to a category in various ways:

  • For generic XLIFF, via state-qualifier (when present)

  • For IWS XLIFF, based on the iws:status information

  • For SDLXLIFF, based on the sdl:seg information

Not categories are possible for all types of XLIFF.

Depending on match quality, we will assign a highlight color based on configuration data, just like we do now with state-qualifier.

Assignee

Marta Borriello

Reporter

Francesco Pugliano

Labels

None

Components

Priority

Major
Configure