Insights on Developing a Machine Translation Post-Editing Training Protocol
Vivetta Jean and Lucia Guerrero discuss the creation of an MTPE training protocol, the role of SIG, and the importance of collaboration in the translation industry.
File
Machine Translation Post-editing Training Protocol by the GALA MTPE Special Interest Group
Added on 09/27/2024
Speakers
add Add new speaker

Speaker 1: Hello everyone and welcome to the great MTP interview with Vivetta Jean who is Translation and Localization Industry Specialist at Intertranslations and Lucia Guerrero who is MT Specialist at CPSL. Welcome both of you. Let's start with the first question. So you guys together with the rest of the special interest group on machine translation post-editing at Gala have been working very hard on developing an MTPE training protocol. So tell us a little bit how the work was organized.

Speaker 2: Okay, so I start first. Machine translation post-editing special interest group is an initiative born after my webcast for Gala, the Management and Training Challenges of Machine Translation Post-Editing. And this revealed the need for training not only for post-editors but for all stakeholders, LSPs, academia and clients. So I'm really thankful to Gala for embracing this initiative and organizing the machine translation post-editing training special interest group and to Lucia Guerrero, my co-moderator, for believing in the vision of this SIG and working with me. It's been more than one year. So now in this scope, the SIG could be considered as a common intelligence workshop where we want all the voices to be heard and all the groups to be represented. To do so, we have appointed one representative for each group, post-editors, academia, clients and LSPs to moderate the groups and keep the minutes. We organize interviews with the members of the group to shed light on the industrial workflows and practices, share the perspectives of the post-editors and understand the challenges of the clients.

Speaker 1: A lot of work. Lucia, do you want to add anything? Yes.

Speaker 3: Apart from the monthly calls, we also have a place where we can continue this conversation and this place is Basecamp. Basecamp is an online platform where all the SIG members can connect to post questions, topics for discussions and also to comment on anyone else's topics. And Vivetta and I also upload there the presentations and the minutes and any other reference material related to the monthly calls such as academic papers. But you know, sometimes one hour and a half is not enough and unfortunately, we don't have the time to listen to everyone. So I think in this sense, Basecamp has proved to be really effective because it has helped us to bring new ideas and new points of view to the calls and it also creates this sense of community. So it's not just, let's meet once monthly and then let's forget about each other until next month. On the contrary, Basecamp helps us feel that we are part of a group and that we all collaborate towards the same objective. And I think it also adds continuity.

Speaker 2: Yeah, and with the contribution of all the members, our vision is by September to draft and produce a common machine translation post-editing training protocol, which actually will represent the current state of machine translation post-editing in the industry with the perspectives of all groups and will shape the machine translation post-editing service and training.

Speaker 1: Okay, well, we'll get to the protocol in a minute, of course, but first of all, let me play devil's advocate. And so machine translation is now integrated in OCAD tools in all the platforms and it can be used fairly easily. So my question would be, does it still make sense to talk about post-editing as a task that is completely different or fairly different from translation?

Speaker 2: Well, for the machine translation post-editing training SIG, post-editing is not a different task from translation. It is translation in the sense that it communicates the message of the text from the source language to the target language, from the author to the target audience. Post-editing is based on the translation process in terms of cognitive background and skills. However, it needs an extensive knowledge and use of technology and the upgrading and re-skilling of the linguists.

Speaker 3: Yeah, yeah, and I think based in our discussions on the SIG that maybe we can say that post-editing is not exactly a task which is different from translation, but maybe a type of translation task. I think in the past, especially with statistical machine translation, some people supported the idea that post-editing was just a matter of swapping words here and there. You know about that, Isabela, because I know you've been involved in a lot of things related to post-editing guidelines and so on. And, you know, some people thought that to post-edit you didn't need to be proficient in the source language. So I'm not supporting this. I'm just saying that it was a common conception before. But I think this idea has changed a lot, especially since the use of neural machine translation, which is prevalent now. And in the SIG, we've seen that now it is widely accepted that in order to post-edit, you really need to be proficient not only in your target language, but also in your source language. And this means that you need translation skills first, as Viveta said, plus other skills.

Speaker 1: Exactly. So no monolingual post-editing, basically. So you mentioned skills. So what kind of skills should be developed for post-editing?

Speaker 2: OK, let me just to start somehow, I read that post-editing is by itself a new skill. So the technicality of the task reveals the need of development of technical skills and some additional ones like problem solving and decision making. Based on our discussions, the special interest group experiencing CAD tools and translation of more than three years seems to work well for the profile of the post-editor in terms of performance, quality and speed.

Speaker 3: And then I would say that apart from these skills, we must also speak about competencies. So, for example, on the first place, instrumental competencies, which are those related to the tools or, for example, basic information about how the MT system works. And then there's also strategic competence, which is, as Viveta said, being able to make quick decisions such as which is going to be faster for me, fixing a machine translated segment or deleting it and translating from scratch such type of decisions. And finally, I mean, last but not least, attitudinal competence, because research has proved that a negative attitude towards machine translation can even influence the whole post-editing experience. And of course, this doesn't mean that we must accept bad quality MT, but it only means that if we start a post-editing task with a negative attitude, negative preconception, then it is more likely that we end up deleting most of the MT suggestions and translating from scratch, even if those proposals were appropriate or at least acceptable. And I think this not only applies to machine translation post-editing. I think that some people, some of us tend to apply much more changes than needed, even when we are reviewing human translation. And that that is an attitude, actually.

Speaker 1: Yeah, exactly. Maybe some more objectivity, both in post-editing and in a reviewing task would help, and maybe also clear requirements. So let's go through for a moment the current state of MT training. What is offered by academia and what is offered by, for example, language service providers? What are the differences, if there are differences?

Speaker 3: I'll take this one first. I think one of the main takeaways from our monthly calls is that we must accept that academia cannot absorb the whole burden of training post-editors. It is simply impossible, first, because the variety of use cases is huge and also because the amount and type of corrections depend on too many factors, such as the purpose of the translation, the target audience, and those define the quality requirements, which in turn define which aspects need to be fixed or not. So too many things to be taken into account by academia, we think. And in the SIG, we agreed that LSPs and customers, which are in fact the translation requesters, should also take the responsibility of training post-editors, preparing onboarding and training processes and feedback loops to add something else to the academia curriculum, which is the basis. For example, that would include giving information about customer specific needs and particular workflows.

Speaker 2: Yeah, and I would add here, based on our discussions in the group, we would say that mass translation post-editing training is now starting to make the first steps. For academia, it is included in some master programs of the universities. However, this depends on the country. As for some countries, still talking about translators using MT is not acceptable. So we also have individual trainers and training courses provided by LSPs and organizations like TAUS. The training provided by the academia focuses more on the theory of the machine translation, the errors, and not on the development of the specific skills that differentiate translation from post-editing. It seems like a gray area. The training of the LSPs focuses on the speed of the MT, with gaps in the translation with gaps in the definitions of the service and the how-to, sometimes being generic, not presenting the errors of the specific engine and not specifying the quality that is needed as this is specified by the customer. So what we are possibly missing in the training is the cooperation, the cooperation of the universities with LSPs to boost the practical aspect of the machine translation and post-editing, and one hand feed it with the aspects from the academic perspective. And on the other hand, balance it with the practical aspects of productivity, technology, and real life from the industrial perspective. So the cooperation of LSPs and universities with intercepts or other collaborative projects will also bring to front the role of the expert, the post-editor in this case, in the workflows and the importance of being able to handle the technology being available.

Speaker 1: Yeah, so that's pretty clear. More collaboration across the whole industry. A question about standards. So there is a post-editing standard, the ISO 18587, which will probably be called for review very soon, I think. Maybe?

Speaker 3: Actually, so soon that I just received the email announcing that it's just been called for a review. I think it came either this morning or yesterday because I'm an ISO expert member representing my company and my country. And, you know, it's perfect time to speak, to talk about that design.

Speaker 1: OK, excellent. So this ISO standard, a good number of LSPs are already certified for this ISO standard on machine translation post-editing. In what way has this standard helped the development of post-editing as a task or the development of post-editing training?

Speaker 3: Well, you know, this standard, the post-editing standard was published in 2017 and that was a time where Google Translate had just launched their NMT, the Neural Machine Translation Services, only a few months before than that. So I think we can see that the whole standard is more or less based on working with statistical machine translation. Right. And we know that things have changed a lot since then. I've also seen that there is a lot of talk going on about whether we should still talk about post-editing or not, if there should be a systematic review or a total revamp of the post-editing standard or even if we need it at all. So anything can happen at this point. But related to the SIG, you know, Vivetta and I are very familiar with this standard and other standards. And regardless of what happens to the post-editing standard, the truth is that, as you said, it's been there for a while and many companies have been certified with it. So we could not simply ignore it. And actually, it has inspired, of course, some of our discussions. So I remember, for example, when we defined the two profiles of the post-editors, the junior and the expert post-editor at the SIG, we took into account the requirements and recommendations from this standard. And we also added an additional session to discuss what post-editing is from a more academic point of view, because, you know, in the Gala SIG, LSPs are naturally more represented than other groups. And we had come up with a more industry-oriented definition.

Speaker 2: OK. And regarding the standard, this works as a frame for our work. Now, the question is, how much do we want to have it narrow or broad, this frame? And in the last course of the machine translation post-editing training, SIG, we discussed on the five W's and one H, who, what, where, why, when and how of machine translation post-editing. Yeah, we tried to reshape and redefine the service and have an agreed definition of the basic elements, not to somehow to eliminate some grey areas that, you know, are allowed by the standard. So based on the input and the voices heard, it seems that there is a great room for the machine translation post-editing standard to be reviewed as a service. And there is a long way to go ahead.

Speaker 1: OK, well, we'll come back to this then in a little while. Another hot topic that's usually a lightning rod for discussion, remuneration of post-editors. What can you tell us about that?

Speaker 3: Yeah, it's always a controversial topic, Isabela, but maybe precisely it's also the reason why we thought that we had to talk about it at the SIG. You know, in Spanish, we have this idiom which says, coger el toro por los cuernos,

Speaker 1: prender el toro por la corna.

Speaker 3: So, you know, it literally means to take the bull by the horns, exactly, which means to be brave and take control of a challenging situation. And that is why we tried to do with the post-editing compensation topic. So we invested one full session to the topic where we presented first some examples of real comments from post-editors found on social media. We also discussed about the aspects which can affect the MTB pricing, such as language pair and quality of the MT output, of course. And we also presented the compensation options that we can currently find in the market, such as pay per hour, pay per word or even based on effort. OK, how do they combine with a transition memories, et cetera. But I think it's important to note that in our SIG, we didn't promote any of them specifically, but instead we discussed about the pros and cons of each of them. And then in the discussion per groups, there was one thing that we all agreed with. This is very important. And it is that any compensation method chosen needs to be to be transparent and supported by data.

Speaker 2: Yeah, this is to say that it's not what is important, it's not the compensation method itself, but the values behind it that support it. And this is probably why we need a code of ethics for machine translation, post-editing. And this is going to be also the topic of the last call in the machine translation post-editing training SIG.

Speaker 1: OK, so that's another point we'll come back to later on. Final question for now. So this training protocol, when will it be ready and available?

Speaker 2: OK, this is really ambitious and we like ambitious things here, so we'll have the training protocol ready by the end of September 2022. And in the course of April, May and June, we that's the quintessence of the post-editing. And these are the calls which will shape the training protocol for machine translation post-editing. And of course, we invite all members to join and be part of this vision with their contributions and ideas.

Speaker 1: Well, I'm for one, I'm really looking forward to this training protocol. So thank you so much, Lucia and Vivetta, and we'll speak very soon as soon as the training protocol is live.

Speaker 2: Thank you. Thank you very much.

ai AI Insights
Summary

Generate a brief summary highlighting the main points of the transcript.

Generate
Title

Generate a concise and relevant title for the transcript based on the main themes and content discussed.

Generate
Keywords

Identify and highlight the key words or phrases most relevant to the content of the transcript.

Generate
Enter your query
Sentiments

Analyze the emotional tone of the transcript to determine whether the sentiment is positive, negative, or neutral.

Generate
Quizzes

Create interactive quizzes based on the content of the transcript to test comprehension or engage users.

Generate
{{ secondsToHumanTime(time) }}
Back
Forward
{{ Math.round(speed * 100) / 100 }}x
{{ secondsToHumanTime(duration) }}
close
New speaker
Add speaker
close
Edit speaker
Save changes
close
Share Transcript