Integrating Rules and Dictionaries from Shallow-Transfer Machine Translation into Phrase-Based Statistical Machine Translation
Main Article Content
We describe a hybridisation strategy whose objective is to integrate linguistic resources from shallow-transfer rule-based machine translation (RBMT) into phrase-based statistical machine translation (PBSMT). It basically consists of enriching the phrase table of a PBSMT system with bilingual phrase pairs matching transfer rules and dictionary entries from a shallow-transfer RBMT system. This new strategy takes advantage of how the linguistic resources are used by the RBMT system to segment the source-language sentences to be translated, and overcomes the limitations of existing hybrid approaches that treat the RBMT systems as a black box. Experimental results confirm that our approach delivers translations of higher quality than existing ones, and that it is specially useful when the parallel corpus available for training the SMT system is small or when translating out-of-domain texts that are well covered by the RBMT dictionaries. A combination of this approach with a recently proposed unsupervised shallow-transfer rule inference algorithm results in a significantly greater translation quality than that of a baseline PBSMT; in this case, the only hand-crafted resource used are the dictionaries commonly used in RBMT. Moreover, the translation quality achieved by the hybrid system built with automatically inferred rules is similar to that obtained by those built with hand-crafted rules.