Cross-domain Constituency Parsing by Leveraging Heterogeneous Data | Journal of Artificial Intelligence Research

PDF

Published: Dec 1, 2024

DOI: https://doi.org/10.1613/jair.1.15736

Keywords:

constituent parsing, dependency parsing, syntactic parsing, knowledge transfer

Peiming Guo

Meishan Zhang

Harbin Institute of Technology (Shenzhen), China

Yulong Chen

Westlake University

Jianling Li

Tianjin University

Min Zhang

Harbin Institute of Technology (Shenzhen), China

Yue Zhang

Westlake University

Abstract

Knowledge transfer is investigated in various natural language processing tasks except cross-domain constituency parsing. In this paper, we leverage heterogeneous data to transfer cross-domain and cross-task knowledge to constituency parsing. Concretely, we first select language modeling, named entity recognition, CCG supertagging and dependency parsing as auxiliary tasks and collect the corpora of these tasks covering various domains as cross-domain and cross-task heterogeneous data. Second, we exploit three types of prefixes: shared, task and domain prefix, to merge cross-domain and cross-task data and decompose the general, task and domain representation in the pretrained language model. Third, we convert the data formats of multi-source heterogeneous datasets and loss objectives of the auxiliary tasks into a consistent formalization closer to constituency parsing. Finally, we jointly train the model to transfer task and domain knowledge to cross-domain constituency parsing. We verify the effectiveness of our proposed model on five target domains of MCTB. Experimental results show that our knowledge transfer model outperforms various baseline models, including conventional chart-based and transition-based parsers and the current large-scale language model for zero-shot and few-shot settings.

Issue

Vol. 81 (2024)

Section

Articles

Article Sidebar

Main Article Content

Abstract

Article Details