next up previous
Next: Introduction

Journal of Artificial Intelligence Research 9 (1998), pp. 247-293. Submitted 5/98; published 11/98
© 1998 AI Access Foundation and Morgan Kaufmann Publishers. All rights reserved.
Postscript and PDF versions of this document are available from here.

An Empirical Approach to Temporal Reference Resolution

Janyce M. Wiebe, Thomas P. O'Hara,
Thorsten Öhrström-Sandgren, and Kenneth J. McKeever

wiebe@cs.nmsu.edu, tomohara@cs.nmsu.edu,
sandgren@lucent.com, kmckeeve@redwood.dn.hac.com

Department of Computer Science and
Computing Research Laboratory
New Mexico State University
Las Cruces, NM 88003

Abstract:

Scheduling dialogs, during which people negotiate the times of appointments, are common in everyday life. This paper reports the results of an in-depth empirical investigation of resolving explicit temporal references in scheduling dialogs. There are four phases of this work: data annotation and evaluation, model development, system implementation and evaluation, and model evaluation and analysis. The system and model were developed primarily on one set of data, and then applied later to a much more complex data set, to assess the generalizability of the model for the task being performed. Many different types of empirical methods are applied to pinpoint the strengths and weaknesses of the approach. Detailed annotation instructions were developed and an intercoder reliability study was performed, showing that naive annotators can reliably perform the targeted annotations. A fully automatic system has been developed and evaluated on unseen test data, with good results on both data sets. We adopt a pure realization of a recency-based focus model to identify precisely when it is and is not adequate for the task being addressed. In addition to system results, an in-depth evaluation of the model itself is presented, based on detailed manual annotations. The results are that few errors occur specifically due to the model of focus being used, and the set of anaphoric relations defined in the model are low in ambiguity for both data sets.



 
next up previous
Next: Introduction