Annotator Rationales for Labeling Tasks in Crowdsourcing

Mucahid Kutlu; Tyler McDonnell; Tamer Elsayed; Matthew Lease

doi:10.1613/jair.1.12012

PDF

Published: Sep 23, 2020

DOI: https://doi.org/10.1613/jair.1.12012

Mucahid Kutlu

TOBB University of Economics and Technology

Tyler McDonnell

Tamer Elsayed

Qatar University

Matthew Lease

University of Texas at Austin

Abstract

When collecting item ratings from human judges, it can be difficult to measure and enforce data quality due to task subjectivity and lack of transparency into how judges make each rating decision. To address this, we investigate asking judges to provide a specific form of rationale supporting each rating decision. We evaluate this approach on an information retrieval task in which human judges rate the relevance of Web pages for different search topics. Cost-benefit analysis over 10,000 judgments collected on Amazon’s Mechanical Turk suggests a win-win. Firstly, rationales yield a multitude of benefits: more reliable judgments, greater transparency for evaluating both human raters and their judgments, reduced need for expert gold, the opportunity for dual-supervision from ratings and rationales, and added value from the rationales themselves. Secondly, once experienced in the task, crowd workers provide rationales with almost no increase in task completion time. Consequently, we can realize the above benefits with minimal additional cost.

Issue

Vol. 69 (2020)

Section

Articles

Article Sidebar

Main Article Content

Abstract

Article Details