Reviewing Autoencoders for Missing Data Imputation: Technical Trends, Applications and Outcomes | Journal of Artificial Intelligence Research

PDF

Published: Dec 14, 2020

DOI: https://doi.org/10.1613/jair.1.12312

Keywords:

data mining, neural networks, machine learning

Ricardo Cardoso Pereira

University of Coimbra

https://orcid.org/0000-0003-1735-0771

Miriam Seoane Santos

University of Coimbra

Pedro Pereira Rodrigues

University of Porto

Pedro Henriques Abreu

University of Coimbra

Abstract

Missing data is a problem often found in real-world datasets and it can degrade the performance of most machine learning models. Several deep learning techniques have been used to address this issue, and one of them is the Autoencoder and its Denoising and Variational variants. These models are able to learn a representation of the data with missing values and generate plausible new ones to replace them. This study surveys the use of Autoencoders for the imputation of tabular data and considers 26 works published between 2014 and 2020. The analysis is mainly focused on discussing patterns and recommendations for the architecture, hyperparameters and training settings of the network, while providing a detailed discussion of the results obtained by Autoencoders when compared to other state-of-the-art methods, and of the data contexts where they have been applied. The conclusions include a set of recommendations for the technical settings of the network, and show that Denoising Autoencoders outperform their competitors, particularly the often used statistical methods.

Issue

Vol. 69 (2020)

Section

Articles

Article Sidebar

Main Article Content

Abstract

Article Details