A Selective Under-Sampling (SUS) Method for Imbalanced Regression

Jovana Aleksic; Miguel García-Remesal

doi:10.1613/jair.1.16062

PDF

Published: Jan 14, 2025

DOI: https://doi.org/10.1613/jair.1.16062

Keywords:

Imbalanced data, Artificial Neural Networks, Generalized regression neural network, Data Mining

Jovana Aleksic

a:1:{s:5:"en_US";s:34:"Universidad Politécnica de Madrid";}

Miguel García-Remesal

Universidad Politécnica de Madrid, Spain

Abstract

Many mainstream machine learning approaches, such as neural networks, are not well suited to work with imbalanced data. Yet, this problem is frequently present in many real-world data sets. Collection methods are imperfect, and often not able to capture enough data in a specific range of the target variable. Furthermore, in certain tasks data is inherently imbalanced with many more normal events than edge cases. This problem is well studied within the classification context. However, only several methods have been proposed to deal with regression tasks. In addition, the proposed methods often do not yield good performance with high-dimensional data, while imbalanced high-dimensional regression has scarcely been explored. In this paper we present a selective under-sampling (SUS) algorithm for dealing with imbalanced regression and its iterative version SUSiter. We assessed this method on 15 regression data sets from different imbalanced domains, 5 synthetic high-dimensional imbalanced data sets and 2 more complex imbalanced age estimation image data sets. Our results suggest that SUS and SUSiter typically outperform other state-of-the-art techniques like SMOGN, or random under-sampling, when used with neural networks as learners.

Issue

Vol. 82 (2025)

Section

Articles

Article Sidebar

Main Article Content

Abstract

Article Details