Diffusion Model Based on Reverse Guidance of Regional Samples

Guan Yang; Zhiyong Zeng; Ren Duan

doi:10.1613/jair.1.18916

PDF

Published: Mar 17, 2026

DOI: https://doi.org/10.1613/jair.1.18916

Keywords:

Data Mining, machime learning, Synthetic data generation, unbalanced datasets

Guan Yang

a:1:{s:5:"en_US";s:42:"Yunnan University of Finance and Economics";}

https://orcid.org/0009-0002-8144-0472

Zhiyong Zeng

Ren Duan

Abstract

When discussing the classification of imbalanced datasets, due to their distribution characteristics, the scarce minority class makes the traditional classification methods biased toward the majority class, reducing minority class recognition. This article mainly starts with the data-level method. It expands the sample size of the minority class by a generative model to improve the classification accuracy and reduce the misclassification cost. Based on the characteristics of the complex distribution of the minority class and the advantages of Diffusion Models, this article proposed a Local Regional Samples Guidance Denoising Diffusion Probabilistic Model (LReDDPM). The method first divides the sample types of the minority class, takes the gradient information of regional samples as the condition, and then uses the denoising diffusion probabilistic model to generate minority class examples. The generated minority class examples are added to the training set to expand the sample size, enriching the local sample density of the minority class. In addition, we explore diffusion models guided by gradients derived from samples in different regions. The experimental results demonstrate that examples generated by models guided by samples from different regions exhibit varying degrees of improvement in classification performance, with the most significant enhancement observed in the safety and boundary regions. It further indicates that the complex distribution of the minority class plays a crucial role in the classification results. We conduct experiments on ten datasets and compare our results with those of five methods to evaluate the superiority and effectiveness of LReDDPM’s method. The final experimental results show that the proposed method can significantly improve classification performance.

Issue

Vol. 85 (2026)

Section

Articles

Article Sidebar

Main Article Content

Abstract

Article Details