Computational Approaches to Automatic Poetry Generation and Evaluation: A Survey
Main Article Content
Abstract
This survey provides a comprehensive synthesis of research on automatic poetry generation and evaluation from 2017 to 2025. We examine computational approaches that leverage pre-trained LLMs, multimodal architectures, and specialized algorithms for handling poetic constraints such as meter, rhyme, and stanza structure. In addition to surveying generative methods, we analyze practices in data engineering, including corpus construction, annotation, and preprocessing tools tailored to poetry. Evaluation receives particular attention: we review automatic metrics, LLM-as-a-judge methods, and human-centered protocols, discussing their strengths and limitations. Compared with prior surveys, our work emphasizes (1) the dominant role of LLMs in both generation and evaluation, (2) a taxonomy of poetry generation tasks categorized by interaction modality, (3) systematic coverage of dataset engineering challenges, and (4) a comprehensive analysis of automatic and human evaluation approaches, highlighting their drawbacks. By consolidating advances across diverse research lines, we show how poetry serves as a challenging benchmark for controllable text generation, multimodal grounding, and human-aligned evaluation. Building on this perspective, the survey summarizes current methods and open challenges in the generation, control, and evaluation of poetic and lyrical text.