Journal of Artificial Intelligence Research

The State of Computer Vision Research in Africa

Wed, 11 Sep 2024 00:00:00 -0700

Despite significant efforts to democratize artificial intelligence (AI), computer vision which is a sub-field of AI, still lags in Africa. A significant factor to this, is the limited access to computing resources, datasets, and collaborations. As a result, Africa’s contribution to top-tier publications in this field has only been 0.06% over the past decade. Towards improving the computer vision field and making it more accessible and inclusive, this study analyzes 63,000 Scopus-indexed computer vision publications from Africa. We utilize large language models to automatically parse their abstracts, to identify and categorize topics and datasets. This resulted in listing more than 100 African datasets. Our objective is to provide a comprehensive taxonomy of dataset categories to facilitate better understanding and utilization of these resources. We also analyze collaboration trends of researchers within and outside the continent. Additionally, we conduct a large-scale questionnaire among African computer vision researchers to identify the structural barriers they believe require urgent attention. In conclusion, our study offers a comprehensive overview of the current state of computer vision research in Africa, to empower marginalized communities to participate in the design and development of computer vision systems.

Towards Robust Offline-to-Online Reinforcement Learning via Uncertainty and Smoothness

Xiaoyu Wen, Xudong Yu, Rui Yang, Haoyuan Chen, Chenjia Bai, Zhen Wang — Wed, 13 Nov 2024 00:00:00 -0800

To obtain a near-optimal policy with fewer interactions in Reinforcement Learning (RL), a promising approach involves the combination of offline RL, which enhances sample efficiency by leveraging offline datasets, and online RL, which explores informative transitions by interacting with the environment. Offline-to-Online RL provides a paradigm for improving an offline-trained agent within limited online interactions. However, due to the significant distribution shift between online experiences and offline data, most offline RL algorithms suffer from performance drops and fail to achieve stable policy improvement in offline-to-online adaptation. To address this problem, we propose the Robust Offlineto-Online (RO2O) algorithm, designed to enhance offline policies through uncertainty and smoothness, and to mitigate the performance drop in online adaptation. Specifically, RO2O incorporates Q-ensemble for uncertainty penalty and adversarial samples for policy and value smoothness, which enable RO2O to maintain a consistent learning procedure in online adaptation without requiring special changes to the learning objective. Theoretical analyses in linear MDPs demonstrate that the uncertainty and smoothness lead to tighter optimality bound in offline-to-online against distribution shift. Experimental results illustrate the superiority of RO2O in facilitating stable offline-to-online learning and achieving significant improvement with limited online interactions.

Understanding What Affects the Generalization Gap in Visual Reinforcement Learning: Theory and Empirical Evidence

Jiafei Lyu, Le Wan, Xiu Li, Zongqing Lu — Wed, 11 Sep 2024 00:00:00 -0700

Recently, there are many efforts attempting to learn useful policies for continuous control in visual reinforcement learning (RL). In this scenario, it is important to learn a generalizable policy, as the testing environment may differ from the training environment, e.g., there exist distractors during deployment. Many practical algorithms are proposed to handle this problem. However, to the best of our knowledge, none of them provide a theoretical understanding of what affects the generalization gap and why their proposed methods work. In this paper, we bridge this issue by theoretically answering the key factors that contribute to the generalization gap when the testing environment has distractors. Our theories indicate that minimizing the representation distance between training and testing environments, which aligns with human intuition, is the most critical for the benefit of reducing the generalization gap. Our theoretical results are supported by the empirical evidence in the DMControl Generalization Benchmark (DMC-GB).

Expected 1.x Makespan-Optimal Multi-Agent Path Finding on Grid Graphs in Low Polynomial Time

Teng Guo, Jingjin Yu — Thu, 31 Oct 2024 00:00:00 -0700

Multi-Agent Path Finding (MAPF) is NP-hard to solve optimally, even on graphs, suggesting no polynomial-time algorithms can compute exact optimal solutions for them. This raises a natural question: How optimal can polynomial-time algorithms reach? Whereas algorithms for computing constant-factor optimal solutions have been developed, the constant factor is generally very large, limiting their application potential. In this work, among other breakthroughs, we propose the first low-polynomial-time MAPF algorithms delivering 1-1.5 (resp., 1-1.67) asymptotic makespan optimality guarantees for 2D (resp., 3D) grids for random instances at a very high 1/3 agent density, with high probability. Moreover, when regularly distributed obstacles are introduced, our methods experience no performance degradation. These methods generalize to support 100% agent density.

Regardless of the dimensionality and density, our high-quality methods are enabled by a unique hierarchical integration of two key building blocks. At the higher level, we apply the labeled Grid Rearrangement Algorithm (GRA), capable of performing efficient reconfiguration on grids through row/column shuffles. At the lower level, we devise novel methods that efficiently simulate row/column shuffles returned by GRA. Our implementations of GRA-based algorithms are highly effective in extensive numerical evaluations, demonstrating excellent scalability compared to other SOTA methods. For example, in 3D settings, GRA-based algorithms readily scale to grids with over 370,000 vertices and over 120,000 agents and consistently achieve conservative makespan optimality approaching 1.5, as predicted by our theoretical analysis.

Counting Complexity for Reasoning in Abstract Argumentation

Johannes K. Fichte, Markus Hecher, Arne Meier — Sun, 23 Jun 2024 00:00:00 -0700

In this paper, we consider counting and projected model counting of extensions in abstract argumentation for various semantics, including credulous reasoning. When asking for projected counts, we are interested in counting the number of extensions of a given argumentation framework, while multiple extensions that are identical when restricted to the projected arguments count as only one projected extension. We establish classical complexity results and parameterized complexity results when the problems are parameterized by the treewidth of the undirected argumentation graph. To obtain upper bounds for counting projected extensions, we introduce novel algorithms that exploit small treewidth of the undirected argumentation graph of the input instance by dynamic programming. Our algorithms run in double or triple exponential time in the treewidth, depending on the semantics under consideration. Finally, we establish lower bounds of bounded treewidth algorithms for counting extensions and projected extension under the exponential time hypothesis (ETH).

Digraph k-Coloring Games: New Algorithms and Experiments

Andrea D'Ascenzo, Mattia D'Emidio, Michele Flammini, Gianpiero Monaco — Mon, 23 Sep 2024 00:00:00 -0700

We study digraph k-coloring games where strategic agents are vertices of a digraph and arcs represent agents' mutual unidirectional conflicts/idiosyncrasies. Each agent can select, as strategy, one of k different colors, and her payoff in a given state (a k-coloring) is given by the number of outgoing neighbors with a color different from her one. Such games model lots of strategic real-world scenarios and are related to several fundamental classes of anti-coordination games. Unfortunately, the problem of understanding whether an instance of the game admits a pure Nash equilibrium (NE), i.e., a state where no agent can improve her payoff by changing strategy, is NP-complete. Thus, in this paper, we focus on algorithms to compute an approximate NE: informally, a coloring is an approximate γ-NE, for some γ ≥ 1, if no agent can improve her payoff, by changing strategy, by a multiplicative factor of γ.

Our contribution is manifold and of both theoretical and experimental nature. First, we characterize the hardness of finding pure and approximate equilibria in both general and special classes of digraphs. Second, we design and analyze three approximation algorithms with different theoretical guarantees on the approximation ratio, under different conditions; (i) algorithm APPROX-1 which computes, for any k ≥ 3, a Δ_o-NE for any n vertex graph having a maximum outdegree of Δ_o, in polynomial time; (ii) algorithm LLL-SPE, a randomized algorithm that, for any constant k ≥ 2, determines a γ-NE for some constant γ but only in digraphs whose minimum outdegree is sufficiently large, in polynomial time in expectation; (iii) algorithm APPROX-3 which, for any ε, computes a (1+ε)-NE by using O(log(n)/ε) colors, for any n-vertex digraph. Note that, the latter shows that a (1+ε)-NE exists and can be computed in polynomial time for k = O(log(n)).

Finally, to assess how proposed algorithms behave in the typical case, we complete our study with an extensive experimental evaluation showing that, while newly introduced algorithms achieve bounded worst case behavior, they generally perform poorly in practice. Motivated by such unsatisfactory performance, we shift our attention to the best-response paradigm, successfully applied to other classes of games, and design and experimentally evaluate it a heuristic based on such paradigm. Our experiments provide strong evidences of such approach outperforming, in terms of approximation and computational time, all other methods and hence identify it as the most suited candidate for practical usage. More remarkably, it is also able to compute exact, pure NE in the great majority of cases. This suggests that, while these games are known to not always possess a pure NE, such an equilibrium often exists and can be efficiently computed, even by a distributed uncoordinated interaction of the agents.

Uncertainty as a Fairness Measure

Selim Kuzucu, Jiaee Cheong, Hatice Gunes, Sinan Kalkan — Sun, 13 Oct 2024 00:00:00 -0700

Unfair predictions of machine learning (ML) models impede their broad acceptance in real-world settings. Tackling this arduous challenge first necessitates defining what it means for an ML model to be fair. This has been addressed by the ML community with various measures of fairness that depend on the prediction outcomes of the ML models, either at the group-level or the individual-level. These fairness measures are limited in that they utilize point predictions, neglecting their variances, or uncertainties, making them susceptible to noise, missingness and shifts in data. In this paper, we first show that a ML model may appear to be fair with existing point-based fairness measures but biased against a demographic group in terms of prediction uncertainties. Then, we introduce new fairness measures based on different types of uncertainties, namely, aleatoric uncertainty and epistemic uncertainty. We demonstrate on many datasets that (i) our uncertaintybased measures are complementary to existing measures of fairness, and (ii) they provide more insights about the underlying issues leading to bias.

A Unified Perspective on Value Backup and Exploration in Monte-Carlo Tree Search

Tuan Dam, Carlo D'Eramo, Jan Peters, Joni Pajarinen — Wed, 13 Nov 2024 00:00:00 -0800

Monte-Carlo Tree Search (MCTS) is a class of methods for solving complex decisionmaking problems through the synergy of Monte-Carlo planning and Reinforcement Learning (RL). The highly combinatorial nature of the problems commonly addressed by MCTS requires the use of efficient exploration strategies for navigating the planning tree and quickly convergent value backup methods. These crucial problems are particularly evident in recent advances that combine MCTS with deep neural networks for function approximation. In this work, we propose two methods for improving the convergence rate and exploration based on a newly introduced backup operator and entropy regularization. We provide strong theoretical guarantees to bound convergence rate, approximation error, and regret of our methods. Moreover, we introduce a mathematical framework based on the use of the α-divergence for backup and exploration in MCTS. We show that this theoretical formulation unifies different approaches, including our newly introduced ones, under the same mathematical framework, allowing to obtain different methods by simply changing the value of α. In practice, our unified perspective offers a flexible way to balance between exploration and exploitation by tuning the single α parameter according to the problem at hand. We validate our methods through a rigorous empirical study from basic toy problems to the complex Atari games, and including both MDP and POMDP problems.

The RL/LLM Taxonomy Tree: Reviewing Synergies Between Reinforcement Learning and Large Language Models

Mon, 26 Aug 2024 00:00:00 -0700

In this work, we review research studies that combine Reinforcement Learning (RL) and Large Language Models (LLMs), two areas that owe their momentum to the development of Deep Neural Networks (DNNs). We propose a novel taxonomy of three main classes based on the way that the two model types interact with each other. The first class, RL4LLM, includes studies where RL is leveraged to improve the performance of LLMs on tasks related to Natural Language Processing (NLP). RL4LLM is divided into two sub-categories depending on whether RL is used to directly fine-tune an existing LLM or to improve the prompt of the LLM. In the second class, LLM4RL, an LLM assists the training of an RL model that performs a task that is not inherently related to natural language. We further break down LLM4RL based on the component of the RL training framework that the LLM assists or replaces, namely reward shaping, goal generation, and policy function. Finally, in the third class, RL+LLM, an LLM and an RL agent are embedded in a common planning framework without either of them contributing to training or fine-tuning of the other. We further branch this class to distinguish between studies with and without natural language feedback. We use this taxonomy to explore the motivations behind the synergy of LLMs and RL and explain the reasons for its success, while pinpointing potential shortcomings and areas where further research is needed, as well as alternative methodologies that serve the same goal.

Unifying SAT-Based Approaches to Maximum Satisfiability Solving

Hannes Ihalainen, Jeremias Berg, Matti Järvisalo — Sun, 07 Jul 2024 00:00:00 -0700

Maximum satisfiability (MaxSAT), employing propositional logic as the declarative language of choice, has turned into a viable approach to solving NP-hard optimization problems arising from artificial intelligence and other real-world settings. A key contributing factor to the success of MaxSAT is the rise of increasingly effective exact solvers that are based on iterative calls to a Boolean satisfiability (SAT) solver. The three types of SAT-based MaxSAT solving approaches, each with its distinguishing features, implemented in current state-of-the-art MaxSAT solvers are the core-guided, the implicit hitting set (IHS), and the objective-bounding approaches. The objective-bounding approach is based on directly searching over the objective function range by iteratively querying a SAT solver if the MaxSAT instance at hand has a solution under different bounds on the objective. In contrast, both core-guided and IHS are so-called unsatisfiability-based approaches that employ a SAT solver as an unsatisfiable core extractor to determine sources of inconsistencies, but critically differ in how the found unsatisfiable cores are made use of towards finding a provably optimal solution. Furthermore, a variety of different algorithmic variants of the core-guided approach in particular have been proposed and implemented in solvers. It is well-acknowledged that each of the three approaches has its advantages and disadvantages, which is also witnessed by instance and problem-domain specific runtime performance differences (and at times similarities) of MaxSAT solvers implementing variants of the approaches. However, the questions of to what extent the approaches are fundamentally different and how the benefits of the individual methods could be combined in a single algorithmic approach are currently not fully understood. In this work, we approach these questions by developing UniMaxSAT, a general unifying algorithmic framework. Based on the recent notion of abstract cores, UniMaxSAT captures in general core-guided, IHS and objective-bounding computations. The framework offers a unified way of establishing quite generally the correctness of the current approaches. We illustrate this by formally showing that UniMaxSAT can simulate the computations of various algorithmic instantiations of the three types of MaxSAT solving approaches. Furthermore, UniMaxSAT can be instantiated in novel ways giving rise to new algorithmic variants of the approaches. We illustrate this aspect by developing a prototype implementation of an algorithmic variant for MaxSAT based on the framework.