Budget-Constrained Learning and Optimization with Bandit Feedback, Semih Cayci; IDS2 seminar series
From Oluwasanmi Koyejo on 10/9/2020
…Read more Less…
Abstract: In numerous decision-making processes such as algorithm portfolio design and adaptive routing, each action incurs a random cost and yields a random and potentially correlated reward, where the process continues until the total cost exceeds a resource constraint. Motivated by these applications, we first investigate the general budgeted bandit problem in which the controller aims to maximize the total reward subject to a budget constraint on the total cost. We show that logarithmic regret is achievable for unbounded cost and reward under mild moment conditions. In particular, we propose robust learning algorithms that incorporate linear estimators to extract and exploit the correlation between the cost and reward of an arm for optimal sample complexity. We prove that these algorithms achieve tight regret bounds, which are optimal up to a constant factor in the Gaussian case. In the second part, we focus on the time-constrained bandit problem, and allow the controller to interrupt an ongoing task and forgo its reward for a potentially faster alternative. We show that this controlled interrupt mechanism improves the total reward linearly over time for a broad class of completion time distributions. We propose learning algorithms that learn the optimal arm and interrupt strategy with order-optimal regret. We show that these learning algorithms provide efficient solutions for boosting the time-efficiency of stochastic local search methods in various important applications that require early stopping, such as the k-SAT problem.