Abstract: In numerous decision-making processes such
as algorithm portfolio design and adaptive routing, each action incurs a random
cost and yields a random and potentially correlated reward, where the process
continues until the total cost exceeds a resource constraint. Motivated by
these applications, we first investigate the general budgeted bandit problem in
which the controller aims to maximize the total reward subject to a budget
constraint on the total cost. We show that logarithmic regret is achievable for
unbounded cost and reward under mild moment conditions. In particular, we
propose robust learning algorithms that incorporate linear estimators to
extract and exploit the correlation between the cost and reward of an arm for
optimal sample complexity. We prove that these algorithms achieve tight regret
bounds, which are optimal up to a constant factor in the Gaussian case. In the
second part, we focus on the time-constrained bandit problem, and allow the
controller to interrupt an ongoing task and forgo its reward for a potentially
faster alternative. We show that this controlled interrupt mechanism improves
the total reward linearly over time for a broad class of completion time
distributions. We propose learning algorithms that learn the optimal arm and
interrupt strategy with order-optimal regret. We show that these learning
algorithms provide efficient solutions for boosting the time-efficiency of
stochastic local search methods in various important applications that require
early stopping, such as the k-SAT problem.