Portfolio Allocation for Bayesian Optimization (2011). Mathew Hoffman, Eric Brochu, and Nando de Freitas.
Most of the literature seems to suggest UCB (or LCB) as the optimal acquisition function in many cases, yet this research proposes the use of a portfolio of acquisition functions governed by a multi-armed bandit strategy, as opposed to only using a single acquisition function.
- The authors suggest that there may be no single acquisition function that will perform best over an entire optimization.
- "This can be treated as a hierarchical multi-armed bandit problem, in which each of the N arms is itself an infinite-armed bandit problem".
- Three strategies are suggested, but Hedge is recommended. "Hedge is an algorithm which at each time step t selects an action i with probability p_t(i) based on the cumulative rewards (gain) for that action." A gain vector is then updated from these rewards.
- You can't necessarily compare convergence rates of the portfolio method directly to the single acquisition functions, since "decisions made at iteration t affect the state of the problem and the resulting rewards at all future iterations". The authors suggest an approach (Theorem 1) for setting bounds on the cumulative regret. These bounds are generated in relation to points proposed by UCB, and the authors suggest possible refinements to this theorem to take into account bounds of other acquisition functions in the portfolio.
- Are the improvements cited enough for us to notice a difference in applied cases?