Abstract
We present a new strategy for gap estimation in randomized algorithms for multiarmed bandits and combine it with the EXP3++ algorithm of Seldin and Slivkins (2014). In the stochastic regime the strategy reduces dependence of regret on a time horizon from $(ln t)^3$ to $(ln t)^2$ and eliminates an additive factor of order $\Delta e^{\Delta^2}$, where $\Delta$ is the minimal gap of a problem instance. In the adversarial regime regret guarantee remains unchanged.
| Originalsprog | Engelsk |
|---|---|
| Titel | Proceedings of Conference on Learning Theory, 7-10 July 2017, Amsterdam, Netherlands |
| Redaktører | Satyen Kale, Ohad Shamir |
| Forlag | Proceedings of Machine Learning Research |
| Publikationsdato | 2017 |
| Sider | 1743-1759 |
| Status | Udgivet - 2017 |
| Begivenhed | The 30th Annual Conference on Learning Theory (COLT) - Amsterdam, Holland Varighed: 7 jul. 2017 → 10 jul. 2017 Konferencens nummer: 30 http://www.learningtheory.org/colt2017/ |
Konference
| Konference | The 30th Annual Conference on Learning Theory (COLT) |
|---|---|
| Nummer | 30 |
| Land/Område | Holland |
| By | Amsterdam |
| Periode | 07/07/2017 → 10/07/2017 |
| Internetadresse |
| Navn | Proceedings of Machine Learning Research |
|---|---|
| Vol/bind | 65 |
| ISSN | 1938-7228 |
Citationsformater
- APA
- Standard
- Harvard
- Vancouver
- Author
- BIBTEX
- RIS