Learning in structured MDPs with convex cost function: improved regret bounds for inventory management, Shipra Agrawal; iDS2

From Oluwasanmi Koyejo  

views comments