Content-type: text/html Downes.ca ~ Stephen's Web ~ How reinforcement learning chooses the ads you see

Stephen Downes

Knowledge, Learning, Community

I actually do whatever I can to avoid viewing advertisements, so I can't really speak to the effectiveness of ad optimization. But the term 'multi-armed bandit' has been appearing in literature on recommendation systems recently (which probably explains why it's also in this article) and so with similar technologies being used to support digital learning platforms this article offers some useful insight. The multi-armed bandit problem has been around since the 1950s, and describes the scenario in which a gambler has to choose which machine to play, where one machine has the highest payout percentage, but where the player doesn't know which one. A multi-armed bandit algorithm must choose between staying with the best-known option ('exploitation') or searching for a better one ('exploration'). Researchers might think of it as a sophistacted alternative to simple A-B testing. Image: Academic Gamer (good YouTube explainer).

Today: 5 Total: 1771 [Direct link] [Share]


Stephen Downes Stephen Downes, Casselman, Canada
stephen@downes.ca

Copyright 2024
Last Updated: Nov 03, 2024 3:07 p.m.

Canadian Flag Creative Commons License.

Force:yes