Downes.ca ~ Stephen's Web ~ How reinforcement learning chooses the ads you see

How reinforcement learning chooses the ads you see

Ben Dickson, TechTalks, Feb 22, 2021
Commentary by Stephen Downes

I actually do whatever I can to avoid viewing advertisements, so I can't really speak to the effectiveness of ad optimization. But the term 'multi-armed bandit' has been appearing in literature on recommendation systems recently (which probably explains why it's also in this article) and so with similar technologies being used to support digital learning platforms this article offers some useful insight. The multi-armed bandit problem has been around since the 1950s, and describes the scenario in which a gambler has to choose which machine to play, where one machine has the highest payout percentage, but where the player doesn't know which one. A multi-armed bandit algorithm must choose between staying with the best-known option ('exploitation') or searching for a better one ('exploration'). Researchers might think of it as a sophistacted alternative to simple A-B testing. Image: Academic Gamer (good YouTube explainer).

Today: 6 Total: 92 [Direct link] [Share]

View full size