Machine learning / R News

Getting AI smarter with Q-learning: a simple first step in Python

by Sara · September 12, 2016

This article is originally published at http://firsttimeprogrammer.blogspot.com/

Yesterday I found an “old” script I wrote during a morning in the last semester. I remember being a little bored and interested in the concept of Q-learning. That was about the time Alpha-Go had beaten the world champion of Go and by reading here and there I found out that a bit of Q-learning mixed with deep learning might have been involved.

Indeed Q-learning seems an interesting concept, perhaps even more fascinating than traditional supervised machine learning since in this case the machine is basically learning from scratch how to perform a certain task in oder to optimize future rewards.

If you would like to read a, quote, “Painless Q-learning tutorial”, I suggest you to read the following explanation: A Painless Q-learning Tutorial. In this article the concept of Q-learning is explained through a simple example and a clear walk-through. After having read the article I decided to put into code the example shown. The example shows a maze through which the agent should go and find its way up to the goal stage (stage 5). Basically, the idea is to train an algorithm to find the optimal path, given an (often random) initial condition, in order to maximize a certain outcome. In this simple example, as you can see from the picture shown in the article, the possible choices are all known and the outcome of each choice is deterministic. A best path exists and can be found easily regardless of the initial condition. Furthemore, the maze is time invariant. These nice theoretical hypothesis are usually not true when dealing when real world problems and this makes using Q-learning hard in practice, even though the concept behind it is relatively simple.

Before taking a look at the code, I suggest to read the article mentioned above, where you will get familiar with the problem tackled below. I’m not going deep in explaining what is going on since the author of that article has already done a pretty good job doing it and my work is just a (probably horribly inefficient) translation in Python of the algorithm.

Given an initial condition of, say, state 2, the optimal sequence path is clearly 2 - 3 – 1 - 5. Let’s see if the algorithm finds it!

And sure enough there it is! Bear in mind this is a very simplified problem. At the same time though, keep in mind that Alpha Go is powered in part by a similar concept. I wonder what other application will come from Q-learning.

The concept of Q-learning is more vast than what I showed here, nevertheless I hope my post was an interesting insight.

Thanks for visiting r-craft.org
This article is originally published at http://firsttimeprogrammer.blogspot.com/
Please visit source website for post related comments.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Getting AI smarter with Q-learning: a simple first step in Python

You may also like...

Categories

Getting AI smarter with Q-learning: a simple first step in Python

You may also like...

Confessions of a Data Scientist: Why I quit social media and still cut my own grass

Tidy evaluation in ggplot2

R Weekly 2023-W09 ggiraph, forester, Excel vs R

Categories