Evaluating the Project
Q-Tank seems to work quite well, but it is not as interesting to watch as I had hoped. The robot takes about 60 steps to build up reasonable Q-Values, but after awhile it begins acting rationally and attempts to face the brightest light in the room. It's not perfect, though. The reality is that it has a hard time actually driving toward a light because the light reading doesn't tend to increase much when moving toward the light source. We could make the light sensor more light-sensitive. The Lego light sensor has the option of reading the raw value, which produces a range of 0 to 1023 (rather than 0 to 100), but this also causes problems. When it is this sensitive, someone shifting their position in the room is enough to change the ambient light value, which fools our robot into thinking it has moved (even though the robot might be hung up in a corner). I've monitored the raw light value of the sensor, and it jumps around all over the place, so using values of 0 to 100 acts to stabilize readings.
In a typical rectangular room, with many objects of different colors creating lots of local shadows and bright spots, it doesn't have as much of a chance to find the strongest light. If the room is all white with a half-sphere ceiling, and the light sensor pointed roughly at the ceiling, then it would be able to move toward the light much more easily. Also, if the brightest direction in the room produces a light value of 68, once it is facing that direction, it doesn't really have anywhere else to go. So the best thing for it to do is sit still and point at the light (which is one reason why I added a negative reinforcement if the light value remains static for more than five moves).
In terms of observing the behavior, Q-Tank would be easier to understand if there were two light sensors. The quick left-right movements that begin each step tend to obscure observations. Also, the movement left and right may throw off the robot because the start and end position may not match exactly, which could contaminate the sensor data the robot is receiving.
Ironically, the most powerful part of the Q-Learning equation, the ability to take into account the state it is delivered to, does not get much use with such a simple problem as light-seeking. The main problem is that Q-Tank is not given a very rich description of the state that it is in. Q-Tank is only given a few scant details from the light sensor. If you, as an intelligent person, were given such scant data about your surroundings, you wouldn't be able to do very interesting actions, either. The Q-Learning algorithm would really shine if the robot kept track of Cartesian coordinates (x, y, and direction). Then it would learn the best route in a particular room layout from any point in the room to a target area, such as a bright light. This would increase the number of states the Q-Table would need to keep track of, however.
Q-Learning is a wonderful technique. It allows a programmer to state the high-level goals of the robot without having to rely on assumptions about the environment. The model requires just two things on the part of the programmer: perceptions and actions. The overall program can be very generic, with no hard-coded behavior. The real challenge is to find a project worthy of the algorithm.
Download the source code.