In this video, a rotary inverted pendulum learns a balancing strategy only through trial-and-error, using reinforcement learning. A few selected stages of learning are shown, since it was doing it for a few days non-stop!
The swing-up is achieved by using simple energy-based control, balancing is learned using the Q(lambda)-learning algorithm, and when it manages to balance the pendulum for 1 minute, the swing-down is accelerated by the inverse of the swing-up controller (moving the motor in a way which would decrease the kinetic+potential energy of the pendulum).