Bain, M., Sammut, C.: A framework for behavioral cloning. In: Muggleton, S.H., Furakawa, K., Michie, D. (eds.) Machine Intelligence, vol. 15, pp. 103–129. Oxford University Press (1995)
Google Scholar
Barto, A.G., Sutton, R.S., Anderson, C.W.: Neuronlike elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man, and Cybernetics 13, 835–846 (1983)
Google Scholar
Barto, A.G., Bradtke, S.J., Singh, S.: Learning to act using real-time dynamic programming. Artificial Intelligence 72(1), 81–138 (1995)
Google Scholar
Bellman, R.E.: Dynamic Programming. Princeton University Press, Princeton (1957)
MATH
Google Scholar
Bertsekas, D.P.: Dynamic Programming and Optimal Control, vol. 1, 2. Athena Scientific, Belmont (1995)
MATH
Google Scholar
Bertsekas, D.P., Tsitsiklis, J.: Neuro-Dynamic Programming. Athena Scientific, Belmont (1996)
MATH
Google Scholar
Bonet, B., Geffner, H.: Faster heuristic search algorithms for planning with uncertainty and full feedback. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), pp. 1233–1238 (2003a)
Google Scholar
Bonet, B., Geffner, H.: Labeled RTDP: Improving the convergence of real-time dynamic programming. In: Proceedings of the International Conference on Artificial Intelligence Planning Systems (ICAPS), pp. 12–21 (2003b)
Google Scholar
Boutilier, C.: Knowledge Representation for Stochastic Decision Processes. In: Veloso, M.M., Wooldridge, M.J. (eds.) Artificial Intelligence Today. LNCS (LNAI), vol. 1600, pp. 111–152. Springer, Heidelberg (1999)
Google Scholar
Boutilier, C., Dean, T., Hanks, S.: Decision theoretic planning: Structural assumptions and computational leverage. Journal of Artificial Intelligence Research 11, 1–94 (1999)
MathSciNet
MATH
Google Scholar
Brafman, R.I., Tennenholtz, M.: R-MAX - a general polynomial time algorithm for near-optimal reinforcement learning. Journal of Machine Learning Research (JMLR) 3, 213–231 (2002)
MathSciNet
Google Scholar
Dean, T., Kaelbling, L.P., Kirman, J., Nicholson, A.: Planning under time constraints in stochastic domains. Artificial Intelligence 76, 35–74 (1995)
Google Scholar
Dixon, K.R., Malak, M.J., Khosla, P.K.: Incorporating prior knowledge and previously learned information into reinforcement learning agents. Tech. rep., Institute for Complex Engineered Systems, Carnegie Mellon University (2000)
Google Scholar
Dorigo, M., Colombetti, M.: Robot Shaping: An Experiment in Behavior Engineering. The MIT Press, Cambridge (1997)
Google Scholar
Drescher, G.: Made-Up Minds: A Constructivist Approach to Artificial Intelligence. The MIT Press, Cambridge (1991)
MATH
Google Scholar
Ferguson, D., Stentz, A.: Focussed dynamic programming: Extensive comparative results. Tech. Rep. CMU-RI-TR-04-13, Robotics Institute, Carnegie Mellon University, Pittsburgh, Pennsylvania (2004)
Google Scholar
Främling, K.: Bi-memory model for guiding exploration by pre-existing knowledge. In: Driessens, K., Fern, A., van Otterlo, M. (eds.) Proceedings of the ICML-2005 Workshop on Rich Representations for Reinforcement Learning, pp. 21–26 (2005)
Google Scholar
Großmann, A.: Adaptive state-space quantisation and multi-task reinforcement learning using constructive neural networks. In: From Animals to Animats: Proceedings of The International Conference on Simulation of Adaptive Behavior (SAB), pp. 160–169 (2000)
Google Scholar
Hansen, E.A., Zilberstein, S.: LAO*: A heuristic search algorithm that finds solutions with loops. Artificial Intelligence 129, 35–62 (2001)
MathSciNet
MATH
Google Scholar
Howard, R.A.: Dynamic Programming and Markov Processes. The MIT Press, Cambridge (1960)
MATH
Google Scholar
Kaelbling, L.P.: Learning in Embedded Systems. The MIT Press, Cambridge (1993)
Google Scholar
Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: A survey. Journal of Artificial Intelligence Research 4, 237–285 (1996)
Google Scholar
Kearns, M., Singh, S.: Near-optimal reinforcement learning in polynomial time. In: Proceedings of the International Conference on Machine Learning (ICML) (1998)
Google Scholar
Koenig, S., Liu, Y.: The interaction of representations and planning objectives for decision-theoretic planning. Journal of Experimental and Theoretical Artificial Intelligence 14(4), 303–326 (2002)
MATH
Google Scholar
Konda, V., Tsitsiklis, J.: Actor-critic algorithms. SIAM Journal on Control and Optimization 42(4), 1143–1166 (2003)
MathSciNet
MATH
Google Scholar
Konidaris, G.: A framework for transfer in reinforcement learning. In: ICML-2006 Workshop on Structural Knowledge Transfer for Machine Learning (2006)
Google Scholar
Kushmerick, N., Hanks, S., Weld, D.S.: An algorithm for probabilistic planning. Artificial Intelligence 76(1-2), 239–286 (1995)
Google Scholar
Littman, M.L., Dean, T., Kaelbling, L.P.: On the complexity of solving Markov decision problems. In: Proceedings of the National Conference on Artificial Intelligence (AAAI), pp. 394–402 (1995)
Google Scholar
Mahadevan, S.: Average reward reinforcement learning: Foundations, algorithms, and empirical results. Machine Learning 22, 159–195 (1996)
Google Scholar
Maloof, M.A.: Incremental rule learning with partial instance memory for changing concepts. In: Proceedings of the International Joint Conference on Neural Networks, pp. 2764–2769 (2003)
Google Scholar
Mataric, M.J.: Reward functions for accelerated learning. In: Proceedings of the International Conference on Machine Learning (ICML), pp. 181–189 (1994)
Google Scholar
Matthews, W.H.: Mazes and Labyrinths: A General Account of their History and Developments. Longmans, Green and Co., London (1922); Mazes & Labyrinths: Their History & Development. Dover Publications, New York (reprinted in 1970)
Google Scholar
McMahan, H.B., Likhachev, M., Gordon, G.J.: Bounded real-time dynamic programming: RTDP with monotone upper bounds and performance guarantees. In: Proceedings of the International Conference on Machine Learning (ICML), pp. 569–576 (2005)
Google Scholar
Moore, A.W., Atkeson, C.G.: Prioritized sweeping: Reinforcement learning with less data and less time. Machine Learning 13(1), 103–130 (1993)
Google Scholar
Ng, A.Y., Harada, D., Russell, S.J.: Policy invariance under reward transformations: Theory and application to reward shaping. In: Proceedings of the International Conference on Machine Learning (ICML), pp. 278–287 (1999)
Google Scholar
Peng, J., Williams, R.J.: Incremental multi-step Q-learning. Machine Learning 22, 283–290 (1996)
Google Scholar
Puterman, M.L.: Markov Decision Processes—Discrete Stochastic Dynamic Programming. John Wiley & Sons, Inc., New York (1994)
MATH
Google Scholar
Puterman, M.L., Shin, M.C.: Modified policy iteration algorithms for discounted Markov decision processes. Management Science 24, 1127–1137 (1978)
MathSciNet
MATH
Google Scholar
Ratitch, B.: On characteristics of Markov decision processes and reinforcement learning in large domains. PhD thesis, The School of Computer Science, McGill University, Montreal (2005)
Google Scholar
Reynolds, S.I.: Reinforcement learning with exploration. PhD thesis, The School of Computer Science, The University of Birmingham, UK (2002)
Google Scholar
Rummery, G.A.: Problem solving with reinforcement learning. PhD thesis, Cambridge University, Engineering Department, Cambridge, England (1995)
Google Scholar
Rummery, G.A., Niranjan, M.: On-line Q-Learning using connectionist systems. Tech. Rep. CUED/F-INFENG/TR 166, Cambridge University, Engineering Department (1994)
Google Scholar
Russell, S.J., Norvig, P.: Artificial Intelligence: a Modern Approach, 2nd edn. Prentice Hall, New Jersey (2003)
Google Scholar
Schaeffer, J., Plaat, A.: Kasparov versus deep blue: The re-match. International Computer Chess Association Journal 20(2), 95–101 (1997)
Google Scholar
Schwartz, A.: A reinforcement learning method for maximizing undiscounted rewards. In: Proceedings of the International Conference on Machine Learning (ICML), pp. 298–305 (1993)
Google Scholar
Singh, S., Jaakkola, T., Littman, M., Szepesvari, C.: Convergence results for single-step on-policy reinforcement-learning algorithms. Machine Learning 38(3), 287–308 (2000)
MATH
Google Scholar
Sutton, R.S.: Learning to predict by the methods of temporal differences. Machine Learning 3, 9–44 (1988)
Google Scholar
Sutton, R.S.: Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In: Proceedings of the International Conference on Machine Learning (ICML), pp. 216–224 (1990)
Google Scholar
Sutton, R.S.: DYNA, an integrated architecture for learning, planning and reacting. In: Working Notes of the AAAI Spring Symposium on Integrated Intelligent Architectures, pp. 151–155 (1991a)
Google Scholar
Sutton, R.S.: Reinforcement learning architectures for animats. In: From Animals to Animats: Proceedings of The International Conference on Simulation of Adaptive Behavior (SAB), pp. 288–296 (1991b)
Google Scholar
Sutton, R.S.: Generalization in reinforcement learning: Successful examples using sparse coarse coding. In: Proceedings of the Neural Information Processing Conference (NIPS), pp. 1038–1044 (1996)
Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: an Introduction. The MIT Press, Cambridge (1998)
Google Scholar
Tash, J., Russell, S.J.: Control strategies for a stochastic planner. In: Proceedings of the National Conference on Artificial Intelligence (AAAI), pp. 1079–1085 (1994)
Google Scholar
Watkins, C.J.C.H.: Learning from delayed rewards. PhD thesis, King’s College, Cambridge, England (1989)
Google Scholar
Watkins, C.J.C.H., Dayan, P.: Q-learning. Machine Learning 8(3/4) (1992); Special Issue on Reinforcement Learning
Google Scholar
Wiering, M.A.: Explorations in efficient reinforcement learning. PhD thesis, Faculteit der Wiskunde, Informatica, Natuurkunde en Sterrenkunde, Universiteit van Amsterdam (1999)
Google Scholar
Wiering, M.A.: Model-based reinforcement learning in dynamic environments. Tech. Rep. UU-CS-2002-029, Institute of Information and Computing Sciences, University of Utrecht, The Netherlands (2002)
Google Scholar
Wiering, M.A.: QV(λ)-Learning: A new on-policy reinforcement learning algorithm. In: Proceedings of the 7th European Workshop on Reinforcement Learning (2005)
Google Scholar
Wiering, M.A., Schmidhuber, J.H.: Efficient model-based exploration. In: From Animals to Animats: Proceedings of The International Conference on Simulation of Adaptive Behavior (SAB), pp. 223–228 (1998a)
Google Scholar
Wiering, M.A., Schmidhuber, J.H.: Fast online Q(λ). Machine Learning 33(1), 105–115 (1998b)
MATH
Google Scholar
Winston, W.L.: Operations Research Applications and Algorithms, 2nd edn. Thomson Information/Publishing Group, Boston (1991)
MATH
Google Scholar
Witten, I.H.: An adaptive optimal controller for discrete-time markov environments. Information and Control 34, 286–295 (1977)
MathSciNet
MATH
Google Scholar