dc.contributor.author |
Marom, O
|
|
dc.contributor.author |
Rosman, Benjamin S
|
|
dc.date.accessioned |
2018-06-15T08:50:04Z |
|
dc.date.available |
2018-06-15T08:50:04Z |
|
dc.date.issued |
2018-02 |
|
dc.identifier.citation |
Marom, O. and Rosman, B.S. 2018. Belief reward shaping in reinforcement learning. Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2-7 February 2018, Hilton New Orleans Riverside, New Orleans, Louisiana, USA |
en_US |
dc.identifier.uri |
https://www.benjaminrosman.com/papers/aaai18.pdf
|
|
dc.identifier.uri |
https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/16912/16598
|
|
dc.identifier.uri |
http://hdl.handle.net/10204/10263
|
|
dc.description |
Copyright: 2018 AAAI. |
en_US |
dc.description.abstract |
A key challenge in many reinforcement learning problems is delayed rewards, which can significantly slow down learning. Although reward shaping has previously been introduced to accelerate learning by bootstrapping an agent with additional information, this can lead to problems with convergence. We present a novel Bayesian reward shaping framework that augments the reward distribution with prior beliefs that decay with experience. Formally, we prove that under suitable conditions a Markov decision process augmented with our framework is consistent with the optimal policy of the original MDP when using the Q-learning algorithm. However, in general our method integrates seamlessly with any reinforcement learning algorithm that learns a value or action-value function through experience. Experiments are run on a gridworld and a more complex backgammon domain that show that we can learn tasks significantly faster when we specify intuitive priors on the reward distribution. |
en_US |
dc.language.iso |
en |
en_US |
dc.publisher |
AAAI |
en_US |
dc.relation.ispartofseries |
Worklist;20909 |
|
dc.subject |
Reinforcement learning |
en_US |
dc.subject |
Reward shaping |
en_US |
dc.title |
Belief reward shaping in reinforcement learning |
en_US |
dc.type |
Conference Presentation |
en_US |
dc.identifier.apacitation |
Marom, O., & Rosman, B. S. (2018). Belief reward shaping in reinforcement learning. AAAI. http://hdl.handle.net/10204/10263 |
en_ZA |
dc.identifier.chicagocitation |
Marom, O, and Benjamin S Rosman. "Belief reward shaping in reinforcement learning." (2018): http://hdl.handle.net/10204/10263 |
en_ZA |
dc.identifier.vancouvercitation |
Marom O, Rosman BS, Belief reward shaping in reinforcement learning; AAAI; 2018. http://hdl.handle.net/10204/10263 . |
en_ZA |
dc.identifier.ris |
TY - Conference Presentation
AU - Marom, O
AU - Rosman, Benjamin S
AB - A key challenge in many reinforcement learning problems is delayed rewards, which can significantly slow down learning. Although reward shaping has previously been introduced to accelerate learning by bootstrapping an agent with additional information, this can lead to problems with convergence. We present a novel Bayesian reward shaping framework that augments the reward distribution with prior beliefs that decay with experience. Formally, we prove that under suitable conditions a Markov decision process augmented with our framework is consistent with the optimal policy of the original MDP when using the Q-learning algorithm. However, in general our method integrates seamlessly with any reinforcement learning algorithm that learns a value or action-value function through experience. Experiments are run on a gridworld and a more complex backgammon domain that show that we can learn tasks significantly faster when we specify intuitive priors on the reward distribution.
DA - 2018-02
DB - ResearchSpace
DP - CSIR
KW - Reinforcement learning
KW - Reward shaping
LK - https://researchspace.csir.co.za
PY - 2018
T1 - Belief reward shaping in reinforcement learning
TI - Belief reward shaping in reinforcement learning
UR - http://hdl.handle.net/10204/10263
ER -
|
en_ZA |