Belief reward shaping in reinforcement learning

Marom, O; Rosman, Benjamin S

dc.contributor.author	Marom, O
dc.contributor.author	Rosman, Benjamin S
dc.date.accessioned	2018-06-15T08:50:04Z
dc.date.available	2018-06-15T08:50:04Z
dc.date.issued	2018-02
dc.identifier.citation	Marom, O. and Rosman, B.S. 2018. Belief reward shaping in reinforcement learning. Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2-7 February 2018, Hilton New Orleans Riverside, New Orleans, Louisiana, USA	en_US
dc.identifier.uri	https://www.benjaminrosman.com/papers/aaai18.pdf
dc.identifier.uri	https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/16912/16598
dc.identifier.uri	http://hdl.handle.net/10204/10263
dc.description	Copyright: 2018 AAAI.	en_US
dc.description.abstract	A key challenge in many reinforcement learning problems is delayed rewards, which can significantly slow down learning. Although reward shaping has previously been introduced to accelerate learning by bootstrapping an agent with additional information, this can lead to problems with convergence. We present a novel Bayesian reward shaping framework that augments the reward distribution with prior beliefs that decay with experience. Formally, we prove that under suitable conditions a Markov decision process augmented with our framework is consistent with the optimal policy of the original MDP when using the Q-learning algorithm. However, in general our method integrates seamlessly with any reinforcement learning algorithm that learns a value or action-value function through experience. Experiments are run on a gridworld and a more complex backgammon domain that show that we can learn tasks significantly faster when we specify intuitive priors on the reward distribution.	en_US
dc.language.iso	en	en_US
dc.publisher	AAAI	en_US
dc.relation.ispartofseries	Worklist;20909
dc.subject	Reinforcement learning	en_US
dc.subject	Reward shaping	en_US
dc.title	Belief reward shaping in reinforcement learning	en_US
dc.type	Conference Presentation	en_US
dc.identifier.apacitation	Marom, O., & Rosman, B. S. (2018). Belief reward shaping in reinforcement learning. AAAI. http://hdl.handle.net/10204/10263	en_ZA
dc.identifier.chicagocitation	Marom, O, and Benjamin S Rosman. "Belief reward shaping in reinforcement learning." (2018): http://hdl.handle.net/10204/10263	en_ZA
dc.identifier.vancouvercitation	Marom O, Rosman BS, Belief reward shaping in reinforcement learning; AAAI; 2018. http://hdl.handle.net/10204/10263 .	en_ZA
dc.identifier.ris	TY - Conference Presentation AU - Marom, O AU - Rosman, Benjamin S AB - A key challenge in many reinforcement learning problems is delayed rewards, which can significantly slow down learning. Although reward shaping has previously been introduced to accelerate learning by bootstrapping an agent with additional information, this can lead to problems with convergence. We present a novel Bayesian reward shaping framework that augments the reward distribution with prior beliefs that decay with experience. Formally, we prove that under suitable conditions a Markov decision process augmented with our framework is consistent with the optimal policy of the original MDP when using the Q-learning algorithm. However, in general our method integrates seamlessly with any reinforcement learning algorithm that learns a value or action-value function through experience. Experiments are run on a gridworld and a more complex backgammon domain that show that we can learn tasks significantly faster when we specify intuitive priors on the reward distribution. DA - 2018-02 DB - ResearchSpace DP - CSIR KW - Reinforcement learning KW - Reward shaping LK - https://researchspace.csir.co.za PY - 2018 T1 - Belief reward shaping in reinforcement learning TI - Belief reward shaping in reinforcement learning UR - http://hdl.handle.net/10204/10263 ER -	en_ZA

Files in this item

Name: Marom_20909_2018.pdf

Size: 797.6Kb

Format: PDF

Description: Conference paper

View/Open

This item appears in the following Collection(s)

Conference Publications

Show simple item record

Browse

All of ResearchSpace
This Collection
- By Issue Date
- Authors
- Titles
- Subjects
- Publication Type
- Cluster
- Impact Area

Quick Links

Legislation and compliance

General Enquiries

Tel: + 27 12 841 2911
Email: callcentre@csir.co.za

Physical Address
Meiring Naudé Road
Brummeria
Pretoria
South Africa

Postal Address
PO Box 395
Pretoria 0001
South Africa

Social Connect

Resources on this site are free to download and reuse according to associated licensing provision. Please read the terms and conditions of usage of each resource.