This paper addresses the problem of online decision making in continually changing and complex environments, with inherent incompleteness in models of change. A fully general version of this problem is intractable but many interesting domains are rendered manageable by the fact that all instances of a task can be generated from a finite set of qualitatively meaningful contexts. We present an approach to online decision making that exploits this decomposability in a two part procedure. In a task independent exploratory process, our algorithm running on an autonomous agent learns the set of structural landmark contexts which compose its domain, and reduces this set through the use of the symmetry structure of permutation groups. To each reduced landmark we then associate a set of policies independent of global context. This enables an efficient online policy instantiation process that composes from already learnt policy templates. This is illustrated on a spatial navigation domain where the learning agent is shown to be able to play a pursuit-evasion game in random environments with unknown dynamic obstacles.
Reference:
Rosman, B.S. 2014. Context-based online policy instantiation for multiple tasks and changing environments. In: Pattern Recognition Association of South Africa (PRASA)/RobMech/6th Workshop on African Language Technology (AfLaT), Cape Town, 27-28 November 2014
Rosman, B. S. (2014). Context-based online policy instantiation for multiple tasks and changing environments. PRASA. http://hdl.handle.net/10204/7835
Rosman, Benjamin S. "Context-based online policy instantiation for multiple tasks and changing environments." (2014): http://hdl.handle.net/10204/7835
Rosman BS, Context-based online policy instantiation for multiple tasks and changing environments; PRASA; 2014. http://hdl.handle.net/10204/7835 .