A Method for Learning Macro-Actions for Virtual Characters Using Programming by Demonstration and Reinforcement Learning

Yunsick Sung and Kyungeun Cho
Volume: 8, No: 3, Page: 409 ~ 420, Year: 2012
10.3745/JIPS.2012.8.3.409
Keywords: Reinforcement Learning, Monte Carlo Method, Behavior Generation Model, Programming B y Demonstration, Macro-Action, Multi-Step Action
Full Text:

Abstract
The decision-making by agents in games is commonly based on reinforcement learning. To improve the quality of agents, it is necessary to solve the problems of the time and state space that are required for learning. Such problems can be solved by Macro-Actions, which are defined and executed by a sequence of primitive actions. In this line of research, the learning time is reduced by cutting down the number of policy decisions by agents. Macro-Actions were originally defined as combinations of the same primitive actions. Based on studies that showed the generation of Macro-Actions by learning, Macro-Actions are now thought to consist of diverse kinds of primitive actions. However an enormous amount of learning time and state space are required to generate Macro-Actions. To resolve these issues, we can apply insights from studies on the learning of tasks through Programming by Demonstration (PbD) to generate Macro- Actions that reduce the learning time and state space. In this paper, we propose a method to define and execute Macro-Actions. Macro-Actions are learned from a human subject via PbD and a policy is learned by reinforcement learning. In an experiment, the proposed method was applied to a car simulation to verify the scalability of the proposed method. Data was collected from the driving control of a human subject, and then the Macro- Actions that are required for running a car were generated. Furthermore, the policy that is necessary for driving on a track was learned. The acquisition of Macro-Actions by PbD reduced the driving time by about 16% compared to the case in which Macro-Actions were directly defined by a human subject. In addition, the learning time was also reduced by a faster convergence of the optimum policies.

Article Statistics
Multiple requests among the same broswer session are counted as one view (or download).
If you mouse over a chart, a box will show the data point's value.


Cite this article
IEEE Style
Y. S. K. Cho, "A Method for Learning Macro-Actions for Virtual Characters Using Programming by Demonstration and Reinforcement Learning," Journal of Information Processing Systems, vol. 8, no. 3, pp. 409~420, 2012. DOI: 10.3745/JIPS.2012.8.3.409.

ACM Style
Yunsick Sung and Kyungeun Cho. 2012. A Method for Learning Macro-Actions for Virtual Characters Using Programming by Demonstration and Reinforcement Learning, Journal of Information Processing Systems, 8, 3, (2012), 409~420. DOI: 10.3745/JIPS.2012.8.3.409.