A Method for Learning Macro-Actions for Virtual Characters Using Programming by Demonstration and Reinforcement Learning


Yunsick Sung, Kyungeun Cho, Journal of Information Processing Systems Vol. 8, No. 3, pp. 409-420, Sep. 2012  

10.3745/JIPS.2012.8.3.409
Keywords: Reinforcement Learning, Monte Carlo Method, Behavior Generation Model, Programming B y Demonstration, Macro-Action, Multi-Step Action
Fulltext:

Abstract

The decision-making by agents in games is commonly based on reinforcement learning. To improve the quality of agents, it is necessary to solve the problems of the time and state space that are required for learning. Such problems can be solved by Macro-Actions, which are defined and executed by a sequence of primitive actions. In this line of research, the learning time is reduced by cutting down the number of policy decisions by agents. Macro-Actions were originally defined as combinations of the same primitive actions. Based on studies that showed the generation of Macro-Actions by learning, Macro-Actions are now thought to consist of diverse kinds of primitive actions. However an enormous amount of learning time and state space are required to generate Macro-Actions. To resolve these issues, we can apply insights from studies on the learning of tasks through Programming by Demonstration (PbD) to generate Macro- Actions that reduce the learning time and state space. In this paper, we propose a method to define and execute Macro-Actions. Macro-Actions are learned from a human subject via PbD and a policy is learned by reinforcement learning. In an experiment, the proposed method was applied to a car simulation to verify the scalability of the proposed method. Data was collected from the driving control of a human subject, and then the Macro- Actions that are required for running a car were generated. Furthermore, the policy that is necessary for driving on a track was learned. The acquisition of Macro-Actions by PbD reduced the driving time by about 16% compared to the case in which Macro-Actions were directly defined by a human subject. In addition, the learning time was also reduced by a faster convergence of the optimum policies.


Statistics
Show / Hide Statistics

Statistics (Cumulative Counts from November 1st, 2017)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.




Cite this article
[APA Style]
Sung, Y. & Cho, K. (2012). A Method for Learning Macro-Actions for Virtual Characters Using Programming by Demonstration and Reinforcement Learning. Journal of Information Processing Systems, 8(3), 409-420. DOI: 10.3745/JIPS.2012.8.3.409.

[IEEE Style]
Y. Sung and K. Cho, "A Method for Learning Macro-Actions for Virtual Characters Using Programming by Demonstration and Reinforcement Learning," Journal of Information Processing Systems, vol. 8, no. 3, pp. 409-420, 2012. DOI: 10.3745/JIPS.2012.8.3.409.

[ACM Style]
Yunsick Sung and Kyungeun Cho. 2012. A Method for Learning Macro-Actions for Virtual Characters Using Programming by Demonstration and Reinforcement Learning. Journal of Information Processing Systems, 8, 3, (2012), 409-420. DOI: 10.3745/JIPS.2012.8.3.409.