Learning future representation with synthetic observations for sample-efficient reinforcement learning-Science China(Information Sciences)年期-手机知网

Learning future representation with synthetic observations for sample-efficient reinforcement learning

State Key Laboratory of Multimodal Artificial Intelligence Systems,Institute of Automation,Chinese Academy of Sciences;School of Artificial Intelligence,University of Chinese Academy of Sciences | Xin LIU Yaran CHEN Haoran LI Dongbin ZHAO

开通知网号

Image-based reinforcement learning(RL) has proven its effectiveness for continuous visual control of embodied agents, where upstream representation learning largely determines the effectiveness of policy learning. Employing selfsupervised auxiliary tasks allows the agent to enhance visual representation in a targeted manner, thereby improving the policy performance and the RL sample efficiency. Prior advanced self-supervised RL methods all try to design better auxiliary objectives to extract more information from agent experience, while ignoring the training data constraints caused by experience limitations in RL training. In this article, we first try to break through this auxiliary training data constraint,proposing a novel RL auxiliary task named learning future representation with synthetic observations(LFS), which improves the self-supervised RL by enriching auxiliary training data. Firstly, a novel training-free method, named frame mask, is proposed to synthesize novel observations that may contain future information. Next, the latent nearest-neighbor clip(LNC)is correspondingly proposed to alleviate the impact of unqualified noise in synthetic observations. The remaining synthetic observations and real observations then together serve as the auxiliary training data to achieve a clustering-based temporal association task for advanced representation learning. LFS allows the agent to access and learn observations that are not present in the current experience but will appear in future training, thus enabling comprehensive visual understanding and an efficient RL process. In addition, LFS does not rely on rewards or actions, which means it has a wider scope of application(e.g., learning from video) than recent advanced RL auxiliary tasks. We conduct extensive experiments on challenging continuous visual control of complex embodied agents, including robot locomotion and manipulation. The results demonstrate that our LFS exhibits state-of-the-art sample efficiency on end-to-end RL tasks(leading on 12/13 tasks), and enables advanced RL visual pre-training(outperforming the next best method by 1.51×) on action-free video demonstrations.

机　构:

State Key Laboratory of Multimodal Artificial Intelligence Systems,Institute of Automation,Chinese Academy of Sciences；School of Artificial Intelligence,University of Chinese Academy of Sciences；

领　域:

计算机软件及计算机应用；自动化技术；

关键词:

deep reinforcement learning(DRL)；RL for embodied agents；RL for continuous control；image-based RL；self-supervised learning；RL visual pre-training；

0 5