A zeroth-order stochastic implicit method for bilevel-structured actor-critic schemes-Science China(Information Sciences)年期-手机知网

A zeroth-order stochastic implicit method for bilevel-structured actor-critic schemes

School of Automation, Beijing Institute of Technology;National Key Laboratory of Autonomous Intelligent Unmanned Systems, School of Automation,Beijing Institute of Technology;Beijing Institute of Technology Chongqing Innovation Center | Haochen TAO Shisheng CUI Zhuo LI Jian SUN

开通知网号

Reinforcement learning algorithms are central to the cognition and decision-making of embodied intelligent agents.A bilevel optimization (BO) modeling approach,along with a host of efficient BO algorithms,has been proven to be an effective means of addressing actor-critic (AC) policy optimization problems.In this work,based on a bilevelstructured AC problem model,an implicit zeroth-order stochastic algorithm is developed.A locally randomized spherical smoothing technique,which can be applied to nonsmooth nonconvex implicit AC formulations and avoid the closed-form lower-level mapping,is introduced.In the proposed zeroth-order scheme,the gradient of the implicit function can be approximated through inexact lower-level value estimations that are practically available.Under suitable assumptions,the algorithmic framework designed for the bilevel AC method is characterized by convergence guarantees under a fixed stepsize and smoothing parameter.Moreover,the proposed algorithm is equipped with the overall iteration complexity of ■.The convergence performance of the proposed algorithm is verified through numerical simulations.

机　构:

School of Automation, Beijing Institute of Technology；National Key Laboratory of Autonomous Intelligent Unmanned Systems, School of Automation,Beijing Institute of Technology；Beijing Institute of Technology Chongqing Innovation Center；

领　域:

自动化技术；

关键词:

actor-critic；bilevel optimization；zeroth-order algorithm；implicit programming；stochastic approximation；

0 5