ReZero: Boosting MCTS-based Algorithms by Just-in-Time and Speedy Reanalyze

发表会议及期刊：arXiv

Chunyu Xuan^{1 2} Yazhe Niu^{2 3} Yuan Pu² Shuai Hu³ Yu Liu^{2 3} Jing Yang¹

Abstract

MCTS-based algorithms, such as MuZero and its derivatives, have achieved widespread success in various decision-making domains. These algorithms employ the reanalyze process to enhance sample efficiency, albeit at the expense of significant wall-clock time consumption. To address this issue, we propose a general approach named ReZero to boost MCTS-based algorithms. Specifically, we propose a new scheme that simplifies data collecting and reanalyzing, which significantly reduces the search cost while guarantees the performance as well. Furthermore, to accelerate each search process, we conceive a method to reuse the subsequent information in the trajectory. The corresponding analysis conducted on the bandit model also provides auxiliary theoretical substantiation for our design. Experiments conducted on Atari environments and board games demonstrates that ReZero substantially improves training speed while maintaining high sample efficiency. The code is available as part of the LightZero benchmark at https://github.com/opendilab/LightZero.

comm@pjlab.org.cn

上海市徐汇区龙文路129号国际传媒港L1楼

沪ICP备2021009351号-1

科学研究

ReZero: Boosting MCTS-based Algorithms by Just-in-Time and Speedy Reanalyze

网站地图