«上一篇/Previous Article|本期目录/Table of Contents|下一篇/Next Article»

j.issn.1001-4616.2019.03.014]
点击复制

基于手牌拆分的“斗地主”蒙特卡洛树搜索()

分享到：

《南京师范大学学报》（自然科学版）[ISSN:1001-4616/CN:32-1239/N]

卷:: 第42卷
期数:: 2019年03期

页码:: 107-114

栏目:: ·全国机器学习会议论文专栏·

出版日期:: 2019-09-30

文章信息/Info

Title:: Monte Carlo Tree Search for“Dou Di Zhu”Based on Splitting

文章编号:: 1001-4616(2019)03-0107-08

作者:: 彭啟文; 王以松; 于小民; 刘满义; 徐方婧; 贵州大学计算机科学与技术学院,贵州贵阳 550025

Author(s):: Peng Qiwen; Wang Yisong; Yu Xiaoming; Liu Manyi; Xu Fangjing; School of Computer Science and Technology,Guizhou University,Guiyang 550025,China

关键词:: 斗地主; 计算机博弈; 强化学习; 蒙特卡洛树搜索

Keywords:: Dou Di Zhu; computer game; reinforcement learning; Monte Carlo tree search

分类号:: TP311

DOI:: 10.3969/j.issn.1001-4616.2019.03.014

文献标志码:: A

摘要:: “斗地主”是典型的多人合作非完全信息博弈,蒙特卡洛树搜索是求解博弈(围棋、国际象棋等)问题的重要工具. 本文首先提出基于“斗地主”规则的手牌拆分算法,通过选择较小拆分以解决其动作空间较大问题; 其次,通过蒙特卡洛抽样法,对“斗地主”非完全合作博弈进行不断抽样模拟,在满足一定预设条件后,选择收益最佳的节点作为本次最佳决策. 实验结果表明,基于手牌拆分的“斗地主”蒙特卡洛树搜索能较好地实现“斗地主”自动博弈.

Abstract:: “Dou Di Zhu”is a typical multiplayer cooperative game with incomplete information. Monte Carlo tree search is an important tool to solve game problems(Go,chess,etc.). Firstly,this paper proposes a hand splitting algorithm based on the rules of “Dou Di Zhu”,which solves the problem of large action space by choosing smaller splitting. Secondly,this paper adopts Monte Carlo sampling method to simulate the incomplete cooperative game of “Dou Di Zhu”. After satisfying certain preset conditions,this paper chooses the node with the best income as the best decision. The experi-mental results show that the Monte Carlo tree search based on hand splitting can realize the automatic game of“Dou Di Zhu”in a smart way.

参考文献/References:

[1] 张维迎. 博弈论与信息经济学[M]. 上海:上海人民出版社,2004.
[2]张加佳. 非完全信息机器博弈中风险及对手模型的研究[D]. 哈尔滨:哈尔滨工业大学,2015.
[3]VON N J,MORGENSTERN O. Theory of games and economic behavior[M]. Princton:Princeton University Press,1994.
[4]SHANNON C E. Programming a computer for playing chess[M]//Computer chess compendium. New York,USA:Springer,1988:2-13.
[5]ROIZEN I,PEARL J. A minimax algorithm better than alpha-beta?Yes and No[J]. Artificial intelligence,1983,21(1/2):199-220.
[6]FULLER S H,GASCHNIG J G,GILLOGLY J J. Analysis of the alpha-beta pruning algorithm[M]. USA:Carnegie-Mellon University,1973.
[7]GELLY S,SILVER D. Combining online and offline knowledge in UTC[C]//Proceedings of the 24th International Conference on Machine Learning. New York,USA:ACM,2007:273-280.
[8]CHASLOT G,BAKKES E,SZITA I. Monte-Carlo tree search:a new framework for game AI[J]. In Proceedings of AIIDEC-08,2008,4(2):216-217.
[9]SILVER D,HUANG A,MADDISON C J,et al. Mastering the game of Go with deep neural networks and tree search[J]. Nature,2016,529(7587):484-489.
[10]刘洋. 点格棋博弈中UCT算法的研究与实现[D]. 安徽:安徽大学,2016.
[11]SILVER D,HUBERT T,SCHRITTWIESER J,et al. A general reinforcement learning algorithm that masters chess,shogi,and Go through self-play[J]. Science,2018,362(6419):1140-1144.
[12]SILVER D,SCHRITTWIESER J,SIMONYAN K,et al. Mastering the game of Go without human knowledge[J]. Nature,2017,550(7676):354-359.
[13]BROWN N,SANDHOLM T. Superhuman AI for heads-up no-limit poker:Libratus beats top professionals[J]. Science,2018,359(6374):418-424.
[14]DARSE B,AARON D,JONATHAN S,et al. The challenge of poker[J]. Artificial intelligence,2002,134:201-240.
[15]STURTEVANT N. Current challenges in multi-player game search[C]//International Conference on Computers and Games. Berlin,Heidelberg:Springer,2004:285-300.
[16]GELLY S,SILVER D. Monte-Carlo tree search and rapid action value estimation in computer Go[J]. Artificial intelligence,2011,175(11):1856-1875.
[17]SCHADD F C. Monte-Carlo search techniques in the modern board game thurn and taxis[D]. Netherlands:Maastricht University,2009.
[18]COULOM R. Effcient selectivity and backup operators in Monte-Carlo tree search[C]//5th Int Conf Comput and Gamesn Turin. Italy:Natural Comput,2006:72-83.
[19]GELLY S,WANG Y. Exploration exploitation in Go:UCT for Monte-Carlo go[C]//NIPS:Neural Information Processing Systems Conference On-line trading of Exploration and Exploitation Workshop. Canada:ffhal-00115330f,2006.

备注/Memo

备注/Memo:: 收稿日期:2019-07-05.基金项目:国家自然科学基金联合基金重点项目(U1836205). 通讯联系人:王以松,博士,教授,博士生导师,研究方向:知识表示与推理、人工智能、机器学习. E-mail:yswang@gzu.edu.cn

常用功能

工具/Tools

统计/Statistics

摘要浏览/Viewed2232
全文下载/Downloads2951
评论/Comments

更新日期/Last Update: 2019-09-30