[1]彭啟文,王以松,于小民,等.基于手牌拆分的“斗地主”蒙特卡洛树搜索[J].南京师范大学学报(自然科学版),2019,42(03):107-114.[doi:10.3969/j.issn.1001-4616.2019.03.014]
 Peng Qiwen,Wang Yisong,Yu Xiaoming,et al.Monte Carlo Tree Search for“Dou Di Zhu”Based on Splitting[J].Journal of Nanjing Normal University(Natural Science Edition),2019,42(03):107-114.[doi:10.3969/j.issn.1001-4616.2019.03.014]
点击复制

基于手牌拆分的“斗地主”蒙特卡洛树搜索()
分享到:

《南京师范大学学报》(自然科学版)[ISSN:1001-4616/CN:32-1239/N]

卷:
第42卷
期数:
2019年03期
页码:
107-114
栏目:
·全国机器学习会议论文专栏·
出版日期:
2019-09-30

文章信息/Info

Title:
Monte Carlo Tree Search for“Dou Di Zhu”Based on Splitting
文章编号:
1001-4616(2019)03-0107-08
作者:
彭啟文王以松于小民刘满义徐方婧
贵州大学计算机科学与技术学院,贵州 贵阳 550025
Author(s):
Peng QiwenWang YisongYu XiaomingLiu ManyiXu Fangjing
School of Computer Science and Technology,Guizhou University,Guiyang 550025,China
关键词:
斗地主计算机博弈强化学习蒙特卡洛树搜索
Keywords:
Dou Di Zhucomputer gamereinforcement learningMonte Carlo tree search
分类号:
TP311
DOI:
10.3969/j.issn.1001-4616.2019.03.014
文献标志码:
A
摘要:
“斗地主”是典型的多人合作非完全信息博弈,蒙特卡洛树搜索是求解博弈(围棋、国际象棋等)问题的重要工具. 本文首先提出基于“斗地主”规则的手牌拆分算法,通过选择较小拆分以解决其动作空间较大问题; 其次,通过蒙特卡洛抽样法,对“斗地主”非完全合作博弈进行不断抽样模拟,在满足一定预设条件后,选择收益最佳的节点作为本次最佳决策. 实验结果表明,基于手牌拆分的“斗地主”蒙特卡洛树搜索能较好地实现“斗地主”自动博弈.
Abstract:
“Dou Di Zhu”is a typical multiplayer cooperative game with incomplete information. Monte Carlo tree search is an important tool to solve game problems(Go,chess,etc.). Firstly,this paper proposes a hand splitting algorithm based on the rules of “Dou Di Zhu”,which solves the problem of large action space by choosing smaller splitting. Secondly,this paper adopts Monte Carlo sampling method to simulate the incomplete cooperative game of “Dou Di Zhu”. After satisfying certain preset conditions,this paper chooses the node with the best income as the best decision. The experi-mental results show that the Monte Carlo tree search based on hand splitting can realize the automatic game of“Dou Di Zhu”in a smart way.

参考文献/References:

[1] 张维迎. 博弈论与信息经济学[M]. 上海:上海人民出版社,2004.
[2]张加佳. 非完全信息机器博弈中风险及对手模型的研究[D]. 哈尔滨:哈尔滨工业大学,2015.
[3]VON N J,MORGENSTERN O. Theory of games and economic behavior[M]. Princton:Princeton University Press,1994.
[4]SHANNON C E. Programming a computer for playing chess[M]//Computer chess compendium. New York,USA:Springer,1988:2-13.
[5]ROIZEN I,PEARL J. A minimax algorithm better than alpha-beta?Yes and No[J]. Artificial intelligence,1983,21(1/2):199-220.
[6]FULLER S H,GASCHNIG J G,GILLOGLY J J. Analysis of the alpha-beta pruning algorithm[M]. USA:Carnegie-Mellon University,1973.
[7]GELLY S,SILVER D. Combining online and offline knowledge in UTC[C]//Proceedings of the 24th International Conference on Machine Learning. New York,USA:ACM,2007:273-280.
[8]CHASLOT G,BAKKES E,SZITA I. Monte-Carlo tree search:a new framework for game AI[J]. In Proceedings of AIIDEC-08,2008,4(2):216-217.
[9]SILVER D,HUANG A,MADDISON C J,et al. Mastering the game of Go with deep neural networks and tree search[J]. Nature,2016,529(7587):484-489.
[10]刘洋. 点格棋博弈中UCT算法的研究与实现[D]. 安徽:安徽大学,2016.
[11]SILVER D,HUBERT T,SCHRITTWIESER J,et al. A general reinforcement learning algorithm that masters chess,shogi,and Go through self-play[J]. Science,2018,362(6419):1140-1144.
[12]SILVER D,SCHRITTWIESER J,SIMONYAN K,et al. Mastering the game of Go without human knowledge[J]. Nature,2017,550(7676):354-359.
[13]BROWN N,SANDHOLM T. Superhuman AI for heads-up no-limit poker:Libratus beats top professionals[J]. Science,2018,359(6374):418-424.
[14]DARSE B,AARON D,JONATHAN S,et al. The challenge of poker[J]. Artificial intelligence,2002,134:201-240.
[15]STURTEVANT N. Current challenges in multi-player game search[C]//International Conference on Computers and Games. Berlin,Heidelberg:Springer,2004:285-300.
[16]GELLY S,SILVER D. Monte-Carlo tree search and rapid action value estimation in computer Go[J]. Artificial intelligence,2011,175(11):1856-1875.
[17]SCHADD F C. Monte-Carlo search techniques in the modern board game thurn and taxis[D]. Netherlands:Maastricht University,2009.
[18]COULOM R. Effcient selectivity and backup operators in Monte-Carlo tree search[C]//5th Int Conf Comput and Gamesn Turin. Italy:Natural Comput,2006:72-83.
[19]GELLY S,WANG Y. Exploration exploitation in Go:UCT for Monte-Carlo go[C]//NIPS:Neural Information Processing Systems Conference On-line trading of Exploration and Exploitation Workshop. Canada:ffhal-00115330f,2006.

备注/Memo

备注/Memo:
收稿日期:2019-07-05.基金项目:国家自然科学基金联合基金重点项目(U1836205). 通讯联系人:王以松,博士,教授,博士生导师,研究方向:知识表示与推理、人工智能、机器学习. E-mail:yswang@gzu.edu.cn
更新日期/Last Update: 2019-09-30