Topics

Various games

Various games

Rogue

We study many intresting properties in reinforcement learning in Rogue, including limited observation, exploration in maze, survival, and generalization.

Catan

Catan is a multi-player imperfect-information game, involving negotiation among players.

Ceramic

Azul is a multi-player perfect-information game.

Half Field Offense

Half Field Offense

Reinforcement Learning in Half Field Offense

Half field offense RoboCup 2D Soccer (HFO) is a subtask in RoboCup simulated soccer. Reinforcement learning in HFO is a challenging research topic. We developed hierarchical advantage and present HA-PPO to properly handle parameterized actions (e.g., turn (action) 30 degrees (parameter)) in PPO. The successful goal rate on one v.s. one task (agent v.s. the built-in goal keeper) is about 71% that is the best achieved by reinfoncement learning agents.

Reinforcement learning in games

In reinforcement learning (RL), agents improve their ability through interaction with an environment. The goal of RL in games is to make good agents without giving domain knowledge. AlphaZero is such an application of RL into games.

Domains

Exploration

DEIR

AlphaZero

  • Nakayashiki, T. and Kaneko, T. “Maximum entropy reinforcement learning in two-player perfect information games,” IEEE SSCI, pp. 1-8. 2021 doi 10.1109/SSCI50451.2021.9659991

Option/skill

Atari

  • Zhu, H. and T. Kaneko “Residual Network for Deep Reinforcement Learning with Attention Mechanism,” J. Inf. Sci. Eng., Vol. 37, No. 3, pp. 517–533 (2021), DOI: 10.6688/JISE.20210537(3) .0002.
  • Hyunwoo, O. and T. Kaneko “Deep Recurrent Q-Network with Truncated History,” in IEEE Technologies and Applications of Artificial Intelligence, pp. 34–39 (2018), DOI: 10.1109/TAAI.2018. 00017.

note

for general introduction for RL studies, see e.g., openai’s documents

Go

Go

Computer Go.

Papers

  • Mandai, Y. and T. Kaneko “RankNet for evaluation functions of the game of Go,” ICGA Jour- nal, Vol. 41, No. 2, pp. 78–91 (2019), DOI: 10.3233/ICG-190108.
  • Evaluation of Game Tree Search Methods by Game Records Takeuchi, S.; Kaneko, T.; Yamaguchi, K.; IEEE Transactions on Computational Intelligence and AI in Games, 2 (4), 288 - 302, Dec. 2010.
  • H. Yoshimoto, K. Yoshizoe, T. Kaneko, A. Kishimoto, and K. Taura: Monte Carlo Go Has a Way to Go, Twenty-First National Conference on Artificial Intelligence (AAAI-06), pages 1070-1075, 2006

Migo

Game tree search

Game tree search

Shogi

Shogi

Learning algorithms

Playing styles

Denou-sen (2013)

GPS Shogi won against Miura 8-dan (now 9-dan) in April, 2013

(Japan Shogi Association)