This package provides a generic, simple and fast implementation ofDeepmind's AlphaZero algorithm:
Beyond its much publicized success in attaining superhuman level at gamessuch as Chess and Go, DeepMind's AlphaZero algorithm illustrates a moregeneral methodology of combining learning and search to explore largecombinatorial spaces effectively. We believe that this methodology canhave exciting applications in many different research areas.
Because AlphaZero is resource-hungry, successful open-sourceimplementations (such as Leela Zero)are written in low-level languages (such as C++) and optimized for highlydistributed computing environments.This makes them hardly accessible for students, researchers and hackers.
The motivation for this project is to provide an implementation ofAlphaZero that is simple enough to be widely accessible, while also beingsufficiently powerful and fast to enable meaningful experiments on limitedcomputing resources.We found the Julia language to be instrumental in achieving this goal.
To download AlphaZero.jl and start training a Connect Four agent, just run:
export GKSwstype=100 # To avoid an occasional GR bug
git clone https://github.com/jonathan-laurent/AlphaZero.jl.git
cd AlphaZero.jl
julia --project -e 'import Pkg; Pkg.instantiate()'
julia --project -e 'using AlphaZero; Scripts.train("connect-four")'
Each training iteration takes about one hour on a desktopcomputer with an Intel Core i5 9600K processor and an 8GB Nvidia RTX2070 GPU. We plot below the evolution of the win rate of our AlphaZero agent against two baselines (a vanilla MCTS baseline and a minmax agent that plans at depth 5 using a handcrafted heuristic):
Note that the AlphaZero agent is not exposed to the baselines during training andlearns purely from self-play, without any form of supervision or prior knowledge.
We also evaluate the performances of the neural network alone against the samebaselines. Instead of plugging it into MCTS, we play the action that isassigned the highest prior probability at each state:
Unsurprisingly, the network alone is initially unable to win a single game.However, it ends up significantly stronger than the minmax baseline despite notbeing able to perform any search.
For more information on training a Connect Four agent using AlphaZero.jl, see our full tutorial.
Contributions to AlphaZero.jl are most welcome. Many contribution ideas are available in our contribution guide.Please do not hesitate to open a Githubissue to shareany idea, feedback or suggestion.
If you want to support this project and help it gain visibility, please consider starringthe repository. Doing well on such metrics may also help us secure academic funding in thefuture. Also, if you use this software as part of your research, we would appreciate thatyou include the following citation in your paper.
This material is based upon work supported by the United States Air Force andDARPA under Contract No. FA9550-16-1-0288 and FA8750-18-C-0092.Any opinions, findings and conclusions or recommendations expressed in this material arethose of the author(s) and do not necessarily reflect the views of the United StatesAir Force and DARPA.