WIP: Effective Umpire AI

Generate state-action-reward triples en masse and directly model the q-value function. Add Monte Carlo Tree Search.

AlphaGo cost google around $35 million, it may be hard to reproduce that success. But, could I make it do something reasonable?

https://github.com/joshhansen/Umpire

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *