Effective Umpire AI [10 Jun 2020]
Generate state-action-reward triples en masse and directly model the q-value function. Add Monte Carlo Tree Search.
AlphaGo cost google around $35 million, it may be hard to reproduce that success. But, could I make it do something reasonable?