Josh Hansen: "Umpire 0.5 progress: I've been hard at work...."

Josh Hansen @me@joshhansen.tech

Hacker-poet. Learn Something cofounder & CTO. Creator of Seattle Poetry Meetup. To me everything is craft; everything is creativity.

Jack of all trades, but master of some. Typescript, Rust, Kotlin, Python—collector of programming languages since age 11. Deep neural networks, reinforcement learning, AI if you must....

Project concepts and progress; kind-hearted smackdowns; quesadilla technique if you're lucky.

Github: https://github.com/joshhansen

LinkedIn: https://www.linkedin.com/in/hansen-josh

Josh Hansen

@me@joshhansen.tech

Umpire 0.5 progress: I've been hard at work. Networking support is very nearly done - I have successfully initiated a multiplayer game over the Internet, though it remains too slow. But I think all notable refactors are behind me, just a few methods that need to be re-implemented to take advantage of local caching of observations and minimize round-trips.

In addition, I've revived and updated the AI training infrastructure, and have a preliminary AlphaGo Zero style player algorithm trained, and a good library of self-play data to build on.

What that means (in case you need a reminder) is that we have simple AIs (random baselines, in fact) play games against each other, and track who wins. Then a supervised neural network model is trained to model the probability of victory given the game state and the action taken. It's similar to a Q-learning state action model, but instead of the "soft" (and somewhat arbitrary) supervision of a reward function, the "hard" supervision of wins and losses is used exclusively.

Of course, I don't have Google's budget, so I'm not sure how far I'll get with this. But it's something I've wanted to try, and now that I have 24GB of VRAM to throw at it I thought I'd see what I can do.