Josh Hansen: "We have liftoff! We're now beating the random..."

Josh Hansen @me@joshhansen.tech

Hacker-poet. Learn Something cofounder & CTO. Creator of Seattle Poetry Meetup. To me everything is craft; everything is creativity.

Jack of all trades, but master of some. Typescript, Rust, Kotlin, Python—collector of programming languages since age 11. Deep neural networks, reinforcement learning, AI if you must....

Project concepts and progress; kind-hearted smackdowns; quesadilla technique if you're lucky.

Github: https://github.com/joshhansen

LinkedIn: https://www.linkedin.com/in/hansen-josh

Josh Hansen

@me@joshhansen.tech

We have liftoff!

We're now beating the random baseline on a 10x10 board, with a greedy algorithm trained only on self-play data:

Evaluating umpire AIs:

r wins: 459

./ai/agz/6.agz wins: 541

Draws: 0

The model is a basic convolutional neural network based on the context surrounding the city or unit taking the next action. The weights serialize to 166KB, so easy to deploy with the game.

The key to training was to up the number of training instances - "The Unreasonable Effectiveness of Data", after all. With purely-random algorithms doing the self-play, this can run extremely fast.

The next obstacle will be the transport mechanic: the need for land units in Umpire to board a transport ship to transfer continents. The inability of air and sea units to capture cities means this mechanic must be understood at some level for an AI to operate on the full 180x90 map, where large stretches of ocean divide multiple land masses.

With purely-random players, the likelihood of this mechanic getting triggered and thus entering the training data seems fairly low. But if we throw enough training episodes at it, we'll see it eventually. This may necessitate multithreading the self-play code to run multiple games simultaneously, and optimizing the game engine for throughput.