2024-02

Going on vacation for ten days of an already short month meant this was never going to be the best month productivity-wise, but it’s nice to look back and still notice that some progress has been made.

Last month’s work introducing sample team scraping for data.pkmn.cc carried into this month much to my chagrin. The logic for fetching teams works 100% of the time on all of my computers but about 50% of the scheduled runs fail on GitHub for reasons I still don’t fully understand (no amount of low-level retrying or throttling seems to resolve it). Given that I have PTSD at this point from debugging anything GitHub Actions-related I’ve kind of been procrastinating resolving the flakiness issues, but inevitably I’ll be forced to sink some time into resolving this to be able to preserve the signal-to-noise in my notifications. At one point a GitHub notification might have meant someone had reviewed my code or responded to an issue but with the pkmn project it’s pretty much always something being broken which is somewhat demoralizing.

Also on the data.pkmn.cc front — API docs! I mentioned this in the State of the Union so it’s nice to be able to check that off the list. There are still potentially some more interesting new ideas for future work (e.g., extending sample team scraping to look for pokepast.es URLs across Smogon with iterative crawling on GitHub Actions or building a public replay database) but none that I plan to work on in the near future. I think the concept of ‘team canonicalization’ that now is publicly being showcased by the sample teams endpoint is worth elaborating on in more detail in a post here at some point — one more thing to add to the backlog!

One of the less interesting and thus less talked about pkmn projects, pkmn/eslint-config was also forced into a major release this month due to upstream changes. ESLint v8.53.0 caused all “stylistic” rules (a distinction that’s about as clear as mud, as evidenced by the migration commit) to silently stop working several months ago which made noticing this issue very subtle. I don’t know who thought making silent-breaking changes in a minor version was the play (as opposed to loudly breaking in a major version), but in a way, it feels par for the course for the JavaScript ecosystem.

The big focus this month was unsurprisingly still on writing. I extracted the shared code between pkmn.cc and pkmn.ai out to a common repository to also be able to use it for a personal blog that will serve as a better home for more general programming content. Perhaps I’m paranoid or have just internalized Haskell’s “avoid success at all costs” motto, but I don’t really want general attention on the pkmn projects at this time. Having an extra degree of separation between anything that has the slightest chance of drawing broad interest makes me more comfortable. While I don’t know when I’ll write the posts I have planned for the external site, at the very least the refactoring was beneficial because I was doing a lot of copying to keep the two pkmn sister sites roughly in sync.

The initial version of the pkmn.ai glossary along with about 60% of the pkmn.ai projects summaries have been completed. The summaries are meant to serve as a way to quickly glean relevant information about each project with pointers on where to dive in. These are different from the paper abstracts which are geared towards academics and usually focus on the competitive Pokémon agent as merely a testbed for a particular algorithm or approach. On pkmn.ai the focus is on the implementation and specifics of the competitive Pokémon agent itself. The summaries that have been written so far might look very different post-editing — like with development part of the process of research involves figuring out what’s important as you go — but right now the goal is just about getting words on the page.

My writing process has improved while working on these various sites. For example, on the projects page, I:

This is… involved. It takes perhaps 1-2 hours minimum if I’m able to dedicate focused time to it, and is pretty tiring so it’s hard to do more than a handful of summaries on a good day. In many cases, I’ve already read the paper or skimmed the codebase sometime in the past after initially discovering the project so things are simplified when I then go back with a highlighter and a fine-tooth comb and make my second pass. Why bother? I’ve wanted to do this for a long time, it’s something that interests me, and it helps me solidify my ideas or think of new avenues of research. The biggest reason though is to eventually be able to write a high-level meta-analysis of the field.

My ideas aren’t fully formed here yet (I still have 40% of projects to cover!), but I think my two big takeaways at this point are that there are probably reasons to be optimistic about future agents achieving much higher levels of performance — a lot of the attempts thus far haven’t been very serious, simulator speed has continuously been cited as a bottleneck, few projects have done to work to grind out improvements after implementing the most basic versions of their approaches — and people are trying to skip steps which are probably necessary. I don’t think Pokémon is going to fall easily to an off-the-shelf zero-knowledge solution unless someone wants to throw a couple million dollars at the problem; I think we’re going to need to do the same legwork the chess and poker fields have done to reason about the game and apply every technique and enhancement under the sun to improve our agents. Naively shoehorning half-assed facsimiles of cutting-edge research from other games in hopes of getting lucky is the technical equivalent of buying lottery tickets.

One place where I am warming to cutting-edge research from other fields is the application of LLMs as a tool. Having initially dismissed them, I’ve been trying to incorporate ChatGPT and Gemini more and more as I get better at figuring out what they can and can’t do consistently well. With the glossary, I would occasionally ask it where commas should go in definitions I had written that felt awkward or I would probe it to compare and contrast concepts that I wasn’t sure I had fully grasped. With projects, I found it was useful for translating code or papers written in Portuguese or Chinese, or most impressively, explaining the architecture of a neural network when given the code. Finding areas where hallucinations and misinformation aren’t that costly due to them being trivially detectable by even an uninformed questioner seems to be the sweet spot for my own personal LLM use at this time, though I’m more bullish on LLM interfaces that have less friction than typing into a chat box.

The current (evershifting) “frontier” of immediate interests and work directions includes finishing the pkmn.ai projects page’s summaries (and possibly the overview) as well as updating the glossary with terms that have come up after continuing to expand the site. I’d like to pull out the RandomPlayerRandomPlayer and MaxDamagePlayerMaxDamagePlayer definitions to a new “Baselines” article that expands on them in greater detail and also covers a proposed BestMatchupPlayerBestMatchupPlayer baseline that I feel is a logical progression that’s currently absent. My work on these baselines has also led me to find more subtle issues with Pokémon Showdown’s protocol that I’ll need to account for in @pkmn/client, integration tests, and in my eventual comprehensive Protocol explainer on pkmn.ai (Pokémon Showdown, the gift that just keeps giving).

The now comically delayed “5 Years In” post (importantly missing the February 19 anniversary of my first Pokémon Showdown commit) sits half-finished after several thousand words. It has been difficult to write due to its size and due to the reflection it’s forcing (though the latter part is the whole point…) but remains something I’d like to get done soon. Similarly, returning to strict protocol validation and addressing the open pkmn/ps issues around mods and formats hasn’t fallen off the radar, when I return to “regularly scheduled programming” these should be among the first things to get addressed. Recent Zig changes seem like they will make it possible to add back test parallelism, though I will probably not play around with that until back in the thick of engine work.

Along with an abstract frontier of potential work, I also tend to have a (sometimes overlapping) frontier of things I’m mulling over when my mind starts to wander. Currently, the two most common of these are figuring out ways to quantify and minimize variance and endgame tablebases. The former is interesting because figuring out ways to better account for variance means better results and more efficient training — improved ratings systems tailored for games with variance can result in a less frustrating experience for ladder players and things like AIVAT from computer poker can provide better objectivity when evaluating various AI approaches without needing to rely on the law of large numbers to save us. I’m even non-ironically considering how one might calculate and control for hax — something that harkens back to Smogon’s most historic April Fool’s Day prank. Endgame tablebases are perhaps even more interesting — basket’s work on perfect information solving feels on the verge of opening up a lot of exciting opportunities and figuring out how to efficiently encode and utilize hypothetical solutions to 1v1 perfect information match-ups in a 6v6 game of imperfect information Pokémon is a fascinating open problem.

pre