Many systems like autonomous vehicle fleets and drone swarms can be modeled as Multi-Agent Reinforcement Learning (MARL) tasks, which deal with how multiple machines can learn to collaborate, coordinate, compete, and collectively learn. It’s been shown that machine learning algorithms — particularly reinforcement learning algorithms — are well-suited to MARL tasks. But it’s often challenging to efficiently scale them up to hundreds or even thousands of machines.
One solution is a technique called centralized training and decentralized execution (CTDE), which allows an algorithm to train using data from multiple machines but make predictions for each machine individually (e.g., like when a driverless car should turn left). QMIX is a popular algorithm that implements CTDE, and many research groups claim to have designed QMIX algorithms that perform well on difficult benchmarks. But a new paper claims that these algorithms’ improvements might only be the result of code optimizations or “tricks” rather than design innovations.