We introduced P2P (Peer to Peer) mode in Leaderboard ranking in SmartBots Coding Challenge 2023. P2P mode uses the TrueSkill algorithm developed by Microsoft. The top 8 participants from P2P Leaderboard will move to the next “Super 8” round.

Before P2P, the participants' bots fight with Bhoos' bot in the leaderboard. They get a score based on the winnings and other metrics. The bots' performance is measured and ranked based on the scores. Playing only with "Bhoos bot" is not the best and reliable method. When the participants improve their bots, their scores will saturate. Scores with a few points difference can't reliably measure the performance difference. The best way is to compete participants' bots with each other. In "P2P mode", we measure bots' performance by fighting them against each other.

How is the skill calculated?

In P2P mode, the bot's skill is measured as a Normal Distribution. The mean (μ) of the distribution is its relative skill. The standard deviation (σ) is the certainty of its skill. The bot's skill is a measure of its performance. So in the P2P leaderboard, the bots are ranked according to their skill scores.

When a new bot enters the arena, the system is uncertain of its skill. The system has no clue of how the bot will perform. It gives a default value of μ=25 and σ=25/3. The new bot fights with 4 random bots.

For example, a new bot "A" fights an existing bot, "B", which has been in the leaderboard for a while. The system has a good idea of the relative skill of "B". Meanwhile, it assigns the default values to "A". The initial distributions could look like this:

New Bot "A" default skills compared to existing Bot "B"

In this condition, the distribution shows that  "B" is more likely to win than "A".

Let's assume that after a fight, "A" wins the game. Now  the system will adjust the skill scores for both the bots. The new distributions could look like this:

After bot "A" wins, the skill scores change for both bots
Disclaimer: The values in the images are not exactly computed. They are meant to represent in general what the process looks like. In the actual implementation, everything is calculated using exact mathematical equations.

The bot "A" will now fight other bots and the σ value will change as it fights more. After 4 fights, the system has a good estimate of the rank for "A". Afterwards, "A" will only fight with its closest neighbors. This process of fighting neighbors continues as long as "A" keeps changing ranks. At some point, "A" fights both of its immediate neighbors and yet the rank remains stable. Now the fight will occur only when "A's" neighboring bots change.

How does P2P ranking work in SmartBots?

1. Your bot will play a set number of matches against 4 random players in the leaderboard.
2. After these fights, an approximate rank for your bot is calculated and your bot is placed at that rank.
3. Your bot plays against one of its neighbours i.e bots one rank above or below the current position.
4. Based on that game, a new rank is calculated for the bot.
5. This process is repeated until your bot's rank is stable.

How do you run P2P in SmartBots?

You upload a docker image. It will play a 1000 games with Bhoos' bot. You can see your gameplay and review your changes. When you want to test your bot against other bots, click  "Submit to P2P ranking" button in docker submission page. This will send your latest submission to P2P.

References

Computing your skill
Original paper