Decoding the Winning Team Strategies | Smart Bots 2023
Smart Bots is a coding competition. This year, it challenged participants to build a 29-point card game bot from scratch, aiming to learn from IT professionals and peers and eventually win the competition. The competition was a three-month-long quest, during which participants used their coding skills and various out-of-the-box tactics to create a bot for the 29 points card game.
This year we had 725 teams consisting of 870 participants, of which 94 teams made it to the leaderboard. At the end of the Registration and Qualifier phase, we selected the Top 8 teams from the leaderboard. Those eight teams had intense competition for a week, and then we closed the submission for Finals. After two days of continuous bot evaluation, we finally announced the winner of Smart Bots 2023.
This blog lists the tactics the winner used to win the competition.
The winning team (Team Top G)(Kushal Subedi and Manoj Khatri) used a combination of innovative strategies and techniques to outperform the other participants. They are discussed below:
Starting Small with Rule-Based Algorithms
Sometimes the thought of tackling complex algorithms can be daunting and overwhelming, causing many people to hesitate and even give up before starting. Starting with smaller, more straightforward tasks can also help build momentum and confidence.
In the Smart Bots Coding Challenge, the winner knew that the first step was to create a strong foundation by learning and mastering the rules and tricks of the 29 points card game. They started with simple rule-based algorithms, building a bot that could perform basic game logic with ease.
They added simple logic to their code, such as(about the 29 points game):
- Play winning cards, if possible; else, play the lowest.
- If the partner is winning, throw the highest value card, but in the case of trump, throw the lowest.
- Show the trump card only if value cards are present in the currently played cards.
- Never show Trump if the partner is winning.
Winning isn't about going grand right away - starting small is the key to success.
Exploring Alternative Approaches
After starting with rule-based algorithms, the winner explored alternative algorithms such as Multi-Armed Bandit, Monte Carlo Tree Search (MCTS), and Information Set Monte Carlo Tree Search (ISMCTS). This allowed the winner to create a bot capable of handling more complex tasks.
What is the Multi-Armed Bandit algorithm?
In coding, a "multi-armed bandit" refers to a problem where one has many options (arms) and the goal is to find the best one (best reward). Different techniques are used to determine which option gives the best result, just like one would try different snacks to find their favorite.
Once the winning team, TopG, mastered the basic algorithms, they took on a multi-armed bandit approach. In the 29-points card game, it involves four players, and each player is dealt a hand of eight cards. To use the multi-armed bandit approach, for each choices (valid moves), one estimates the outcome of choosing that card (either win or loss) by randomly simulating the gameplay after that choice. Then after a lot of iterations, choose the card with highest score.
For each iteration, TopG used the following tactic:
- Play the move for which bandit is maximum.
- Randomly simulate.
- Update the scores for each move.
Multi-Armed Bandit is a simple algorithm and is typically used when the number of options is small and the outcomes are straightforward. At the same time, Determinized MCTS is a complex algorithm and the outcomes are complex.
To use Determinized Monte Carlo Tree Search, the team TopG constructed a game tree and simulated possible outcomes to select the move, updated their move to root nodes, and played the move with the highest visits.
However, the effectiveness of DMCTS depends on the quality of the game tree and the number of simulations. A well-constructed game tree with many simulations will result in better decision-making. Additionally, DMCTS is a computationally expensive algorithm and strategy fusion; hence they moved on to exploring other algorithms.
Information Set Monte Carlo Tree Search (ISMCTS)
ISMCTS deals with games of imperfect information, where players do not have access to all information about the game state. In contrast, DMCTS treats the games as if it has perfect information about the game state. In a game of cards, one might have no idea about the opponent’s cards or strategies they use, and ISMCTS is an advantageous approach.
Top G constructed information sets for each player based on the cards they hold and simulated possible moves and outcomes based on those information sets.
Putting Course Concepts to Use
The 3rd year aspiring BCT students used Bipartite Matching to optimize their bot's performance.
They used interpolation to derive the bidding strategy. They used following tactic:
- Find out the highest repeated suit
- Provide offset according to number of same suit as that of highest suit (1,2,3,4): For highest_count 4, 3, 2, 1, the offset is 17, 16, 15, 14 respectively
- Bid = offset value + 50% of most repeated suit cards value + 34% of remaining cards value
“Bipartite matching is a graph theory concept used to find a matching between two disjoint sets of vertices in a graph, where each vertex in one set is adjacent to at most one vertex in the other set.” This comes directly from the coursebook of the Computer Science syllabus.
They used a Bipartite Matching algorithm to determine which cards have been played and which are still in the hands of the other players. Using depth-first search (DFS) instead of breadth-first search (BFS), one can more efficiently find a match between the cards and the players based on the suits they have lost. By keeping track of the cards that have been played, the suits that each player has lost, and the potential missing suits of the other players, players can make more informed decisions about which cards to play and which ones to hold onto. This increased their chances of winning the game and maximizing their score.
Reading, Researching, and Implementing
While the Top 8 teams followed almost the exact mechanisms while building their bot, this particular strategy sets the winner apart from the rest. The winning team did a lot of research and read papers on various coding and algorithms topics. This allowed the winner to stay up-to-date and implement maximum tweaks to their bot codes in the Qualifier Phase.
For instance, while many participants opted for UCB tuned or UCB 1 to estimate the rewards of each action, this team took a dig and found Thompson Sampling.
Thompson Sampling is a probabilistic algorithm that balances exploration and exploitation by choosing actions based on an updated probability distribution as new information becomes available. UCB Tuned, on the other hand, uses upper confidence bounds to estimate the reward of each action and chooses the one with the highest estimated reward.
They used both these strategies for different numbers of simulations and found that the Thompson Sampling had the upper hand.
While adding complex algorithms to their code, one must be aware of time and space complexity. Hence, time and again, codes must be optimized to enhance good results. The winning team used:
The team added heuristics for opponent play during simulations. They also guessed the trump suit by assigning a probability to each suit based on cards played by the bidder.
When exploring the wrong path, there's no solution to be found. Hence, they set criteria not to visit the branch if it loses 90% of the time in 100 iterations.
For speed, the team opted for C++ for finals. Python and Node.js both take time, and C++ saves time. They also added their hash functions to optimize their speed.
Ultimately, the competition was quite challenging this year, and we also learned a lot from them. The journey of each Top 8 team was incredible as each came up with smart strategies and techniques.
Keynote: “There’s only one growth strategy: work hard.” William Hague