-
Notifications
You must be signed in to change notification settings - Fork 1
0xshreyash/COMP30024-Silder-Project
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
/******************************************************************************************************************/ Submission by: Max Lee and Shreyash Patodia. Student Number: 719577 and 767336 File: comments.txt /******************************************************************************************************************/ Introduction: ------------------------------------------------------------------------------------------------------------------------ In this comments.txt file we talk about how we went about making an agent that plays the Slider game and does an extremely good job at beating it's makers at it (without using too much space at that). In our final submission we have a myriad of agents, strategies and scorers. Our final agent Shreya uses three strategies, AlphaBetaGreedy, AlphaBetaBlocking and AlphaBetaEnd and also two scorers, BlockingScorer and EndGameScorer. We decided that using different approaches at different stages of the game would be best because then we could leverage domain specific knowledge in order to score the game keeping in mind the aspects that affect the game the most in those stages. We will first detail the file structure of our submission to make it a little bit easier for you to be able to go through it and then explain in detail the creative aspects of not just our final submission but also other things that we tried but didn't work so well for us. Our main focus in this project was to find a winning strategy for the game, and as we progressed through the creation of our AI the best strategy in our head changed from just move forward, to whoever makes more lateral moves loses and finally came down to trying to make the opponent make more lateral moves than us by blocking their forward direction. We will explain the working of our strategies and heuristics in the creativity section that follows. File Structure: ------------------------------------------------------------------------------------------------------------------------ We have a number of packages and files which help us produce the most modular and re-usable project possible. This allowed us to create many try out many different things before arriving at our final submission. The varies levels of our file structure are detailed below: |__com.teammaxine |__agents - This package contains all our agents. |__Agent - This was the abstract super class for all our agents, the init and update operations for our agents were defined in this class allowing us to be able to create new agents on the fly without having to deal with the tedious aspects of doing so. We think this is one of the first creative aspects of our projects, because it allows for us to be creative in other ways. |__JungEun - This was our first agent that just played random (kind of) moves. |__Shreya - Our final AI (more about this later) |__ShreyaCache - An AI that used transposition tables, not our final submission due to memory constraints. |__Shreyundo - Our first AI that did not deep copy the board, allowed us to make significant improvements in terms of speed and helped us build our final AI. |__UpMan - An agent that makes the best forward move, if there are no forward moves then makes the best lateral move. Surprisingly hard to beat. |__UserAgent - An agent that allows us (users) to play against our AI, another one of the creative things we did in order to aid development. |__Xena - An AI that uses MonteCarlo Tree Search, helped us come up with a greedy strategy for the start of the game. |__board - Contains different components of the board representation in our game. |__ actions |__ AgentAction - A class that extends Move in order to give us more information about the move, something creative that we did but underutilized due to the simple nature of the game and we reckon such an extension could be useful for games that have different types of pieces for each player. |__ elements |__Board |__BoardAgent |__Cell |__CompressedBoard - A creative approach that did not make it's way into our final submission (explanation later) |__Horizontal - The horizontal player's info as seen by the board. |__Vertical - The vertical player's info as seen by the board. |__ helpers |__BoardCompressor - Another creative approach that did not pay off. |__Compressor - The generic parent class of BoardCompressor |__CompressionTestDriver |__Generator - Generates random board. |__LRUCache - Our attempt at doing transposition tables, that did not meet the memory requirements of the project. |__MathHelper |__Permute |__State - Part of our implementation of the transposition table |__Vector2 - Our coordinate system |__ scorer - Contains the different scorers we created to evaluate the game. |__AlphaScorer |__BetsScorer |__BlockingScorer - One of our final scorers. |__DeltaScorer |__EndGameScorer - The other one of our final scorers. |__MonteCarloScorer |__MontyPythonScorer |__ProximityScorer |__Scorer - The parent class all scorers derive from. |__ScorerTestDriver Creative points: ------------------------------------------------------------------------------------------------------------------------ Algorithmic creativity: We chose to use "minimax with alpha beta pruning" as our final algorithm for the assignment, but we included elements of "MonteCarlo Tree Search" for our early game policy. We also tried an iterative deepening version of alpha beta pruning called MTD(f) algorithm (https://en.wikipedia.org/wiki/MTD-f) which was implemented in AlphaBetaCache but was too resource hungry and wouldn't work within the limits of the running time of our assignment, we tried to remove the iterative deepening aspect from it so that it would take less time (this is the state that file is in right now) but the memory requirements of the Transposition table was far too much causing our program to run really slow. But the algorithm did prune a lot of our nodes and if not for the memory restrictions of the project would have been part of our final submission. The alpha beta search(es) that we use is also modified - The first, AlphaBetaGreedy is modified to be greedy, in that it only evaluates some of the branches reducing the branching factor and thus the search time (similar to how a monte-carlo search would work). Our other two implementations AlphaBetaBlocking and AlphaBetaEnd are close to being pure alpha beta but third one also counts lateral moves made by our player in order to send them to the EndGameScorer to penalise those moves (lateral moves are bad!). Early Game Policy (uses AlphaBetaGreedy and BlockingScorer): Early on in the game, we know that both agents are going to make forward moves (move right if we are H and up if we are V). This is because making those moves drives them closer to the goal of winning the game and it is very unlikely to encounter a situation early on in the game wherein one of the players cannot make forward moves. With this knowledge we choose only "optimistically" good moves to evaluate in our AlphaBetaGreedy i.e. it only looks at the forward moves and expands the tree on these optimistically good nodes (like a Monte-Carlo tree search would). We not only pick forward moves but we pick the most un-congested forward moves, i.e. ones that don't have too many other pieces in it's proximity. This allows us to decrease the branching factor from a possible 3*(n - 1) to less than (n - 1) most of the time. We have played the game a number of times, and our greedy approach does not lead to any loss in performance on the contrary it allows us to be able to search a bit deeper during the more critical times in the game. We think this is a great use of domain-specific knowledge and allows us to prune off a lot of nodes (and time). Mid-Game Policy (uses AlphaBetaBlocking adn BlockingScorer): We use a mid-game policy that focuses mainly on blocking the other player because as we see the game it is just us being able to make the other player make more lateral moves than us in order to win. In this part of the game we not just consider moving towards the goal but also focus on being able to block the other player to set-up the game in such a way that the opponent has no forward moves left. This allows us to be able to choke the opponents player and if the other player is not created to counter this, we will win games even when we go second. We also don't mind making some lateral moves at this stage in order to move pieces that aren't blocking any opposing pieces. This strategy may look like it is contradicting our notion of how to win the game, but we focus on making the opposing player make more lateral moves than us rather than making the least lateral moves possible. This strategy works really well in practice and our AI wins most of the games it has played. Our heuristic function in this part of the does not value taking a piece off a board too much, since blocking is best with more pieces and it also doesn't penalise lateral moves made. This strategy slightly contrasts our end game policy as we will explain in the next paragraph. End-Game Policy (uses AlphaBetaEnd and EndGameScorer): At the end of the game, making lateral moves in order to block someone because it could be the difference between winning and losing. We try to consider a few more aspects than our mid-game policy in our end-game policy. First of all we weight actions that take a piece off the board a lot more here. We also penalize any lateral moves made by our player in order to promote moving forward and finishing off the game. Our blocking strategy still exists in this scorer but the value of block is far less, because by this time we would have set up the game by making the opposition do lateral moves. We also differentiate between lateral moves because there is one sensible lateral move and one that is just bad. We will use the mock board below to explain what we mean by this: + + + H + + + + + V + + Say we are V in this example, moving left is more sensible for us than moving right because the H piece can follow us if we move right meaning we made a lateral move to no avail. We tax the left move a little bit less than the right move in order to promote our pieces to move in the right lateral direction. If the left-side is also blocked then it would reflect in the scorer through the blocking value in our scorers, allowing us to make the best lateral move when we are our of forward moves. We also use the information from the root node of the search to count how many moves we have taken off the board in order to value that to a higher extent. Design: We believe that our design is extremely creative since in our framework, an agent has a strategy which can be swapped up every time the agent needs to make a move. Each strategy use a scorer for it's evaluation and these scorers can also be swapped around in order to provide a polymorphic design. We leverage this by switching our strategy from AlphaBetaGreedy to AlphaBetaBlocking when transitioning from early game to the middle of the game and then transition to AlphaBetaEnd at the end of the game. Such a design means that in the future if we wanted to use NegaMax as our strategy all we would need to do is write the algorithm, pick a scorer and then plug it into one of our agents, making the process of extension as painless as possible. This is especially important in a project like this where we were iterating on our design and had to try different things to find the best combination. Performance: We wrote extremely good code for Part-A where counting the number of legal moves for us was an O(n) operation rather than being an O(n^2) operation. For this part we need to get legal moves a potentially hundreds of times each game, and we believe that our efficiency of our part-A code allows us to go deeper into the tree and be faster for this part of our project. Other creative aspects: If we ever reach a terminal state where we win the game then we return INT_MAX - max_depth + depth, meaning that a game state where we won without having to travel deeper would be given a higher value. Similarly we return INT_MIN + max_depth - depth to return a higher value for a state where we lost after travelling the most depth. This allows us to distinguish between winning states and take how quickly we win into consideration. We also have in our scorer a routine that taxes the player if the next move is not their move since the other player will probably move closer to the goal in the next move, this allows us to almost simulate looking one level deeper than we actually do, giving us more insight into the game. One last point for this section is that we don't really ever create a new copy of board or a snapshot of the state of the game, we make moves on a board and undo them before returning form the sections saving us a significant amount of memory (this memory would have been required if we chose to deep copy the board). ------------------------------------------------------------------------------------------------------------------------ Things we tried that didn't make it to our final submission: Monte-Carlo Search and Machine Learning: Monte Carlo search was utilized in early versions of the AI agent. It essentially counted number of wins, loss and draws after certain number of random branches deep down are visited. This strategy was highly effective in the early stages of our development however it was quickly over-taken by smart heuristics and we ended up purely using Minimax with Alpha Beta pruning. Potential improvement here can be made such that it does not play 'random' games but instead make logical choices. Learner class was developed to tweak Monte Carlo agent by playing against itself or other agent however we ended up not doing this due to time constraints. Iterative Deepening and Transposition Tables: We tried to implement move ordering using iterative deepening and Transposition Tables implemented using a LRU Cache, a Compressed Board as key (compressed board is just a byte array, which is much smaller in size than the actual board) and a state class to store the score, depth, lowerbound and upperbound associated with the board. We successfully got this to work but the memory requirements of the Cache were far too high for it to be able to function well, meaning that we couldn't submit it as part of our final AI. Even though, we don't submit these classes as part of our final submission it required a significant amount of effort to make these classes and there was also a lot of creative thinking involved creating an LRU Cache instead of a standard hash table, and in compressing the board in order to optimise memory usage but sadly we couldn't get the memory usage of a large enough size cache to be low enough to include in our final submission. Sorting one level deep: We also tried looking one level deeper in the tree in order to sort the moves according to the value of the board at the next level. But this lead to a decrease in speed and didn't allow us to look any deeper so we ended up not using this idea. ------------------------------------------------------------------------------------------------------------------------
About
Our Project for COMP30024.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published