Matches in SemOpenAlex for { <https://semopenalex.org/work/W2254629359> ?p ?o ?g. }
- W2254629359 abstract "Monte Carlo Tree Search (MCTS) is a method for making optimal decisions in artificial intelligence (AI) problems, typically for move planning in combinatorial games. It combines the generality of random simulation with the precision of tree search. Research interest in MCTS has risen sharply due to its spectacular success with computer Go and its potential application to a number of other difficult problems. Its application extends beyond games, and MCTS can theoretically be applied to any domain that can be described in terms of (state, action) pairs, as well as it can be used to simulate forecast outcomes such as decision support, control, delayed reward problems or complex optimization. The main advantages of the MCTS algorithm consist in the fact that, on one hand, it does not require any strategic or tactical knowledge about the given domain to make reasonable decisions, on the other hand algorithm can be halted at any time to return the current best estimate. So far, current research has shown that the algorithm can be parallelized on multiple CPUs. The motivation behind this work was caused by the emerging GPUbased systems and their high computational potential combined with the relatively low power usage compared to CPUs. As a problem to be solved I chose to develop an AI GPU(Graphics Processing Unit)-based agent in the game of Reversi (Othello) and SameGame puzzle which provide sufficiently complex problems for tree searching with non-uniform structure. The importance of this research is that if the MCTS algorithm can be efficiently parallelized on GPU(s) it can also be applied to other similar problems on modern multi-CPU/GPU systems such as the TSUBAME 2.0 supercomputer. Tree searching algorithms are hard to parallelize, especially when GPU is considered. Finding an algorithm which is suitable for GPUs is crucial if tree search has to be performed on recent supercomputers. Conventional ones do not provide good performance, because of the limitations of the GPUs’ architecture and the programming scheme, threads’ communication boundaries. One of the problems is the SIMD execution scheme within GPU for a group of threads. It means that standard CPU parallel implementations such as root-parallelism fail. The other problem is the difficulty to generate pseudo-random numbers on GPU which is important for Monte Carlo methods. Available methods are usually very time consuming. Third of all, no current research work discusses scalability of the algorithm for millions of threads (when multiple GPUs are considered), so it is important to estimate to what extent the parallelism can be increased. In this thesis I am proposing an efficient parallel GPU MCTS implementation based on the introduced ’block-parallelism’ scheme which combines GPU SIMD thread groups and performs independent searches without any need of intra-GPU or inter-GPU communication. I compare it with a simple leaf parallel scheme which implies certain performance limitations. The obtained results show that using my GPU MCTS implementation on the TSUBAME 2.0 system one GPU’s performance can be compared to 50-100 CPU threads depending on factors such as the search time and other MCTS parameters. The block-parallel algorithm provides better results than the naive leaf-parallel scheme which fail to scale well beyond 1000 threads on a single GPU. The block-parallel algorithm is approximately 4 times more efficient in terms of the number of CPU threads’ results comparable with the GPU implementation. In order not to generate random numbers on GPU I am introducing an algorithm, where the numbers are transferred from the CPU for each GPU block accessible as a look-up table. This approach makes the time needed for random-sequence generation insignificantly small. In this thesis for the first time I am discussing the scalability of the algorithm for millions of threads. The program is designed in the way that it can be run on many nodes using Message Passing Interface (MPI) standard. As a method of evaluating my results I compared the results of multiple CPU cores and GPUs playing against the standard sequential CPU implementation. Therefore the algorithm’s scalability is analyzed for multiple CPUs and GPUs. My results show that this algorithm implies almost no inter-node communication overhead and it scales linearly in terms of the number of simulation performed in a given time period. However, beyond a certain number of running threads, a lack of performance improvement was observed. I concluded that this limit is affected by the algorithm’s implementation and it can be improved to some extent by tuning the parameters or adjusting the algorithm itself. The improvements I am proposing and analyzing are variance-based error estimation and simultaneous CPU/GPU execution. Using these two methods modifying the MCTS algorithm the overall effectiveness can be increased by 10-50% further, compared to the basis block-parallel implementation. Also, another factor considered is the criteria of estimating the performance is the overall score of the game (win percentage or the score). Not all the parameters in the MCTS algorithm are analyzed thoroughly in regarding the GPU’s implementation and their importance considering scalability. This is caused by the certain limitations of the proposed evaluation method. As it is based on the average score, multiple games have to be played to get accurate results and time needed to aquire them is relatively long. I am also stating the remaining problems to be solved such as estimating the algorithm’s scalability for hundreds of GPUs and the overcoming the GPU latency for a single task execution. CPUs need very little time to perform a single search, whereas GPUs need to be loaded with data and run thousands of simulations at once. This implies different tree structures in the MCTS algorithm and also different characteristics of the obtained scores. Therefore I am also presenting results of my research for up to millions of GPU threads. I am discussing also several problems such as GPU and CPU implementation differences and power-usage comparisons. The GPU implementation consumes less power totally when the number of CPU threads needed to get comparable results is considered." @default.
- W2254629359 created "2016-06-24" @default.
- W2254629359 creator A5080202677 @default.
- W2254629359 date "2012-01-01" @default.
- W2254629359 modified "2023-09-26" @default.
- W2254629359 title "Large Scale Monte Carlo Tree Search on GPU" @default.
- W2254629359 cites W1504466744 @default.
- W2254629359 cites W1509593372 @default.
- W2254629359 cites W1512315140 @default.
- W2254629359 cites W1513645366 @default.
- W2254629359 cites W1530458829 @default.
- W2254629359 cites W1549167910 @default.
- W2254629359 cites W1560689031 @default.
- W2254629359 cites W156577148 @default.
- W2254629359 cites W1573483709 @default.
- W2254629359 cites W1589918049 @default.
- W2254629359 cites W1640303844 @default.
- W2254629359 cites W1714211023 @default.
- W2254629359 cites W1857467995 @default.
- W2254629359 cites W1969483458 @default.
- W2254629359 cites W1994584977 @default.
- W2254629359 cites W2025856884 @default.
- W2254629359 cites W2042714866 @default.
- W2254629359 cites W2083659666 @default.
- W2254629359 cites W2095595785 @default.
- W2254629359 cites W2110195531 @default.
- W2254629359 cites W2122410182 @default.
- W2254629359 cites W2148043549 @default.
- W2254629359 cites W2150470619 @default.
- W2254629359 cites W2154080909 @default.
- W2254629359 cites W2157803532 @default.
- W2254629359 cites W2202514763 @default.
- W2254629359 cites W3142540905 @default.
- W2254629359 cites W84222572 @default.
- W2254629359 cites W91122409 @default.
- W2254629359 hasPublicationYear "2012" @default.
- W2254629359 type Work @default.
- W2254629359 sameAs 2254629359 @default.
- W2254629359 citedByCount "1" @default.
- W2254629359 countsByYear W22546293592014 @default.
- W2254629359 crossrefType "journal-article" @default.
- W2254629359 hasAuthorship W2254629359A5080202677 @default.
- W2254629359 hasConcept C105795698 @default.
- W2254629359 hasConcept C107673813 @default.
- W2254629359 hasConcept C111350023 @default.
- W2254629359 hasConcept C113174947 @default.
- W2254629359 hasConcept C11413529 @default.
- W2254629359 hasConcept C121683094 @default.
- W2254629359 hasConcept C13153151 @default.
- W2254629359 hasConcept C134306372 @default.
- W2254629359 hasConcept C154945302 @default.
- W2254629359 hasConcept C15744967 @default.
- W2254629359 hasConcept C173608175 @default.
- W2254629359 hasConcept C19499675 @default.
- W2254629359 hasConcept C2779851693 @default.
- W2254629359 hasConcept C2780767217 @default.
- W2254629359 hasConcept C33923547 @default.
- W2254629359 hasConcept C36503486 @default.
- W2254629359 hasConcept C41008148 @default.
- W2254629359 hasConcept C46149586 @default.
- W2254629359 hasConcept C542102704 @default.
- W2254629359 hasConcept C80444323 @default.
- W2254629359 hasConceptScore W2254629359C105795698 @default.
- W2254629359 hasConceptScore W2254629359C107673813 @default.
- W2254629359 hasConceptScore W2254629359C111350023 @default.
- W2254629359 hasConceptScore W2254629359C113174947 @default.
- W2254629359 hasConceptScore W2254629359C11413529 @default.
- W2254629359 hasConceptScore W2254629359C121683094 @default.
- W2254629359 hasConceptScore W2254629359C13153151 @default.
- W2254629359 hasConceptScore W2254629359C134306372 @default.
- W2254629359 hasConceptScore W2254629359C154945302 @default.
- W2254629359 hasConceptScore W2254629359C15744967 @default.
- W2254629359 hasConceptScore W2254629359C173608175 @default.
- W2254629359 hasConceptScore W2254629359C19499675 @default.
- W2254629359 hasConceptScore W2254629359C2779851693 @default.
- W2254629359 hasConceptScore W2254629359C2780767217 @default.
- W2254629359 hasConceptScore W2254629359C33923547 @default.
- W2254629359 hasConceptScore W2254629359C36503486 @default.
- W2254629359 hasConceptScore W2254629359C41008148 @default.
- W2254629359 hasConceptScore W2254629359C46149586 @default.
- W2254629359 hasConceptScore W2254629359C542102704 @default.
- W2254629359 hasConceptScore W2254629359C80444323 @default.
- W2254629359 hasLocation W22546293591 @default.
- W2254629359 hasOpenAccess W2254629359 @default.
- W2254629359 hasPrimaryLocation W22546293591 @default.
- W2254629359 hasRelatedWork W107377490 @default.
- W2254629359 hasRelatedWork W1463737727 @default.
- W2254629359 hasRelatedWork W1557489584 @default.
- W2254629359 hasRelatedWork W1562728818 @default.
- W2254629359 hasRelatedWork W1577195333 @default.
- W2254629359 hasRelatedWork W2252427423 @default.
- W2254629359 hasRelatedWork W2257842633 @default.
- W2254629359 hasRelatedWork W2259316968 @default.
- W2254629359 hasRelatedWork W2262854951 @default.
- W2254629359 hasRelatedWork W2263376079 @default.
- W2254629359 hasRelatedWork W2267544373 @default.
- W2254629359 hasRelatedWork W2271329658 @default.
- W2254629359 hasRelatedWork W2591722715 @default.
- W2254629359 hasRelatedWork W2616131264 @default.
- W2254629359 hasRelatedWork W2806898911 @default.