SNU Research Team Led by Professor Byung-Gon Chun Develops the “Parallax,” a Deep Learning Distributed Training System
작성자
관리자
등록일
2019.01.29
조회수
998
SNU Research Team Led by Professor Byung-Gon Chun Develops the “Parallax,” a Deep Learning Distributed Training System
▲ SNU Department of Computer Science and Engineering Professor Byung-Gon Chun’s Team: (From Bottom Left) Soojeong Kim (Ph.D.), Professor Chun, Gyeong-In Yu (Ph.D.), Joo Seong Jeong (Ph.D.), Hojin Park (Ph.D.), Hyeonmin Ha (Ph.D.), Eunji Jeong (Ph.D.), Sanha Lee (M.S.), Sungwoo Cho (M.S.)
SNU College of Engineering reported on 3rd that the research team led by Professor Byung-Gon Chun of the Department of Computer Science and Engineering has developed a deep learning distributed training system called the “Parallax.”
Deep learning technology is applied in various fields including image processing, speaker recognition, automatic driving, etc. Recently, researches on distributed training using GPU have been conducted actively to reduce the training time of deep learning model.
Majority of researches so far have leaned heavily towards image processing models that use dense tensor rather than sparse tensor. Unlike image processing models, there is a sparsity of variable used for natural language processing models. However, conventional distributed training systems lack to utilize this sparsity; thus, the distributed training performance of models with sparse variable lag behind models with dense variable.
Hence, the team suggested its “Parallax,” which is a distributed training system that takes the sparsity of variables into consideration. The Parallax utilizes a hybrid distributed training architecture that summons different training architecture methods based on whether the variable is dense or sparse.
This system uses “partitioning” to effectively process large sparse variable: Data is processed by each machines to be reduced in size. Then, it is transferred to another machine to minimize traffic between machines. As a result, while maintaining the performance of image processing model, the performance of natural language processing model was enhanced six times compared to conventional systems. In addition, the team greatly increased the system’s usability by converting models developed by a single GPU to be compatible for learning in multiple GPU.
The research findings will be presented at the EuroSys (European Conference on Computer Systems), one or the largest conference on the field, hosted at Dresden, Germany in March.
[Reference]
Research article before official publishment: https://arxiv.org/abs/1808.02621 (Final work is to be opened to public in March of 2019.)
An application of Parallax’s hybrid distributed training architecture: When a model is under distributed training, a distributed architecture using AllReduce is used for dense variable and a variable server architecture is used for sparse variable.