Professor Jaejin Lee's Research Team of the SNU Department of Computer Science and Engineering Develops a Deep Learning Core Software in the AI Field
작성자
관리자
등록일
2021.03.23.
조회수
1248
Professor Jaejin Lee's Research Team of the SNU Department of Computer Science and Engineering Develops a Deep Learning Core Software in the AI Field
-A deep learning core software technology that surpasses Google and NVIDIA
Recently, deep learning technology is essential in the field of artificial intelligence (AI) and big data, and domestic researchers have developed a deep learning compiler framework technology, which is the core of this deep learning technology. The deep learning compiler framework is a software that is the key for improving the inferening and learning performance of a given deep learning model.
Professor Jaejin Lee's research team of Seoul National University's Department of Data Science at the Graduate School of Data Science as well as the Department of Computer Science and Engineering of the College of Engineering have announced to have developed a deep learning compiler framework, a core software in the field of AI, and have reached the level of greater performance than that of Google and NVIDIA's commercial deep learning compiler frameworks that have formerly provided the highest level of performance.
Deep learning technology requires the support of high-performance AI semiconductors such as NVIDIA GPUs in order to make logical inferences reason and learn within a realistically manageable time. However, there is a limitation of having to rely heavily on a commercial deep learning compiler framework that is provided by AI semiconductor manufacturers.
The deep learning compiler framework developed by Professor Jaejin Lee's research team, by solely using public GPU hardware information, has reached high performance levels by directly generating a given deep learning model and a GPU-optimized code that will execute the model.
The technology developed by the research team was applied to and tested in several widely used deep learning benchmark models (ResNet, BERT, etc.), the results of which have shown that it can achieve a similar or higher level of performance to that of state-of-the-art deep learning compiler frameworks such as Google's TensorFlow XLA, NVIDIA's TensorRT and Apache's TVM. The research team plans to make the developed technology into a public open software.
“Until now, the development of domestic technologies have been limited because of the fact that they rely heavily on commercial deep learning compiler frameworks for which foreign hardware manufacturers have not released source codes. Our research has demonstrated that even with advanced technologies that have already been commercialized overseas, it is possible to develop further advanced technologies that surpass them by using other creative and original methods,” said researcher Wookeun Jung of the Department of Computer Science and Engineering, who made a major contribution to this research.
“This research result is an encouraging case for securing cutting-edge core software technology in the field of deep learning in Korea. Currently, AI semiconductor development is booming in Korea as well as all over the world and the achievements of this research is an essential technology for the use and commercialization of AI semiconductors,” explained Professor Jaejin Lee of Seoul National University.
The research results will be presented at PLDI (Programming Language Design and Implementation), an international academic conference in the field of programming languages that is scheduled to be held in June this year.
[Research Results]
A Deep Learning Optimization Framework for Versatile GPU Workloads
Wookeun Jung, Thanh Tuan Dao, and Jaejin Lee
(PLDI’21, conditionally accepted with shepherding)
Widely used deep learning frameworks such as PyTorch and TensorFlow rely heavily on the cuDNN library provided by NVIDIA. However, pre-implemented libraries such as that of cuDNN have diverse types of deep learning operators which show limitations of difficulties in achieving high performance the more the hardware becomes diversified, and optimization such as fusion cannot be applied.
This study proposes DeepCuts, a deep learning optimization framework that generates optimal code using given deep learning operations and GPU hardware information. DeepCuts generates efficient code through fusion-considered code generation techniques and performance modeling and achieves higher performance than that of existing state-of-the-art deep learning optimization frameworks (Apache TVM, Google TensorFlow XLA, NVIDIA TensorRT).
[Explanation of Key Terms]
1. Deep learning compiler framework
It refers to software that receives a deep learning application irrespective of hardware information as an input and generates optimized code for execution in a hardware accelerator (e.g. GPU) at the bottom. Representative examples include TensorFlow XLA developed by Google, TensorRT developed by NVIDIA and TVM of Apache.
2. GPU
An abbreviation for Graphics Processing Unit, it was used only for graphic processing, but now is used for general purpose calculations to increase performance and power efficiency, and is a de facto standard platform that executes inference and learning of deep learning in the field of AI.
3. PyTorch and TensorFlow
The most widely used software framework for the development and execution of deep learning models in the field of AI. The deep learning models that were developed using them are optimized through Google's TensorFlow XLA, NVIDIA's TensorRT, and Apache's TVM.
It represents a series of processes for generating a GPU program by receiving a Computation Graph representing a deep learning operation from PyTorch or TensorFlow as an input. Candidate generator determines candidate operations for generating optimization codes or clusters of operations through fusion. The performance estimator receives these and uses hardware information (GPU) and implementation parameters to predict the performance for when these operations are implemented in real GPU code through a simple predictive model. Depending on the prediction result, the code implementation with very poor performance is filtered and passed to the code generator. The code generator creates and executes several sets of actual codes, and finally selects the best performing code.