Matches in SemOpenAlex for { <https://semopenalex.org/work/W4379410147> ?p ?o ?g. }
- W4379410147 endingPage "962" @default.
- W4379410147 startingPage "935" @default.
- W4379410147 abstract "计算机视觉的任务目标是建立接近人类视觉系统的计算模型。随着深度神经网络(deep neural network,DNN)的发展,对计算机视觉中高层语义的分析与理解成为研究重点。计算机视觉的高层语义通常为人类可理解、可表述的用于表达图像、视频等媒体信号内容的描述子(descriptor),典型的高层语义分析任务包含图像分类、目标检测、实例分割、语义分割与视频场景识别、目标跟踪等。基于深度神经网络的算法使计算机视觉任务获得逐步提升的性能,但是网络模型的体量增大与计算效率的降低随之而来。模型蒸馏是一种基于迁移学习进行模型压缩的方案。此类方案通常利用一个预训练模型作为教师,提取其有效的表示,如模型输出、隐藏层特征或特征间相似度等,并将上述表示作为另一个规模较小、推断速度较快的学生模型的额外监督信号,对该学生模型进行训练,以达到提升小模型性能从而取代大模型的目的。模型蒸馏对模型性能与计算复杂度有着良好权衡,因此愈来愈多地用于基于深度学习的高层语义分析中。自2014年模型蒸馏概念提出以来,研究人员开发了大量应用于高层语义分析的模型蒸馏方法,在图像分类、目标检测与语义分割任务中的应用最为广泛。本文对上述典型任务中具有代表性的模型蒸馏方案进行调研和汇总,依照不同的视觉任务进行介绍。首先,从最成熟、应用最广泛的分类任务模型蒸馏方法开始,介绍其不同的设计思路与应用场景,展示部分实验性能的对比,指出在分类任务上与在检测、分割任务上应用模型蒸馏的条件差异性。接着,对几种经特殊设计而应用于目标检测、语义分割的典型模型蒸馏方法进行介绍,结合模型结构对设计目的与思路进行说明,提供部分实验结果的对比与分析。最后,对当前高层语义分析中模型蒸馏方法的现状进行了总结分析,并指出存在的困难及不足,设想未来可能的探索思路与发展方向。;Computer vision tasks aim to construct computational models in relevant to functions-like of human visual systems. Current deep learning models are progressively improving upper bounds of performances in multiple computer vision tasks,especially for analysis and understanding of high-level semantics,i. e. ,multimedia-based descriptors for human recognition. Typical tasks to understand high-level semantics include image classification,object detection,instance segmentation,semantic segmentation,and video’s recognition and tracking. With the development of convolutional neural networks(CNNs),deep learning based high-level semantic understanding have all been benefiting from increasingly deeper and cumbersome models,which is also challenged for the problem of storages and computational costs. To obtain lighter structure and computation efficiency,many model compression strategies have been proposed,e. g. ,pruning, weight quantization,and low-rank factorization. But,such challenging issue is to be resolved for altered network structure or drop-severe of performance when deployed on computer vision tasks. Model distillation can be as one of the typical compression methods in terms of transfer learning to model compression. In general,model distillation utilizes a large and complicated pre-trained model as“teacher”and takes its effective representations,e. g. ,model outputs,features of hidden layers or feature maps-between similarities. These representations are treated as extra supervision signal together with the original ground truth for a lighter and faster model’s training,in which the lighter model is called“student”. As model distillation provides favorable balance between models’performances and efficiency,it is being rapidly explored on different computer vision tasks. This paper investigates the progress of model distillation methods since its introduction in 2014 and introduces their different strategies in various applications. We review some popular distillation strategies and current model distillation algorithms deployed on image classification,object detection and semantic segmentation in this paper. First,we introduce distillation methods for image classification tasks,where model distillation has already achieved mature development. Fundamentals of model distillation starts from using teacher classifiers’output logits as soft labels,bringing student with more inter-categories structural information,which is not available in conventional one-hot ground truths. Furthermore,hint learning can be used to utilize hierarchical structure of neural networks and take feature maps from hidden layers as another“teachers”-involved representations. Most of distillation strategies are designed and derived from similar approaches. In the aspects of frameworks’design and application scenes,the paper respectively introduced some typical distillation strategies on classification models. Some methods mainly considered novel approaches on supervision signal design,i. e. ,ensembles that differs from conventional classification soft labels or feature maps. Newly developed features for student models to mimic are usually computed from attention or similarity maps of different layers,data augmentations or sampled images. Other methods consider adding noise or perturbation to teacher classifiers’output or using probability inference to minimize the gap between teacher and student models. These specially designed features or logits are focused on a more appropriate representation of knowledge in teacher models than plain features from some layers’outputs. Moreover,in other methods,the procedure of model distillation is altered,and more complicated schemes are introduced to transfer teacher’s knowledge instead of simply training the student with generated labels or features. Also,as generative adversarial networks(GANs)achieve promising performance in image synthesis,some model distillation methods also introduce adversarial mechanisms in classifiers’distillation,where teacher models’features are regarded as“real ones” and the students are expected to“generate”similar features. In many practical scenes such as model compression,selftraining and parallel computing,classifiers’distillation is utilized in coordinate to specific process as well,e. g. ,fine tuning networks with full-precision teachers,distilling student model with its previous versions during training,and using models from different nodes as teachers. We summarize some popular strategies performances and illustrate the data in a table after approaches of model distillation in image classification tasks are introduced. Distillation methods’performances on improving classifiers’top-1 accuracies are compared on several typical classification datasets. The second part of the paper focuses on specially developed distillation methods for computer vision tasks more complicated than classification,e. g. , object detection,instance segmentation and semantic segmentation. Differentiated from classifiers,models of these tasks contain more redundant structures with heterogeneous outputs. Hence,recent works on detectors’and segmentation models’distillation is relatively less than those in classifiers’distillation. The paper describes current challenges in designing of distillation frameworks on detection and segmentation tasks. Some of typical distillation methods for detectors and segmentation models are then introduced based on different tasks and their multifaceted structures. Since there were few works specified for instance segmentation models’distillation,the papers simply introduce similar distillation methods for object detectors in the beginning of the second part. For detectors,requirements from localization demand special concentration on local information around foreground objects. Meanwhile,images from object detection datasets consists of more complicated scenes generally in which large amounts of different objects may occur. Hence,the solutions of distillation strategiesborrowing from for classifies may bring undesired performance decrease in object detection. Due to more complex structures in detectors,previous distillation methods may not be applicable. As“backbone with task heads”structure is widely used in modern computer vision models,researchers develop novel distillation methods mainly based on this typical framework. The introduced detectors’distillation strategies investigate issues above and mainly focus on specific output logits acquirement and specially designed loss functions for different parts in detectors. To highlight foreground regions before distillation,backbones-derived feature maps are often selected through regions of interest(RoIs)using masking operations. Various of output logits are selected in different methods from teacher models’task heads,affecting training of students’task heads in terms of specific matching and imitation schemes. Semantic segmentation requires more global information than object detection or instance segmentation tasks,focusing on pixel-wise classification inside the total image. One of the critical factors of pixels’correct classification is oriented to the analysis of inter-pixel relationships. Hence,model distillation methods for semantic segmentation also take advantages of pixels in both output masks and feature maps from hidden layers. Distillation strategies introduced in the paper are majorly on the application of hierarchical distillation on different part,e. g. ,the imitation of full output classification mask,imitation of full feature maps,computing of similarity matrices, and using conditional GANs(cGANs)for auxiliary imitation. The former two approaches are fundamental practices in model distillation. In contrast,to realize segmentation model’s pixel-wise knowledge to be more‘compact’after compression,some distillation methods utilize compressed features instead of original one to compute similarity with student. When cGANs is used to imitate student segmentation model to the teacher features,researchers introduce Wasserstein distance as a better metric for adversarial training. At the final part of this paper,previous works of model distillation for high-level semantic understanding are summarized. We review some obstacles and unsolved problems in current development of model distillation,and the future research direction is predicted as well." @default.
- W4379410147 created "2023-06-06" @default.
- W4379410147 creator A5028963874 @default.
- W4379410147 creator A5042485246 @default.
- W4379410147 date "2023-01-01" @default.
- W4379410147 modified "2023-09-26" @default.
- W4379410147 title "Model distillation for high-level semantic understanding:a survey" @default.
- W4379410147 cites W1861492603 @default.
- W4379410147 cites W1903029394 @default.
- W4379410147 cites W2031489346 @default.
- W4379410147 cites W2046589280 @default.
- W4379410147 cites W2058641082 @default.
- W4379410147 cites W2097117768 @default.
- W4379410147 cites W2107709024 @default.
- W4379410147 cites W2108598243 @default.
- W4379410147 cites W2112796928 @default.
- W4379410147 cites W2144794286 @default.
- W4379410147 cites W2194775991 @default.
- W4379410147 cites W2233116163 @default.
- W4379410147 cites W2294370754 @default.
- W4379410147 cites W2317851288 @default.
- W4379410147 cites W2340897893 @default.
- W4379410147 cites W2520549145 @default.
- W4379410147 cites W2531409750 @default.
- W4379410147 cites W2543539599 @default.
- W4379410147 cites W2549139847 @default.
- W4379410147 cites W2560023338 @default.
- W4379410147 cites W2601564443 @default.
- W4379410147 cites W2618530766 @default.
- W4379410147 cites W2739879705 @default.
- W4379410147 cites W2750432752 @default.
- W4379410147 cites W2794284562 @default.
- W4379410147 cites W2903711666 @default.
- W4379410147 cites W2936864631 @default.
- W4379410147 cites W2952787292 @default.
- W4379410147 cites W2954054736 @default.
- W4379410147 cites W2955192706 @default.
- W4379410147 cites W2959289524 @default.
- W4379410147 cites W2963037989 @default.
- W4379410147 cites W2963140444 @default.
- W4379410147 cites W2963150697 @default.
- W4379410147 cites W2963163009 @default.
- W4379410147 cites W2963351448 @default.
- W4379410147 cites W2963575695 @default.
- W4379410147 cites W2963785012 @default.
- W4379410147 cites W2963881378 @default.
- W4379410147 cites W2964241181 @default.
- W4379410147 cites W2964268168 @default.
- W4379410147 cites W2964309882 @default.
- W4379410147 cites W2981537897 @default.
- W4379410147 cites W2981819252 @default.
- W4379410147 cites W2981884310 @default.
- W4379410147 cites W2982242214 @default.
- W4379410147 cites W2983943451 @default.
- W4379410147 cites W2986015886 @default.
- W4379410147 cites W2986445670 @default.
- W4379410147 cites W2987861506 @default.
- W4379410147 cites W2991662170 @default.
- W4379410147 cites W2997006708 @default.
- W4379410147 cites W3016719260 @default.
- W4379410147 cites W3034342078 @default.
- W4379410147 cites W3034619943 @default.
- W4379410147 cites W3034695001 @default.
- W4379410147 cites W3035163969 @default.
- W4379410147 cites W3049435445 @default.
- W4379410147 cites W3105676814 @default.
- W4379410147 cites W3113410735 @default.
- W4379410147 cites W3173270634 @default.
- W4379410147 cites W4214524539 @default.
- W4379410147 cites W639708223 @default.
- W4379410147 cites W753847829 @default.
- W4379410147 doi "https://doi.org/10.11834/jig.210337" @default.
- W4379410147 hasPublicationYear "2023" @default.
- W4379410147 type Work @default.
- W4379410147 citedByCount "0" @default.
- W4379410147 crossrefType "journal-article" @default.
- W4379410147 hasAuthorship W4379410147A5028963874 @default.
- W4379410147 hasAuthorship W4379410147A5042485246 @default.
- W4379410147 hasBestOaLocation W43794101471 @default.
- W4379410147 hasConcept C127413603 @default.
- W4379410147 hasConcept C185592680 @default.
- W4379410147 hasConcept C204030448 @default.
- W4379410147 hasConcept C204321447 @default.
- W4379410147 hasConcept C21880701 @default.
- W4379410147 hasConcept C23123220 @default.
- W4379410147 hasConcept C41008148 @default.
- W4379410147 hasConcept C43617362 @default.
- W4379410147 hasConceptScore W4379410147C127413603 @default.
- W4379410147 hasConceptScore W4379410147C185592680 @default.
- W4379410147 hasConceptScore W4379410147C204030448 @default.
- W4379410147 hasConceptScore W4379410147C204321447 @default.
- W4379410147 hasConceptScore W4379410147C21880701 @default.
- W4379410147 hasConceptScore W4379410147C23123220 @default.
- W4379410147 hasConceptScore W4379410147C41008148 @default.
- W4379410147 hasConceptScore W4379410147C43617362 @default.
- W4379410147 hasIssue "4" @default.
- W4379410147 hasLocation W43794101471 @default.
- W4379410147 hasOpenAccess W4379410147 @default.