Matches in SemOpenAlex for { <https://semopenalex.org/work/W3168463823> ?p ?o ?g. }
- W3168463823 endingPage "2970" @default.
- W3168463823 startingPage "2939" @default.
- W3168463823 abstract "The research progress in multimodal learning has grown rapidly over the last decade in several areas, especially in computer vision. The growing potential of multimodal data streams and deep learning algorithms has contributed to the increasing universality of deep multimodal learning. This involves the development of models capable of processing and analyzing the multimodal information uniformly. Unstructured real-world data can inherently take many forms, also known as modalities, often including visual and textual content. Extracting relevant patterns from this kind of data is still a motivating goal for researchers in deep learning. In this paper, we seek to improve the understanding of key concepts and algorithms of deep multimodal learning for the computer vision community by exploring how to generate deep models that consider the integration and combination of heterogeneous visual cues across sensory modalities. In particular, we summarize six perspectives from the current literature on deep multimodal learning, namely: multimodal data representation, multimodal fusion (i.e., both traditional and deep learning-based schemes), multitask learning, multimodal alignment, multimodal transfer learning, and zero-shot learning. We also survey current multimodal applications and present a collection of benchmark datasets for solving problems in various vision domains. Finally, we highlight the limitations and challenges of deep multimodal learning and provide insights and directions for future research." @default.
- W3168463823 created "2021-06-22" @default.
- W3168463823 creator A5010803356 @default.
- W3168463823 creator A5070390073 @default.
- W3168463823 creator A5076712211 @default.
- W3168463823 creator A5081011085 @default.
- W3168463823 date "2021-06-10" @default.
- W3168463823 modified "2023-10-17" @default.
- W3168463823 title "A survey on deep multimodal learning for computer vision: advances, trends, applications, and datasets" @default.
- W3168463823 cites W1498436455 @default.
- W3168463823 cites W1528056001 @default.
- W3168463823 cites W1536680647 @default.
- W3168463823 cites W1539811621 @default.
- W3168463823 cites W1578285471 @default.
- W3168463823 cites W182048296 @default.
- W3168463823 cites W1903029394 @default.
- W3168463823 cites W1916445035 @default.
- W3168463823 cites W1946093182 @default.
- W3168463823 cites W1951319388 @default.
- W3168463823 cites W1977942424 @default.
- W3168463823 cites W2005756025 @default.
- W3168463823 cites W2012485971 @default.
- W3168463823 cites W2060166505 @default.
- W3168463823 cites W2064675550 @default.
- W3168463823 cites W2085411191 @default.
- W3168463823 cites W2088049833 @default.
- W3168463823 cites W2100235303 @default.
- W3168463823 cites W2102605133 @default.
- W3168463823 cites W2105071316 @default.
- W3168463823 cites W2112796928 @default.
- W3168463823 cites W2114367267 @default.
- W3168463823 cites W2125838338 @default.
- W3168463823 cites W2134383396 @default.
- W3168463823 cites W2136166259 @default.
- W3168463823 cites W2136922672 @default.
- W3168463823 cites W2137356002 @default.
- W3168463823 cites W2156222070 @default.
- W3168463823 cites W2163345210 @default.
- W3168463823 cites W2163922914 @default.
- W3168463823 cites W2168195382 @default.
- W3168463823 cites W2176950688 @default.
- W3168463823 cites W2197258718 @default.
- W3168463823 cites W2204747942 @default.
- W3168463823 cites W22229905 @default.
- W3168463823 cites W2251324968 @default.
- W3168463823 cites W2341569833 @default.
- W3168463823 cites W2405546895 @default.
- W3168463823 cites W2466618734 @default.
- W3168463823 cites W2521738011 @default.
- W3168463823 cites W2565639579 @default.
- W3168463823 cites W2572559801 @default.
- W3168463823 cites W2584992898 @default.
- W3168463823 cites W2593501769 @default.
- W3168463823 cites W2604878161 @default.
- W3168463823 cites W2605649771 @default.
- W3168463823 cites W2618530766 @default.
- W3168463823 cites W2742847802 @default.
- W3168463823 cites W2760327656 @default.
- W3168463823 cites W2765811365 @default.
- W3168463823 cites W2767290858 @default.
- W3168463823 cites W2770233088 @default.
- W3168463823 cites W2770472008 @default.
- W3168463823 cites W2777460464 @default.
- W3168463823 cites W2782730293 @default.
- W3168463823 cites W2790047899 @default.
- W3168463823 cites W2792919579 @default.
- W3168463823 cites W2793236744 @default.
- W3168463823 cites W2800017313 @default.
- W3168463823 cites W2805954242 @default.
- W3168463823 cites W2886816990 @default.
- W3168463823 cites W2888631859 @default.
- W3168463823 cites W2889162782 @default.
- W3168463823 cites W2890686698 @default.
- W3168463823 cites W2896171544 @default.
- W3168463823 cites W2899642385 @default.
- W3168463823 cites W2900323013 @default.
- W3168463823 cites W2900362520 @default.
- W3168463823 cites W2904106524 @default.
- W3168463823 cites W2905544595 @default.
- W3168463823 cites W2908425794 @default.
- W3168463823 cites W2913340405 @default.
- W3168463823 cites W2914914077 @default.
- W3168463823 cites W2915599493 @default.
- W3168463823 cites W2916052001 @default.
- W3168463823 cites W2916103538 @default.
- W3168463823 cites W2916723116 @default.
- W3168463823 cites W2917807569 @default.
- W3168463823 cites W2919115771 @default.
- W3168463823 cites W2919768272 @default.
- W3168463823 cites W2922319990 @default.
- W3168463823 cites W2922915988 @default.
- W3168463823 cites W2936721683 @default.
- W3168463823 cites W2943893946 @default.
- W3168463823 cites W2945356051 @default.
- W3168463823 cites W2946165673 @default.
- W3168463823 cites W2947223061 @default.
- W3168463823 cites W2950697717 @default.
- W3168463823 cites W2950959996 @default.