Matches in SemOpenAlex for { <https://semopenalex.org/work/W4375958194> ?p ?o ?g. }
Showing items 1 to 81 of
81
with 100 items per page.
- W4375958194 abstract "In recent years, visual question answering (VQA) has attracted attention from the research community because of its highly potential applications (such as virtual assistance on intelligent cars, assistant devices for blind people, or information retrieval from document images using natural language as queries) and challenge. The VQA task requires methods that have the ability to fuse the information from questions and images to produce appropriate answers. Neural visual question answering models have achieved tremendous growth on large-scale datasets which are mostly for resource-rich languages such as English. However, available datasets narrow the VQA task as the answers selection task or answer classification task. We argue that this form of VQA is far from human ability and eliminates the challenge of the answering aspect in the VQA task by just selecting answers rather than generating them. In this paper, we introduce the OpenViVQA (Open-domain Vietnamese Visual Question Answering) dataset, the first large-scale dataset for VQA with open-ended answers in Vietnamese, consists of 11,000+ images associated with 37,000+ question-answer pairs (QAs). Moreover, we proposed FST, QuMLAG, and MLPAG which fuse information from images and answers, then use these fused features to construct answers as humans iteratively. Our proposed methods achieve results that are competitive with SOTA models such as SAAA, MCAN, LORA, and M4C. The dataset is available to encourage the research community to develop more generalized algorithms including transformers for low-resource languages such as Vietnamese." @default.
- W4375958194 created "2023-05-10" @default.
- W4375958194 creator A5020382119 @default.
- W4375958194 creator A5033137339 @default.
- W4375958194 creator A5046262541 @default.
- W4375958194 creator A5059371432 @default.
- W4375958194 date "2023-05-06" @default.
- W4375958194 modified "2023-10-05" @default.
- W4375958194 title "OpenViVQA: Task, Dataset, and Multimodal Fusion Models for Visual Question Answering in Vietnamese" @default.
- W4375958194 doi "https://doi.org/10.1016/j.inffus.2023.101868" @default.
- W4375958194 hasPublicationYear "2023" @default.
- W4375958194 type Work @default.
- W4375958194 citedByCount "0" @default.
- W4375958194 crossrefType "posted-content" @default.
- W4375958194 hasAuthorship W4375958194A5020382119 @default.
- W4375958194 hasAuthorship W4375958194A5033137339 @default.
- W4375958194 hasAuthorship W4375958194A5046262541 @default.
- W4375958194 hasAuthorship W4375958194A5059371432 @default.
- W4375958194 hasBestOaLocation W43759581941 @default.
- W4375958194 hasConcept C103621254 @default.
- W4375958194 hasConcept C119599485 @default.
- W4375958194 hasConcept C119857082 @default.
- W4375958194 hasConcept C127413603 @default.
- W4375958194 hasConcept C134306372 @default.
- W4375958194 hasConcept C138885662 @default.
- W4375958194 hasConcept C141353440 @default.
- W4375958194 hasConcept C154945302 @default.
- W4375958194 hasConcept C162324750 @default.
- W4375958194 hasConcept C187736073 @default.
- W4375958194 hasConcept C195324797 @default.
- W4375958194 hasConcept C199360897 @default.
- W4375958194 hasConcept C204321447 @default.
- W4375958194 hasConcept C23123220 @default.
- W4375958194 hasConcept C2780451532 @default.
- W4375958194 hasConcept C2780801425 @default.
- W4375958194 hasConcept C2993776861 @default.
- W4375958194 hasConcept C33923547 @default.
- W4375958194 hasConcept C36503486 @default.
- W4375958194 hasConcept C41008148 @default.
- W4375958194 hasConcept C41895202 @default.
- W4375958194 hasConcept C44291984 @default.
- W4375958194 hasConcept C59650362 @default.
- W4375958194 hasConceptScore W4375958194C103621254 @default.
- W4375958194 hasConceptScore W4375958194C119599485 @default.
- W4375958194 hasConceptScore W4375958194C119857082 @default.
- W4375958194 hasConceptScore W4375958194C127413603 @default.
- W4375958194 hasConceptScore W4375958194C134306372 @default.
- W4375958194 hasConceptScore W4375958194C138885662 @default.
- W4375958194 hasConceptScore W4375958194C141353440 @default.
- W4375958194 hasConceptScore W4375958194C154945302 @default.
- W4375958194 hasConceptScore W4375958194C162324750 @default.
- W4375958194 hasConceptScore W4375958194C187736073 @default.
- W4375958194 hasConceptScore W4375958194C195324797 @default.
- W4375958194 hasConceptScore W4375958194C199360897 @default.
- W4375958194 hasConceptScore W4375958194C204321447 @default.
- W4375958194 hasConceptScore W4375958194C23123220 @default.
- W4375958194 hasConceptScore W4375958194C2780451532 @default.
- W4375958194 hasConceptScore W4375958194C2780801425 @default.
- W4375958194 hasConceptScore W4375958194C2993776861 @default.
- W4375958194 hasConceptScore W4375958194C33923547 @default.
- W4375958194 hasConceptScore W4375958194C36503486 @default.
- W4375958194 hasConceptScore W4375958194C41008148 @default.
- W4375958194 hasConceptScore W4375958194C41895202 @default.
- W4375958194 hasConceptScore W4375958194C44291984 @default.
- W4375958194 hasConceptScore W4375958194C59650362 @default.
- W4375958194 hasLocation W43759581941 @default.
- W4375958194 hasOpenAccess W4375958194 @default.
- W4375958194 hasPrimaryLocation W43759581941 @default.
- W4375958194 hasRelatedWork W204133468 @default.
- W4375958194 hasRelatedWork W2075451754 @default.
- W4375958194 hasRelatedWork W2391533720 @default.
- W4375958194 hasRelatedWork W2395174199 @default.
- W4375958194 hasRelatedWork W2951097643 @default.
- W4375958194 hasRelatedWork W2991310128 @default.
- W4375958194 hasRelatedWork W3215363805 @default.
- W4375958194 hasRelatedWork W4226441484 @default.
- W4375958194 hasRelatedWork W4307481286 @default.
- W4375958194 hasRelatedWork W4309395021 @default.
- W4375958194 isParatext "false" @default.
- W4375958194 isRetracted "false" @default.
- W4375958194 workType "article" @default.