Matches in SemOpenAlex for { <https://semopenalex.org/work/W2883077790> ?p ?o ?g. }
- W2883077790 abstract "A long-term goal of AI research is to build intelligent agents that can see the rich visual environment around us, communicate this understanding in natural language to humans and other agents, and act in a physical or embodied environment. To this end, recent advances at the intersection of language and vision have made incredible progress – from being able to generate natural language descriptions of images/videos, to answering questions about them, to even holding free-form conversations about visual content! However, while these agents can passively describe images or answer (a sequence of) questions about them, they cannot act in the world (what if I cannot answer a question from my current view, or I am asked to move or manipulate something?). Thus, the challenge now is to extend this progress in language and vision to embodied agents that take actions and actively interact with their visual environments. To reduce the entry barrier for new researchers, this tutorial will provide an overview of the growing number of multimodal tasks and datasets that combine textual and visual understanding. We will comprehensively review existing state-of-the-art approaches to selected tasks such as image captioning, visual question answering (VQA) and visual dialog, presenting the key architectural building blocks (such as co-attention) and novel algorithms (such as cooperative/adversarial games) used to train models for these tasks. We will then discuss some of the current and upcoming challenges of combining language, vision and actions, and introduce some recently-released interactive 3D simulation environments designed for this purpose." @default.
- W2883077790 created "2018-08-03" @default.
- W2883077790 creator A5020091255 @default.
- W2883077790 creator A5042265238 @default.
- W2883077790 creator A5060958969 @default.
- W2883077790 date "2018-01-01" @default.
- W2883077790 modified "2023-09-27" @default.
- W2883077790 title "Connecting Language and Vision to Actions" @default.
- W2883077790 cites W1514535095 @default.
- W2883077790 cites W1548663377 @default.
- W2883077790 cites W1731062554 @default.
- W2883077790 cites W177796033 @default.
- W2883077790 cites W1793121960 @default.
- W2883077790 cites W1832693441 @default.
- W2883077790 cites W1889081078 @default.
- W2883077790 cites W1889268436 @default.
- W2883077790 cites W1895577753 @default.
- W2883077790 cites W1933349210 @default.
- W2883077790 cites W1956340063 @default.
- W2883077790 cites W1971301773 @default.
- W2883077790 cites W1990128172 @default.
- W2883077790 cites W2005708641 @default.
- W2883077790 cites W202277873 @default.
- W2883077790 cites W2040916592 @default.
- W2883077790 cites W2053299703 @default.
- W2883077790 cites W2085337304 @default.
- W2883077790 cites W2099471712 @default.
- W2883077790 cites W2119465010 @default.
- W2883077790 cites W2120279123 @default.
- W2883077790 cites W2120615054 @default.
- W2883077790 cites W2122865749 @default.
- W2883077790 cites W2126725946 @default.
- W2883077790 cites W2130942839 @default.
- W2883077790 cites W2131340601 @default.
- W2883077790 cites W2131357087 @default.
- W2883077790 cites W2139694477 @default.
- W2883077790 cites W2149746394 @default.
- W2883077790 cites W2150295085 @default.
- W2883077790 cites W2153579005 @default.
- W2883077790 cites W2155132542 @default.
- W2883077790 cites W2157191138 @default.
- W2883077790 cites W2158028897 @default.
- W2883077790 cites W2160424986 @default.
- W2883077790 cites W2161066414 @default.
- W2883077790 cites W2165467443 @default.
- W2883077790 cites W2165596530 @default.
- W2883077790 cites W2185083674 @default.
- W2883077790 cites W2186845332 @default.
- W2883077790 cites W2236233024 @default.
- W2883077790 cites W2247119764 @default.
- W2883077790 cites W2250333922 @default.
- W2883077790 cites W2250728538 @default.
- W2883077790 cites W2250750514 @default.
- W2883077790 cites W2251509039 @default.
- W2883077790 cites W2251894552 @default.
- W2883077790 cites W2251939518 @default.
- W2883077790 cites W2260776682 @default.
- W2883077790 cites W2267186426 @default.
- W2883077790 cites W2270364989 @default.
- W2883077790 cites W2280395961 @default.
- W2883077790 cites W2293004735 @default.
- W2883077790 cites W2295227292 @default.
- W2883077790 cites W2296283641 @default.
- W2883077790 cites W2302086703 @default.
- W2883077790 cites W2324083987 @default.
- W2883077790 cites W2400166611 @default.
- W2883077790 cites W2402268235 @default.
- W2883077790 cites W2403380339 @default.
- W2883077790 cites W2404416160 @default.
- W2883077790 cites W2408271274 @default.
- W2883077790 cites W2463565445 @default.
- W2883077790 cites W2474390746 @default.
- W2883077790 cites W2506483933 @default.
- W2883077790 cites W2560730294 @default.
- W2883077790 cites W2561715562 @default.
- W2883077790 cites W2603266952 @default.
- W2883077790 cites W2604728175 @default.
- W2883077790 cites W2740109156 @default.
- W2883077790 cites W2742113707 @default.
- W2883077790 cites W2745461083 @default.
- W2883077790 cites W2757369719 @default.
- W2883077790 cites W2768661419 @default.
- W2883077790 cites W2774005037 @default.
- W2883077790 cites W2783375473 @default.
- W2883077790 cites W2805216012 @default.
- W2883077790 cites W2806617565 @default.
- W2883077790 cites W2915219238 @default.
- W2883077790 cites W2950276680 @default.
- W2883077790 cites W2950577311 @default.
- W2883077790 cites W2962684798 @default.
- W2883077790 cites W2962716332 @default.
- W2883077790 cites W2962887844 @default.
- W2883077790 cites W2962968835 @default.
- W2883077790 cites W2963084599 @default.
- W2883077790 cites W2963143606 @default.
- W2883077790 cites W2963184844 @default.
- W2883077790 cites W2963224792 @default.
- W2883077790 cites W2963367210 @default.
- W2883077790 cites W2963383024 @default.
- W2883077790 cites W2963398599 @default.