Matches in SemOpenAlex for { <https://semopenalex.org/work/W3135367836> ?p ?o ?g. }
- W3135367836 abstract "State-of-the-art computer vision systems are trained to predict a fixed set of predetermined object categories. This restricted form of supervision limits their generality and usability since additional labeled data is needed to specify any other visual concept. Learning directly from raw text about images is a promising alternative which leverages a much broader source of supervision. We demonstrate that the simple pre-training task of predicting which caption goes with which image is an efficient and scalable way to learn SOTA image representations from scratch on a dataset of 400 million (image, text) pairs collected from the internet. After pre-training, natural language is used to reference learned visual concepts (or describe new ones) enabling zero-shot transfer of the model to downstream tasks. We study the performance of this approach by benchmarking on over 30 different existing computer vision datasets, spanning tasks such as OCR, action recognition in videos, geo-localization, and many types of fine-grained object classification. The model transfers non-trivially to most tasks and is often competitive with a fully supervised baseline without the need for any dataset specific training. For instance, we match the accuracy of the original ResNet-50 on ImageNet zero-shot without needing to use any of the 1.28 million training examples it was trained on. We release our code and pre-trained model weights at this https URL." @default.
- W3135367836 created "2021-03-15" @default.
- W3135367836 creator A5006446297 @default.
- W3135367836 creator A5007406730 @default.
- W3135367836 creator A5028772381 @default.
- W3135367836 creator A5030305998 @default.
- W3135367836 creator A5031107879 @default.
- W3135367836 creator A5041662257 @default.
- W3135367836 creator A5043641795 @default.
- W3135367836 creator A5051250767 @default.
- W3135367836 creator A5057289323 @default.
- W3135367836 creator A5059260582 @default.
- W3135367836 creator A5061984381 @default.
- W3135367836 creator A5064685592 @default.
- W3135367836 date "2021-02-26" @default.
- W3135367836 modified "2023-10-02" @default.
- W3135367836 title "Learning Transferable Visual Models From Natural Language Supervision" @default.
- W3135367836 cites W1522301498 @default.
- W3135367836 cites W1527575280 @default.
- W3135367836 cites W1677182931 @default.
- W3135367836 cites W1812645736 @default.
- W3135367836 cites W1849277567 @default.
- W3135367836 cites W1880262756 @default.
- W3135367836 cites W1964806982 @default.
- W3135367836 cites W1967134278 @default.
- W3135367836 cites W1977295328 @default.
- W3135367836 cites W1981283549 @default.
- W3135367836 cites W2018881137 @default.
- W3135367836 cites W2041616772 @default.
- W3135367836 cites W2066255970 @default.
- W3135367836 cites W2080171500 @default.
- W3135367836 cites W2081613070 @default.
- W3135367836 cites W2100031962 @default.
- W3135367836 cites W2101234009 @default.
- W3135367836 cites W2102765684 @default.
- W3135367836 cites W2103163130 @default.
- W3135367836 cites W2108598243 @default.
- W3135367836 cites W2109586012 @default.
- W3135367836 cites W2112912048 @default.
- W3135367836 cites W2117876524 @default.
- W3135367836 cites W2119775030 @default.
- W3135367836 cites W2124219775 @default.
- W3135367836 cites W2131744502 @default.
- W3135367836 cites W2132339004 @default.
- W3135367836 cites W2134270519 @default.
- W3135367836 cites W2137471889 @default.
- W3135367836 cites W2142996775 @default.
- W3135367836 cites W2145215286 @default.
- W3135367836 cites W2145607950 @default.
- W3135367836 cites W2149557440 @default.
- W3135367836 cites W2150066425 @default.
- W3135367836 cites W2153579005 @default.
- W3135367836 cites W2157487986 @default.
- W3135367836 cites W2163284576 @default.
- W3135367836 cites W2164587673 @default.
- W3135367836 cites W2167905777 @default.
- W3135367836 cites W2170973209 @default.
- W3135367836 cites W2172191903 @default.
- W3135367836 cites W2185175083 @default.
- W3135367836 cites W2194775991 @default.
- W3135367836 cites W2250384498 @default.
- W3135367836 cites W2250539671 @default.
- W3135367836 cites W2251939518 @default.
- W3135367836 cites W2274287116 @default.
- W3135367836 cites W2277195237 @default.
- W3135367836 cites W2284646714 @default.
- W3135367836 cites W2328078142 @default.
- W3135367836 cites W2335728318 @default.
- W3135367836 cites W2338908902 @default.
- W3135367836 cites W2402144811 @default.
- W3135367836 cites W24089286 @default.
- W3135367836 cites W2462831000 @default.
- W3135367836 cites W2483215953 @default.
- W3135367836 cites W2518108298 @default.
- W3135367836 cites W2555897561 @default.
- W3135367836 cites W2562153041 @default.
- W3135367836 cites W2606220156 @default.
- W3135367836 cites W2612573399 @default.
- W3135367836 cites W2758782048 @default.
- W3135367836 cites W2763421725 @default.
- W3135367836 cites W2774267535 @default.
- W3135367836 cites W2775461895 @default.
- W3135367836 cites W2784121710 @default.
- W3135367836 cites W2787214294 @default.
- W3135367836 cites W2787560479 @default.
- W3135367836 cites W2788481061 @default.
- W3135367836 cites W27961112 @default.
- W3135367836 cites W2799269579 @default.
- W3135367836 cites W2804935296 @default.
- W3135367836 cites W2806857275 @default.
- W3135367836 cites W2809324505 @default.
- W3135367836 cites W2842511635 @default.
- W3135367836 cites W2886604692 @default.
- W3135367836 cites W2886641317 @default.
- W3135367836 cites W2888166343 @default.
- W3135367836 cites W2895392434 @default.
- W3135367836 cites W2898970033 @default.
- W3135367836 cites W2899136066 @default.
- W3135367836 cites W2899663614 @default.
- W3135367836 cites W2910458567 @default.