Matches in SemOpenAlex for { <https://semopenalex.org/work/W4312691946> ?p ?o ?g. }
- W4312691946 endingPage "308" @default.
- W4312691946 startingPage "290" @default.
- W4312691946 abstract "In this paper, we study the challenging instance-wise vision-language tasks, where the free-form language is required to align with the objects instead of the whole image. To address these tasks, we propose X-DETR, whose architecture has three major components: an object detector, a language encoder, and vision-language alignment. The vision and language streams are independent until the end and they are aligned using an efficient dot-product operation. The whole network is trained end-to-end, such that the detector is optimized for the vision-language tasks instead of an off-the-shelf component. To overcome the limited size of paired object-language annotations, we leverage other weak types of supervision to expand the knowledge coverage. This simple yet effective architecture of X-DETR shows good accuracy and fast speeds for multiple instance-wise vision-language tasks, e.g., 16.4 AP on LVIS detection of 1.2K categories at $$sim $$ 20 frames per second without using any LVIS annotation during training. The code is available at https://github.com/amazon-research/cross-modal-detr." @default.
- W4312691946 created "2023-01-05" @default.
- W4312691946 creator A5001760915 @default.
- W4312691946 creator A5010126054 @default.
- W4312691946 creator A5027282999 @default.
- W4312691946 creator A5038328783 @default.
- W4312691946 creator A5046318233 @default.
- W4312691946 creator A5047671906 @default.
- W4312691946 creator A5087812675 @default.
- W4312691946 date "2022-01-01" @default.
- W4312691946 modified "2023-10-17" @default.
- W4312691946 title "X-DETR: A Versatile Architecture for Instance-wise Vision-Language Tasks" @default.
- W4312691946 cites W1861492603 @default.
- W4312691946 cites W1933349210 @default.
- W4312691946 cites W2102605133 @default.
- W4312691946 cites W2106277773 @default.
- W4312691946 cites W2194775991 @default.
- W4312691946 cites W2251512949 @default.
- W4312691946 cites W2277195237 @default.
- W4312691946 cites W2489434015 @default.
- W4312691946 cites W2565639579 @default.
- W4312691946 cites W2568262903 @default.
- W4312691946 cites W2606473278 @default.
- W4312691946 cites W2745461083 @default.
- W4312691946 cites W2886641317 @default.
- W4312691946 cites W2891999386 @default.
- W4312691946 cites W2948672349 @default.
- W4312691946 cites W2962964995 @default.
- W4312691946 cites W2963037989 @default.
- W4312691946 cites W2963109634 @default.
- W4312691946 cites W2963150697 @default.
- W4312691946 cites W2963518342 @default.
- W4312691946 cites W2963735856 @default.
- W4312691946 cites W2963854535 @default.
- W4312691946 cites W2963936013 @default.
- W4312691946 cites W2963954913 @default.
- W4312691946 cites W2964241181 @default.
- W4312691946 cites W2964345792 @default.
- W4312691946 cites W2970231061 @default.
- W4312691946 cites W2983943451 @default.
- W4312691946 cites W2987734933 @default.
- W4312691946 cites W3034727271 @default.
- W4312691946 cites W3090449556 @default.
- W4312691946 cites W3091588028 @default.
- W4312691946 cites W3092198590 @default.
- W4312691946 cites W3095670406 @default.
- W4312691946 cites W3096609285 @default.
- W4312691946 cites W3106250896 @default.
- W4312691946 cites W3159619744 @default.
- W4312691946 cites W3171668871 @default.
- W4312691946 cites W3173220247 @default.
- W4312691946 cites W3173859428 @default.
- W4312691946 cites W3182683290 @default.
- W4312691946 cites W3192845000 @default.
- W4312691946 cites W4312563428 @default.
- W4312691946 cites W4312956471 @default.
- W4312691946 doi "https://doi.org/10.1007/978-3-031-20059-5_17" @default.
- W4312691946 hasPublicationYear "2022" @default.
- W4312691946 type Work @default.
- W4312691946 citedByCount "3" @default.
- W4312691946 countsByYear W43126919462023 @default.
- W4312691946 crossrefType "book-chapter" @default.
- W4312691946 hasAuthorship W4312691946A5001760915 @default.
- W4312691946 hasAuthorship W4312691946A5010126054 @default.
- W4312691946 hasAuthorship W4312691946A5027282999 @default.
- W4312691946 hasAuthorship W4312691946A5038328783 @default.
- W4312691946 hasAuthorship W4312691946A5046318233 @default.
- W4312691946 hasAuthorship W4312691946A5047671906 @default.
- W4312691946 hasAuthorship W4312691946A5087812675 @default.
- W4312691946 hasBestOaLocation W43126919462 @default.
- W4312691946 hasConcept C107457646 @default.
- W4312691946 hasConcept C111919701 @default.
- W4312691946 hasConcept C118505674 @default.
- W4312691946 hasConcept C123657996 @default.
- W4312691946 hasConcept C142362112 @default.
- W4312691946 hasConcept C153083717 @default.
- W4312691946 hasConcept C153180895 @default.
- W4312691946 hasConcept C153349607 @default.
- W4312691946 hasConcept C154945302 @default.
- W4312691946 hasConcept C195324797 @default.
- W4312691946 hasConcept C204321447 @default.
- W4312691946 hasConcept C2776151529 @default.
- W4312691946 hasConcept C31972630 @default.
- W4312691946 hasConcept C41008148 @default.
- W4312691946 hasConcept C76155785 @default.
- W4312691946 hasConcept C94915269 @default.
- W4312691946 hasConceptScore W4312691946C107457646 @default.
- W4312691946 hasConceptScore W4312691946C111919701 @default.
- W4312691946 hasConceptScore W4312691946C118505674 @default.
- W4312691946 hasConceptScore W4312691946C123657996 @default.
- W4312691946 hasConceptScore W4312691946C142362112 @default.
- W4312691946 hasConceptScore W4312691946C153083717 @default.
- W4312691946 hasConceptScore W4312691946C153180895 @default.
- W4312691946 hasConceptScore W4312691946C153349607 @default.
- W4312691946 hasConceptScore W4312691946C154945302 @default.
- W4312691946 hasConceptScore W4312691946C195324797 @default.
- W4312691946 hasConceptScore W4312691946C204321447 @default.
- W4312691946 hasConceptScore W4312691946C2776151529 @default.