Matches in SemOpenAlex for { <https://semopenalex.org/work/W4386076227> ?p ?o ?g. }
Showing items 1 to 90 of
90
with 100 items per page.
- W4386076227 abstract "Assessing the aesthetics of an image is challenging, as it is influenced by multiple factors including composition, color, style, and high-level semantics. Existing image aesthetic assessment (IAA) methods primarily rely on human-labeled rating scores, which oversimplify the visual aesthetic information that humans perceive. Conversely, user comments offer more comprehensive information and are a more natural way to express human opinions and preferences regarding image aesthetics. In light of this, we propose learning image aesthetics from user comments, and exploring vision-language pretraining methods to learn multimodal aesthetic representations. Specifically, we pretrain an image-text encoder-decoder model with image-comment pairs, using contrastive and generative objectives to learn rich and generic aesthetic semantics without human labels. To efficiently adapt the pretrained model for downstream IAA tasks, we further propose a lightweight rank-based adapter that employs text as an anchor to learn the aesthetic ranking concept. Our results show that our pretrained aesthetic vision-language model outperforms prior works on image aesthetic captioning over the AVA-Captions dataset, and it has powerful zero-shot capability for aesthetic tasks such as zero-shot style classification and zero-shot IAA, surpassing many supervised baselines. With only minimal finetuning parameters using the proposed adapter module, our model achieves state-of-the-art IAA performance over the AVA dataset. <sup xmlns:mml=http://www.w3.org/1998/Math/MathML xmlns:xlink=http://www.w3.org/1999/xlink>1</sup> <sup xmlns:mml=http://www.w3.org/1998/Math/MathML xmlns:xlink=http://www.w3.org/1999/xlink>1</sup> Our model is available at https://github.com/google-research/google-research/tree/master/VILA" @default.
- W4386076227 created "2023-08-23" @default.
- W4386076227 creator A5002085979 @default.
- W4386076227 creator A5010755869 @default.
- W4386076227 creator A5030485695 @default.
- W4386076227 creator A5048405389 @default.
- W4386076227 creator A5058759451 @default.
- W4386076227 creator A5071396730 @default.
- W4386076227 date "2023-06-01" @default.
- W4386076227 modified "2023-09-26" @default.
- W4386076227 title "VILA: Learning Image Aesthetics from User Comments with Vision-Language Pretraining" @default.
- W4386076227 cites W1956340063 @default.
- W4386076227 cites W2056380823 @default.
- W4386076227 cites W2060277733 @default.
- W4386076227 cites W2078807908 @default.
- W4386076227 cites W2080754665 @default.
- W4386076227 cites W2101105183 @default.
- W4386076227 cites W2217895792 @default.
- W4386076227 cites W2467531333 @default.
- W4386076227 cites W2524036617 @default.
- W4386076227 cites W2560647685 @default.
- W4386076227 cites W2623430765 @default.
- W4386076227 cites W2756217618 @default.
- W4386076227 cites W2779483295 @default.
- W4386076227 cites W2807107013 @default.
- W4386076227 cites W2931027027 @default.
- W4386076227 cites W2962883796 @default.
- W4386076227 cites W2970231061 @default.
- W4386076227 cites W2976886057 @default.
- W4386076227 cites W2988824574 @default.
- W4386076227 cites W3012364787 @default.
- W4386076227 cites W3035523707 @default.
- W4386076227 cites W3035595647 @default.
- W4386076227 cites W3035712445 @default.
- W4386076227 cites W3091249416 @default.
- W4386076227 cites W3093501041 @default.
- W4386076227 cites W3103635814 @default.
- W4386076227 cites W3172942063 @default.
- W4386076227 cites W3193689960 @default.
- W4386076227 cites W4214745154 @default.
- W4386076227 cites W4285606417 @default.
- W4386076227 cites W4309933612 @default.
- W4386076227 cites W4312353506 @default.
- W4386076227 cites W4312574244 @default.
- W4386076227 cites W4382462760 @default.
- W4386076227 doi "https://doi.org/10.1109/cvpr52729.2023.00968" @default.
- W4386076227 hasPublicationYear "2023" @default.
- W4386076227 type Work @default.
- W4386076227 citedByCount "0" @default.
- W4386076227 crossrefType "proceedings-article" @default.
- W4386076227 hasAuthorship W4386076227A5002085979 @default.
- W4386076227 hasAuthorship W4386076227A5010755869 @default.
- W4386076227 hasAuthorship W4386076227A5030485695 @default.
- W4386076227 hasAuthorship W4386076227A5048405389 @default.
- W4386076227 hasAuthorship W4386076227A5058759451 @default.
- W4386076227 hasAuthorship W4386076227A5071396730 @default.
- W4386076227 hasConcept C115961682 @default.
- W4386076227 hasConcept C154945302 @default.
- W4386076227 hasConcept C157657479 @default.
- W4386076227 hasConcept C184337299 @default.
- W4386076227 hasConcept C195324797 @default.
- W4386076227 hasConcept C199360897 @default.
- W4386076227 hasConcept C204321447 @default.
- W4386076227 hasConcept C23123220 @default.
- W4386076227 hasConcept C41008148 @default.
- W4386076227 hasConceptScore W4386076227C115961682 @default.
- W4386076227 hasConceptScore W4386076227C154945302 @default.
- W4386076227 hasConceptScore W4386076227C157657479 @default.
- W4386076227 hasConceptScore W4386076227C184337299 @default.
- W4386076227 hasConceptScore W4386076227C195324797 @default.
- W4386076227 hasConceptScore W4386076227C199360897 @default.
- W4386076227 hasConceptScore W4386076227C204321447 @default.
- W4386076227 hasConceptScore W4386076227C23123220 @default.
- W4386076227 hasConceptScore W4386076227C41008148 @default.
- W4386076227 hasLocation W43860762271 @default.
- W4386076227 hasOpenAccess W4386076227 @default.
- W4386076227 hasPrimaryLocation W43860762271 @default.
- W4386076227 hasRelatedWork W159132833 @default.
- W4386076227 hasRelatedWork W2293457016 @default.
- W4386076227 hasRelatedWork W2794386110 @default.
- W4386076227 hasRelatedWork W2977842567 @default.
- W4386076227 hasRelatedWork W3111379637 @default.
- W4386076227 hasRelatedWork W3165078055 @default.
- W4386076227 hasRelatedWork W3197904250 @default.
- W4386076227 hasRelatedWork W4283315784 @default.
- W4386076227 hasRelatedWork W87581401 @default.
- W4386076227 hasRelatedWork W1872130062 @default.
- W4386076227 isParatext "false" @default.
- W4386076227 isRetracted "false" @default.
- W4386076227 workType "article" @default.