Matches in SemOpenAlex for { <https://semopenalex.org/work/W3175486514> ?p ?o ?g. }
Showing items 1 to 88 of
88
with 100 items per page.
- W3175486514 endingPage "13569" @default.
- W3175486514 startingPage "13561" @default.
- W3175486514 abstract "Models that top leaderboards often perform unsatisfactorily when deployed in real world applications; this has necessitated rigorous and expensive pre-deployment model testing. A hitherto unexplored facet of model performance is: Are our leaderboards doing equitable evaluation? In this paper, we introduce a task-agnostic method to probe leaderboards by weighting samples based on their 'difficulty' level. We find that leaderboards can be adversarially attacked and top performing models may not always be the best models. We subsequently propose alternate evaluation metrics. Our experiments on 10 models show changes in model ranking and an overall reduction in previously reported performance- thus rectifying the overestimation of AI systems' capabilities. Inspired by behavioral testing principles, we further develop a prototype of a visual analytics tool that enables leaderboard revamping through customization, based on an end user's focus area. This helps users analyze models' strengths and weaknesses, and guides them in the selection of a model best suited for their application scenario. In a user study, members of various commercial product development teams, covering 5 focus areas, find that our prototype reduces pre-deployment development and testing effort by 41% on average." @default.
- W3175486514 created "2021-07-05" @default.
- W3175486514 creator A5063722751 @default.
- W3175486514 creator A5087561934 @default.
- W3175486514 date "2021-05-18" @default.
- W3175486514 modified "2023-10-18" @default.
- W3175486514 title "How Robust are Model Rankings : A Leaderboard Customization Approach for Equitable Evaluation" @default.
- W3175486514 hasPublicationYear "2021" @default.
- W3175486514 type Work @default.
- W3175486514 sameAs 3175486514 @default.
- W3175486514 citedByCount "4" @default.
- W3175486514 countsByYear W31754865142021 @default.
- W3175486514 crossrefType "proceedings-article" @default.
- W3175486514 hasAuthorship W3175486514A5063722751 @default.
- W3175486514 hasAuthorship W3175486514A5087561934 @default.
- W3175486514 hasConcept C105339364 @default.
- W3175486514 hasConcept C111472728 @default.
- W3175486514 hasConcept C115903868 @default.
- W3175486514 hasConcept C119857082 @default.
- W3175486514 hasConcept C120665830 @default.
- W3175486514 hasConcept C121332964 @default.
- W3175486514 hasConcept C126838900 @default.
- W3175486514 hasConcept C127413603 @default.
- W3175486514 hasConcept C136764020 @default.
- W3175486514 hasConcept C138885662 @default.
- W3175486514 hasConcept C154945302 @default.
- W3175486514 hasConcept C183003079 @default.
- W3175486514 hasConcept C183115368 @default.
- W3175486514 hasConcept C189430467 @default.
- W3175486514 hasConcept C192209626 @default.
- W3175486514 hasConcept C201995342 @default.
- W3175486514 hasConcept C2780451532 @default.
- W3175486514 hasConcept C41008148 @default.
- W3175486514 hasConcept C42475967 @default.
- W3175486514 hasConcept C63882131 @default.
- W3175486514 hasConcept C71924100 @default.
- W3175486514 hasConceptScore W3175486514C105339364 @default.
- W3175486514 hasConceptScore W3175486514C111472728 @default.
- W3175486514 hasConceptScore W3175486514C115903868 @default.
- W3175486514 hasConceptScore W3175486514C119857082 @default.
- W3175486514 hasConceptScore W3175486514C120665830 @default.
- W3175486514 hasConceptScore W3175486514C121332964 @default.
- W3175486514 hasConceptScore W3175486514C126838900 @default.
- W3175486514 hasConceptScore W3175486514C127413603 @default.
- W3175486514 hasConceptScore W3175486514C136764020 @default.
- W3175486514 hasConceptScore W3175486514C138885662 @default.
- W3175486514 hasConceptScore W3175486514C154945302 @default.
- W3175486514 hasConceptScore W3175486514C183003079 @default.
- W3175486514 hasConceptScore W3175486514C183115368 @default.
- W3175486514 hasConceptScore W3175486514C189430467 @default.
- W3175486514 hasConceptScore W3175486514C192209626 @default.
- W3175486514 hasConceptScore W3175486514C201995342 @default.
- W3175486514 hasConceptScore W3175486514C2780451532 @default.
- W3175486514 hasConceptScore W3175486514C41008148 @default.
- W3175486514 hasConceptScore W3175486514C42475967 @default.
- W3175486514 hasConceptScore W3175486514C63882131 @default.
- W3175486514 hasConceptScore W3175486514C71924100 @default.
- W3175486514 hasIssue "15" @default.
- W3175486514 hasLocation W31754865141 @default.
- W3175486514 hasOpenAccess W3175486514 @default.
- W3175486514 hasPrimaryLocation W31754865141 @default.
- W3175486514 hasRelatedWork W2012132303 @default.
- W3175486514 hasRelatedWork W2073987314 @default.
- W3175486514 hasRelatedWork W2089373480 @default.
- W3175486514 hasRelatedWork W2222859420 @default.
- W3175486514 hasRelatedWork W2319083155 @default.
- W3175486514 hasRelatedWork W2559798383 @default.
- W3175486514 hasRelatedWork W2794713557 @default.
- W3175486514 hasRelatedWork W2945804722 @default.
- W3175486514 hasRelatedWork W2968673376 @default.
- W3175486514 hasRelatedWork W2974688375 @default.
- W3175486514 hasRelatedWork W298569435 @default.
- W3175486514 hasRelatedWork W3008933805 @default.
- W3175486514 hasRelatedWork W3037368595 @default.
- W3175486514 hasRelatedWork W3081464307 @default.
- W3175486514 hasRelatedWork W3104523900 @default.
- W3175486514 hasRelatedWork W3122812598 @default.
- W3175486514 hasRelatedWork W3158434099 @default.
- W3175486514 hasRelatedWork W3167512058 @default.
- W3175486514 hasRelatedWork W3172775359 @default.
- W3175486514 hasRelatedWork W3214484773 @default.
- W3175486514 hasVolume "35" @default.
- W3175486514 isParatext "false" @default.
- W3175486514 isRetracted "false" @default.
- W3175486514 magId "3175486514" @default.
- W3175486514 workType "article" @default.