Matches in SemOpenAlex for { <https://semopenalex.org/work/W4387006422> ?p ?o ?g. }
- W4387006422 abstract "Over the last years, the ever-growing number of Machine Learning(ML) and Artificial Intelligence(AI) applications deployed in the Cloud has led to high demands on the computing resources required for efficient processing. Multiple users deploy multiple applications on the same server node to maximize Quality of Service(QoS); however, this leads to increased interference. In addition, Cloud providers aim to minimize their operating costs by efficiently utilizing the available resources. These conflicting optimization goals form a complex paradigm where efficient scheduling is required. In this work, we present IRIS, an interference- and resource-aware predictive inference scheduling framework for ML inference serving in the cloud. We target the multi-objective problem of QoS maximization with effective CPU utilization based on Queries per Second(QPS) predictions by proposing a modelless ML-based solution and integrating it into the Kubernetes platform. Our approach is evaluated over real hardware infrastructure and a set of ML applications. Our experimental analysis shows that under various QoS constraints, the model specific interference-aware scheduler violates QoS constraints less frequently by achieving 1.8x fewer violations, on average, compared to over-provisioning and 3.1 x fewer violations compared to under-provisioning, through efficient exploitation of available CPU resources. The model-less feature is able to cause, on average, 1.5x fewer violations compared to the model-specific scheduler, while further reducing the average CPU utilization by <tex xmlns:mml=http://www.w3.org/1998/Math/MathML xmlns:xlink=http://www.w3.org/1999/xlink>$approx 30{%}$</tex> ." @default.
- W4387006422 created "2023-09-26" @default.
- W4387006422 creator A5002172931 @default.
- W4387006422 creator A5028774103 @default.
- W4387006422 creator A5043131021 @default.
- W4387006422 creator A5076498836 @default.
- W4387006422 creator A5090343962 @default.
- W4387006422 creator A5092442462 @default.
- W4387006422 date "2023-07-01" @default.
- W4387006422 modified "2023-09-26" @default.
- W4387006422 title "IRIS: Interference and Resource Aware Predictive Orchestration for ML Inference Serving" @default.
- W4387006422 cites W1982063824 @default.
- W4387006422 cites W1988184485 @default.
- W4387006422 cites W2060032722 @default.
- W4387006422 cites W2062017159 @default.
- W4387006422 cites W2064290949 @default.
- W4387006422 cites W2084226860 @default.
- W4387006422 cites W2084819966 @default.
- W4387006422 cites W2156077332 @default.
- W4387006422 cites W2194775991 @default.
- W4387006422 cites W2490603845 @default.
- W4387006422 cites W2772948367 @default.
- W4387006422 cites W2794670651 @default.
- W4387006422 cites W2892341857 @default.
- W4387006422 cites W2934208298 @default.
- W4387006422 cites W2944614352 @default.
- W4387006422 cites W2953169926 @default.
- W4387006422 cites W2977714483 @default.
- W4387006422 cites W2986864338 @default.
- W4387006422 cites W2999580716 @default.
- W4387006422 cites W3017091196 @default.
- W4387006422 cites W3039010666 @default.
- W4387006422 cites W3046986616 @default.
- W4387006422 cites W3095488153 @default.
- W4387006422 cites W3096411172 @default.
- W4387006422 cites W3100676516 @default.
- W4387006422 cites W3121263745 @default.
- W4387006422 cites W3137917418 @default.
- W4387006422 cites W3156127671 @default.
- W4387006422 cites W3165292244 @default.
- W4387006422 cites W3204216247 @default.
- W4387006422 cites W3205373118 @default.
- W4387006422 cites W3210776666 @default.
- W4387006422 cites W4214690606 @default.
- W4387006422 cites W4226109017 @default.
- W4387006422 cites W4236491543 @default.
- W4387006422 cites W4253824360 @default.
- W4387006422 cites W4283727273 @default.
- W4387006422 cites W4320067926 @default.
- W4387006422 doi "https://doi.org/10.1109/cloud60044.2023.00021" @default.
- W4387006422 hasPublicationYear "2023" @default.
- W4387006422 type Work @default.
- W4387006422 citedByCount "0" @default.
- W4387006422 crossrefType "proceedings-article" @default.
- W4387006422 hasAuthorship W4387006422A5002172931 @default.
- W4387006422 hasAuthorship W4387006422A5028774103 @default.
- W4387006422 hasAuthorship W4387006422A5043131021 @default.
- W4387006422 hasAuthorship W4387006422A5076498836 @default.
- W4387006422 hasAuthorship W4387006422A5090343962 @default.
- W4387006422 hasAuthorship W4387006422A5092442462 @default.
- W4387006422 hasConcept C111919701 @default.
- W4387006422 hasConcept C119857082 @default.
- W4387006422 hasConcept C120314980 @default.
- W4387006422 hasConcept C126255220 @default.
- W4387006422 hasConcept C154945302 @default.
- W4387006422 hasConcept C172191483 @default.
- W4387006422 hasConcept C206729178 @default.
- W4387006422 hasConcept C2776214188 @default.
- W4387006422 hasConcept C31258907 @default.
- W4387006422 hasConcept C33923547 @default.
- W4387006422 hasConcept C41008148 @default.
- W4387006422 hasConcept C5119721 @default.
- W4387006422 hasConcept C77088390 @default.
- W4387006422 hasConcept C79974875 @default.
- W4387006422 hasConceptScore W4387006422C111919701 @default.
- W4387006422 hasConceptScore W4387006422C119857082 @default.
- W4387006422 hasConceptScore W4387006422C120314980 @default.
- W4387006422 hasConceptScore W4387006422C126255220 @default.
- W4387006422 hasConceptScore W4387006422C154945302 @default.
- W4387006422 hasConceptScore W4387006422C172191483 @default.
- W4387006422 hasConceptScore W4387006422C206729178 @default.
- W4387006422 hasConceptScore W4387006422C2776214188 @default.
- W4387006422 hasConceptScore W4387006422C31258907 @default.
- W4387006422 hasConceptScore W4387006422C33923547 @default.
- W4387006422 hasConceptScore W4387006422C41008148 @default.
- W4387006422 hasConceptScore W4387006422C5119721 @default.
- W4387006422 hasConceptScore W4387006422C77088390 @default.
- W4387006422 hasConceptScore W4387006422C79974875 @default.
- W4387006422 hasLocation W43870064221 @default.
- W4387006422 hasOpenAccess W4387006422 @default.
- W4387006422 hasPrimaryLocation W43870064221 @default.
- W4387006422 hasRelatedWork W1982410425 @default.
- W4387006422 hasRelatedWork W2004072179 @default.
- W4387006422 hasRelatedWork W2084667297 @default.
- W4387006422 hasRelatedWork W2157044008 @default.
- W4387006422 hasRelatedWork W2347158830 @default.
- W4387006422 hasRelatedWork W2550639320 @default.
- W4387006422 hasRelatedWork W2560090078 @default.
- W4387006422 hasRelatedWork W2905824599 @default.
- W4387006422 hasRelatedWork W2949516016 @default.