Matches in SemOpenAlex for { <https://semopenalex.org/work/W4387609097> ?p ?o ?g. }
Showing items 1 to 73 of
73
with 100 items per page.
- W4387609097 endingPage "15" @default.
- W4387609097 startingPage "1" @default.
- W4387609097 abstract "Deep neural network (DNN) foundation models are currently exhibiting high prediction accuracy and strong adaptability to broad tasks with remarkably large model scales. They are increasingly becoming the backend support of DNN-driven real-time online services, e.g., Siri and Instagram. Such services require low-latency and cost-efficiency for quality-of-service and commercial competitiveness. When deployed in a cloud environment, these services call for an appropriate selection of cloud configurations (i.e., specific types of VM instances), as well as a considerate device placement plan that places the operations of the model to multiple GPUs via model parallelism for cost-efficiency. Currently, the deployment mainly relies on service providers’ manual efforts, which is not only onerous but also far from satisfactory oftentimes due to the huge joint search space of cloud configurations and device placement plans (for a same service, a poor deployment can incur significantly more costs by tens of times). In this paper, we attempt to efficiently automate the cloud deployment for real-time foundation model inference with minimum costs under the constraint of acceptably low latency. This attempt is enabled by 1) jointly leveraging the Bayesian Optimization and Deep Reinforcement Learning to adaptively unearth the (nearly) optimal cloud configuration and device placement with limited search time, and 2) enhancing the cost-efficiency of the deployment based on the probing-informed block multiplexing mechanism and Tensor Algebra SuperOptimizer. We implement a prototype system based on TensorFlow, conduct extensive experiments on top of Microsoft Azure, and demonstrate the generality and scalability of our solution. Results show that for lightweight DNN models and foundation models, our solution essentially saves inference costs by up to 15% and 47% with 57% and 38% lower search overheads respectively, compared with non-trivial baselines." @default.
- W4387609097 created "2023-10-14" @default.
- W4387609097 creator A5002867627 @default.
- W4387609097 creator A5003951925 @default.
- W4387609097 creator A5049750274 @default.
- W4387609097 creator A5077532419 @default.
- W4387609097 creator A5085281913 @default.
- W4387609097 date "2023-01-01" @default.
- W4387609097 modified "2023-10-14" @default.
- W4387609097 title "Automating Cloud Deployment for Real-Time Online Foundation Model Inference" @default.
- W4387609097 doi "https://doi.org/10.1109/tnet.2023.3321967" @default.
- W4387609097 hasPublicationYear "2023" @default.
- W4387609097 type Work @default.
- W4387609097 citedByCount "0" @default.
- W4387609097 crossrefType "journal-article" @default.
- W4387609097 hasAuthorship W4387609097A5002867627 @default.
- W4387609097 hasAuthorship W4387609097A5003951925 @default.
- W4387609097 hasAuthorship W4387609097A5049750274 @default.
- W4387609097 hasAuthorship W4387609097A5077532419 @default.
- W4387609097 hasAuthorship W4387609097A5085281913 @default.
- W4387609097 hasConcept C105339364 @default.
- W4387609097 hasConcept C111919701 @default.
- W4387609097 hasConcept C115903868 @default.
- W4387609097 hasConcept C119857082 @default.
- W4387609097 hasConcept C120314980 @default.
- W4387609097 hasConcept C154945302 @default.
- W4387609097 hasConcept C177606310 @default.
- W4387609097 hasConcept C18903297 @default.
- W4387609097 hasConcept C31258907 @default.
- W4387609097 hasConcept C41008148 @default.
- W4387609097 hasConcept C48044578 @default.
- W4387609097 hasConcept C5119721 @default.
- W4387609097 hasConcept C77088390 @default.
- W4387609097 hasConcept C79974875 @default.
- W4387609097 hasConcept C86803240 @default.
- W4387609097 hasConceptScore W4387609097C105339364 @default.
- W4387609097 hasConceptScore W4387609097C111919701 @default.
- W4387609097 hasConceptScore W4387609097C115903868 @default.
- W4387609097 hasConceptScore W4387609097C119857082 @default.
- W4387609097 hasConceptScore W4387609097C120314980 @default.
- W4387609097 hasConceptScore W4387609097C154945302 @default.
- W4387609097 hasConceptScore W4387609097C177606310 @default.
- W4387609097 hasConceptScore W4387609097C18903297 @default.
- W4387609097 hasConceptScore W4387609097C31258907 @default.
- W4387609097 hasConceptScore W4387609097C41008148 @default.
- W4387609097 hasConceptScore W4387609097C48044578 @default.
- W4387609097 hasConceptScore W4387609097C5119721 @default.
- W4387609097 hasConceptScore W4387609097C77088390 @default.
- W4387609097 hasConceptScore W4387609097C79974875 @default.
- W4387609097 hasConceptScore W4387609097C86803240 @default.
- W4387609097 hasFunder F4320321001 @default.
- W4387609097 hasFunder F4320321543 @default.
- W4387609097 hasFunder F4320333993 @default.
- W4387609097 hasFunder F4320335777 @default.
- W4387609097 hasFunder F4320336567 @default.
- W4387609097 hasLocation W43876090971 @default.
- W4387609097 hasOpenAccess W4387609097 @default.
- W4387609097 hasPrimaryLocation W43876090971 @default.
- W4387609097 hasRelatedWork W2016108640 @default.
- W4387609097 hasRelatedWork W2047454415 @default.
- W4387609097 hasRelatedWork W2070040999 @default.
- W4387609097 hasRelatedWork W2348924972 @default.
- W4387609097 hasRelatedWork W2357124094 @default.
- W4387609097 hasRelatedWork W2365736347 @default.
- W4387609097 hasRelatedWork W2387293848 @default.
- W4387609097 hasRelatedWork W2387399993 @default.
- W4387609097 hasRelatedWork W2389739210 @default.
- W4387609097 hasRelatedWork W3121791438 @default.
- W4387609097 isParatext "false" @default.
- W4387609097 isRetracted "false" @default.
- W4387609097 workType "article" @default.