Resource scheduling techniques in cloud from a view of coordination: a holistic survey

Yuzhao WANG; Junqing YU; Zhibin YU

doi:10.1631/FITEE.2100298

Your Location：

Home >

Browse articles >

Resource scheduling techniques in cloud from a view of coordination: a holistic survey

Regular Papers | Updated：2023-01-28

- Resource scheduling techniques in cloud from a view of coordination: a holistic survey
  Enhanced Publication
- 从协同视角论云资源调度技术：综述
- Frontiers of Information Technology & Electronic Engineering Vol. 24, Issue 1, Pages: 1-40(2023)
- Affiliations：
  
  1.School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China
  2.Center of Heterogeneous Intelligent Computer Architecture and Systems, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
- Author bio：
  
  E-mail: yuzhao_w@hust.edu.cn;
  ‡Corresponding authors
  zb.yu@siat.ac.cn
- Funds：
- DOI：10.1631/FITEE.2100298
  CLC： TP39
- Received：24 June 2021，
  
  Accepted：02 November 2021，
  
  Published：0 January 2023
- Accepted：
Scan QR Code
Yuzhao WANG, Junqing YU, Zhibin YU. Resource scheduling techniques in cloud from a view of coordination: a holistic survey[J]. Frontiers of Information Technology & Electronic Engineering, 2023, 24(1): 1-40.
DOI：

Yuzhao WANG, Junqing YU, Zhibin YU. Resource scheduling techniques in cloud from a view of coordination: a holistic survey[J]. Frontiers of Information Technology & Electronic Engineering, 2023, 24(1): 1-40. DOI： 10.1631/FITEE.2100298.

摘要

当前公有云中的资源竞争管控仍然是一个悬而未决的问题。新型应用框架（如深度学习和微服务）和专用硬件（如GPU和TPU）的开发与部署给资源管理系统的设计带来新的挑战。现有的解决方案往往为保证应用性能而牺牲集群效率，如资源超额分配导致的低利用率。由于涉及到了软件栈中的不同模块，突破该困境并非易事。尽管如此，产学界为寻找高效的性能隔离和资源调度进行了大量的研究。本文从协同的角度对相关工作进行了全面概述，并揭示其中的技术发展趋势。简言之，本文涉及如下四个主题: 不同层次上（包括微体系结构、系统和虚拟层）的资源隔离机制，包括GPU多任务处理; 机器层和集群层的资源调度技术，包括面向深度学习应用的GPU调度技术; 自适应资源管理技术，包括微服务相关的最新研究; 最后探讨了未来的研究方向。希望本文能帮助相关研究人员了解公有云中资源管理技术的概貌，并更好地把握其发展趋势。

Abstract

Nowadays

the management of resource contention in shared cloud remains a pending problem. The evolution and deployment of new application paradigms (e.g.

deep learning training and microservices) and custom hardware (e.g.

graphics processing unit (GPU) and tensor processing unit (TPU)) have posed new challenges in resource management system design. Current solutions tend to trade cluster efficiency for guaranteed application performance

e.g.

resource over-allocation

leaving a lot of resources underutilized. Overcoming this dilemma is not easy

because different components across the software stack are involved. Nevertheless

massive efforts have been devoted to seeking effective performance isolation and highly efficient resource scheduling. The goal of this paper is to systematically cover related aspects to deliver the techniques from the coordination perspective

and to identify the corresponding trends they indicate. Briefly

four topics are involved. First

isolation mechanisms deployed at different levels (micro-architecture

system

and virtualization levels) are reviewed

including GPU multitasking methods. Second

resource scheduling techniques within an individual machine and at the cluster level are investigated

respectively. Particularly

GPU scheduling for deep learning applications is described in detail. Third

adaptive resource management including the latest microservice-related research is thoroughly explored. Finally

future research directions are discussed in the light of advanced work. We hope that this review paper will help researchers establish a global view of the landscape of resource management techniques in shared cloud

and see technology trends more clearly.

关键词

Keywords

references

Achermann R , Panwar A , Bhattacharjee A , et al. , 2020 . Mitosis: transparently self-replicating page-tables for large-memory machines . Proc 25 th Int Conf on Architectural Support for Programming Languages and Operating Systems , p. 283 - 300 . doi: 10.1145/3373376.3378468 http://doi.org/10.1145/3373376.3378468

Akkus IE , Chen RC , Rimac I , et al. , 2018 . SAND: towards high-performance serverless computing . Proc USENIX Annual Technical Conf , p. 923 - 935 .

Alibaba , 2020 . Fuxi 2.0—The Core Dispatching System of Ali Economy Towards the Big Data and Cloud Computing Scheduling Challenge ( in Chinese ). https://developer.aliyun.com/article/760083 https://developer.aliyun.com/article/760083 [Accessed on July 1, 2021 ].

Ananthanarayanan G , Douglas C , Ramakrishnan R , et al. , 2012 . True elasticity in multi-tenant data-intensive compute clusters . Proc 3 rd ACM Symp on Cloud Computing , p. 1 - 7 .

Asmussen N , Völp M , Nöthen B , et al. , 2016 . M3: a hardware/operating-system co-design to tame heterogeneous manycores . Proc 21 st Int Conf on Architectural Support for Programming Languages and Operating Systems , p. 189 - 203 . doi: 10.1145/2872362.2872371 http://doi.org/10.1145/2872362.2872371

Ausavarungnirun R , Miller V , Landgraf J , et al. , 2018 . MASK: redesigning the GPU memory hierarchy to support multi-application concurrency . Proc 23 rd Int Conf on Architectural Support for Programming Languages and Operating Systems , p. 503 - 518 . doi: 10.1145/3173162.3173169 http://doi.org/10.1145/3173162.3173169

Bao YX , Peng YH , Wu C , 2019 . Deep learning-based job placement in distributed machine learning clusters . Proc IEEE Conf on Computer Communications , p. 505 - 513 . doi: 10.1109/INFOCOM.2019.8737460 http://doi.org/10.1109/INFOCOM.2019.8737460

Bauman E , Ayoade G , Lin ZQ , 2015 . A survey on hypervisor-based monitoring: approaches, applications, and evolutions . ACM Comput Surv , 48 ( 1 ): 10 . doi: 10.1145/2775111 http://doi.org/10.1145/2775111

Baumann A , Barham P , Dagand PE , et al. , 2009 . The multi-kernel: a new OS architecture for scalable multicore systems . Proc ACM SIGOPS 22 nd Symp on Operating Systems Principles , p. 29 - 44 . doi: 10.1145/1629575.1629579 http://doi.org/10.1145/1629575.1629579

Berger DS , Berg B , Zhu T , et al. , 2018 . RobinHood: tail latency-aware caching—dynamically reallocating from cache-rich to cache-poor . Proc 13 th USENIX Conf on Operating Systems Design and Implementation , p. 195 - 212 .

Bhadauria M , McKee SA , 2010 . An approach to resource-aware co-scheduling for CMPs . Proc 24 th ACM Int Conf on Supercomputing , p. 189 - 199 . doi: 10.1145/1810085.1810113 http://doi.org/10.1145/1810085.1810113

Bitirgen R , Ipek E , Martinez JF , 2008 . Coordinated management of multiple interacting resources in chip multi-processors: a machine learning approach . Proc 41 st IEEE/ACM Int Symp on Microarchitecture , p. 318 - 329 . doi: 10.1109/MICRO.2008.4771801 http://doi.org/10.1109/MICRO.2008.4771801

Blagodurov S , Zhuravlev S , Fedorova A , et al. , 2010 . A case for NUMA-aware contention management on multicore systems . Proc 19 th Int Conf on Parallel Architectures and Compilation Techniques , p. 557 - 558 . doi: 10.1145/1854273.1854350 http://doi.org/10.1145/1854273.1854350

Boucher S , Kalia A , Andersen DG , et al. , 2018 . Putting the “micro” back in microservice . Proc USENIX Annual Technical Conf , p. 645 - 650 .

Boutin E , Ekanayake J , Lin W , et al. , 2014 . Apollo: scalable and coordinated scheduling for cloud-scale computing . Proc 11 th USENIX Symp on Operating Systems Design and Implementation , p. 285 - 300 .

Cadden J , Unger T , Awad Y , et al. , 2020 . SEUSS: skip redundant paths to make serverless fast . Proc 15 th European Conf on Computer Systems , p. 1 - 15 . doi: 10.1145/3342195.3392698 http://doi.org/10.1145/3342195.3392698

Carastan-Santos D , de Camargo RY , 2017 . Obtaining dynamic scheduling policies with simulation and machine learning . Proc Int Conf for High Performance Computing, Networking, Storage and Analysis , p. 1 - 13 . doi: 10.1145/3126908.3126955 http://doi.org/10.1145/3126908.3126955

Carvalho M , Cirne W , Brasileiro F , et al. , 2014 . Long-term SLOs for reclaimed cloud computing resources . Proc ACM Symp on Cloud Computing , p. 1 - 13 . doi: 10.1145/2670979.2670999 http://doi.org/10.1145/2670979.2670999

Castelló A , Peña AJ , Mayo R , 2018 . Exploring the interoperability of remote GPGPU virtualization using rCUDA and directive-based programming models . J Supercomput , 74 ( 11 ): 5628 - 5642 . doi: 10.1007/s11227-016-1791-y http://doi.org/10.1007/s11227-016-1791-y

Chandra D , Guo F , Kim S , et al. , 2005 . Predicting inter-thread cache contention on a chip multi-processor architecture . Proc 11 th Int Symp on High-Performance Computer Architecture , p. 340 - 351 . doi: 10.1109/HPCA.2005.27 http://doi.org/10.1109/HPCA.2005.27

Chaudhary S , Ramjee R , Sivathanu M , et al. , 2020 . Balancing efficiency and fairness in heterogeneous GPU clusters for deep learning . Proc 15 th European Conf on Computer Systems , p. 1 - 16 . doi: 10.1145/3342195.3387555 http://doi.org/10.1145/3342195.3387555

Chen L , Lingys J , Chen K , et al. , 2018 . AuTO: scaling deep reinforcement learning for datacenter-scale automatic traffic optimization . Proc Conf of the ACM Special Interest Group on Data Communication , p. 191 - 205 . doi: 10.1145/3230543.3230551 http://doi.org/10.1145/3230543.3230551

Chen Q , Yang HL , Mars J , et al. , 2016 . Baymax: QoS awareness and increased utilization for non-preemptive accelerators in warehouse scale computers . Proc 21 st Int Conf on Architectural Support for Programming Languages and Operating Systems , p. 681 - 696 . doi: 10.1145/2872362.2872368 http://doi.org/10.1145/2872362.2872368

Chen Q , Yang HL , Guo MY , et al. , 2017 . Prophet: precise QoS prediction on non-preemptive accelerators to improve utilization in warehouse-scale computers . Proc 22 nd Int Conf on Architectural Support for Programming Languages and Operating Systems , p. 17 - 32 . doi: 10.1145/3037697.3037700 http://doi.org/10.1145/3037697.3037700

Chen Q , Wang ZN , Leng JW , et al. , 2019 . Avalon: towards QoS awareness and improved utilization through multi-resource management in datacenters . Proc ACM Int Conf on Supercomputing , p. 272 - 283 . doi: 10.1145/3330345.3330370 http://doi.org/10.1145/3330345.3330370

Chen W , Rao J , Zhou XB , 2017 . Preemptive, low latency datacenter scheduling via lightweight virtualization . Proc USENIX Annual Technical Conf , p. 251 - 263 .

Cherkasova L , Gupta D , Vahdat A , 2007 . Comparison of the three CPU schedulers in Xen . ACM SIGMETRICS Perform Eval Rev , 35 ( 2 ): 42 - 51 . doi: 10.1145/1330555.1330556 http://doi.org/10.1145/1330555.1330556

Cho S , Jin L , 2006 . Managing distributed, shared L2 caches through OS-level page allocation . Proc 39 th Annual IEEE/ACM Int Symp on Microarchitecture , p. 455 - 468 . doi: 10.1109/MICRO.2006.31 http://doi.org/10.1109/MICRO.2006.31

Curino C , Difallah DE , Douglas C , et al. , 2014 . Reservation-based scheduling: if you’re late don’t blame us! Proc ACM Symp on Cloud Computing , p. 1 - 14 . doi: 10.1145/2670979.2670981 http://doi.org/10.1145/2670979.2670981

Dai GH , Huang TH , Chi YZ , et al. , 2017 . ForeGraph: exploring large-scale graph processing on multi-FPGA architecture . Proc ACM/SIGDA Int Symp on Field-Programmable Gate Arrays , p. 217 - 226 . doi: 10.1145/3020078.3021739 http://doi.org/10.1145/3020078.3021739

Dean J , Barroso LA , 2013 . The tail at scale . Commun ACM , 56 ( 2 ): 74 - 80 . doi: 10.1145/2408776.2408794 http://doi.org/10.1145/2408776.2408794

Delgado P , Dinu F , Kermarrec AM , et al. , 2015 . Hawk: hybrid datacenter scheduling . Proc USENIX Annual Technical Conf , p. 499 - 510 .

Delgado P , Didona D , Dinu F , et al. , 2016 . Job-aware scheduling in Eagle: divide and stick to your probes . Proc 7 th ACM Symp on Cloud Computing , p. 497 - 509 . doi: 10.1145/2987550.2987563 http://doi.org/10.1145/2987550.2987563

Delimitrou C , Kozyrakis C , 2014 . Quasar: resource-efficient and QoS-aware cluster management . Proc 19 th Int Conf on Architectural Support for Programming Languages and Operating Systems , p. 127 - 144 . doi: 10.1145/2541940.2541941 http://doi.org/10.1145/2541940.2541941

Delimitrou C , Kozyrakis C , 2016 . HCloud: resource-efficient provisioning in shared cloud systems . Proc 21 st Int Conf on Architectural Support for Programming Languages and Operating Systems , p. 473 - 488 . doi: 10.1145/2872362.2872365 http://doi.org/10.1145/2872362.2872365

Delimitrou C , Sanchez D , Kozyrakis C , 2015 . Tarcil: reconciling scheduling speed and quality in large shared clusters . Proc 6 th ACM Symp on Cloud Computing , p. 97 - 110 . doi: 10.1145/2806777.2806779 http://doi.org/10.1145/2806777.2806779

Dhakal A , Kulkarni SG , Ramakrishnan KK , 2020 . GSLICE: controlled spatial sharing of GPUs for a scalable inference platform . Proc 11 th ACM Symp on Cloud Computing , p. 492 - 506 . doi: 10.1145/3419111.3421284 http://doi.org/10.1145/3419111.3421284

Ebrahimi E , Lee CJ , Mutlu O , et al. , 2010 . Fairness via source throttling: a configurable and high-performance fairness substrate for multi-core memory systems . Proc 15 th Int Conf on Architectural Support for Programming Languages and Operating Systems , p. 335 - 346 . doi: 10.1145/1736020.1736058 http://doi.org/10.1145/1736020.1736058

Engler DR , Kaashoek MF , O’Toole J , 1995 . Exokernel: an operating system architecture for application-level resource management . Proc 15 th ACM Symp on Operating Systems Principles , p. 251 - 266 . doi: 10.1145/224056.224076 http://doi.org/10.1145/224056.224076

Eyerman S , Eeckhout L , 2010 . Probabilistic job symbiosis modeling for SMT processor scheduling . Proc 15 th Int Conf on Architectural Support for Programming Languages and Operating Systems , p. 91 - 102 . doi: 10.1145/1736020.1736033 http://doi.org/10.1145/1736020.1736033

Facebook , 2015 . Facebook Disaggregated Rack . http://goo.gl/6h2Ut http://goo.gl/6h2Ut [Accessed on July 1, 2021 ].

Feliu J , Sahuquillo J , Petit S , et al. , 2013 . L1-bandwidth aware thread allocation in multicore SMT processors . Proc 22 nd Int Conf on Parallel Architectures and Compilation Techniques , p. 123 - 132 . doi: 10.1109/PACT.2013.6618810 http://doi.org/10.1109/PACT.2013.6618810

Feliu J , Eyerman S , Sahuquillo J , et al. , 2016 . Symbiotic job scheduling on the IBM POWER8 . Proc IEEE Int Symp on High Performance Computer Architecture , p. 669 - 680 . doi: 10.1109/HPCA.2016.7446103 http://doi.org/10.1109/HPCA.2016.7446103

Firestone D , Putnam A , Mundkur S , et al. , 2018 . Azure accelerated networking: smartnics in the public cloud . Proc 15 th USENIX Symp on Networked Systems Design and Implementation , p. 51 - 66 .

Fowers J , Ovtcharov K , Papamichael M , et al. , 2018 . A configurable cloud-scale DNN processor for real-time AI . Proc ACM/IEEE 45 th Annual Int Symp on Computer Architecture , p. 1 - 14 . doi: 10.1109/ISCA.2018.00012 http://doi.org/10.1109/ISCA.2018.00012

Gan Y , Zhang YQ , Cheng DL , et al. , 2019a . An open-source benchmark suite for microservices and their hardware-software implications for cloud & edge systems . Proc 24 th Int Conf on Architectural Support for Programming Languages and Operating Systems , p. 3 - 18 . doi: 10.1145/3297858.3304013 http://doi.org/10.1145/3297858.3304013

Gan Y , Zhang YQ , Hu K , et al. , 2019b . Seer: leveraging big data to navigate the complexity of performance debugging in cloud microservices . Proc 24 th Int Conf on Architectural Support for Programming Languages and Operating Systems , p. 19 - 33 . doi: 10.1145/3297858.3304004 http://doi.org/10.1145/3297858.3304004

Gan Y , Liang MY , Dev S , et al. , 2021 . Sage: practical and scalable ML-driven performance debugging in microservices . Proc 26 th ACM Int Conf on Architectural Support for Programming Languages and Operating Systems , p. 135 - 151 . doi: 10.1145/3445814.3446700 http://doi.org/10.1145/3445814.3446700

Giceva J , 2016 . Database/Operating System Co-design . PhD Thesis, ETH Zurich , Switzerland .

Goder A , Spiridonov A , Wang Y , 2015 . Bistro: scheduling data-parallel jobs against live production systems . Proc USENIX Annual Technical Conf , p. 459 - 471 .

Gog I , Schwarzkopf M , Gleave A , et al. , 2016 . Firmament: fast, centralized cluster scheduling at scale . Proc 12 th USENIX Conf on Operating Systems Design and Implementation , p. 99 - 115 .

Goglin B , Furmento N , 2009 . Enabling high-performance memory migration for multithreaded applications on LINUX . Proc IEEE Int Symp on Parallel & Distributed Processing , p. 1 - 9 . doi: 10.1109/IPDPS.2009.5161101 http://doi.org/10.1109/IPDPS.2009.5161101

Grandl R , Ananthanarayanan G , Kandula S , et al. , 2014 . Multi-resource packing for cluster schedulers . Proc ACM Conf on SIGCOMM , p. 455 - 466 . doi: 10.1145/2619239.2626334 http://doi.org/10.1145/2619239.2626334

Grandl R , Chowdhury M , Akella A , et al. , 2016a . Altruistic scheduling in multi-resource clusters . Proc 12 th USENIX Conf on Operating Systems Design and Implementation , p. 65 - 80 .

Grandl R , Kandula S , Rao S , et al. , 2016b . Graphene: packing and dependency-aware scheduling for data-parallel clusters . Proc 12 th USENIX Conf on Operating Systems Design and Implementation , p. 81 - 97 .

Grulich PM , Nawab F , 2018 . Collaborative edge and cloud neural networks for real-time video processing . Proc VLDB Endow , 11 ( 12 ): 2046 - 2049 . doi: 10.14778/3229863.3236256 http://doi.org/10.14778/3229863.3236256

Gu JC , Chowdhury M , Shin KG , et al. , 2019 . Tiresias: a GPU cluster manager for distributed deep learning . Proc 16 th USENIX Symp on Networked Systems Design and Implementation , p. 485 - 500 .

Guo F , Li YK , Lui JCS , et al. , 2019 . DCUDA: dynamic GPU scheduling with live migration support . Proc ACM Symp on Cloud Computing , p. 114 - 125 . doi: 10.1145/3357223.3362714 http://doi.org/10.1145/3357223.3362714

Gysi T , Bär J , Hoefler T , 2016 . dCUDA: hardware supported overlap of computation and communication . Proc Int Conf for High Performance Computing, Networking, Storage and Analysis , p. 609 - 620 . doi: 10.1109/SC.2016.51 http://doi.org/10.1109/SC.2016.51

Han J , Jeon S , Choi YR , et al. , 2016 . Interference management for distributed parallel applications in consolidated clusters . Proc 21 st Int Conf on Architectural Support for Programming Languages and Operating Systems , p. 443 - 456 . doi: 10.1145/2872362.2872388 http://doi.org/10.1145/2872362.2872388

Haque E , Eom YH , He YX , et al. , 2015 . Few-to-Many: incremental parallelism for reducing tail latency in interactive services . Proc 20 th Int Conf on Architectural Support for Programming Languages and Operating Systems , p. 161 - 175 . doi: 10.1145/2694344.2694384 http://doi.org/10.1145/2694344.2694384

Herbst NR , Kounev S , Reussner R , 2013 . Elasticity in cloud computing: what it is, and what it is not . Proc 10 th Int Conf on Autonomic Computing , p. 23 - 27 .

Hindman B , Konwinski A , Zaharia M , et al. , 2011 . Mesos: a platform for fine-grained resource sharing in the data center . Proc 8 th USENIX Conf on Networked Systems Design and Implementation , p. 295 - 308 .

Hong CH , Spence I , Nikolopoulos DS , 2017 . GPU virtualization and scheduling methods: a comprehensive survey . ACM Comput Surv , 50 ( 3 ): 35 . doi: 10.1145/3068281 http://doi.org/10.1145/3068281

Hou XF , Li C , Liu JC , et al. , 2020 . ANT-Man: towards agile power management in the microservice era . Proc Int Conf for High Performance Computing, Networking, Storage and Analysis , Article 78 .

Hsu CH , Zhang YQ , Laurenzano MA , et al. , 2015 . Adrenaline: pinpointing and reining in tail queries with quick voltage boosting . Proc IEEE 21 st Int Symp on High Performance Computer Architecture , p. 271 - 282 . doi: 10.1109/HPCA.2015.7056039 http://doi.org/10.1109/HPCA.2015.7056039

Hu ZM , Tu J , Li BC , 2019 . Spear: optimized dependency-aware task scheduling with deep reinforcement learning . Proc IEEE 39 th Int Conf on Distributed Computing Systems , p. 2037 - 2046 . doi: 10.1109/ICDCS.2019.00201 http://doi.org/10.1109/ICDCS.2019.00201

Ibanez S , Shahbaz M , McKeown N , 2019 . The case for a network fast path to the CPU . Proc 18 th ACM Workshop on Hot Topics in Networks , p. 52 - 59 . doi: 10.1145/3365609.3365851 http://doi.org/10.1145/3365609.3365851

Intel , 2016 . Intel Cache Allocation Technique . https://software.intel.com/en-us/articles/introduction-to-cache-allocation-technology https://software.intel.com/en-us/articles/introduction-to-cache-allocation-technology [Accessed on July 1, 2021 ].

Isard M , Prabhakaran V , Currey J , et al. , 2009 . Quincy: fair scheduling for distributed computing clusters . Proc ACM SIGOPS 22 nd Symp on Operating Systems Principles , p. 261 - 276 . doi: 10.1145/1629575.1629601 http://doi.org/10.1145/1629575.1629601

Islam S , Venugopal S , Liu AN , 2015 . Evaluating the impact of fine-scale burstiness on cloud elasticity . Proc 6 th ACM Symp on Cloud Computing , p. 250 - 261 . doi: 10.1145/2806777.2806846 http://doi.org/10.1145/2806777.2806846

Jeon M , He YX , Kim H , et al. , 2016 . TPC: target-driven parallelism combining prediction and correction to reduce tail latency in interactive services . Proc 21 st Int Conf on Architectural Support for Programming Languages and Operating Systems , p. 129 - 141 . doi: 10.1145/2872362.2872370 http://doi.org/10.1145/2872362.2872370

Jeon M , Venkataraman S , Phanishayee A , et al. , 2018 . Multi-Tenant GPU Clusters for Deep Learning Workloads: Analysis and Implications . Technical Report No. MSR-TR-2018-13 , Microsoft Research , USA .

Jeon M , Venkataraman S , Phanishayee A , et al. , 2019 . Analysis of large-scale multi-tenant GPU clusters for DNN training workloads . Proc USENIX Annual Technical Conf , p. 947 - 960 .

Jeong EY , Woo S , Jamshed M , et al. , 2014 . mTCP: a highly scalable user-level TCP stack for multicore systems . Proc 11 th USENIX Conf on Networked Systems Design and Implementation , p. 489 - 502 .

Jeyakumar V , Alizadeh M , Mazières D , et al. , 2013 . EyeQ: practical network performance isolation at the edge . Proc 10 th USENIX Symp on Networked Systems Design and Implementation , p. 297 - 311 .

Jia ZP , Witchel E , 2021 . Nightcore: efficient and scalable serverless computing for latency-sensitive, interactive microservices . Proc 26 th ACM Int Conf on Architectural Support for Programming Languages and Operating Systems , p. 152 - 166 . doi: 10.1145/3445814.3446701 http://doi.org/10.1145/3445814.3446701

Jyothi SA , Curino C , Menache I , et al. , 2016 . Morpheus: towards automated SLOs for enterprise clusters . Proc 12 th USENIX Conf on Operating Systems Design and Implementation , p. 117 - 134 .

Kakivaya G , Xun L , Hasha R , et al. , 2018 . Service fabric: a distributed platform for building microservices in the cloud . Proc 13 th EuroSys Conf , Article 33 . doi: 10.1145/3190508.3190546 http://doi.org/10.1145/3190508.3190546

Kalia A , Kaminsky M , Andersen DG , 2016 . FaSST: fast, scalable and simple distributed transactions with two-sided (RDMA) datagram RPCs . Proc 12 th USENIX Conf on Operating Systems Design and Implementation , p. 185 - 201 .

Kalia A , Kaminsky M , Andersen D , 2019 . Datacenter RPCs can be general and fast . Proc 16 th USENIX Symp on Networked Systems Design and Implementation , p. 1 - 16 .

Kang YP , Hauswald J , Gao C , et al. , 2017 . Neurosurgeon: collaborative intelligence between the cloud and mobile edge . Proc 22 nd Int Conf on Architectural Support for Programming Languages and Operating Systems , p. 615 - 629 . doi: 10.1145/3037697.3037698 http://doi.org/10.1145/3037697.3037698

Kannan RS , Subramanian L , Raju A , et al. , 2019 . GrandSLAm: guaranteeing SLAs for jobs in microservices execution frameworks . Proc 14 th EuroSys Conf , Article 34 . doi: 10.1145/3302424.3303958 http://doi.org/10.1145/3302424.3303958

Kannan S , Gavrilovska A , Gupta V , et al. , 2017 . HeteroOS: OS design for heterogeneous memory management in datacenter . Proc 44 th Annual Int Symp on Computer Architecture , p. 521 - 534 . doi: 10.1145/3079856.3080245 http://doi.org/10.1145/3079856.3080245

Kannan S , Ren YJ , Bhattacharjee A , 2021 . KLOCs: kernel-level object contexts for heterogeneous memory systems . Proc 26 th ACM Int Conf on Architectural Support for Programming Languages and Operating Systems , p. 65 - 78 . doi: 10.1145/3445814.3446745 http://doi.org/10.1145/3445814.3446745

Kapoor R , Porter G , Tewari M , et al. , 2012 . Chronos: predictable low latency for data center applications . Proc 3 rd ACM Symp on Cloud Computing , Article 9 . doi: 10.1145/2391229.2391238 http://doi.org/10.1145/2391229.2391238

Karanasos K , Rao S , Curino C , et al. , 2015 . Mercury: hybrid centralized and distributed scheduling in large shared clusters . Proc USENIX Annual Technical Conf , p. 485 - 497 .

Kasture H , Sanchez D , 2014 . Ubik: efficient cache sharing with strict QoS for latency-critical workloads . Proc 19 th Int Conf on Architectural Support for Programming Languages and Operating Systems , p. 729 - 742 . doi: 10.1145/2541940.2541944 http://doi.org/10.1145/2541940.2541944

Khawaja A , Landgraf J , Prakash R , et al. , 2018 . Sharing, protection, and compatibility for reconfigurable fabric with AMORPHOS . Proc 13 th USENIX Conf on Operating Systems Design and Implementation , p. 107 - 127 .

Khorasani F , Esfeden HA , Farmahini-Farahani A , et al. , 2018 . RegMutex: inter-warp GPU register time-sharing . Proc ACM/IEEE 45 th Annual Int Symp on Computer Architecture , p. 816 - 828 . doi: 10.1109/ISCA.2018.00073 http://doi.org/10.1109/ISCA.2018.00073

Klimovic A , Kozyrakis C , Thereska E , et al. , 2016 . Flash storage disaggregation . Proc 11 th European Conf on Computer Systems , Article 29 . doi: 10.1145/2901318.2901337 http://doi.org/10.1145/2901318.2901337

Knauerhase R , Brett P , Hohlt B , et al. , 2008 . Using OS observations to improve performance in multicore systems . IEEE Micro , 28 ( 3 ): 54 - 66 . doi: 10.1109/MM.2008.48 http://doi.org/10.1109/MM.2008.48

Korolija D , Roscoe T , Alonso G , 2020 . Do OS abstractions make sense on FPGAs? Proc 14 th USENIX Symp on Operating Systems Design and Implementation , p. 991 - 1010 .

Kotra JB , Zhang HB , Alameldeen AR , et al. , 2018 . CHAMELEON: a dynamically reconfigurable heterogeneous memory system . Proc 51 st Annual IEEE/ACM Int Symp on Microarchitecture , p. 533 - 545 . doi: 10.1109/MICRO.2018.00050 http://doi.org/10.1109/MICRO.2018.00050

Lazarev N , Xiang SJ , Adit N , et al. , 2021 . Dagger: efficient and fast RPCs in cloud microservices with near-memory reconfigurable NICs . Proc 26 th ACM Int Conf on Architectural Support for Programming Languages and Operating Systems , p. 36 - 51 . doi: 10.1145/3445814.3446696 http://doi.org/10.1145/3445814.3446696

Le TN , Sun X , Chowdhury M , et al. , 2020 . AlloX: compute allocation in hybrid clusters . Proc 15 th European Conf on Computer Systems , Article 31 . doi: 10.1145/3342195.3387547 http://doi.org/10.1145/3342195.3387547

Le YF , Chang H , Mukherjee S , et al. , 2017 . UNO: uniflying host and smart NIC offload for flexible packet processing . Proc Symp on Cloud Computing , p. 506 - 519 . doi: 10.1145/3127479.3132252 http://doi.org/10.1145/3127479.3132252

Li CL , Andersen DG , Fu Q , et al. , 2017 . Workload analysis and caching strategies for search advertising systems . Proc Symp on Cloud Computing , p. 170 - 180 . doi: 10.1145/3127479.3129255 http://doi.org/10.1145/3127479.3129255

Li J , Agrawal K , Elnikety S , et al. , 2016 . Work stealing for interactive services to meet target latency . Proc 21 st ACM SIGPLAN Symp on Principles and Practice of Parallel Programming , Article 14 . doi: 10.1145/2851141.2851151 http://doi.org/10.1145/2851141.2851151

Li JL , Sharma NK , Ports DRK , et al. , 2014 . Tales of the tail: hardware, OS, and application-level sources of tail latency . Proc ACM Symp on Cloud Computing , p. 1 - 14 . doi: 10.1145/2670979.2670988 http://doi.org/10.1145/2670979.2670988

Lim K , Chang JC , Mudge T , et al. , 2009 . Disaggregated memory for expansion and sharing in blade servers . Proc 36 th Annual Int Symp on Computer Architecture , p. 267 - 278 . doi: 10.1145/1555754.1555789 http://doi.org/10.1145/1555754.1555789

Linux Community , 2016 . Linux Kernel Namespace . https://en.wikipedia.org/wiki/Linux_namespaces https://en.wikipedia.org/wiki/Linux_namespaces [Accessed on Feb. 23, 2021 ].

Liu M , Peter S , Krishnamurthy A , et al. , 2019 . E3: energy-efficient microservices on SmartNIC-accelerated servers . Proc USENIX Annual Technical Conf , p. 363 - 378 .

Lo D , Cheng LQ , Govindaraju R , et al. , 2015 . Heracles: improving resource efficiency at scale . Proc 42 nd Annual Int Symp on Computer Architecture , p. 450 - 462 . doi: 10.1145/2749469.2749475 http://doi.org/10.1145/2749469.2749475

Luo QY , Lin JK , Zhuo YW , et al. , 2019 . Hop: heterogeneity-aware decentralized training . Proc 24 th Int Conf on Architectural Support for Programming Languages and Operating Systems , p. 893 - 907 . doi: 10.1145/3297858.3304009 http://doi.org/10.1145/3297858.3304009

Ma JC , Zuo GF , Loughlin K , et al. , 2020 . A hypervisor for shared-memory FPGA platforms . Proc 25 th Int Conf on Architectural Support for Programming Languages and Operating Systems , p. 827 - 844 . doi: 10.1145/3373376.3378482 http://doi.org/10.1145/3373376.3378482

Madhavapeddy A , Scott DJ , 2014 . Unikernels: the rise of the virtual library operating system . Commun ACM , 57 ( 1 ): 61 - 69 . doi: 10.1145/2541883.2541895 http://doi.org/10.1145/2541883.2541895

Mahajan K , Balasubramanian A , Singhvi A , et al. , 2020 . Themis: fair and efficient GPU cluster scheduling . Proc 17 th USENIX Symp on Networked Systems Design and Implementation , p. 289 - 304 .

Manco F , Lupu C , Schmidt F , et al. , 2017 . My VM is lighter (and safer) than your container . Proc 26 th Symp on Operating Systems Principles , p. 218 - 233 . doi: 10.1145/3132747.3132763 http://doi.org/10.1145/3132747.3132763

Mao HZ , Alizadeh M , Menache I , et al. , 2016 . Resource management with deep reinforcement learning . Proc 15 th ACM Workshop on Hot Topics in Networks , p. 50 - 56 . doi: 10.1145/3005745.3005750 http://doi.org/10.1145/3005745.3005750

Mao HZ , Schwarzkopf M , Venkatakrishnan SB , et al. , 2019 . Learning scheduling algorithms for data processing clusters . Proc Special Interest Group on Data Communication , p. 270 - 288 . doi: 10.1145/3341302.3342080 http://doi.org/10.1145/3341302.3342080

Mars J , Tang LJ , 2013 . Whare-Map: heterogeneity in “homogeneous” warehouse-scale computers . Proc 40 th Annual Int Symp on Computer Architecture , p. 619 - 630 . doi: 10.1145/2485922.2485975 http://doi.org/10.1145/2485922.2485975

Min C , Kang W , Kumar M , et al. , 2018 . Solros: a data-centric operating system architecture for heterogeneous computing . Proc 13 th EuroSys Conf , Article 36 . doi: 10.1145/3190508.3190523 http://doi.org/10.1145/3190508.3190523

Moon Y , Lee S , Jamshed MA , et al. , 2020 . AccelTCP: accelerating network applications with stateful TCP offloading . Proc 17 th USENIX Symp on Networked Systems Design and Implementation , p. 77 - 92 .

Moritz P , Nishihara R , Wang S , et al. , 2018 . Ray: a distributed framework for emerging AI applications . Proc 13 th USENIX Conf on Operating Systems Design and Implementation , p. 561 - 577 . doi: 10.48550/arXiv.1712.05889 http://doi.org/10.48550/arXiv.1712.05889

Multicluster Special Interest Group , 2020 . Kubernetes Multicluster . https://github.com/kubernetes/community/tree/master/sigmulticluster https://github.com/kubernetes/community/tree/master/sigmulticluster [Accessed on July 1, 2021 ].

Mutlu O , Moscibroda T , 2008 . Parallelism-aware batch scheduling: enhancing both performance and fairness of shared DRAM systems . Proc Int Symp on Computer Architecture , p. 63 - 74 . doi: 10.1109/ISCA.2008.7 http://doi.org/10.1109/ISCA.2008.7

Nagaraj K , Bharadia D , Mao HZ , et al. , 2016 . NUMFabric: fast and flexible bandwidth allocation in datacenters . Proc ACM SIGCOMM Conf , p. 188 - 201 . doi: 10.1145/2934872.2934890 http://doi.org/10.1145/2934872.2934890

Narayanan D , Santhanam K , Kazhamiaka F , et al. , 2020 . Heterogeneity-aware cluster scheduling policies for deep learning workloads . Proc 14 th USENIX Symp on Operating Systems Design and Implementation , p. 481 - 498 .

Nightingale EB , Hodson O , McIlroy R , et al. , 2009 . Helios: heterogeneous multiprocessing with satellite kernels . Proc ACM SIGOPS 22 nd Symp on Operating Systems Principles , p. 221 - 234 . doi: 10.1145/1629575.1629597 http://doi.org/10.1145/1629575.1629597

Novaković D , Vasić N , Novaković S , et al. , 2013 . DeepDive: transparently identifying and managing performance interference in virtualized environments . Proc USENIX Annual Technical Conf , p. 219 - 230 .

Novaković S , Daglis A , Bugnion E , et al. , 2014 . Scale-out NUMA . Proc 19 th Int Conf on Architectural Support for Programming Languages and Operating Systems , p. 3 - 18 . doi: 10.1145/2541940.2541965 http://doi.org/10.1145/2541940.2541965

Ousterhout K , Wendell P , Zaharia M , et al. , 2013 . Sparrow: distributed, low latency scheduling . Proc 24 th ACM Symp on Operating Systems Principles , p. 69 - 84 . doi: 10.1145/2517349.2522716 http://doi.org/10.1145/2517349.2522716

Ousterhout K , Canel C , Ratnasamy S , et al. , 2017 . Monotasks: architecting for performance clarity in data analytics frameworks . Proc 26 th Symp on Operating Systems Principles , p. 184 - 200 . doi: 10.1145/3132747.3132766 http://doi.org/10.1145/3132747.3132766

Panda A , Zheng WT , Hu XH , et al. , 2017 . SCL: simplifying distributed SDN control planes . Proc 14 th USENIX Symp on Networked Systems Design and Implementation , p. 329 - 345 .

Park JJK , Park Y , Mahlke S , 2015 . Chimera: collaborative preemption for multitasking on a shared GPU . Proc 20 th Int Conf on Architectural Support for Programming Languages and Operating Systems , p. 593 - 606 . doi: 10.1145/2694344.2694346 http://doi.org/10.1145/2694344.2694346

Peng X , Shi XH , Dai HL , et al. , 2020 . Capuchin: tensor-based GPU memory management for deep learning . Proc 25 th Int Conf on Architectural Support for Programming Languages and Operating Systems , p. 891 - 905 . doi: 10.1145/3373376.3378505 http://doi.org/10.1145/3373376.3378505

Peng YH , Bao YX , Chen YR , et al. , 2018 . Optimus: an efficient dynamic resource scheduler for deep learning clusters . Proc 20 th EuroSys Conf , Article 3 . doi: 10.1145/3190508.3190517 http://doi.org/10.1145/3190508.3190517

Popov M , Jimborean A , Black-Schaffer D , 2019 . Efficient thread/page/parallelism autotuning for NUMA systems . Proc ACM Int Conf on Supercomputing , p. 342 - 353 . doi: 10.1145/3330345.3330376 http://doi.org/10.1145/3330345.3330376

Pothukuchi RP , Greathouse JL , Rao K , et al. , 2019 . Tangram: integrated control of heterogeneous computers . Proc 52 nd Annual IEEE/ACM Int Symp on Microarchitecture , p. 384 - 398 . doi: 10.1145/3352460.3358285 http://doi.org/10.1145/3352460.3358285

Pratheek B , Jawalkar N , Basu A , 2021 . Improving GPU multi-tenancy with page walk stealing . Proc IEEE Int Symp on High-Performance Computer Architecture , p. 626 - 639 . doi: 10.1109/HPCA51647.2021.00059 http://doi.org/10.1109/HPCA51647.2021.00059

Qiu HR , Banerjee SS , Jha S , et al. , 2020 . FIRM: an intelligent fine-grained resource management framework for SLO-oriented microservices . Proc 14 th USENIX Symp on Operating Systems Design and Implementation , p. 805 - 825 .

Qureshi MK , Patt YN , 2006 . Utility-based cache partitioning: a low-overhead, high-performance, runtime mechanism to partition shared caches . Proc 39 th Annual IEEE/ACM Int Symp on Microarchitecture , p. 423 - 432 . doi: 10.1109/MICRO.2006.49 http://doi.org/10.1109/MICRO.2006.49

Rao J , Wang K , Zhou XB , et al. , 2013 . Optimizing virtual machine scheduling in NUMA multicore systems . Proc IEEE 19 th Int Symp on High Performance Computer Architecture , p. 306 - 317 . doi: 10.1109/HPCA.2013.6522328 http://doi.org/10.1109/HPCA.2013.6522328

Reiss C , Tumanov A , Ganger GR , et al. , 2012 . Heterogeneity and dynamicity of clouds at scale: Google trace analysis . Proc 3 rd ACM Symp on Cloud Computing , Article 7 . doi: 10.1145/2391229.2391236 http://doi.org/10.1145/2391229.2391236

Rhu M , Gimelshein N , Clemons J , et al. , 2016 . vDNN: virtualized deep neural networks for scalable, memory-efficient neural network design . Proc 49 th Annual IEEE/ACM Int Symp on Microarchitecture , p. 1 - 13 . doi: 10.1109/MICRO.2016.7783721 http://doi.org/10.1109/MICRO.2016.7783721

Rossbach CJ , Currey J , Silberstein M , et al. , 2011 . PTask: operating system abstractions to manage GPUs as compute devices . Proc 23 rd ACM Symp on Operating Systems Principles , p. 233 - 248 . doi: 10.1145/2043556.2043579 http://doi.org/10.1145/2043556.2043579

Sanchez D , Kozyrakis C , 2011 . Vantage: scalable and efficient fine-grain cache partitioning . Proc 38 th Annual Int Symp on Computer Architecture , p. 57 - 68 . doi: 10.1145/2000064.2000073 http://doi.org/10.1145/2000064.2000073

Schwarzkopf M , Konwinski A , Abd-El-Malek M , et al. , 2013 . Omega: flexible, scalable schedulers for large compute clusters . Proc 8 th ACM European Conf on Computer Systems , p. 351 - 364 . doi: 10.1145/2465351.2465386 http://doi.org/10.1145/2465351.2465386

Sengupta D , Belapure R , Schwan K , 2013 . Multi-tenancy on GPGPU-based servers . Proc 7 th Int Workshop on Virtualization Technologies in Distributed Computing , p. 3 - 10 . doi: 10.1145/2465829.2465830 http://doi.org/10.1145/2465829.2465830

Sengupta D , Goswami A , Schwan K , et al. , 2014 . Scheduling multi-tenant cloud workloads on accelerator-based systems . Proc Int Conf for High Performance Computing, Networking, Storage and Analysis , p. 513 - 524 . doi: 10.1109/SC.2014.47 http://doi.org/10.1109/SC.2014.47

Shan YZ , Huang YT , Chen YL , et al. , 2018 . LegoOS: a disseminated, distributed OS for hardware resource disaggregation . Proc 13 th USENIX Conf on Operating Systems Design and Implementation , p. 69 - 87 .

Sharma NK , Zhao CXY , Liu M , et al. , 2020 . Programmable calendar queues for high-speed packet scheduling . Proc 17 th USENIX Symp on Networked Systems Design and Implementation , p. 685 - 699 .

Sharma P , Guo T , He X , et al. , 2016 . Flint: batch-interactive data-intensive processing on transient servers . Proc 11 th European Conf on Computer Systems , Article 6 . doi: 10.1145/2901318.2901319 http://doi.org/10.1145/2901318.2901319

Shen ZM , Sun Z , Sela GE , et al. , 2019 . X-Containers: breaking down barriers to improve performance and isolation of cloud-native containers . Proc 24 th Int Conf on Architectural Support for Programming Languages and Operating Systems , p. 121 - 135 . doi: 10.1145/3297858.3304016 http://doi.org/10.1145/3297858.3304016

Shillaker S , Pietzuch P , 2020 . Faasm: lightweight isolation for efficient stateful serverless computing . Proc USENIX Annual Technical Conf , p. 419 - 433 . doi: 10.48550/arXiv.2002.09344 http://doi.org/10.48550/arXiv.2002.09344

Sigelman BH , Barroso LA , Burrows M , et al. , 2010 . Dapper, a Large-Scale Distributed Systems Tracing Infrastructure . https://storage.googleapis.com/pub-tools-public-publication-data/pdf/36356.pdf https://storage.googleapis.com/pub-tools-public-publication-data/pdf/36356.pdf [Accessed on July 1, 2021 ].

Singh S , Chana I , 2016 . A survey on resource scheduling in cloud computing: issues and challenges . J Grid Comput , 14 ( 2 ): 217 - 264 . doi: 10.1007/s10723-015-9359-2 http://doi.org/10.1007/s10723-015-9359-2

Snavely A , Tullsen DM , 2000 . Symbiotic jobscheduling for a simultaneous multithreaded processor . ACM SIGOPS Oper Syst Rev , 34 ( 5 ): 234 - 244 . doi: 10.1145/378993.379244 http://doi.org/10.1145/378993.379244

Song X , Shi JC , Chen HB , et al. , 2013 . Schedule processes, not VCPUs . Proc 4 th Asia-Pacific Workshop on Systems , p. 1 - 7 . doi: 10.1145/2500727.2500736 http://doi.org/10.1145/2500727.2500736

Sriraman A , Dhanotia A , 2020 . Accelerometer: understanding acceleration opportunities for data center overheads at hyperscale . Proc 25 th Int Conf on Architectural Support for Programming Languages and Operating Systems , p. 733 - 750 . doi: 10.1145/3373376.3378450 http://doi.org/10.1145/3373376.3378450

Sriraman A , Wenisch TF , 2018 . μTune: auto-tuned threading for OLDI microservices . Proc 13 th USENIX Conf on Operating Systems Design and Implementation , p. 177 - 194 .

Sriraman A , Dhanotia A , Wenisch TF , 2019 . SoftSKU: optimizing server architectures for microservice diversity @scale . Proc 46 th Int Symp on Computer Architecture , p. 513 - 526 . doi: 10.1145/3307650.3322227 http://doi.org/10.1145/3307650.3322227

Staples G , 2006 . TORQUE resource manager . Proc ACM/IEEE Conf on Supercomputing . doi: 10.1145/1188455.1188464 http://doi.org/10.1145/1188455.1188464

Subramanian L , Seshadri V , Ghosh A , et al. , 2015 . The application slowdown model: quantifying and controlling the impact of inter-application interference at shared caches and main memory . Proc 48 th Int Symp on Microarchitecture , p. 62 - 75 . doi: 10.1145/2830772.2830803 http://doi.org/10.1145/2830772.2830803

Tanasic I , Gelado I , Cabezas J , et al. , 2014 . Enabling preemptive multiprogramming on GPUs . Proc ACM/IEEE 41 st Int Symp on Computer Architecture , p. 193 - 204 . doi: 10.1109/ISCA.2014.6853208 http://doi.org/10.1109/ISCA.2014.6853208

Tang CQ , Yu K , Veeraraghavan K , et al. , 2020 . Twine: a unified cluster management system for shared infrastructure . Proc 14 th USENIX Symp on Operating Systems Design and Implementation , p. 787 - 803 .

Tang LJ , Mars J , Vachharajani N , et al. , 2011 . The impact of memory subsystem resource sharing on datacenter applications . Proc 38 th Annual Int Symp on Computer Architecture , p. 283 - 294 .

Tembey P , Gavrilovska A , Schwan K , 2014 . Merlin: application- and platform-aware resource allocation in consolidated server systems . Proc ACM Symp on Cloud Computing , p. 1 - 14 . doi: 10.1145/2670979.2670993 http://doi.org/10.1145/2670979.2670993

Thinakaran P , Gunasekaran JR , Sharma B , et al. , 2017 . Phoenix: a constraint-aware scheduler for heterogeneous datacenters . Proc IEEE 37 th Int Conf on Distributed Computing Systems , p. 977 - 987 . doi: 10.1109/ICDCS.2017.262 http://doi.org/10.1109/ICDCS.2017.262

Tirmazi M , Barker A , Deng N , et al. , 2020 . Borg: the next generation . Proc 15 th European Conf on Computer Systems , Article 30 . doi: 10.1145/3342195.3387517 http://doi.org/10.1145/3342195.3387517

Tumanov A , Zhu T , Park JW , et al. , 2016 . TetriSched: global rescheduling with adaptive plan-ahead in dynamic heterogeneous clusters . Proc 11 th European Conf on Computer Systems , Article 35 . doi: 10.1145/2901318.2901355 http://doi.org/10.1145/2901318.2901355

Vanga M , Gujarati A , Brandenburg BB , 2018 . Tableau: a high-throughput and predictable VM scheduler for high-density workloads . Proc 13 th EuroSys Conf , Article 28 . doi: 10.1145/3190508.3190557 http://doi.org/10.1145/3190508.3190557

Vasić N , Novaković D , Miučin S , et al. , 2012 . DejaVu: accelerating resource allocation in virtualized environments . Proc 17 th Int Conf on Architectural Support for Programming Languages and Operating Systems , p. 423 - 436 . doi: 10.1145/2150976.2151021 http://doi.org/10.1145/2150976.2151021

Vavilapalli VK , Murthy AC , Douglas C , et al. , 2013 . Apache Hadoop YARN: yet another resource negotiator . Proc 4 th Annual Symp on Cloud Computing , Article 5 . doi: 10.1145/2523616.2523633 http://doi.org/10.1145/2523616.2523633

Verma A , Pedrosa L , Korupolu M , et al. , 2015 . Large-scale cluster management at Google with Borg . Proc 10 th European Conf on Computer Systems , Article 18 . doi: 10.1145/2741948.2741964 http://doi.org/10.1145/2741948.2741964

Vulimiri A , Curino C , Godfrey PB , et al. , 2015 . Wanaly-tics: geo-distributed analytics for a data intensive world . Proc ACM SIGMOD Int Conf on Management of Data , p. 1087 - 1092 . doi: 10.1145/2723372.2735365 http://doi.org/10.1145/2723372.2735365

Wang JJ , Balazinska M , 2017 . Elastic memory management for cloud data analytics . Proc USENIX Annual Technical Conf , p. 745 - 758 .

Wang JY , Pan JL , Esposito F , et al. , 2019 . Edge cloud offloading algorithms: issues, methods, and perspectives . ACM Comput Surv , 52 ( 1 ): 2 . doi: 10.1145/3284387 http://doi.org/10.1145/3284387

Wang LN , Ye JM , Zhao YM , et al. , 2018 . SuperNeurons: dynamic GPU memory management for training deep neural networks . Proc 23 rd ACM SIGPLAN Symp on Principles and Practice of Parallel Programming , p. 41 - 53 . doi: 10.1145/3178487.3178491 http://doi.org/10.1145/3178487.3178491

Wang LP , Weng QZ , Wang W , et al. , 2020 . Metis: learning to schedule long-running applications in shared container clusters at scale . Proc Int Conf for High Performance Computing, Networking, Storage and Analysis , Article 68 .

Wang SQ , Gonzalez OJ , Zhou XB , et al. , 2020 . An efficient and non-intrusive GPU scheduling framework for deep learning training systems . Proc Int Conf for High Performance Computing, Networking, Storage and Analysis , Article 90 .

Wang ZN , Yang J , Melhem R , et al. , 2016 . Simultaneous multikernel GPU: multi-tasking throughput processors via fine-grained sharing . Proc IEEE Int Symp on High Performance Computer Architecture , p. 358 - 369 . doi: 10.1109/HPCA.2016.7446078 http://doi.org/10.1109/HPCA.2016.7446078

Weerasiri D , Barukh MC , Benatallah B , et al. , 2017 . A taxonomy and survey of cloud resource orchestration techniques . ACM Comput Surv , 50 ( 2 ): 26 . doi: 10.1145/3054177 http://doi.org/10.1145/3054177

Williams D , Koller R , 2016 . Unikernel monitors: extending minimalism outside of the box . Proc 8 th USENIX Workshop on Hot Topics in Cloud Computing , p. 1 - 6 .

Xiao WC , Bhardwaj R , Ramjee R , et al. , 2018 . Gandiva: introspective cluster scheduling for deep learning . Proc 13 th USENIX Conf on Operating Systems Design and Implementation , p. 595 - 610 .

Xiao WX , Ren SR , Li Y , et al. , 2020 . AntMan: dynamic scaling on GPU clusters for deep learning . Proc 14 th USENIX Symp on Operating Systems Design and Implementation , p. 533 - 548 .

Xu QM , Jeon H , Kim K , et al. , 2016 . Warped-Slicer: efficient intra-SM slicing through dynamic resource partitioning for GPU multiprogramming . Proc ACM/IEEE 43 rd Annual Int Symp on Computer Architecture , p. 230 - 242 . doi: 10.1109/ISCA.2016.29 http://doi.org/10.1109/ISCA.2016.29

Xu YJ , Musgrave Z , Noble B , et al. , 2013 . Bobtail: avoiding long tails in the cloud . Proc 10 th USENIX Symp on Networked Systems Design and Implementation , p. 329 - 341 .

Yan Y , Gao YJ , Chen Y , et al. , 2016 . TR-Spark: transient computing for big data analytics . Proc 7 th ACM Symp on Cloud Computing , p. 484 - 496 . doi: 10.1145/2987550.2987576 http://doi.org/10.1145/2987550.2987576

Yang HL , Breslow A , Mars J , et al. , 2013 . Bubble-Flux: precise online QoS management for increased utilization in warehouse scale computers . Proc 40 th Annual Int Symp on Computer Architecture , p. 607 - 618 . doi: 10.1145/2485922.2485974 http://doi.org/10.1145/2485922.2485974

Yang X , Blackburn SM , McKinley KS , 2016 . Elfen scheduling: fine-grain principled borrowing from latency-critical workloads using simultaneous multithreading . Proc USENIX Annual Technical Conf , p. 309 - 322 .

Yang Y , Kim GW , Song WW , et al. , 2017 . Pado: a data processing engine for harnessing transient resources in datacenters . Proc 12 th European Conf on Computer Systems , p. 575 - 588 . doi: 10.1145/3064176.3064181 http://doi.org/10.1145/3064176.3064181

Yeh TT , Sabne A , Sakdhnagool P , et al. , 2017 . Pagoda: fine-grained GPU resource virtualization for narrow tasks . Proc 22 nd ACM SIGPLAN Symp on Principles and Practice of Parallel Programming , p. 221 - 234 . doi: 10.1145/3018743.3018754 http://doi.org/10.1145/3018743.3018754

Yeh TT , Sinclair MD , Beckmann BM , et al. , 2021 . Deadline-aware offloading for high-throughput accelerators . Proc IEEE Int Symp on High-Performance Computer Architecture , p. 479 - 492 . doi: 10.1109/HPCA51647.2021.00048 http://doi.org/10.1109/HPCA51647.2021.00048

Zellweger G , Gerber S , Kourtis K , et al. , 2014 . Decoupling cores, kernels, and operating systems . Proc 11 th USENIX Symp on Operating Systems Design and Implementation , p. 17 - 31 .

Zha Y , Li J , 2020 . Virtualizing FPGAs in the cloud . Proc 25 th Int Conf on Architectural Support for Programming Languages and Operating Systems , p. 845 - 858 . doi: 10.1145/3373376.3378491 http://doi.org/10.1145/3373376.3378491

Zha Y , Li J , 2021 . When application-specific ISA meets FPGAs: a multi-layer virtualization framework for heterogeneous cloud FPGAs . Proc 26 th ACM Int Conf on Architectural Support for Programming Languages and Operating Systems , p. 123 - 134 . doi: 10.1145/3445814.3446699 http://doi.org/10.1145/3445814.3446699

Zhang D , Dai D , He YB , et al. , 2020 . RLScheduler: an automated HPC batch job scheduler using reinforcement learning . Proc Int Conf for High Performance Computing, Networking, Storage and Analysis , p. 1 - 15 . doi: 10.1109/SC41405.2020.00035 http://doi.org/10.1109/SC41405.2020.00035

Zhang JS , Xiong YQ , Xu NY , et al. , 2017 . The Feniks FPGA operating system for cloud computing . Proc 8 th Asia-Pacific Workshop on Systems , Article 22 . doi: 10.1145/3124680.3124743 http://doi.org/10.1145/3124680.3124743

Zhang X , Dwarkadas S , Shen K , 2009 . Towards practical page coloring-based multicore cache management . Proc 4 th ACM European Conf on Computer Systems , p. 89 - 102 . doi: 10.1145/1519065.1519076 http://doi.org/10.1145/1519065.1519076

Zhang X , Tune E , Hagmann R , et al. , 2013 . CPI 2 : CPU performance isolation for shared compute clusters . Proc 8 th ACM European Conf on Computer Systems , p. 379 - 391 . doi: 10.1145/2465351.2465388 http://doi.org/10.1145/2465351.2465388

Zhang XT , Zheng X , Wang Z , et al. , 2019 . Fast and scalable VMM live upgrade in large cloud infrastructure . Proc 24 th Int Conf on Architectural Support for Programming Languages and Operating Systems , p. 93 - 105 . doi: 10.1145/3297858.3304034 http://doi.org/10.1145/3297858.3304034

Zhang YQ , Laurenzano MA , Mars J , et al. , 2014 . SMiTe: precise QoS prediction on real-system SMT processors to improve utilization in warehouse scale computers . Proc 47 th Annual IEEE/ACM Int Symp on Microarchitecture , p. 406 - 418 . doi: 10.1109/MICRO.2014.53 http://doi.org/10.1109/MICRO.2014.53

Zhang YQ , Prekas G , Fumarola GM , et al. , 2016 . History-based harvesting of spare cycles and storage in large-scale datacenters . Proc 12 th USENIX Conf on Operating Systems Design and Implementation , p. 755 - 770 .

Zhang YQ , Hua WZ , Zhou ZZ , et al. , 2021 . Sinan: ML-based and QoS-aware resource management for cloud microservices . Proc 26 th ACM Int Conf on Architectural Support for Programming Languages and Operating Systems , p. 167 - 181 . doi: 10.1145/3445814.3446693 http://doi.org/10.1145/3445814.3446693

Zhao HY , Han ZH , Yang Z , et al. , 2020 . HiveD: sharing a GPU cluster for deep learning with guarantees . Proc 14 th USENIX Symp on Operating Systems Design and Implementation , p. 515 - 532 .

Zhao M , Cabrera J , 2018 . RTVirt: enabling time-sensitive computing on virtualized systems through cross-layer CPU scheduling . Proc 13 th EuroSys Conf , Article 27 . doi: 10.1145/3190508.3190527 http://doi.org/10.1145/3190508.3190527

Zheng L , Li XL , Zheng YH , et al. , 2020 . Scaph: scalable GPU-accelerated graph processing with value-driven differential scheduling . Proc USENIX Annual Technical Conf , p. 573 - 588 .

Zhou H , Chen M , Lin Q , et al. , 2018 . Overload control for scaling WeChat microservices . Proc ACM Symp on Cloud Computing , p. 149 - 161 . doi: 10.1145/3267809.3267823 http://doi.org/10.1145/3267809.3267823

Zhou ZY , Benson TA , 2019 . Composing SDN controller enhancements with Mozart . Proc ACM Symp on Cloud Computing , p. 351 - 363 . doi: 10.1145/3357223.3362712 http://doi.org/10.1145/3357223.3362712

Zhu H , Kaffes K , Chen ZX , et al. , 2020 . RackSched: a microsecond-scale scheduler for rack-scale computers . Proc 14 th USENIX Symp on Operating Systems Design and Implementation , p. 1225 - 1240 .

Zhu HS , Erez M , 2016 . Dirigent: enforcing QoS for latency-critical tasks on shared multicore systems . Proc 21 st Int Conf on Architectural Support for Programming Languages and Operating Systems , p. 33 - 47 . doi: 10.1145/2872362.2872394 http://doi.org/10.1145/2872362.2872394

Zhu T , Kozuch MA , Harchol-Balter M , 2017 . WorkloadCompactor: reducing datacenter cost while providing tail latency SLO guarantees . Proc Symp on Cloud Computing , p. 598 - 610 . doi: 10.1145/3127479.3132245 http://doi.org/10.1145/3127479.3132245

Zhuravlev S , Blagodurov S , Fedorova A , 2010 . Addressing shared resource contention in multicore processors via scheduling . Proc 15 th Int Conf on Architectural Support for Programming Languages and Operating Systems , p. 129 - 142 . doi: 10.1145/1736020.1736036 http://doi.org/10.1145/1736020.1736036

Views

127

Downloads

CSCD

Alert me when the article has been cited

Submit

Tools

Publicity Resources

Programming bare-metal accelerators with heterogeneous threading models: a case study of Matrix-3000

Coordination of PSS and TCSC controller using modified particle swarm optimization algorithm to improve power system dynamic performance

Related Author

Jianbin FANG

Peng ZHANG

Chun HUANG

Tao TANG

Kai LU

Ruibo WANG

Zheng WANG

Alireza REZAZADEH

Related Institution

College of Computer Science and Technology, National University of Defense Technology

School of Computing, University of Leeds

Faculty of Electrical and Computer Engineering, Shahid Beheshti University, Tehran 198396

Faculty of Engineering and Technology, Imam Khomeini International University, Ghazvin

Chat

Address：Zhejiang University Press, 148 Tianmushan Road, Hangzhou, China Postal code：310028
Tel：+86-571-88273162 Email：fitee@zju.edu.cn
It is recommended to read the content of this site in Chrome&IE9+. Please switch to extreme mode in browser 360.
Cookies We use cookies to help provide and enhance our service and tailor content. By continuing, you agree to the use of cookies.

⁰