Admit or preserve? Addressing server failures in cloud computing task management

Nadav Lavi*, Hanoch Levy

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Cloud computing task management has a critical role in the efficient operation of the cloud resources, i.e., the servers. The task management handles critical and complicated decisions, overcoming the inherent dynamic nature of cloud computing systems and the additional complexity due to the large magnitude of resources in such systems (tens of thousands of servers). Due to the fact that servers may fail, task management is required to conduct both task admissions and task preservation decisions. Moreover, both these decisions require considering future system trajectories and the interplay between preservation and admission. In this paper we study the combined problem of task admission and preservation in a dynamic environment of cloud computing systems through analysis of a queueing system based on a Markov decision process (MDP). We show that the optimal operational policy is of a double switching curve type. On face value, the extraction of the optimal policy is rather complicated, yet our analysis reveals that the optimal policy can be reduced to a single rule, since the rules can effectively be decoupled. Based on this result, we propose two heuristic approaches that approximate the optimal rule for the most relevant system settings in cloud computing systems. Our results provide a simple policy scheme for the combined admission and preservation problem that can be applied in a complex cloud computing environments, and eliminate the need for sophisticated real-time control mechanisms.

Original languageEnglish
Pages (from-to)279-325
Number of pages47
JournalQueueing Systems
Volume94
Issue number3-4
DOIs
StatePublished - 1 Apr 2020

Keywords

  • Admission control
  • Cloud computing
  • Markov decision processes
  • Task management
  • Task preservation

Fingerprint

Dive into the research topics of 'Admit or preserve? Addressing server failures in cloud computing task management'. Together they form a unique fingerprint.

Cite this