Δημοσιεύσεις

Project Acronym: ScaleSciComp
Title: Scale Scientific Computations
Affiliation: democritus university of thrace
Pi: George Gravvanis
Research Field: mathematics and computer sciences

A framework for simulating large scale cloud infrastructures
by Christos K. Filelis-Papadopoulos, George A. Gravvanis and Panagiotis E. Kyziropoulos
Abstract:
Abstract Cloud infrastructures are continuously growing in size, since more cloud nodes are added to already existing hyper-scale infrastructures. These hyper-scale infrastructures are also becoming heterogeneous as different types of accelerators are added in order to increase performance per watt for certain types of applications and allow for various HPC workloads to migrate to Cloud environments. The introduction of diverse workloads that migrate in the Cloud along with increasing volume of incoming tasks results in phenomena of network congestion, underutilization and resource fragmentation. Simulators are used to analyze, study and possibly improve Cloud environments. However, existing Cloud simulation tools lack the ability to handle heterogeneous resources and tasks that span across multiple Cloud nodes. Moreover, they are mostly sequential and cannot scale to large numbers of Cloud nodes. Furthermore, they do not support over-commitment, which is a common practice in real-world Cloud environments. A framework for simulating large numbers of heterogeneous cloud nodes organized in Cells and executing large numbers of HPC tasks is proposed. The framework is inherently parallel and designed for hybrid distributed memory parallel systems, supporting CPU, memory and network over-commitment. The simulation framework is based on a time advancing loop, allowing dynamic change of the granularity of the simulator and minimizing memory requirements, since data related to the current time-step is stored. Moreover, a latency model for the currency of data in the Gateway Service and Broker is also supported. Implementation details along with discussions concerning the extensibility of the framework are given. Numerical results for simulating large number of heterogeneous resources and incoming tasks are also presented.
Reference:
A framework for simulating large scale cloud infrastructures (Christos K. Filelis-Papadopoulos, George A. Gravvanis and Panagiotis E. Kyziropoulos), In Future Generation Computer Systems, 2017.
Bibtex Entry:
@article{FILELISPAPADOPOULOS2017,
 title = {A framework for simulating large scale cloud infrastructures},
 journal = {Future Generation Computer Systems},
 year = {2017},
 bibyear = {2017},
 issn = {0167-739X},
 doi = {https://doi.org/10.1016/j.future.2017.06.017},
 url = {http://www.sciencedirect.com/science/article/pii/S0167739X17303230},
 author = {Christos K. Filelis-Papadopoulos and George A. Gravvanis and Panagiotis E. Kyziropoulos},
 abstract = {Abstract Cloud infrastructures are continuously growing in size, since more cloud nodes are added to already existing hyper-scale infrastructures. These hyper-scale infrastructures are also becoming heterogeneous as different types of accelerators are added in order to increase performance per watt for certain types of applications and allow for various HPC workloads to migrate to Cloud environments. The introduction of diverse workloads that migrate in the Cloud along with increasing volume of incoming tasks results in phenomena of network congestion, underutilization and resource fragmentation. Simulators are used to analyze, study and possibly improve Cloud environments. However, existing Cloud simulation tools lack the ability to handle heterogeneous resources and tasks that span across multiple Cloud nodes. Moreover, they are mostly sequential and cannot scale to large numbers of Cloud nodes. Furthermore, they do not support over-commitment, which is a common practice in real-world Cloud environments. A framework for simulating large numbers of heterogeneous cloud nodes organized in Cells and executing large numbers of HPC tasks is proposed. The framework is inherently parallel and designed for hybrid distributed memory parallel systems, supporting CPU, memory and network over-commitment. The simulation framework is based on a time advancing loop, allowing dynamic change of the granularity of the simulator and minimizing memory requirements, since data related to the current time-step is stored. Moreover, a latency model for the currency of data in the Gateway Service and Broker is also supported. Implementation details along with discussions concerning the extensibility of the framework are given. Numerical results for simulating large number of heterogeneous resources and incoming tasks are also presented.},
}