AI for Clouds: Predictive and Learning Centric Solutions for Edge and Cloud Computing Systems
Contemporary Distributed Systems such as Edge and Clouds are large scale, highly interconnected and complex infrastructures distributed over multiple networks. On the other hand, workloads from diverse users demand a high level of Service Level Agreements (SLAs) to satisfy the application requirements. Resource management systems should account these factors in managing workloads and resources efficiently. However, due to the massive complexity of these interconnected systems and heterogeneous workload characteristics, it is impossible to manually fine-tune the controllable parameters to efficiently manage the resources and simultaneously satisfy workload requirements. Hence, innovative data-driven Artificial Intelligence (AI)-Centric solutions are necessary. The AI-centric solutions can capture complex non-linear relationships between different elements and effectively configure the system for efficiency.
Team Members @ Melbourne CLOUDS Lab
- Prof. Rajkumar Buyya
- Mr. Shashikant Ilager
- Mr. Mohammad Goudarzi
- Ms. Amanda Jayanetti
- Mr. Siddharth Agarwal
- Maria A. Rodriguez, Ramamohanarao Kotagiri, and Rajkumar Buyya, Detecting Performance Anomalies in Scientific Workflows using Hierarchical Temporal Memory, Future Generation Computer Systems, Volume 88, Pages: 624-635, ISSN: 0167-739X, Elsevier Press, Amsterdam, The Netherlands, November 2018.
- Shashikant Ilager, Kotagiri Ramamohanarao, and Rajkumar Buyya, ETAS: Energy and Thermal-Aware Dynamic Virtual Machine Consolidation in Cloud Data Center with Proactive Hotspot Mitigation, Concurrency and Computation: Practice and Experience (CCPE), Volume 31, No. 17, Pages: 1-15, ISSN: 1532-0626, Wiley Press, New York, USA, September 2019.
- Shashikant Ilager, Rajeev Muralidhar, and Rajkumar Buyya, Artificial Intelligence (AI)-Centric Management of Resources in Modern Distributed Computing Systems, Proceedings of the IEEE Cloud Summit 2020 (IEEE CS Press, USA), Harrisburg, PA, USA, October 21-22, 2020.
- Shreshth Tuli, Shashikant Ilager, Kotagiri Ramamohanarao, and Rajkumar Buyya, Dynamic Scheduling for Stochastic Edge-Cloud Computing Environments using A3C Learning and Residual Recurrent Neural Networks, IEEE Transactions on Mobile Computing (TMC), Volume ?, Number ?, Pages: ??, ISSN: 1536-1233, IEEE Computer Society Press, USA (in press, accepted on Aug 13, 2020).
- Shashikant Ilager, Rajeev Muralidhar, Kotagiri Rammohanrao, and Rajkumar Buyya, A Data-Driven Frequency Scaling Approach for Deadline-aware Energy Efficient Scheduling on Graphics Processing Units (GPUs), Proceedings of the 20th IEEE/ACM International Symposium on Cluster, Cloud, and Internet Computing (CCGrid 2020, IEEE CS Press, USA), Melbourne, Australia, May 11-14, 2020.
- Sara Kardani Moghaddam, Rajkumar Buyya, and Ramamohanarao Kotagiri, ADRL: A Hybrid Anomaly-aware Deep Reinforcement Learning-based Resource Scaling in Clouds, IEEE Transactions on Parallel and Distributed Systems (TPDS), Volume 32, No. 3, Pages: 514-526, ISSN: 1045-9219, IEEE CS Press, USA, March 2021.
- Shashikant Ilager, Kotagiri Ramamohanarao, and Rajkumar Buyya, Thermal Prediction for Efficient Energy Management of Clouds using Machine Learning, IEEE Transactions on Parallel and Distributed Systems (TPDS), Volume 32, No. 5, Pages: 1044-1056, ISSN: 1045-9219, IEEE CS Press, USA, May 2021.
- Siddharth Agarwal, Maria A. Rodriguez, and Rajkumar Buyya, A Reinforcement Learning Approach to Reduce Serverless Function Cold Start Frequency, Proceedings of the 21th IEEE/ACM International Symposium on Cluster, Cloud, and Internet Computing (CCGrid 2021, IEEE CS Press, USA), Melbourne, Australia, May 10-13, 2021.
Datasets and other useful pointers
- UniMelb Cloud Dataset (give pasword for download as: Dataset@1234)
Please note: the complete tutorial is under preperation. If you want the breif information of dataset, please refer the above TPDS paper . The parameters "Ambient_Temperature(inlet + max cpu temp); Cooling_Power (calculated based on the lumped RC model used CCPE paper ); Total_Power(cooling+ server power); CPU_Temp_Max (max of CPU1 and CPU2)" are computed externally. Remaining all parameters are collected from datacentre monitoring infrastrcuture. For any queries, contact:email@example.com