iLLM: Melbourne Innovations for Large language Model (LLM) Platforms and Systems
Large Language Models (LLMs), such as ChatGPT, Qwen, DeepSeek, have emerged as powerful AI models, which are trained on massive amounts of text data to understand and generate human-like language with applications in chatbots, text summarization, query-answering systems, travelling planning. They can also be used for creating smart software systems and tools that help create software code and test suits, help in dynamic configuration of systems and networks, and automated creation of schedule. Our iLLM initiative explores (1) historical evolution of Large AI models, (2) challenges in deploying LLMs in Cloud computing environments, (3) novel techniques for efficient execution of LLM operation in Clouds, (4) LLM-guided approaches for efficient deployment of microservices based applications in Clouds, (5) future directions LLM and Quantum-powered software systems and applications.
Team Members @ Melbourne qCLOUDS Lab
- Rajkumar Buyya
- Muhammed Tawfiqul Islam
- Adel Toosi
- Yifan Sun
- Haoyu Bai
- Qifan Deng
External Collaborators
- Ruhui Ma, SJTU at Shanghai
- Minxian Xu, CAS @ Shenzhen
- All co-authors of papers noted below!
Publications
- Xueyuan Han, Zinuo Cai, Yichu Zhang, Chongxin Fan, Junhan Liu, Ruhui Ma and Rajkumar Buyya, Hermes: Memory-Efficient Pipeline Inference for Large Models on Edge Devices, Proceedings of the 42nd IEEE International Conference on Computer Design (ICCD 2024, IEEE CS Press, USA), Milan, Italy, November 18-20, 2024.
- Jing Bi, Ziqi Wang, Haitao Yuan, Xiankun Shi, Ziyue Wang, Jia Zhang, MengChu Zhou, and Rajkumar Buyya, Large AI Models and Their Applications: Classification, Limitations, and Potential Solutions, Software: Practice and Experience (SPE), Volume 55, Issue 6, Pages: 1003-1017, ISSN: 0038-0644, Wiley Press, New York, USA, June 2025.
- Tharindu B. Hewage, Shashikant Ilager, Maria Rodriguez Read, and Rajkumar Buyya, Aging-aware CPU Core Management for Embodied Carbon Amortization in Cloud LLM Inference, Proceedings of the 16th ACM International Conference on Future and Sustainable Energy Systems (E-ENERGY 2025, ACM Press, USA), Rotterdam, Netherlands, June 17-20, 2025.
- Renjun Zhang, Tianming Zhang, Zinuo Cai, Dongmei Li, Ruhui Ma, and Rajkumar Buyya, MemoriaNova: Optimizing Memory-Aware Model Inference for Edge Computing, ACM Transactions on Architecture and Code Optimization (TACO), Volume 22, Number 1, Article No. 3, Pages: 1-25, ISSN:1544-3566, ACM Press, New York, USA, March 2025.
- Jie Ou, Jinyu Guo, Shuaihong Jiang, Xu Li, Ruini Xue, Wenhong Tian, and Rajkumar Buyya, Accelerating Long-Context Inference of Large Language Models via Dynamic Attention Load Balancing, Knowledge-Based Systems, Volume 333, Pages: 1-21, ISSN: 0950-7051, Elsevier, Amsterdam, The Netherlands, January 2026.
