AISoLA 2025

Bridging the Gap Between AI and Reality • Rhodes, Greece

Talk

Federated Learning in Hybrid Cloud - Trends and Challenges for Fine-tuning Large Language Models

Room: Room A

Authors: Jude Vivek Joseph Arokiarajan

Abstract: Federated Learning (FL) has emerged as a promising paradigm for distributed and privacy-preserving machine learning. By training models across decentralised datasets without requiring raw data to leave its source, FL addresses some of the most pressing enterprise concerns, such as data governance, regulatory compli- ance, and confidentiality. With the rise of Large Language Models (LLMs), FL has become especially relevant for fine-tuning models using sensitive, domain- specific data, often spread across hybrid infrastructures that combine multi-cloud services and on-premises datacentres. However, existing FL frameworks are not well-suited for the complexity of hybrid cloud infrastructures, given the variability in compute, storage, and net- working capabilities [5]. Although progress has been made on the algorithmic aspects of FL, the architectural and operational dimensions remain underdevel- oped. Specifically, most frameworks lack modularity, which prevents easy exten- sion or replacement of components; standardised orchestration workflows, which are critical to coordinate the ML lifecycle and monitoring and observability fea- tures, which are essential for sustaining FL in long-running enterprise opera- tions [4]. The overall research aims to design a cloud-native reference architecture that supports observable, scalable, and maintainable FL deployments for LLM fine- tuning in hybrid cloud environments, addressing architectural gaps that hinder the adoption of production-grade FL. This work further explores two guiding sub-questions: – RQ1: What are the key trends, requirements, and architectural challenges related to observability, modularity and scalability in fine-tuning LLMs in federated learning environments within hybrid cloud infrastructures? – RQ2: How do current FL frameworks fall short in supporting LLM fine- tuning across hybrid cloud environments?