PDF  PubReader

Hwan and Jun: Clustered Federated Learning Based on Mahalanobis Distance for Sequential Medical Data

Tae Hwan Hwan and Bong Jun Jun

Clustered Federated Learning Based on Mahalanobis Distance for Sequential Medical Data

Abstract: In hospitals, metadata typically contains patients' personal information based on the doctor's diagnosis. Therefore, sniffers or hijackers could launch attacks to steal important information from hospitals or patients. For this reason, hospital data must be anonymized and protected by specialized systems to ensure its safe use, especially when multiple hospitals share data. If hospitals implement systems that can securely share data while maintaining privacy, researchers and clinicians can leverage large amounts of distributed data to more effectively train deep learning models. In this context, we select a solution based on clustered federated learning (CFL). In typical CFL scenarios, forming appropriate clusters can help build more personalized models for different groups. However, previous CFL approaches still face challenges from model heterogeneity. To further mitigate the heterogeneity problem, we propose a Mahalanobis distance based clustered federated learning (MD-CFL) method, which offers advantages in reducing model heterogeneity and improving clustering performance by correcting for feature skew in non-normalized data. Our experiments show that MD-CFL achieves accurate clustering performance, with a higher silhouette score compared to cosine-based FedAvg.

Keywords: Clustering , Detecting Emotion and Stress , Federated Learning , Mahalanobis Distance

1. Introduction

In recent years, researchers have been able to aggregate metadata from medical devices using various sensors. This metadata includes users' private data from distributed sources, making privacy a critical concern for training artificial intelligence (AI) models. Federated learning (FL) allows the model to be trained by sharing only parameters with a central server, without exposing the user's actual data. This feature of federated learning helps protect user privacy while allowing researchers to train AI models [1]. However, federated learning has its limitations, especially regarding model heterogeneity. This paper focuses on solving the problem of model heterogeneity by proposing a clustering method based on Mahalanobis distance, called Mahalanobis distance based clustered federated learning (MD-CFL). The method is designed to mitigate model heterogeneity. This research compares the performance of cosine-based FedAvg with our MD-CFL based FedAvg on the wearable stress and affect detection (WESAD) and K-EmoCon datasets. Experimental results show that under conditions of model heterogeneity, our MD-CFL based clustering method outperforms the cosine-based method and achieves a higher silhouette score.

2. Related Work

Clustered federated learning (CFL) is one of the fundamental methods for addressing data heterogeneity in distributed environments. In scenarios with data heterogeneity, federated learning typically performs worse than CFL because heterogeneity exists among distributed data. This paper provides an overview of federated learning and CFL.

2.1 Federated Learning

Federated learning is an approach in machine learning that enables multiple devices or organizations to collaboratively train a model without centralizing the training data. Rather than transferring all data to a central server, each participant (such as a mobile device or organization) trains a model locally on its own data and shares only the model updates (e.g., gradients or weights) with a central server. The central server aggregates these updates to refine the global model. This cycle continues iteratively until the global model achieves a desirable level of performance. FedAvg [1] is a basic federated learning architecture that combines local stochastic gradient descent (SGD) on each client (or device) with a central averaging step to update the global model (Fig. 1). This method efficiently aggregates updates from multiple clients to iteratively improve the global model.

Fig. 1.

Federated learning architecture.
1.png
2.2 Clustered Federated Learning

Recently, various CFL algorithms have been developed in the medical field. Jiang et al. [2, 3] proposed a clustering algorithm that incorporates a silhouette score adjustment method. In this direction, Yoo et al. [4-6] introduced personalized federated learning using a clustering approach. Although these works aim to address the challenges posed by heterogeneous data, they still face challenges related to model parameter heterogeneity. In federated learning applications, model parameters often contain different features during the aggregation process, which leads to parameter heterogeneity and negatively affects the performance of AI models.

To date, general clustered federated learning has been proposed as a solution to improve performance in the presence of data heterogeneity. Most approaches have focused primarily on data heterogeneity. However, this paper shifts the focus to model heterogeneity issues. In general, as noted by Yoo et al. [5, 6], model heterogeneity and data heterogeneity coexist in non-IID situations. Non-IID scenarios can be categorized into five major types, as shown in Table 1.

Table 1.

Non-IID cases in federated learning with data heterogeneity [ 5, 6]
Non-IID case Description
Feature distribution skew Marginal distributions of data features vary. For instance, even if two people wear the same smartwatch model and exercise for the same duration, the measured features will differ due to individual characteristics, like variations in their gait.
Label distribution skew Marginal distributions of data labels vary. For instance, frostbite commonly occurs in colder regions because it is caused by prolonged exposure to extreme cold, leading to tissue damage in affected body parts. Consequently, it is infrequent in areas with relatively warmer climates.
Same label but different features Conditional distributions of data features vary. For example, healthcare data is measured using medical devices like neuroimaging tools and patient biomarkers. However, hospitals do not all utilize the same brands of medical devices.
Same feature but different labels The conditional distributions of data labels differ. For instance, lung images affected by the recent COVID-19 virus can be challenging to differentiate from pneumonia, as they share similar characteristics in many lesions.
Quantity skew The volume of data from each patient or hospital varies. For example, if hospital A has seen five times as many patients as hospital B, the amount of data available from each hospital will differ significantly as well.

In this paper, a non-IID situation characterized by feature distribution skew was adopted to establish a data heterogeneous environment and model heterogeneous conditions.

Algorithm 1 outlines the basic clustered federated learning algorithm. In line 14, the data distance metric selected is cosine distance. Typically, clustering is performed using cosine similarity between the target parameter and the local parameter on the server side. Other basic algorithms follow the same principles as FedAvg, including the aggregation method, local training, uploading updates, and iterations. This algorithm uses the K-means machine learning method. The K-means method, initially developed in signal processing, seeks to divide n observations into k clusters. Each observation is assigned to the cluster with the nearest center (or centroid), which then acts as the representative point or prototype for that cluster. Most K-means implementations use cosine similarity scores. However, in the presence of model heterogeneity, Mahalanobis distance proves to be more effective. Mahalanobis distance considers model heterogeneity and decreases model bias by normalizing model vectors based on data distribution [7]. If we assume that all clients have the same parameter importance or features, cosine similarity is an appropriate method. However, in real-world medical applications, the importance and features of client models may vary significantly due to differences in data volume. As a result, this leads to model heterogeneity. Cosine similarity can be used to measure the similarity between models, as shown below:

(1)
[TeX:] $$\boldsymbol{d}\left(\Delta \boldsymbol{\theta}_i, \Delta \boldsymbol{\theta}_j\right)=\frac{\left\langle\Delta \boldsymbol{\theta}_i, \Delta \boldsymbol{\theta}_j\right\rangle}{\left\|\Delta \boldsymbol{\theta}_i\right\|\left\|\Delta \theta_j\right\|}$$

where [TeX:] $$\Delta \theta_{\boldsymbol{i}} \text { and } \Delta \theta_{\boldsymbol{j}}$$ denote the model parameters of clients i and j, respectively, this similarity ranges between [-1,1], with high similarity approaching 1 and low similarity approaching -1. Cosine similarity, along with other similarity measures, is typically used with data. However, in federated learning conditions, we do not have direct access to client data, but we can work with model parameters. Therefore, we apply cosine similarity to each model parameter compared to the global model.

Algorithm 1
Basic clustered federated learning algorithm
pseudo1.png

3. Proposed Method

To address the problems of model parameter heterogeneity, we propose MD-CFL. Mahalanobis distance is a similarity measure that calculates a specific normalized distance after preprocessing, allowing it to reflect the correlation between variables through data normalization, calculated as:

(2)
[TeX:] $$d\left(\Delta \theta_i, \Delta \theta_j\right)=\left[\left(\Delta \theta_i-\Delta \theta_j\right) S^{-1}\left(\Delta \theta_i-\Delta \theta_j\right)^T\right]^{1 / 2}$$

where [TeX:] $$S^{-1}$$ represents the covariance matrix of the target model parameter. Mahalanobis distance offers performance advantages in heterogeneous data conditions because it can adjust for feature skew in non-normalized data. By using this distance, we can also correct the skew in the model features. Therefore, we modified Eq. (2) by incorporating the normalization of the parameters based on the number of samples, as shown in Eq. (3).

This adjustment enables us to reflect the level of importance of the model, normalize heterogeneous model features, and calculate the distance between the model parameters of two specific clients. As a result, this modification enhances both model performance and clustering performance.

(3)
[TeX:] $$\begin{aligned} d(X, Y) & =\left[(X-Y) S^{-1}(X-Y)^T\right]^{1 / 2}, \\ X & =\frac{c_i}{c}\left(\Delta \theta_i\right), Y=\frac{c_j}{c}\left(\Delta \theta_j\right), \end{aligned}$$

where [TeX:] $$c_i \text{ and } c_j$$ denotes the number of samples of client i and j, respectively. C is the number of total samples in clusters. After the server clusters the models, we select a method to apply the clusters based on the number of clients included in each cluster. This approach can cause the global model features to skew toward clusters with a larger number of clients. Challenges may arise when each cluster is weighted during the aggregation process based on the number of clients or other metrics. In this paper, we focus only on clusters that include many clients, while excluding other clusters from the aggregation

Algorithm 2 differs from Algorithm 1, in line 14, in that it applies Mahalanobis distance instead of cosine similarity for clustering the client model parameters on the server.

Algorithm 2
Proposed MD-CFL based clustered federated learning algorithm
pseudo2.png

4. Experiments

4.1 Dataset

This paper focuses on medical analysis data for emotion detection. This type of data is contained in sequence metadata from wearable sensors or multimodal measurements, including audiovisual recordings. Emotion detection is one of the valuable tasks in the medical field that impacts the study of human cognitive development. This paper considers two types of emotional data and provides their descriptions.

4.1.1 WESAD data

The WESAD dataset contains stress levels and metadata aggregated through experiments in the United States [8] on specific wearable device users. The metadata was collected from wearable devices measured from the wrist and chest. Types of metadata measured from the chest include 3-axis accelerometer (ACC), electrocardiogram (ECG), electromyogram (EMG), electrodermal activity (EDA), skin temperature (Temp), and respiration (Resp). All signals are sampled at a rate of 700 Hz. We use the WESAD data measured from the chest of specific users. All the users' metadata have time-series data features. To select input data for stress detection, AI model training under conditions where stress is the output variable, input data and output data need to be calculated with a correlation index between them. This is because the correlation index shows the effect of the input variable on the deep learning results and indicates the importance of the variable in training.

Fig. 2.

Data characteristics of WESAD: (a) correlation between input variables and stress, (b) time-series data of selected data.
2.png

Fig. 2(a) shows that ACC, EDA, and Temp are important variables for stress detection. Fig. 2(b) shows the main features of the time series data (top: input data, bottom: stress level). Note that high ACC and EDA values tend to indicate stress reduction. Conversely, high Temp values tended to indicate increased stress. Data correlation analysis is used to select significant variables for stress detection and to analyze the user's stress pattern according to changes in input variables.

4.1.2 K-EmoCon

K-EmoCon is a rich, multimodal dataset that provides detailed, continuous emotion annotations recorded during natural conversations. The dataset includes various multimodal measurements such as audiovisual recordings, EEG data, and peripheral physiological signals, collected with commercially accessible equipment across 16 sessions of roughly 10-minute paired debates on social issues. Notably, it provides emotion annotations from three distinct perspectives: self-assessment, assessment by the debate partner, and by an external observer. While watching the debate footage, raters annotated the emotional expressions every 5 seconds, evaluating them in terms of arousal-valence as well as 18 other categorical emotions. As a result, K-EmoCon is the first publicly available dataset that supports multi-perspective emotion assessments within social interactions [9]. In this paper, ACC, EDA, and Temp from K-EmoCon are used to classify stressful or non-stressful tasks.

4.2 Neural Network Model

In this paper, the gate recurrent unit (GRU) model, widely used in time series data analysis, was used to reflect the features of time series data. GRU is a type of recurrent neural network (RNN). Therefore, it has a neural network structure for processing sequential data (e.g., sentences, time series data, etc.). GRU offers performance stability compared to previous RNN models.

4.3 Result

As shown in Table 2, CFL algorithms outperform non-CFL in situations with data heterogeneity. This is because data collected in a single centralized server has less heterogeneity. In general, traditional federated learning algorithms, such as FedAvg, show lower performance in terms of error rate (loss) compared to centralized learning (CL). However, as discussed above, CL algorithms do not consider privacy issues.

(4)
[TeX:] $$\begin{gathered} \text { Loss }=-\sum_{i=1}^n p\left(x_i\right) \log p\left(x_i\right)(\text { cross entropy }) \\ \text { Accuracy }=\frac{T P+T N}{T P+T N+F P+F N} \\ \text { Precision }=\frac{T P}{T P+F P} \\ \text { Recall }=\frac{T P}{T P+F N} \\ F 1-\text { score }=2 \times \frac{\text { Precision } \times \text { Recall }}{\text { Precision }+ \text { Recall }} \end{gathered}$$

where T is true of prediction, F is false of prediction, P is stress (positive label), and N is non-stress (negative lebel).

When clustered FedAvg based on cosine similarity is applied, models can achieve more accurate performance under heterogeneous data conditions compared to FedAvg (non-CFL). In other words, CFL has been shown to be more effective than non-CFL in situations with data heterogeneity [10].

Table 2.

Stress versus non-stress detection models performance for WESAD
Algorithm Epoch/Round Accuracy (%) F1-score (%) Loss
Centralized learning 60E/- 76.50 43.34 0.8467
FedAvg [1] 2E/30R 75.24 42.93 0.8756
CFL
Cosine based CFL [3] 2E/30R 77.79 43.74 0.8115
MD-CFL (Proposed) 2E/30R 77.79 43.74 0.8044

The bold font indicates the best results for CFL performance.

In Table 2, our MD-CFL achieved the best performance in terms of cross entropy loss. The reason for this result is that MD-CFL has features that decrease model heterogeneity and bias. MD-CFL then increases clustering performance and model performance.

In clustering, the silhouette score is an important metric because it measures the similarity between the global model and the client models. The higher the score, the more the client models tend to follow the global model.

Table 3.

Stress versus non-stress detection models performance for K-EmoCon
Algorithm Epoch/Round Accuracy (%) F1-score (%) Loss
Centralized learning 60E/- 64.24 47.72 0.6093
FedAvg [1] 2E/30R 50.22 39.71 0.6110
CFL
Cosine based CFL [3] 2E/30R 49.12 39.59 0.6112
MD-CFL (Proposed) 2E/30R 50.99 40.14 0.6109

The bold font indicates the best results for CFL performance.

Fig. 3.

Comparison of model performance on K-EmoCon: (a) accuracy, (b) precision, and (c) recall.
3.png

Table 3 and Fig. 3 show that MD-CFL achieves the best performance in CFL when applied to the K-EmoCon dataset. In this case, the K-EmoCon dataset presents more heterogeneous conditions than WESAD.

(5)
[TeX:] $$S(i)=\frac{b(i)-a(i)}{\max \{a(i), b(i)\}}, a(i)=\operatorname{mean}\{d(i, C)\}, b(i)=\min \{d(i, C)\}$$

where S(i) is silhouette score, i is data point, d is distance of vectors, and C is cluster. The silhouette score is used to evaluate cluster [11].

Fig. 4.

Comparison of silhouette score for (a) WESAD and (b) K-EmoCon.
4.png

In Fig. 4, MD-CFL shows a constant silhouette score pattern. This is because the amount of performance variation has been reduced by normalizing the models' distance. Therefore, our algorithm can minimize a poor clustering rate and model overfitting compared to cosine similarity-based algorithms.

5. Conclusion

In this paper, we propose the MD-CFL method to mitigate model heterogeneity problems in federated learning. The proposed method uses Mahalanobis distance and normalizes model parameters to mitigate model heterogeneity. As a result, the client's model parameters help reduce model heterogeneity, which in turn improves both model and clustering performance. In the experiments, the Mahalanobis distance-based method shows a higher silhouette score compared to the cosine-based method in both the WESAD and K-EmoCon datasets. As future work, we plan to consider various parameters related to clustering, such as cluster weights, and to incorporate other attention-based models, such as transformers, to further improve performance.

Conflict of Interest

The authors declare that they have no competing interests.

Funding

This research was supported by MSIT Korea under NRF Korea (RS-2025-00557379, 80%), by the KIAT grant funded by the Korean government (MOTIE) (P0017123, The Competency Development Program for Industry Specialist, 10%), and by the Convergence Security Core Talent Training Business Support Program (IITP-2025-RS-2024-00426853, 10%) supervised by the IITP.

Acknowledgements

The preliminary version of this work titled “Stress Affect Detection at Wearable Devices via Clustered Federated Learning based on Number of Samples Mahalanobis Distance” has been published on the Proceedings of the Annual Symposium of KIPS on May 2024.

Biography

Tae Hwan Yoon
https://orcid.org/0009-0002-7548-1155

He received his B.Sc. at the School of Computer Science and Engineering, Korea Bible University, Seoul, Korea, in 2017 and 2022. Since March 2023, he has been an M.S. student at the School of Computer Science and Engineering at Soongsil University in Korea. His current research interests include distributed artificial intelligence, medical data analysis, security, and LLM.

Biography

Bong Jun Choi
https://orcid.org/0000-0002-6550-749X

He is an associate professor at the School of Computer Science & Engineering and jointly at the School of Electronic Engineering, Soongsil University, Seoul, Korea. Previously, he was an assistant professor at the Department of Computer Science, State University of New York Korea, Korea, and concurrently a research assistant professor at the Department of Computer Science, Stony Brook University, USA. He received his B.Sc. and M.Sc. degrees from Yonsei University, Korea, both in Electrical and Electronics Engineering, and his Ph.D. from the University of Waterloo, Canada, in Electrical and Computer Engineering. His research focuses on distributed artificial intelligence, intelligent energy networks, federated learning, and security. He is a senior member of IEEE and a member of ACM.

References

  • 1 B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas, "Communication-efficient learning of deep networks from decentralized data," Proceedings of Machine Learning Research, vol. 54, pp. 1273-1282, 2017. https://proceedings.mlr.press/v54/mcmahan17acustom:[[[-]]]
  • 2 S. Jiang, Y . Li, F. Firouzi, and K. Chakrabarty, "Federated clustered multi-domain learning for health monitoring," Scientific Reports, vol. 14, no. 1, article no. 903, 2024. https://doi.org/10.1038/s41598-024- 51344-9doi:[[[10.1038/s41598-024-51344-9]]]
  • 3 S. Jiang, F. Firouzi, and K. Chakrabarty, "Low-overhead clustered federated learning for personalized stress monitoring," IEEE Internet of Things Journal, vol. 11, no. 3, pp. 4335-4347, 2024. https://doi.org/10.1109/ JIOT.2023.3299736doi:[[[10.1109/JIOT.2023.3299736]]]
  • 4 J. H. Yoo, H. M. Son, H. Jeong, E. H. Jang, A. Y . Kim, H. Y . Yu, H. J. Jeon, and T. M. Chung, "Personalized federated learning with clustering: non-iid heart rate variability data application," in Proceedings of 2021 International Conference on Information and Communication Technology Convergence (ICTC), Jeju Island, Republic of Korea, 2021, pp. 1046-1051. https://doi.org/10.1109/ICTC52510.2021.9620852doi:[[[10.1109/ICTC52510.2021.9620852]]]
  • 5 J. H. Yoo, H. Jeong, J. Lee, and T. M. Chung, "Federated learning: issues in medical application," in Future Data and Security Engineering. Cham, Switzerland: Springer, 2021, pp. 3-22. https://doi.org/10.1007/978-3- 030-91387-8_1doi:[[[10.1007/978-3-030-91387-8_1]]]
  • 6 J. H. Yoo, H. Jeong, J. Lee, and T. M. Chung, "Open problems in medical federated learning," International Journal of Web Information Systems, vol. 18, no. 2-3, pp. 77-99, 2022. https://doi.org/10.1108/IJWIS-04- 2022-0080doi:[[[10.1108/IJWIS-04-2022-0080]]]
  • 7 M. Lee, "Multivariate spatial cluster analysis using Mahalanobis distance," Journal of the Korean Cartographic Association, vol. 12, no. 2, pp. 37-46, 2012. https://journal.kci.go.kr/jkca/archive/articleView? artiId=ART001692589custom:[[[https://journal.kci.go.kr/jkca/archive/articleView?artiId=ART001692589]]]
  • 8 P. Schmidt, A. Reiss, R. Duerichen, C. Marberger, and K. Van Laerhoven, "Introducing WESAD, a multimodal dataset for wearable stress and affect detection," in Proceedings of the 20th ACM International Conference on Multimodal Interaction, Boulder, CO, USA, 2018, pp. 400-408. https://doi.org/10.1145/3242 969.3242985doi:[[[10.1145/3242969.3242985]]]
  • 9 C. Y . Park, N. Cha, S. Kang, A. Kim, A. H. Khandoker, L. Hadjileontiadis, A. Oh, Y . Jeong, and U. Lee, "KEmoCon, a multimodal sensor dataset for continuous emotion recognition in naturalistic conversations," Scientific Data, vol. 7, no. 1, article no. 293, 2020. https://doi.org/10.1038/s41597-020-00630-ydoi:[[[10.1038/s41597-020-00630-y]]]
  • 10 F. Sattler, K. R. Muller, and W. Samek, "Clustered federated learning: model-agnostic distributed multitask optimization under privacy constraints," IEEE Transactions on Neural Networks and Learning Systems, vol. 32, no. 8, pp. 3710-3722, 2021. https://doi.org/10.1109/TNNLS.2020.3015958doi:[[[10.1109/TNNLS.2020.3015958]]]
  • 11 G. Vardakas, I. Papakostas, and A. Likas, "Deep clustering using the soft silhouette score: towards compact and well-separated clusters," 2024 (Online). Available: https://arxiv.org/abs/2402.00608.doi:[[[https://arxiv.org/abs/2402.00608]]]