1 Introduction

ClustAll is an R package designed for patient stratification in complex diseases. It addresses common challenges encountered in clinical data analysis and provides a versatile framework for identifying patient subgroups.

Patient stratification is essential in biomedical research for understanding disease heterogeneity, identifying prognostic factors, and guiding personalized treatment strategies. The ClustAll underlying concept is that a robust stratification should be reproducible through various clustering methods. ClustAll employs diverse distance metrics (Correlation-based distance and Gower distance) and clustering methods (K-Means, K-Medoids, and H-Clust).

1.1 ClustAll key features:

  • Handles Diverse Data Types, including missing values, mixed data, and correlated variables.
  • Provides Multiple Stratification Solutions, enabling exploration of different clustering algorithms and parameters.
  • Robustness Analysis, to identify stable and reproducible clusters.
  • Validation , for assessing the reliability of clustering results using clinical phenotypes (ground truth) if available.
  • Visualization functions for interpreting clustering results and comparing different stratifications.

1.2 Interpreting ClustAll Stratification Output

The names of ClustAll stratification outputs consist of a letter followed by a number, such as cuts_a_9. The letter denotes the combination of distance metric and clustering method utilized to generate the particular stratification, while the number corresponds to the embedding derived from the depth at which the dendrogram with grouped variables was cut.

Table 1: ClustAll Stratification Output Interpretation
Nomenclature Distance.Metric Clustering.Method
a Correlation K-means
b Correlation Hierarchical Clustering
c Gower K-medoids
d Gower Hierarchical-Clustering