Principal component analysis (PCA) provides an effective statistical approach for exploiting the patterns in CD4 count and viral load data over time. The method defines a new variable related to adherence to antiretroviral therapy (ART) that captures information from longitudinal data using feature extraction properties of PCA which is demonstrated using data from patients who acquired HIV-1 during follow-up in an ART cohort and were subsequently followed prospectively from early infection.
The PCA scores for each patient obtained by this method serve as informative summary statistics for the CD4-count and viral-load trajectories. Similar to baseline CD4 count or viral load, the first PCA score can be interpreted as a single-value summary measure of an individual’s overall treatment response to ART, but unlike most single-value summaries of CD4-count or viral-load trajectories, the first PCA score summarizes the dynamics of these quantities and reveals specific features of the trajectories associated with the effectiveness of the adherence of ART. Moreover, PCA scores are used as powerful prognostic factor than other common summaries when used in predictive analysis.
Doing a PCA under tidy principles requires running the function
PCA()
from FactoMineR
package on the matrix of
scaled numeric predictor variables, and then visualizing the result
nicely using the factoextra
package.
In general, when performing PCA, three things are wanted: 1) Look at the data in PC coordinates; 2) Look at the rotation matrix; and 3) Look at the variance explained by each principal component (PC).
library(FactoMineR)
data(viral_new, package = "viruslearner")
res_pca <- PCA(viral_new, graph = FALSE)
print(res_pca)
#> **Results for the Principal Component Analysis (PCA)**
#> The analysis was performed on 87 individuals, described by 9 variables
#> *The results are available in the following objects:
#>
#> name description
#> 1 "$eig" "eigenvalues"
#> 2 "$var" "results for the variables"
#> 3 "$var$coord" "coord. for the variables"
#> 4 "$var$cor" "correlations variables - dimensions"
#> 5 "$var$cos2" "cos2 for the variables"
#> 6 "$var$contrib" "contributions of the variables"
#> 7 "$ind" "results for the individuals"
#> 8 "$ind$coord" "coord. for the individuals"
#> 9 "$ind$cos2" "cos2 for the individuals"
#> 10 "$ind$contrib" "contributions of the individuals"
#> 11 "$call" "summary statistics"
#> 12 "$call$centre" "mean of the variables"
#> 13 "$call$ecart.type" "standard error of the variables"
#> 14 "$call$row.w" "weights for the individuals"
#> 15 "$call$col.w" "weights for the variables"
To look at the data in PC coordinates requires combining the PC
coordinates with the original dataset. This is done via the
res_pca$ind$coord
object. The columns containing the fitted
coordinates are called Dim.1
, Dim.2
, etc.
The transformation of variables shown in Figure @ref(fig:plotcoords), in which PCA scores are prognostic factors, reflects the clinical belief that adherence to ART may be affected by the change and other patterns of CD4 counts and viral loads over time. The approach uses PCA with data from the entire cohort to extract the primary structure in individual trajectories and provides a concise summary of each trajectory (PCA score) to reveal clinical features of the CD4 and viral load trajectories that serve as an ART adherence transformed predictor variable.
The rotation matrix is stored as the res_pca$var$coord
object. The rotation matrix is essential for understanding the
relationship between original variables and principal components.
res_pca$var$coord
#> Dim.1 Dim.2 Dim.3 Dim.4 Dim.5
#> cd_2018 0.27870555 -0.04062273 -0.26163183 0.873137932 0.21538696
#> cd_2019 0.64518533 0.03420379 -0.12376197 0.065834990 0.24795163
#> vl_2019 -0.07573479 -0.07003117 0.78505366 0.031053421 0.60176292
#> cd_2021 0.76117069 0.13314906 0.25218852 -0.036074808 -0.08770293
#> vl_2021 -0.34552296 0.74720041 -0.07602424 0.071507915 0.08454595
#> cd_2022 0.85936771 0.18392938 0.10428379 -0.005222839 -0.15593741
#> vl_2022 -0.17860498 0.82464513 0.11759695 0.051856613 -0.02878516
#> cd_2023 0.86609530 0.11313863 0.11020329 -0.100126342 -0.15036423
#> vl_2023 -0.30954010 -0.10341100 0.54940982 0.422680202 -0.58356162
A negative value indicates an inverse relationship with each PC. Patients with lower baseline CD4 counts contribute more to the negative direction of PC’s. A positive value suggests a positive relationship with each PC. Patients with higher baseline viral loads contribute more to the positive direction of PC’s.
Figure @ref(fig:plotrot) shows the rotation matrix in the context of a plot. The correlation between a variable and a PC is used as the coordinates of the variable on the PC. The correlation circle plot shows the relationships between all variables, with positively correlated variables shown grouped together, negatively correlated variables are positioned on opposite sides of the plot origin (opposed quadrants), and the distance between variables and the origin measures the quality of the variables on the factor map.
Patients with lower baseline values are positioned more towards the negative direction of PC1 and patients with higher baseline values are positioned more towards the positive direction of PC1. Patients with slightly decreasing baseline values are positioned more towards the negative direction of PC2, and those with slightly increasing baseline values are positioned more towards the positive direction of PC2.
Patients with similar patterns in CD4 counts and viral loads are likely to cluster together in the PC1 vs PC2 dispersion graph with the direction of the contribution values indicating how each variable influences the position of patients along PC1 and PC2 axes. Lower baseline CD4 counts, higher baseline viral loads, decreasing CD4 counts, and decreasing viral loads tend to be grouped together based on the negative directions of PC1 and PC2. Conversely, slightly decreasing baseline CD4 counts, slightly increasing baseline viral loads, decreasing CD4 counts, and increasing viral loads are grouped together based on the positive directions of PC1 and PC2.
The variance explained by each PC can be extracted via the function
get_eigenvalue
.
eig_val <- get_eigenvalue(res_pca)
eig_val
#> eigenvalue variance.percent cumulative.variance.percent
#> Dim.1 2.8147923 31.275470 31.27547
#> Dim.2 1.3211254 14.679171 45.95464
#> Dim.3 1.1081563 12.312848 58.26749
#> Dim.4 0.9654834 10.727593 68.99508
#> Dim.5 0.8731286 9.701429 78.69651
#> Dim.6 0.6741441 7.490490 86.18700
#> Dim.7 0.5923136 6.581263 92.76826
#> Dim.8 0.4196698 4.662998 97.43126
#> Dim.9 0.2311864 2.568738 100.00000
And is shown in the context of a plot in Figure @ref(fig:plotvarpc).
The first component captures 45.2% of the variation in the data and together with the second component approximately 80% of the variability is captured. Eigenvalue greater than 1 indicate those PCs that account for more variance than accounted by one of the original variables in standarized data, so it becomes a commonly used cutoff point for which PCs are retained.
The association of the outcome features (cd_2022 and vl_2022) of the data with the first two principal components is shown in the scatter plots of Figure @ref(fig:outvspc).
The extracted coordinates for the individual observations serve as
the transformed variable for scoring the adherence to ART. These values
get stored in the res_pca$ind$coord
object.
res_pca$ind$coord
#> Dim.1 Dim.2 Dim.3 Dim.4 Dim.5
#> 1 -0.98870520 -0.5405155102 -0.383502511 0.11758005 0.3573502180
#> 2 -0.68151256 -0.3917371875 -0.536657820 0.63130683 0.2126490172
#> 3 0.57755011 0.0003973339 -0.100869418 -0.09672156 -0.0800287624
#> 4 2.43507136 0.4466142907 0.699071390 -0.52886808 -0.6855489745
#> 5 0.19217912 -0.1142184061 -0.028523841 0.21826272 -0.1810635238
#> 6 -1.51331100 -0.5031996885 -0.448863209 -0.04448105 0.2568046717
#> 7 1.40048059 -0.5320656166 -2.337653919 7.53067393 1.9670461536
#> 8 1.55749377 0.1253492882 0.284136941 -0.07535803 -0.2796376847
#> 9 -1.64727635 -0.5781447220 -0.482786717 -0.27219243 0.1612452440
#> 10 -1.67709265 -0.6461470595 -0.616179382 -0.36691068 0.4267772631
#> 11 -2.68138741 -0.7014226723 -0.533090186 -0.42818750 -0.0446504283
#> 12 -0.34904282 -0.3529301054 -0.553990858 -0.11787367 0.4174126575
#> 13 3.05438607 0.3235781005 -0.323566625 0.40517362 0.1227363909
#> 14 -1.31307839 0.5159140137 0.900200003 -0.16693948 1.1948880288
#> 15 0.41711257 -0.1168868326 -0.043978876 -0.17905349 -0.1864074227
#> 16 0.40319571 -0.1713163002 -0.557921669 0.61280572 0.6199189632
#> 17 -1.44667518 0.0311466462 -0.224201788 -0.37717503 0.1422834188
#> 18 1.32279186 0.0716430706 -0.023473630 0.11602289 -0.0850725955
#> 19 -1.18967950 -0.4688077427 -0.384614984 -0.31641020 0.0140775889
#> 20 2.44633402 0.2508076295 0.039074793 0.18016519 0.0383323379
#> 21 0.62139003 0.0502922717 0.322809044 -0.81226127 -0.7921883905
#> 22 0.38001562 -0.1303155323 -0.250956909 0.62257892 -0.0560720836
#> 23 0.48687394 0.0806233493 -0.027685311 -0.51955124 -0.3001967630
#> 24 -1.79190896 -0.5895877612 -0.429458041 -0.47314270 -0.0829786697
#> 25 0.12407424 -0.3117160968 -0.296564871 -0.04460880 0.2570393396
#> 26 1.55601245 -0.0069240447 -0.653396942 0.36426775 0.5093770890
#> 27 0.22951246 -0.1510644844 0.019957383 -0.28813007 -0.1457492868
#> 28 0.57390223 0.0046820600 0.046540072 -0.43762365 -0.3902294005
#> 29 0.73309854 -0.1101504847 -0.276041977 1.01379683 -0.0271487190
#> 30 -4.62647652 -1.5830336096 5.353325336 3.81577233 -5.0454807764
#> 31 0.53180432 -0.0395805398 0.115362542 -0.72131663 -0.4635237465
#> 32 -0.87666139 -0.4228828798 -0.276286449 -0.21336956 0.0129220628
#> 33 1.77438416 0.1777073962 -0.069467327 0.48157466 -0.0812141341
#> 34 0.74548656 -0.0367837441 -0.059557020 0.01273757 -0.0785535028
#> 35 -0.62274769 -0.3441353458 -0.202906963 -0.12698436 -0.0827287253
#> 36 2.36639454 0.2830740834 0.482457920 -0.16748537 -0.4061329146
#> 37 -0.25158190 -0.3276692868 -0.486333352 -0.01520842 0.7739732192
#> 38 -1.31337910 -0.4043508899 -0.129280826 -0.60200783 -0.1982142235
#> 39 -2.14767808 8.3465508538 0.930016658 0.52344357 -0.5234965121
#> 40 0.85515110 0.0266804074 0.265865181 -0.40060584 -0.4398120854
#> 41 0.55270440 -0.0792691160 0.007468191 -0.14163775 -0.2321123096
#> 42 -1.20220290 -0.4900904558 -0.372030506 -0.42609660 0.0352072741
#> 43 -0.83946495 -0.3412849549 -0.161729926 -0.42599418 -0.1711841934
#> 44 0.39343074 -0.1939855459 -0.346757257 -0.13461730 0.2219048244
#> 45 -1.45420619 -0.5016847594 -0.196978869 -0.57656854 -0.0177663974
#> 46 0.48486528 -0.0204741505 -0.069404591 -0.47503675 -0.0887807342
#> 47 -4.91847286 5.4350935095 -1.126008885 0.51424312 0.9472752765
#> 48 1.13479028 0.1370867673 0.264652052 -0.07473402 -0.8316267457
#> 49 2.27879028 0.2770138821 -0.120278510 -0.16561978 -0.4332859531
#> 50 2.71605796 0.2629746063 0.185099586 0.27460016 0.0956020524
#> 51 0.23240636 -0.0369567364 0.339201950 -0.74403402 -0.5957382520
#> 52 -0.40654486 -0.2129938055 0.007364296 -0.51889936 -0.4270906197
#> 53 1.24428239 0.1493784888 -0.031411269 -0.48587751 -0.2972151901
#> 54 -0.29539979 -0.2117351780 -0.012914211 -0.13535101 -0.3680289293
#> 55 -0.83044701 -0.3450922252 -0.229181356 -0.26270782 -0.0421493294
#> 56 -1.05362652 -0.4926583759 -0.453822550 0.07096787 0.1738621313
#> 57 0.01031647 -0.2307256977 -0.338203562 0.22398656 0.1209286166
#> 58 1.60733633 0.1558112991 0.141661709 -0.45410522 -0.2835030517
#> 59 -2.73214608 -0.8254271651 -0.614658499 -0.46792962 0.3046034945
#> 60 0.74642343 -0.0768789780 0.067742274 -0.19745048 -0.2340123804
#> 61 -1.22977262 -0.4115264528 -0.063764598 -0.62886181 -0.2280905536
#> 62 0.71621951 -0.0294710947 -0.012392441 -0.18788655 -0.2850253030
#> 63 -1.23238510 -0.4517404722 -0.547464672 0.07551622 0.1127870993
#> 64 -1.10159090 -0.5078295697 -0.719780641 -0.14225313 0.5863538021
#> 65 0.58735640 -0.0685325227 -0.044355792 -0.26500383 -0.0040050172
#> 66 1.00314451 -0.0288961090 0.071213248 -0.18368757 -0.0972814613
#> 67 0.79295082 0.0678617926 0.112018151 -0.30775994 -0.1565951400
#> 68 -2.71981554 -0.7944818559 -0.695617684 0.03478114 0.0283567255
#> 69 -0.16434052 -0.3126216383 -0.547949389 -0.13846149 0.2903203341
#> 70 3.93328382 0.6072039993 0.313854170 0.52356243 -0.3724959129
#> 71 -2.06398342 -0.6271244583 -0.266960841 -0.58047286 -0.1158176662
#> 72 0.40636505 -0.1103113427 -0.299305868 0.07230995 0.2284518215
#> 73 -0.30923708 0.2077146091 -0.252024774 -0.06269505 0.2561422396
#> 74 -1.90970358 -0.4929084137 -0.131379749 -0.52761419 -0.4296979594
#> 75 1.98265024 0.3976080356 0.850942882 -0.74286984 -1.1541846965
#> 76 -3.09730941 -0.8451052258 -0.605125093 0.06885556 0.3872850584
#> 77 -1.37492155 -0.5191609467 -0.374699021 -0.25565095 0.0003256777
#> 78 3.09281666 0.4757441197 0.492893156 -0.11716075 -0.5384380202
#> 79 2.20764458 0.3445328342 0.095728043 -0.07602299 -0.3573542786
#> 80 -0.86556815 -0.4551266749 0.853133716 -0.25119432 1.1167943694
#> 81 4.36106362 0.6709446328 0.494252043 0.33963788 -0.2263232092
#> 82 -0.72834646 -0.3606536878 -0.358613386 -0.15867011 0.1403901733
#> 83 0.85408000 1.0833476133 0.248846014 -0.28400861 -0.2581958862
#> 84 -0.80503884 -0.5735560973 -1.026307994 0.48184166 1.1058518237
#> 85 -0.67080711 -0.4005282405 -0.354056712 -0.37497694 0.1570103861
#> 86 -0.14576387 -0.5495637998 6.891136868 -0.16009620 5.6521120979
#> 87 1.14561551 0.1666053059 0.214964455 -0.40199106 -0.5442704268