-
-
Notifications
You must be signed in to change notification settings - Fork 40
/
Copy path25-multivariate.Rmd
2716 lines (1883 loc) · 76.2 KB
/
25-multivariate.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
# Multivariate Methods {#sec-multivariate-methods}
In the previous section on [ANOVA](#sec-analysis-of-variance-anova), we examined how to compare means across multiple groups. However, [ANOVA](#sec-analysis-of-variance-anova) primarily deals with a **single response variable**. In many business and financial applications, we often need to analyze multiple interrelated variables simultaneously. For instance:
- In **marketing**, customer purchase behavior, brand perception, and loyalty scores are often studied together.
- In **finance**, portfolio risk assessment involves analyzing correlations between different asset returns.
To handle such cases, we use [multivariate methods](#sec-multivariate-methods), which extend classical statistical techniques to multiple dependent variables. At the core of multivariate analysis lies the [covariance matrix](#sec-covariance-matrix-multivariate), which captures relationships between multiple random variables.
## Basic Understanding
### Multivariate Random Vectors
Let $y_1, \dots, y_p$ be random variables, possibly correlated, with means $\mu_1, \dots, \mu_p$. We define the random vector:
$$
\mathbf{y} =
\begin{bmatrix}
y_1 \\
\vdots \\
y_p
\end{bmatrix}
$$
The expected value (mean vector) is:
$$
E(\mathbf{y}) =
\begin{bmatrix}
\mu_1 \\
\vdots \\
\mu_p
\end{bmatrix}
$$
### Covariance Matrix {#sec-covariance-matrix-multivariate}
The covariance between any two variables $y_i$ and $y_j$ is:
$$
\sigma_{ij} = \text{cov}(y_i, y_j) = E[(y_i - \mu_i)(y_j - \mu_j)]
$$
This leads to the **variance-covariance matrix**, also called the **dispersion matrix**:
$$
\mathbf{\Sigma} = (\sigma_{ij}) =
\begin{bmatrix}
\sigma_{11} & \sigma_{12} & \dots & \sigma_{1p} \\
\sigma_{21} & \sigma_{22} & \dots & \sigma_{2p} \\
\vdots & \vdots & \ddots & \vdots \\
\sigma_{p1} & \sigma_{p2} & \dots & \sigma_{pp}
\end{bmatrix}
$$
where $\sigma_{ii} = \text{Var}(y_i)$ represents the variance of $y_i$. Since covariance is symmetric, we have:
$$
\sigma_{ij} = \sigma_{ji}, \quad \forall i, j.
$$
If we consider two random vectors $\mathbf{u}_{p \times 1}$ and $\mathbf{v}_{q \times 1}$ with means $\mu_u$ and $\mu_v$, their **cross-covariance matrix** is:
$$
\mathbf{\Sigma}_{uv} = \text{cov}(\mathbf{u}, \mathbf{v}) = E[(\mathbf{u} - \mu_u)(\mathbf{v} - \mu_v)']
$$
where $\mathbf{\Sigma}_{uv} \neq \mathbf{\Sigma}_{vu}$, but they satisfy:
$$
\mathbf{\Sigma}_{uv} = \mathbf{\Sigma}_{vu}'.
$$
#### Properties of Covariance Matrices
A valid covariance matrix $\mathbf{\Sigma}$ satisfies the following properties:
1. **Symmetry**:\
$$\mathbf{\Sigma}' = \mathbf{\Sigma}.$$
2. **Non-negative definiteness**:\
$$\mathbf{a}'\mathbf{\Sigma} \mathbf{a} \geq 0, \quad \forall \mathbf{a} \in \mathbb{R}^p,$$ which implies that the **eigenvalues** $\lambda_1, \dots, \lambda_p$ satisfy: $$\lambda_1 \geq \lambda_2 \geq \dots \geq \lambda_p \geq 0.$$
3. **Generalized variance** (determinant of $\mathbf{\Sigma}$):\
$$|\mathbf{\Sigma}| = \lambda_1 \lambda_2 \dots \lambda_p \geq 0.$$
4. **Total variance** (trace of $\mathbf{\Sigma}$):\
$$\text{tr}(\mathbf{\Sigma}) = \sum_{i=1}^{p} \lambda_i = \sum_{i=1}^{p} \sigma_{ii}.$$
5. **Positive definiteness** (a common assumption in multivariate analysis):
- All eigenvalues of $\mathbf{\Sigma}$ are strictly positive.
- $\mathbf{\Sigma}$ has an inverse $\mathbf{\Sigma}^{-1}$, satisfying: $$\mathbf{\Sigma}^{-1} \mathbf{\Sigma} = \mathbf{I}_{p \times p} = \mathbf{\Sigma} \mathbf{\Sigma}^{-1}.$$
#### Correlation Matrices
The **correlation matrix** provides a standardized measure of linear relationships between variables. The correlation between two variables $y_i$ and $y_j$ is defined as:
$$
\rho_{ij} = \frac{\sigma_{ij}}{\sqrt{\sigma_{ii} \sigma_{jj}}}
$$
where $\sigma_{ij}$ is the covariance and $\sigma_{ii}$ and $\sigma_{jj}$ are variances.
Thus, the **correlation matrix** $\mathbf{R}$ is:
$$
\mathbf{R} =
\begin{bmatrix}
\rho_{11} & \rho_{12} & \dots & \rho_{1p} \\
\rho_{21} & \rho_{22} & \dots & \rho_{2p} \\
\vdots & \vdots & \ddots & \vdots \\
\rho_{p1} & \rho_{p2} & \dots & \rho_{pp}
\end{bmatrix}
$$
where $\rho_{ii} = 1$ for all $i$.
Alternatively, the correlation matrix can be expressed as:
$$
\mathbf{R} = [\text{diag}(\mathbf{\Sigma})]^{-1/2} \mathbf{\Sigma} [\text{diag}(\mathbf{\Sigma})]^{-1/2}
$$
where:
- $\text{diag}(\mathbf{\Sigma})$ is a diagonal matrix with elements $\sigma_{ii}$ on the diagonal and zeros elsewhere.
- $\mathbf{A}^{1/2}$ (the square root of a symmetric matrix) is a symmetric matrix satisfying $\mathbf{A} = \mathbf{A}^{1/2} \mathbf{A}^{1/2}$.
------------------------------------------------------------------------
### Equalities in Expectation and Variance
Let:
- $\mathbf{x}$ and $\mathbf{y}$ be random vectors with means $\mu_x$ and $\mu_y$ and covariance matrices $\mathbf{\Sigma}_x$ and $\mathbf{\Sigma}_y$.
- $\mathbf{A}$ and $\mathbf{B}$ be matrices of constants, and $\mathbf{c}$ and $\mathbf{d}$ be vectors of constants.
Then the following properties hold:
1. **Expectation transformations**: $$
E(\mathbf{Ay + c}) = \mathbf{A} \mu_y + \mathbf{c}
$$
2. **Variance transformations**: $$
\text{Var}(\mathbf{Ay + c}) = \mathbf{A} \text{Var}(\mathbf{y}) \mathbf{A}' = \mathbf{A \Sigma_y A'}
$$
3. **Covariance of linear transformations**: $$
\text{Cov}(\mathbf{Ay + c}, \mathbf{By + d}) = \mathbf{A \Sigma_y B'}
$$
4. **Expectation of combined variables**: $$
E(\mathbf{Ay + Bx + c}) = \mathbf{A} \mu_y + \mathbf{B} \mu_x + \mathbf{c}
$$
5. **Variance of combined variables**: $$
\text{Var}(\mathbf{Ay + Bx + c}) =
\mathbf{A \Sigma_y A' + B \Sigma_x B' + A \Sigma_{yx} B' + B\Sigma'_{yx}A'}
$$
------------------------------------------------------------------------
### Multivariate Normal Distribution
The **multivariate normal distribution (MVN)** is fundamental in multivariate analysis. Let $\mathbf{y}$ be a multivariate normal random variable with mean $\mu$ and covariance matrix $\mathbf{\Sigma}$. Then its **probability density function (PDF)** is:
$$
f(\mathbf{y}) = \frac{1}{(2\pi)^{p/2} |\mathbf{\Sigma}|^{1/2}}
\exp \left(-\frac{1}{2} (\mathbf{y} - \mu)' \mathbf{\Sigma}^{-1} (\mathbf{y} - \mu) \right).
$$
We denote this distribution as:
$$
\mathbf{y} \sim N_p(\mu, \mathbf{\Sigma}).
$$
------------------------------------------------------------------------
#### Properties of the Multivariate Normal Distribution
The multivariate normal distribution has several important properties that are fundamental to multivariate statistical methods.
- **Linear Transformations**:\
Let $\mathbf{A}_{r \times p}$ be a fixed matrix. Then:
$$
\mathbf{Ay} \sim N_r (\mathbf{A \mu}, \mathbf{A \Sigma A'})
$$
where $r \leq p$. Additionally, for $\mathbf{A \Sigma A'}$ to be **non-singular**, the rows of $\mathbf{A}$ must be **linearly independent**.
- **Standardization using Precision Matrix**:\
Let $\mathbf{G}$ be a matrix such that:
$$
\mathbf{\Sigma}^{-1} = \mathbf{GG}'
$$
Then:
$$
\mathbf{G'y} \sim N_p(\mathbf{G' \mu}, \mathbf{I})
$$
and:
$$
\mathbf{G'(y-\mu)} \sim N_p (0,\mathbf{I}).
$$
This transformation **whitens** the data, converting it into an identity covariance structure.
- **Linear Combinations**:\
Any fixed linear combination of $y_1, \dots, y_p$, say $\mathbf{c'y}$, follows:
$$
\mathbf{c'y} \sim N_1 (\mathbf{c' \mu}, \mathbf{c' \Sigma c}).
$$
------------------------------------------------------------------------
#### Partitioning the MVN Distribution
Consider a partitioned random vector:
$$
\mathbf{y} =
\begin{bmatrix}
\mathbf{y}_1 \\
\mathbf{y}_2
\end{bmatrix}
\sim
N_p
\left(
\begin{bmatrix}
\mu_1 \\
\mu_2
\end{bmatrix},
\begin{bmatrix}
\mathbf{\Sigma}_{11} & \mathbf{\Sigma}_{12} \\
\mathbf{\Sigma}_{21} & \mathbf{\Sigma}_{22}
\end{bmatrix}
\right).
$$
where:
- $\mathbf{y}_1$ is $p_1 \times 1$,
- $\mathbf{y}_2$ is $p_2 \times 1$,
- $p_1 + p_2 = p$,
- and $p_1, p_2 \geq 1$.
The marginal distributions of $\mathbf{y}_1$ and $\mathbf{y}_2$ are:
$$
\mathbf{y}_1 \sim N_{p_1}(\mathbf{\mu_1}, \mathbf{\Sigma_{11}})
\quad \text{and} \quad
\mathbf{y}_2 \sim N_{p_2}(\mathbf{\mu_2}, \mathbf{\Sigma_{22}}).
$$
Each component $y_i$ follows:
$$
y_i \sim N_1(\mu_i, \sigma_{ii}).
$$
The **conditional distribution** of $\mathbf{y}_1$ given $\mathbf{y}_2$ is also normal:
$$
\mathbf{y}_1 | \mathbf{y}_2 \sim N_{p_1} \Big(
\mathbf{\mu_1 + \Sigma_{12} \Sigma_{22}^{-1}(y_2 - \mu_2)},
\mathbf{\Sigma_{11} - \Sigma_{12} \Sigma_{22}^{-1} \Sigma_{21}}
\Big).
$$
This equation shows that **knowing** $\mathbf{y}_2$ adjusts the mean of $\mathbf{y}_1$, and the variance is reduced.\
Similarly, the conditional distribution of $\mathbf{y}_2$ given $\mathbf{y}_1$ follows the same structure.
- $\mathbf{y}_1$ and $\mathbf{y}_2$ are **independent** if and only if:
$$
\mathbf{\Sigma}_{12} = 0.
$$
If $\mathbf{y} \sim N(\mathbf{\mu}, \mathbf{\Sigma})$ and $\mathbf{\Sigma}$ is **positive definite**, then:
$$
(\mathbf{y} - \mu)' \mathbf{\Sigma}^{-1} (\mathbf{y} - \mu) \sim \chi^2_p.
$$
This property is essential in **hypothesis testing** and **Mahalanobis distance calculations**.
------------------------------------------------------------------------
#### Summation of Independent MVN Variables
If $\mathbf{y}_i$ are independent random vectors following:
$$
\mathbf{y}_i \sim N_p (\mathbf{\mu}_i , \mathbf{\Sigma}_i),
$$
then for fixed matrices $\mathbf{A}_{i(m \times p)}$, the sum:
$$
\sum_{i=1}^k \mathbf{A}_i \mathbf{y}_i
$$
follows:
$$
\sum_{i=1}^k \mathbf{A}_i \mathbf{y}_i \sim N_m \Big(
\sum_{i=1}^{k} \mathbf{A}_i \mathbf{\mu}_i, \sum_{i=1}^k \mathbf{A}_i \mathbf{\Sigma}_i \mathbf{A}_i'
\Big).
$$
This property underpins **multivariate regression** and **linear discriminant analysis**.
------------------------------------------------------------------------
#### Multiple Regression
In multivariate analysis, multiple regression extends simple regression to cases where multiple predictor variables influence a response variable. Suppose:
$$
\left(
\begin{array}
{c}
Y \\
\mathbf{x}
\end{array}
\right)
\sim
N_{p+1}
\left(
\left[
\begin{array}
{c}
\mu_y \\
\mathbf{\mu}_x
\end{array}
\right]
,
\left[
\begin{array}
{cc}
\sigma^2_Y & \mathbf{\Sigma}_{yx} \\
\mathbf{\Sigma}_{yx} & \mathbf{\Sigma}_{xx}
\end{array}
\right]
\right)
$$
where:
- $Y$ is a scalar response variable.
- $\mathbf{x}$ is a $p \times 1$ vector of predictors.
- $\mu_y$ and $\mathbf{\mu}_x$ are the respective means.
- $\sigma_Y^2$ is the variance of $Y$.
- $\mathbf{\Sigma}_{xx}$ is the covariance matrix of $\mathbf{x}$.
- $\mathbf{\Sigma}_{yx}$ is the covariance vector between $Y$ and $\mathbf{x}$.
From the properties of the **multivariate normal distribution**, the conditional expectation of $Y$ given $\mathbf{x}$ is:
$$
\begin{aligned}
E(Y| \mathbf{x}) &= \mu_y + \mathbf{\Sigma}_{yx} \mathbf{\Sigma}_{xx}^{-1} (\mathbf{x}- \mathbf{\mu}_x) \\
&= \mu_y - \mathbf{\Sigma}_{yx} \mathbf{\Sigma}_{xx}^{-1} \mathbf{\mu}_x + \mathbf{\Sigma}_{yx} \mathbf{\Sigma}_{xx}^{-1} \mathbf{x} \\
&= \beta_0 + \mathbf{\beta' x},
\end{aligned}
$$
where:
- $\beta_0 = \mu_y - \mathbf{\Sigma}_{yx} \mathbf{\Sigma}_{xx}^{-1} \mathbf{\mu}_x$ (intercept).
- $\mathbf{\beta} = (\beta_1, \dots, \beta_p)' = \mathbf{\Sigma}_{xx}^{-1} \mathbf{\Sigma}_{yx}'$ (regression coefficients).
This resembles the least squares estimator:
$$
\mathbf{\beta} = (\mathbf{x'x})^{-1} \mathbf{x'y},
$$
but differs when considering the theoretical covariance relationships rather than empirical estimates.
The **conditional variance** of $Y$ given $\mathbf{x}$ is:
$$
\text{Var}(Y | \mathbf{x}) = \sigma^2_Y - \mathbf{\Sigma}_{yx} \mathbf{\Sigma}_{xx}^{-1} \mathbf{\Sigma'}_{yx}.
$$
This shows that knowing $\mathbf{x}$ **reduces uncertainty** in predicting $Y$.
------------------------------------------------------------------------
#### Samples from Multivariate Normal Populations
Suppose we have a random sample of size $n$, denoted as:
$$
\mathbf{y}_1, \dots, \mathbf{y}_n \sim N_p (\mathbf{\mu}, \mathbf{\Sigma}).
$$
Then:
1. **Sample Mean**: The sample mean is given by:
$$
\bar{\mathbf{y}} = \frac{1}{n} \sum_{i=1}^n \mathbf{y}_i.
$$
Since $\mathbf{y}_i$ are independent and identically distributed (iid), it follows that:
$$
\bar{\mathbf{y}} \sim N_p (\mathbf{\mu}, \mathbf{\Sigma} / n).
$$
This implies that $\bar{\mathbf{y}}$ is an unbiased estimator of $\mathbf{\mu}$.
2. **Sample Covariance Matrix**: The $p \times p$ sample variance-covariance matrix is:
$$
\mathbf{S} = \frac{1}{n-1} \sum_{i=1}^n (\mathbf{y}_i - \bar{\mathbf{y}})(\mathbf{y}_i - \bar{\mathbf{y}})'.
$$
Expanding this:
$$
\mathbf{S} = \frac{1}{n-1} \left( \sum_{i=1}^n \mathbf{y}_i \mathbf{y}_i' - n \bar{\mathbf{y}} \bar{\mathbf{y}}' \right).
$$
- $\mathbf{S}$ is symmetric.
- $\mathbf{S}$ is an unbiased estimator of $\mathbf{\Sigma}$.
- $\mathbf{S}$ contains $p(p+1)/2$ unique random variables.
3. **Wishart Distribution**: The scaled sample covariance matrix follows a Wishart distribution:
$$
(n-1) \mathbf{S} \sim W_p(n-1, \mathbf{\Sigma}).
$$
where:
- $W_p(n-1, \mathbf{\Sigma})$ is a Wishart distribution with $n-1$ degrees of freedom.
- $E[(n-1) \mathbf{S}] = (n-1) \mathbf{\Sigma}$.
The Wishart distribution is a multivariate generalization of the chi-square distribution.
4. **Independence of** $\bar{\mathbf{y}}$ and $\mathbf{S}$: The sample mean $\bar{\mathbf{y}}$ and sample covariance matrix $\mathbf{S}$ are independent:
$$
\bar{\mathbf{y}} \perp \mathbf{S}.
$$
This result is crucial for inference in multivariate hypothesis testing.
5. **Sufficiency of** $\bar{\mathbf{y}}$ and $\mathbf{S}$: The pair $(\bar{\mathbf{y}}, \mathbf{S})$ are sufficient statistics for $(\mathbf{\mu}, \mathbf{\Sigma})$.\
That is, all the information about $\mathbf{\mu}$ and $\mathbf{\Sigma}$ in the sample is contained in $\bar{\mathbf{y}}$ and $\mathbf{S}$, regardless of sample size.
------------------------------------------------------------------------
#### Large Sample Properties
Consider a random sample $\mathbf{y}_1, \dots, \mathbf{y}_n$ drawn from a population with mean $\mathbf{\mu}$ and variance-covariance matrix $\mathbf{\Sigma}$.
**Key Properties**
- **Consistency of Estimators**:
- The sample mean $\bar{\mathbf{y}}$ is a consistent estimator of $\mathbf{\mu}$.
- The sample covariance matrix $\mathbf{S}$ is a consistent estimator of $\mathbf{\Sigma}$.
- Multivariate [Central Limit Theorem]:
- Similar to the univariate case, the sample mean follows approximately:
$$
\sqrt{n}(\bar{\mathbf{y}} - \mu) \dot{\sim} N_p (\mathbf{0}, \mathbf{\Sigma})
$$
This approximation holds when the sample size is large relative to the number of variables ($n \geq 25p$).
- Equivalently, the sample mean follows:
$$
\bar{\mathbf{y}} \dot{\sim} N_p (\mathbf{\mu}, \mathbf{\Sigma} / n).
$$
- **Wald's Theorem**:
- When $n$ is large relative to $p$:
$$
n(\bar{\mathbf{y}} - \mathbf{\mu})' \mathbf{S}^{-1} (\bar{\mathbf{y}} - \mathbf{\mu}) \sim \chi^2_p.
$$
This is useful for hypothesis testing about $\mathbf{\mu}$.
------------------------------------------------------------------------
#### Maximum Likelihood Estimation for MVN
Suppose $\mathbf{y}_1, \dots, \mathbf{y}_n$ are iid random vectors from:
$$
\mathbf{y}_i \sim N_p (\mathbf{\mu}, \mathbf{\Sigma}).
$$
The likelihood function for the sample is:
$$
\begin{aligned}
L(\mathbf{\mu}, \mathbf{\Sigma}) &= \prod_{j=1}^n \left[ \frac{1}{(2\pi)^{p/2}|\mathbf{\Sigma}|^{1/2}}
\exp \left(-\frac{1}{2} (\mathbf{y}_j - \mathbf{\mu})' \mathbf{\Sigma}^{-1} (\mathbf{y}_j - \mathbf{\mu}) \right) \right] \\
&= \frac{1}{(2\pi)^{np/2}|\mathbf{\Sigma}|^{n/2}}
\exp \left(-\frac{1}{2} \sum_{j=1}^n (\mathbf{y}_j - \mathbf{\mu})' \mathbf{\Sigma}^{-1} (\mathbf{y}_j - \mathbf{\mu}) \right).
\end{aligned}
$$
Taking the log-likelihood function and differentiating with respect to $\mathbf{\mu}$ and $\mathbf{\Sigma}$ leads to the maximum likelihood estimators:
The MLE for the mean is simply the sample mean:
$$
\hat{\mathbf{\mu}} = \bar{\mathbf{y}}.
$$
The MLE for the covariance matrix is:
$$
\hat{\mathbf{\Sigma}} = \frac{n-1}{n} \mathbf{S}.
$$
where:
$$
\mathbf{S} = \frac{1}{n-1} \sum_{j=1}^n (\mathbf{y}_j - \bar{\mathbf{y}})(\mathbf{y}_j - \bar{\mathbf{y}})'.
$$
This differs from $\mathbf{S}$ by the factor $\frac{n-1}{n}$, making $\hat{\mathbf{\Sigma}}$ a **biased estimator** of $\mathbf{\Sigma}$.
------------------------------------------------------------------------
##### Properties of Maximum Likelihood Estimators
MLEs have several important theoretical properties:
1. **Invariance**:
- If $\hat{\theta}$ is the MLE of $\theta$, then the MLE of any function $h(\theta)$ is:
$$
h(\hat{\theta}).
$$
2. **Consistency**:
- MLEs are consistent estimators, meaning they converge to the true parameter values as $n \to \infty$.
- However, they can be biased for finite samples.
3. **Efficiency**:
- MLEs are asymptotically efficient, meaning they achieve the Cramér-Rao lower bound for variance in large samples.
- No other estimator has a smaller variance asymptotically.
4. **Asymptotic Normality**:
- Suppose $\hat{\theta}_n$ is the MLE for $\theta$ based on $n$ independent observations.
- Then, for large $n$:
$$
\hat{\theta}_n \dot{\sim} N(\theta, \mathbf{H}^{-1}),
$$
where $\mathbf{H}$ is the [Fisher Information Matrix], defined as:
$$
\mathbf{H}_{ij} = -E\left(\frac{\partial^2 l(\mathbf{\theta})}{\partial \theta_i \partial \theta_j}\right).
$$
- The [Fisher Information Matrix] measures the amount of information in the data about $\theta$.
- It can be estimated by evaluating the second derivatives of the log-likelihood function at $\hat{\theta}_n$.
------------------------------------------------------------------------
##### Likelihood Ratio Testing
MLEs allow us to construct likelihood ratio tests for hypothesis testing.
- Suppose we test a null hypothesis $H_0$:
$$
H_0: \mathbf{\theta} \in \Theta_0 \quad \text{vs.} \quad H_A: \mathbf{\theta} \in \Theta.
$$
- The likelihood ratio statistic is:
$$
\Lambda = \frac{\max_{\theta \in \Theta_0} L(\mathbf{\mu}, \mathbf{\Sigma} | \mathbf{Y})}
{\max_{\theta \in \Theta} L(\mathbf{\mu}, \mathbf{\Sigma} | \mathbf{Y})}.
$$
- Under large sample conditions, we use the Wilks' theorem, which states:
$$
-2 \log \Lambda \sim \chi^2_v,
$$
where:
- $v$ is the difference in the number of parameters between the unrestricted and restricted models.
- This allows us to approximate the distribution of $-2 \log \Lambda$ using the chi-square distribution.
------------------------------------------------------------------------
### Test of Multivariate Normality
Assessing multivariate normality is essential for many statistical techniques, including multivariate regression, principal component analysis, and MANOVA. Below are key methods for testing MVN.
#### Univariate Normality Checks
Before testing for multivariate normality, it is useful to check for univariate normality in each variable separately:
- Normality Assessment: Visual and statistical tests can be used to check normality.
- Key Property: If any univariate distribution is not normal, then the joint multivariate distribution cannot be normal.
- Important Caveat: Even if all univariate distributions are normal, this does not guarantee multivariate normality.
Thus, **univariate normality is a necessary but not sufficient condition** for MVN.
------------------------------------------------------------------------
#### Mardia's Test for Multivariate Normality {#sec-mardias-test-for-multivariate-normality}
@mardia1970measures proposed two measures for assessing MVN:
**1. Multivariate Skewness**
Defined as:
$$
\beta_{1,p} = E[(\mathbf{y} - \mathbf{\mu})' \mathbf{\Sigma}^{-1} (\mathbf{x} - \mathbf{\mu})]^3,
$$
where $\mathbf{x}$ and $\mathbf{y}$ are independent but identically distributed.
**2. Multivariate Kurtosis**
Defined as:
$$
\beta_{2,p} = E[(\mathbf{y} - \mathbf{\mu})' \mathbf{\Sigma}^{-1} (\mathbf{x} - \mathbf{\mu})]^2.
$$
For a **true multivariate normal distribution**:
$$
\beta_{1,p} = 0, \quad \beta_{2,p} = p(p+2).
$$
**Sample Estimates**
For a random sample of size $n$, we estimate:
$$
\hat{\beta}_{1,p} = \frac{1}{n^2} \sum_{i=1}^{n} \sum_{j=1}^{n} g^2_{ij},
$$
$$
\hat{\beta}_{2,p} = \frac{1}{n} \sum_{i=1}^{n} g^2_{ii},
$$
where:
- $g_{ij} = (\mathbf{y}_i - \bar{\mathbf{y}})' \mathbf{S}^{-1} (\mathbf{y}_j - \bar{\mathbf{y}})$,
- $g_{ii} = d_i^2$, which is the Mahalanobis distance.
@mardia1970measures derived the following large-sample approximations:
$$
\kappa_1 = \frac{n \hat{\beta}_{1,p}}{6} \dot{\sim} \chi^2_{p(p+1)(p+2)/6},
$$
$$
\kappa_2 = \frac{\hat{\beta}_{2,p} - p(p+2)}{\sqrt{8p(p+2)/n}} \sim N(0,1).
$$
**Interpretation**
- $\kappa_1$ and $\kappa_2$ are test statistics for the null hypothesis of MVN.
- Non-normality in means is associated with skewness ($\beta_{1,p}$).
- Non-normality in covariance is associated with kurtosis ($\beta_{2,p}$).
------------------------------------------------------------------------
#### Doornik-Hansen Test
- This test transforms variables to approximate normality using skewness and kurtosis corrections [@doornik2008omnibus].
- Recommended when sample sizes are small.
#### Chi-Square Q-Q Plot
The Chi-Square Q-Q plot is a graphical method for assessing MVN:
1. Compute Mahalanobis distances:
$$
d_i^2 = (\mathbf{y}_i - \bar{\mathbf{y}})' \mathbf{S}^{-1} (\mathbf{y}_i - \bar{\mathbf{y}}).
$$
2. The transformed variables:
$$
\mathbf{z}_i = \mathbf{\Sigma}^{-1/2}(\mathbf{y}_i - \mathbf{\mu})
$$
are iid from $N_p(\mathbf{0}, \mathbf{I})$, and thus:
$$
d_i^2 \sim \chi^2_p.
$$
3. Plot ordered $d_i^2$ values against the theoretical quantiles of the $\chi^2_p$ distribution.
**Interpretation**
- If the data are MVN, the plot should resemble a straight line at 45°.
- Deviations suggest non-normality, especially in the tails.
**Limitations**
- Requires a large sample size.
- Even when data are truly MVN, the tails may deviate.
------------------------------------------------------------------------
#### Handling Non-Normality
If data **fail** the multivariate normality tests, possible approaches include:
1. **Ignoring non-normality** (acceptable for large samples due to the [Central Limit Theorem]).
2. **Using nonparametric methods** (e.g., permutation tests).
3. **Applying approximate models** (e.g., [Generalized Linear Mixed Models]).
4. **Transforming the data** (e.g., log, Box-Cox, or rank transformations \@ref(variable-transformation)).
------------------------------------------------------------------------
```{r, message = FALSE}
# Load necessary libraries
library(heplots) # Multivariate hypothesis tests
library(ICSNP) # Multivariate tests
library(MVN) # Multivariate normality tests
library(tidyverse) # Data wrangling & visualization
# Load dataset
trees <- read.table("images/trees.dat")
names(trees) <-
c("Nitrogen", "Phosphorous", "Potassium", "Ash", "Height")
# Structure of dataset
str(trees)
# Summary statistics
summary(trees)
# Pearson correlation matrix
cor(trees, method = "pearson")
# Q-Q plots for each variable
gg <- trees %>%
pivot_longer(everything(), names_to = "Var", values_to = "Value") %>%
ggplot(aes(sample = Value)) +
geom_qq() +
geom_qq_line() +
facet_wrap( ~ Var, scales = "free")
print(gg)
# Shapiro-Wilk test for univariate normality
sw_tests <- apply(trees, MARGIN = 2, FUN = shapiro.test)
sw_tests
# Kolmogorov-Smirnov test for normality
ks_tests <- map(trees, ~ ks.test(scale(.x), "pnorm"))
ks_tests
# Mardia's test for multivariate normality
mardia_test <-
mvn(
trees,
mvnTest = "mardia",
covariance = FALSE,
multivariatePlot = "qq"
)
mardia_test$multivariateNormality
# Doornik-Hansen test
dh_test <-
mvn(
trees,
mvnTest = "dh",
covariance = FALSE,
multivariatePlot = "qq"
)
dh_test$multivariateNormality
# Henze-Zirkler test
hz_test <-
mvn(
trees,
mvnTest = "hz",
covariance = FALSE,
multivariatePlot = "qq"
)
hz_test$multivariateNormality
# Royston's test (only for 3 < obs < 5000)
royston_test <-
mvn(
trees,
mvnTest = "royston",
covariance = FALSE,
multivariatePlot = "qq"
)
royston_test$multivariateNormality
# Energy test
estat_test <-
mvn(
trees,
mvnTest = "energy",
covariance = FALSE,
multivariatePlot = "qq"
)
estat_test$multivariateNormality
```
### Mean Vector Inference
#### Univariate Case
In the univariate normal distribution, we test:
$$
H_0: \mu = \mu_0
$$
using the t-test statistic:
$$
T = \frac{\bar{y} - \mu_0}{s/\sqrt{n}} \sim t_{n-1}.
$$
**Decision Rule**
- If $H_0$ is true, then $T$ follows a t-distribution with $n-1$ degrees of freedom.
- We reject $H_0$ if:
$$
|T| > t_{(1-\alpha/2, n-1)}
$$
because an extreme value suggests that observing $\bar{y}$ under $H_0$ is unlikely.
**Alternative Formulation**
Squaring $T$, we obtain:
$$
T^2 = \frac{(\bar{y} - \mu_0)^2}{s^2/n} = n(\bar{y} - \mu_0) (s^2)^{-1} (\bar{y} - \mu_0).
$$
Under $H_0$:
$$
T^2 \sim f_{(1,n-1)}.
$$
This formulation allows for a direct extension to the multivariate case.
------------------------------------------------------------------------
#### Multivariate Generalization: Hotelling's $T^2$ Test
For a p-dimensional mean vector, we test:
$$
\begin{aligned}
&H_0: \mathbf{\mu} = \mathbf{\mu}_0, \\
&H_a: \mathbf{\mu} \neq \mathbf{\mu}_0.
\end{aligned}
$$
Define the Hotelling's $T^2$ test statistic:
$$
T^2 = n(\bar{\mathbf{y}} - \mathbf{\mu}_0)' \mathbf{S}^{-1} (\bar{\mathbf{y}} - \mathbf{\mu}_0).
$$
where:
- $\bar{\mathbf{y}}$ is the sample mean vector,
- $\mathbf{S}$ is the sample covariance matrix,
- $T^2$ can be interpreted as a generalized squared distance between $\bar{\mathbf{y}}$ and $\mathbf{\mu}_0$.
Under multivariate normality, the test statistic follows an F-distribution:
$$
F = \frac{n-p}{(n-1)p} T^2 \sim f_{(p, n-p)}.
$$
We reject $H_0$ if:
$$
F > f_{(1-\alpha, p, n-p)}.
$$
------------------------------------------------------------------------
**Key Properties of Hotelling's** $T^2$ Test
1. **Invariance to Measurement Scale**:
- If we apply a linear transformation to the data:
$$
\mathbf{z} = \mathbf{C} \mathbf{y} + \mathbf{d},
$$
where $\mathbf{C}$ and $\mathbf{d}$ do not depend on $\mathbf{y}$, then:
$$
T^2(\mathbf{z}) = T^2(\mathbf{y}).
$$
This ensures that unit changes (e.g., inches to centimeters) do not affect the test results.
2. **Likelihood Ratio Test**:
- The $T^2$ test can be derived as a likelihood ratio test for $H_0: \mathbf{\mu} = \mathbf{\mu}_0$.
------------------------------------------------------------------------
```{r, message = FALSE}
# Load required packages
library(MASS) # For multivariate analysis
library(ICSNP) # For Hotelling's T^2 test
# Simulated dataset (5 variables, 30 observations)
set.seed(123)
n <- 30 # Sample size
p <- 5 # Number of variables
mu <- rep(0, p) # Population mean vector
Sigma <- diag(p) # Identity covariance matrix
# Generate multivariate normal data
data <- mvrnorm(n, mu, Sigma)
colnames(data) <- paste0("V", 1:p)
# Compute sample mean and covariance
sample_mean <- colMeans(data)
sample_cov <- cov(data)
# Perform Hotelling's T^2 test (testing against mu_0 = rep(0, p))
hotelling_test <- HotellingsT2(data, mu = rep(0, p))
# Print results
print(hotelling_test)
```
#### Confidence Intervals
##### Confidence Region for the Mean Vector
An exact $100(1-\alpha)\%$ confidence region for the population mean vector $\mathbf{\mu}$ is the set of all vectors $\mathbf{v}$ that are "close enough" to the observed mean vector $\bar{\mathbf{y}}$ such that:
$$
n(\bar{\mathbf{y}} - \mathbf{\mu}_0)' \mathbf{S}^{-1} (\bar{\mathbf{y}} - \mathbf{\mu}_0) \leq \frac{(n-1)p}{n-p} f_{(1-\alpha, p, n-p)}.
$$
**Interpretation**
- The confidence region consists of all mean vectors $\mathbf{\mu}_0$ for which we fail to reject $H_0$ in the Hotelling's $T^2$ test.
- If $p = 2$, this confidence region forms a hyper-ellipsoid.
**Why Use Confidence Regions?**
- They provide a joint assessment of plausible values for $\mathbf{\mu}$.
- However, in practice, we often prefer individual confidence intervals for each mean component.
------------------------------------------------------------------------
##### Simultaneous Confidence Intervals
We want simultaneous confidence statements, ensuring that all individual confidence intervals hold simultaneously with high probability.
**Simultaneous Confidence Intervals (General Form)**
By projecting the confidence region onto the coordinate axes, we obtain simultaneous confidence intervals:
$$
\bar{y}_{i} \pm \sqrt{\frac{(n-1)p}{n-p} f_{(1-\alpha, p, n-p)} \frac{s_{ii}}{n}}, \quad \text{for } i = 1, \dots, p.
$$