-
-
Notifications
You must be signed in to change notification settings - Fork 40
/
Copy path09-nonlinear_generalized_linear_mixed.Rmd
1201 lines (812 loc) · 44.1 KB
/
09-nonlinear_generalized_linear_mixed.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
# Nonlinear and Generalized Linear Mixed Models {#sec-nonlinear-and-generalized-linear-mixed-models}
Nonlinear Mixed Models (NLMMs) and Generalized Linear Mixed Models (GLMMs) extend traditional models by incorporating both fixed effects and random effects, allowing for greater flexibility in modeling complex data structures.
- NLMMs extend nonlinear models to include both fixed and random effects, accommodating nonlinear relationships in the data.
- GLMMs extend generalized linear models to include random effects, allowing for correlated data and non-constant variance structures.
------------------------------------------------------------------------
## Nonlinear Mixed Models {#sec-nonlinear-mixed-models}
A general form of a nonlinear mixed model is:
$$
Y_{ij} = f(\mathbf{x}_{ij}, \boldsymbol{\theta}, \boldsymbol{\alpha}_i) + \epsilon_{ij}
$$
for the $j$-th response from the $i$-th cluster (or subject), where:
- $i = 1, \ldots, n$ (number of clusters/subjects),
- $j = 1, \ldots, n_i$ (number of observations per cluster),
- $\boldsymbol{\theta}$ represents the fixed effects,
- $\boldsymbol{\alpha}_i$ are the random effects for cluster $i$,
- $\mathbf{x}_{ij}$ are the regressors or design variables,
- $f(\cdot)$ is a nonlinear mean response function,
- $\epsilon_{ij}$ represents the residual error, often assumed to be normally distributed with mean 0.
NLMMs are particularly useful when the relationship between predictors and the response cannot be adequately captured by a linear model.
------------------------------------------------------------------------
## Generalized Linear Mixed Models {#sec-generalized-linear-mixed-models}
GLMMs extend GLMs by incorporating random effects, which allows for modeling data with hierarchical or clustered structures.
The conditional distribution of $y_i$ given the random effects $\boldsymbol{\alpha}_i$ is:
$$
y_i \mid \boldsymbol{\alpha}_i \sim \text{independent } f(y_i \mid \boldsymbol{\alpha})
$$
where $f(y_i \mid \boldsymbol{\alpha})$ belongs to the exponential family of distributions:
$$
f(y_i \mid \boldsymbol{\alpha}) = \exp \left( \frac{y_i \theta_i - b(\theta_i)}{a(\phi)} - c(y_i, \phi) \right)
$$
- $\theta_i$ is the canonical parameter,
- $a(\phi)$ is a dispersion parameter,
- $b(\theta_i)$ and $c(y_i, \phi)$ are specific functions defining the exponential family.
The conditional mean of $y_i$ is related to $\theta_i$ by:
$$
\mu_i = \frac{\partial b(\theta_i)}{\partial \theta_i}
$$
Applying a link function $g(\cdot)$, we relate the mean response to both fixed and random effects:
$$
\begin{aligned}
E(y_i \mid \boldsymbol{\alpha}) &= \mu_i \\
g(\mu_i) &= \mathbf{x}_i' \boldsymbol{\beta} + \mathbf{z}_i' \boldsymbol{\alpha}
\end{aligned}
$$
- $g(\cdot)$ is a known link function,
- $\mathbf{x}_i$ and $\mathbf{z}_i$ are design matrices for fixed and random effects, respectively,
- $\boldsymbol{\beta}$ represents fixed effects, and $\boldsymbol{\alpha}$ represents random effects.
We also specify the distribution of the random effects:
$$
\boldsymbol{\alpha} \sim f(\boldsymbol{\alpha})
$$
This distribution is often assumed to be multivariate normal (Law of large Number applies to fixed effects) but can be chosen (subjectively) based on the context.
------------------------------------------------------------------------
## Relationship Between NLMMs and GLMMs
NLMMs can be viewed as a special case of GLMMs when the inverse link function corresponds to a nonlinear transformation of the linear predictor:
$$
\begin{aligned}
\mathbf{Y}_i &= \mathbf{f}(\mathbf{x}_i, \boldsymbol{\theta}, \boldsymbol{\alpha}_i) + \boldsymbol{\epsilon}_i \\
\mathbf{Y}_i &= g^{-1}(\mathbf{x}_i' \boldsymbol{\beta} + \mathbf{z}_i' \boldsymbol{\alpha}_i) + \boldsymbol{\epsilon}_i
\end{aligned}
$$
Here, $g^{-1}(\cdot)$ represents the inverse link function, corresponding to a nonlinear transformation of the fixed and random effects.
Note:\
We can't derive the analytical formulation of the marginal distribution because nonlinear combination of normal variables is not normally distributed, even in the case of additive error ($\epsilon_i$) and random effects ($\alpha_i$) are both normal.
------------------------------------------------------------------------
## Marginal Properties of GLMMs
### Marginal Mean of $y_i$
The marginal mean is obtained by integrating over the distribution of the random effects:
$$
E(y_i) = E_{\boldsymbol{\alpha}}(E(y_i \mid \boldsymbol{\alpha})) = E_{\boldsymbol{\alpha}}(\mu_i) = E\left(g^{-1}(\mathbf{x}_i' \boldsymbol{\beta} + \mathbf{z}_i' \boldsymbol{\alpha})\right)
$$
Since $g^{-1}(\cdot)$ is nonlinear, this expectation cannot be simplified further without specific distributional assumptions.
#### Special Case: Log Link Function
For a log-link function, $g(\mu) = \log(\mu)$, the inverse link is the exponential function:
$$
E(y_i) = E\left(\exp(\mathbf{x}_i' \boldsymbol{\beta} + \mathbf{z}_i' \boldsymbol{\alpha})\right)
$$
Using properties of the moment-generating function (MGF):
$$
E(y_i) = \exp(\mathbf{x}_i' \boldsymbol{\beta}) \cdot E\left(\exp(\mathbf{z}_i' \boldsymbol{\alpha})\right)
$$
Here, $E(\exp(\mathbf{z}_i' \boldsymbol{\alpha}))$ is the MGF of $\boldsymbol{\alpha}$ evaluated at $\mathbf{z}_i$.
------------------------------------------------------------------------
### Marginal Variance of $y_i$
The variance decomposition formula applies:
$$
\begin{aligned}
\operatorname{Var}(y_i) &= \operatorname{Var}_{\boldsymbol{\alpha}}\left(E(y_i \mid \boldsymbol{\alpha})\right) + E_{\boldsymbol{\alpha}}\left(\operatorname{Var}(y_i \mid \boldsymbol{\alpha})\right) \\
&= \operatorname{Var}(\mu_i) + E\left(a(\phi) V(\mu_i)\right)
\end{aligned}
$$
Expressed explicitly:
$$
\operatorname{Var}(y_i) = \operatorname{Var}\left(g^{-1}(\mathbf{x}_i' \boldsymbol{\beta} + \mathbf{z}_i' \boldsymbol{\alpha})\right) + E\left(a(\phi) V\left(g^{-1}(\mathbf{x}_i' \boldsymbol{\beta} + \mathbf{z}_i' \boldsymbol{\alpha})\right)\right)
$$
Without specific assumptions about $g(\cdot)$ and the distribution of $\boldsymbol{\alpha}$, this is the most general form.
------------------------------------------------------------------------
### Marginal Covariance of $\mathbf{y}$
Random effects induce correlation between observations within the same cluster. The covariance between $y_i$ and $y_j$ is:
$$
\begin{aligned}
\operatorname{Cov}(y_i, y_j) &= \operatorname{Cov}_{\boldsymbol{\alpha}}\left(E(y_i \mid \boldsymbol{\alpha}), E(y_j \mid \boldsymbol{\alpha})\right) + E_{\boldsymbol{\alpha}}\left(\operatorname{Cov}(y_i, y_j \mid \boldsymbol{\alpha})\right) \\
&= \operatorname{Cov}(\mu_i, \mu_j) + E(0) \\
&= \operatorname{Cov}\left(g^{-1}(\mathbf{x}_i' \boldsymbol{\beta} + \mathbf{z}_i' \boldsymbol{\alpha}), g^{-1}(\mathbf{x}_j' \boldsymbol{\beta} + \mathbf{z}_j' \boldsymbol{\alpha})\right)
\end{aligned}
$$
The second term vanishes when $y_i$ and $y_j$ are conditionally independent given $\boldsymbol{\alpha}$. This dependency structure is a hallmark of mixed models.
------------------------------------------------------------------------
Example: Repeated Measurements with a Poisson GLMM
Consider repeated count measurements for subjects:
- Let $y_{ij}$ be the $j$-th count for subject $i$.
- Assume $y_{ij} \mid \alpha_i \sim \text{independent } \text{Poisson}(\mu_{ij})$.
The model is specified as:
$$
\log(\mu_{ij}) = \mathbf{x}_{ij}' \boldsymbol{\beta} + \alpha_i
$$
where:
- $\alpha_i \sim \text{i.i.d. } N(0, \sigma^2_{\alpha})$ represents subject-specific random effects,
- This is a log-link GLMM with random intercepts for subjects.
The inclusion of $\alpha_i$ accounts for subject-level heterogeneity, capturing unobserved variability across individuals.
------------------------------------------------------------------------
## Estimation in Nonlinear and Generalized Linear Mixed Models
In [Linear Mixed Models](#sec-linear-mixed-models), the marginal likelihood of the observed data $\mathbf{y}$ is derived by integrating out the random effects from the hierarchical formulation:
$$
f(\mathbf{y}) = \int f(\mathbf{y} \mid \boldsymbol{\alpha}) \, f(\boldsymbol{\alpha}) \, d\boldsymbol{\alpha}
$$
For LMMs, both component distributions---
- the conditional distribution $f(\mathbf{y} \mid \boldsymbol{\alpha})$, and
- the random effects distribution $f(\boldsymbol{\alpha})$---
are typically assumed to be Gaussian with linear relationships. These assumptions imply that the marginal distribution of $\mathbf{y}$ is also Gaussian, allowing the integral to be solved analytically using properties of the multivariate normal distribution.
In contrast:
- For GLMMs, the conditional distribution $f(\mathbf{y} \mid \boldsymbol{\alpha})$ belongs to the exponential family but is not Gaussian in general.
- For NLMMs, the relationship between the mean response and the random (and fixed) effects is nonlinear, complicating the integral.
In both cases, the marginal likelihood integral:
$$
L(\boldsymbol{\beta}; \mathbf{y}) = \int f(\mathbf{y} \mid \boldsymbol{\alpha}) \, f(\boldsymbol{\alpha}) \, d\boldsymbol{\alpha}
$$
cannot be solved analytically. Consequently, estimation requires:
- [Numerical Integration](#estimation-by-numerical-integration)
- Linearization of the Model
-
------------------------------------------------------------------------
### Estimation by Numerical Integration {#estimation-by-numerical-integration}
The marginal likelihood for parameter estimation is given by:
$$
L(\boldsymbol{\beta}; \mathbf{y}) = \int f(\mathbf{y} \mid \boldsymbol{\alpha}) \, f(\boldsymbol{\alpha}) \, d\boldsymbol{\alpha}
$$
To estimate the fixed effects $\boldsymbol{\beta}$, we often maximize the log-likelihood:
$$
\ell(\boldsymbol{\beta}) = \log L(\boldsymbol{\beta}; \mathbf{y})
$$
Optimization requires the score function (gradient):
$$
\frac{\partial \ell(\boldsymbol{\beta})}{\partial \boldsymbol{\beta}}
$$
Since the integral in $L(\boldsymbol{\beta}; \mathbf{y})$ is generally intractable, we rely on numerical techniques to approximate it.
------------------------------------------------------------------------
#### Methods for Numerical Integration
1. Gaussian Quadrature
- Suitable for low-dimensional random effects ($\dim(\boldsymbol{\alpha})$ is small).
- Approximates the integral using weighted sums of function evaluations at specific points (nodes).
- Gauss-Hermite quadrature is commonly used when random effects are normally distributed.
Limitation: Computational cost grows exponentially with the dimension of $\boldsymbol{\alpha}$ (curse of dimensionality).
2. Laplace Approximation
- Approximates the integral by expanding the log-likelihood around the mode of the integrand (i.e., the most likely value of $\boldsymbol{\alpha}$).
- Provides accurate results for moderate-sized random effects and large sample sizes.
- First-order Laplace approximation is commonly used; higher-order versions improve accuracy but increase complexity.
Key Idea: Approximate the integral as:
$$
\int e^{h(\boldsymbol{\alpha})} d\boldsymbol{\alpha} \approx e^{h(\hat{\boldsymbol{\alpha}})} \sqrt{\frac{(2\pi)^q}{|\mathbf{H}|}}
$$
where:
- $\hat{\boldsymbol{\alpha}}$ is the mode of $h(\boldsymbol{\alpha})$,
- $\mathbf{H}$ is the Hessian matrix of second derivatives at $\hat{\boldsymbol{\alpha}}$,
- $q$ is the dimension of $\boldsymbol{\alpha}$.
3. Monte Carlo Integration
- Uses random sampling to approximate the integral.
- Importance Sampling improves efficiency by sampling from a distribution that better matches the integrand.
- Markov Chain Monte Carlo methods, such as Gibbs sampling or Metropolis-Hastings, are used when the posterior distribution is complex.
Advantage: Scales better with high-dimensional random effects compared to quadrature methods.
Limitation: Computationally intensive, and variance of estimates can be high without careful tuning.
------------------------------------------------------------------------
#### Choosing an Integration Method
| Method | Dimensionality | Accuracy | Computational Cost |
|:-----------------|:------------------|:-----------------|:-----------------|
| Gaussian Quadrature | Low-dimensional ($q \leq 3$) | High (with sufficient nodes) | High (exponential growth with $q$) |
| Laplace Approximation | Moderate-dimensional | Moderate to High | Moderate |
| Monte Carlo Methods | High-dimensional | Variable (depends on sample size) | High (but scalable) |
- For small random effect dimensions, quadrature methods are effective.
- For moderate dimensions, Laplace approximation offers a good balance.
- For high dimensions or complex models, Monte Carlo techniques are often the method of choice.
------------------------------------------------------------------------
### Estimation by Linearization {#sec-estimation-by-linearization-glmm}
When estimating parameters in [NLMMs](#sec-nonlinear-mixed-models) and [GLMMs](#sec-generalized-linear-mixed-models), one common and effective approach is linearization. This technique approximates the nonlinear or non-Gaussian components with linear counterparts, enabling the use of standard LMM estimation methods. Linearization not only simplifies the estimation process but also allows for leveraging well-established statistical tools and methods developed for linear models.
#### Concept of Linearization
The core idea is to create a linearized version of the response variable, known as the *working response* or *pseudo-response*, denoted as $\tilde{y}_i$. This pseudo-response is designed to approximate the original nonlinear relationship in a linear form, facilitating easier estimation of model parameters. The conditional mean of this pseudo-response is expressed as:
$$
E(\tilde{y}_i \mid \boldsymbol{\alpha}) = \mathbf{x}_i' \boldsymbol{\beta} + \mathbf{z}_i' \boldsymbol{\alpha}
$$
Here:
- $\mathbf{x}_i$ is the design matrix for fixed effects,
- $\boldsymbol{\beta}$ represents the fixed effect parameters,
- $\mathbf{z}_i$ is the design matrix for random effects,
- $\boldsymbol{\alpha}$ denotes the random effects.
In addition to the conditional mean, it is essential to estimate the conditional variance of the pseudo-response to fully characterize the linearized model:
$$
\operatorname{Var}(\tilde{y}_i \mid \boldsymbol{\alpha})
$$
This variance estimation ensures that the model accounts for the inherent variability in the data, maintaining the integrity of statistical inferences.
#### Application of Linearization
Once linearized, the model structure closely resembles that of a [linear mixed model]((#sec-linear-mixed-models)), allowing us to apply standard estimation techniques from [LMMs](#sec-linear-mixed-models). These techniques include methods such as MLE and REML, which are computationally efficient and statistically robust.
The primary difference between various linearization-based methods lies in how the linearization is performed. This often involves expanding the nonlinear function $f(\mathbf{x}, \boldsymbol{\theta}, \boldsymbol{\alpha})$ or the inverse link function $g^{-1}(\cdot)$. The goal is to approximate these complex functions with simpler linear expressions while retaining as much of the original model's characteristics as possible.
##### Taylor Series Expansion
A widely used method for linearization is the Taylor series expansion. This approach approximates the nonlinear mean function around initial estimates of the random effects. The first-order Taylor series expansion of the nonlinear function is given by:
$$
f(\mathbf{x}_{ij}, \boldsymbol{\theta}, \boldsymbol{\alpha}_i) \approx f(\mathbf{x}_{ij}, \boldsymbol{\theta}, \hat{\boldsymbol{\alpha}}_i) + \frac{\partial f}{\partial \boldsymbol{\alpha}_i} \bigg|_{\hat{\boldsymbol{\alpha}}_i} (\boldsymbol{\alpha}_i - \hat{\boldsymbol{\alpha}}_i)
$$
In this expression:
- $f(\mathbf{x}_{ij}, \boldsymbol{\theta}, \hat{\boldsymbol{\alpha}}_i)$ is the function evaluated at the initial estimates of the random effects $\hat{\boldsymbol{\alpha}}_i$,
- $\frac{\partial f}{\partial \boldsymbol{\alpha}_i} \big|_{\hat{\boldsymbol{\alpha}}_i}$ represents the gradient (or derivative) of the function with respect to the random effects, evaluated at $\hat{\boldsymbol{\alpha}}_i$,
- $(\boldsymbol{\alpha}_i - \hat{\boldsymbol{\alpha}}_i)$ captures the deviation from the initial estimates.
The initial estimates $\hat{\boldsymbol{\alpha}}_i$ are often set to zero for simplicity, especially in the early stages of model fitting. This approximation reduces the model to a linear form, making it amenable to standard LMM estimation techniques.
##### Advantages and Considerations
Linearization offers several advantages:
1. Simplified Computation: By transforming complex nonlinear relationships into linear forms, linearization reduces computational complexity.
2. Flexibility: Despite the simplification, linearized models retain the ability to capture key features of the original data structure.
3. Statistical Robustness: The use of established LMM estimation techniques ensures robust parameter estimation.
However, linearization also comes with considerations:
- Approximation Error: The accuracy of the linearized model depends on how well the linear approximation captures the original nonlinear relationship.
- Choice of Expansion Point: The selection of initial estimates $\hat{\boldsymbol{\alpha}}_i$ can influence the quality of the approximation.
- Higher-Order Terms: In cases where the first-order approximation is insufficient, higher-order Taylor series terms may be needed, increasing model complexity.
------------------------------------------------------------------------
#### Penalized Quasi-Likelihood {#penalized-quasi-likelihood}
Penalized Quasi-Likelihood (PQL) is one of the most popular linearization-based estimation methods for [GLMMs](#sec-generalized-linear-mixed-models).
The linearization is achieved through a first-order Taylor expansion of the inverse link function around current estimates of the parameters. The working response at the $k$-th iteration is given by:
$$
\tilde{y}_i^{(k)} = \hat{\eta}_i^{(k-1)} + \left(y_i - \hat{\mu}_i^{(k-1)}\right) \cdot \left.\frac{d \eta}{d \mu}\right|_{\hat{\eta}_i^{(k-1)}}
$$
Where:
- $\eta_i = g(\mu_i)$ is the linear predictor,\
- $\hat{\eta}_i^{(k-1)}$ and $\hat{\mu}_i^{(k-1)}$ are the estimates from the previous iteration $(k-1)$,\
- $g(\cdot)$ is the link function, and $\mu_i = g^{-1}(\eta_i)$.
PQL Estimation Algorithm
1. Initialization:\
Start with initial estimates of $\boldsymbol{\beta}$ and $\boldsymbol{\alpha}$ (commonly set to zeros).
2. Compute the Working Response:\
Use the formula above to compute $\tilde{y}_i^{(k)}$ based on current parameter estimates.
3. Fit a [Linear Mixed Model](#sec-linear-mixed-models):\
Apply standard LMM estimation techniques to the pseudo-response $\tilde{y}_i^{(k)}$ to update estimates of $\boldsymbol{\beta}$ and $\boldsymbol{\alpha}$.
4. Update Variance Components:\
Estimate $\operatorname{Var}(\tilde{y}_i \mid \boldsymbol{\alpha})$ based on updated parameter estimates.
5. Iteration:\
Repeat steps 2--4 until the estimates converge.
Comments on PQL:
- Advantages:
- Easy to implement using existing LMM software.
- Fast convergence for many practical datasets.
- Limitations:
- Inference is only asymptotically correct due to the linearization approximation.
- Biased estimates are common, especially:
- For binomial responses with small group sizes,
- In Bernoulli models (worst-case scenario),
- In Poisson models with small counts. [@faraway2016extending]
- Hypothesis testing and confidence intervals can be unreliable.
------------------------------------------------------------------------
#### Generalized Estimating Equations {#generalized-estimating-equations}
Generalized Estimating Equations (GEE) offer an alternative approach to parameter estimation in models with correlated data, particularly for marginal models where the focus is on population-averaged effects rather than subject-specific effects.
GEE estimates are obtained by solving estimating equations rather than maximizing a likelihood function.
Consider a marginal generalized linear model:
$$
\operatorname{logit}(E(\mathbf{y})) = \mathbf{X} \boldsymbol{\beta}
$$
Assuming a working covariance matrix $\mathbf{V}$ for the elements of $\mathbf{y}$, the estimating equation for $\boldsymbol{\beta}$ is:
$$
\mathbf{X}' \mathbf{V}^{-1} (\mathbf{y} - E(\mathbf{y})) = 0
$$
If $\mathbf{V}$ correctly specifies the covariance structure, the estimator is unbiased. In practice, we often assume a simple structure (e.g., independence) and obtain robust standard errors even when the covariance is misspecified.
------------------------------------------------------------------------
##### GEE for Repeated Measures
Let $y_{ij}$ denote the $j$-th measurement on the $i$-th subject, with:
$$
\mathbf{y}_i =
\begin{pmatrix}
y_{i1} \\
\vdots \\
y_{in_i}
\end{pmatrix},
\quad
\boldsymbol{\mu}_i =
\begin{pmatrix}
\mu_{i1} \\
\vdots \\
\mu_{in_i}
\end{pmatrix},
\quad
\mathbf{x}_{ij} =
\begin{pmatrix}
X_{ij1} \\
\vdots \\
X_{ijp}
\end{pmatrix}
$$
Define the working covariance matrix of $\mathbf{y}_i$ as:
$$
\mathbf{V}_i = \operatorname{Cov}(\mathbf{y}_i)
$$
Following [@liang1986longitudinal], the GEE for estimating $\boldsymbol{\beta}$ is:
$$
S(\boldsymbol{\beta}) = \sum_{i=1}^K \frac{\partial \boldsymbol{\mu}_i'}{\partial \boldsymbol{\beta}} \, \mathbf{V}_i^{-1} (\mathbf{y}_i - \boldsymbol{\mu}_i) = 0
$$
Where:
- $K$ is the number of subjects (or clusters),
- $\boldsymbol{\mu}_i = E(\mathbf{y}_i)$,
- $\mathbf{V}_i$ is the working covariance matrix.
------------------------------------------------------------------------
##### Working Correlation Structures
The covariance matrix $\mathbf{V}_i$ is modeled as:
$$
\mathbf{V}_i = a(\phi) \, \mathbf{B}_i^{1/2} \, \mathbf{R}(\boldsymbol{c}) \, \mathbf{B}_i^{1/2}
$$
- $a(\phi)$ is a dispersion parameter,
- $\mathbf{B}_i$ is a diagonal matrix with variance functions $V(\mu_{ij})$ on the diagonal,
- $\mathbf{R}(\boldsymbol{c})$ is the working correlation matrix, parameterized by $\boldsymbol{c}$. If $\mathbf{R}(\boldsymbol{c})$ is the true correlation matrix of $\mathbf{y}_i$, then $\mathbf{V}_i$ is the true covariance matrix.
Common Working Correlation Structures:
- Independence: $\mathbf{R} = \mathbf{I}$ (simplest, but ignores correlation).
- Exchangeable: Constant correlation between all pairs within a cluster.
- Autoregressive (AR(1)): Correlation decreases with time lag.
- Unstructured: Each pair has its own correlation parameter.
------------------------------------------------------------------------
##### Iterative Algorithm for GEE Estimation
1. Initialization:
- Compute an initial estimate of $\boldsymbol{\beta}$ using a GLM under the independence assumption ($\mathbf{R} = \mathbf{I}$).
2. Estimate the Working Correlation Matrix:
- Based on residuals from the initial fit, estimate $\mathbf{R}(\boldsymbol{c})$.
3. Update the Covariance Matrix:
- Calculate $\hat{\mathbf{V}}_i$ using the updated working correlation matrix.
4. Update $\boldsymbol{\beta}$:
$$
\boldsymbol{\beta}^{(r+1)} = \boldsymbol{\beta}^{(r)} + \left(\sum_{i=1}^K \frac{\partial \boldsymbol{\mu}_i'}{\partial \boldsymbol{\beta}} \, \hat{\mathbf{V}}_i^{-1} \, \frac{\partial \boldsymbol{\mu}_i}{\partial \boldsymbol{\beta}} \right)^{-1}
\left( \sum_{i=1}^K \frac{\partial \boldsymbol{\mu}_i'}{\partial \boldsymbol{\beta}} \, \hat{\mathbf{V}}_i^{-1} (\mathbf{y}_i - \boldsymbol{\mu}_i) \right)
$$
5. Iteration:
- Repeat steps 2--4 until convergence (i.e., when changes in $\boldsymbol{\beta}$ are negligible).
------------------------------------------------------------------------
Comments on GEE:
- Advantages:
- Provides consistent estimates of $\boldsymbol{\beta}$ even if the working correlation matrix is misspecified.
- Robust standard errors (also known as "sandwich" estimators) account for potential misspecification.
- Limitations:
- Not a likelihood-based method, so likelihood-ratio tests are not appropriate.
- Efficiency loss if the working correlation matrix is poorly specified.
- Estimation of random effects is not possible---GEE focuses on marginal (population-averaged) effects.
------------------------------------------------------------------------
### Estimation by Bayesian Hierarchical Models {#estimation-by-bayesian-hierarchical-models}
Bayesian methods provide a flexible framework for estimating parameters in [NLMMs](#sec-nonlinear-mixed-models) and [GLMMs](#sec-generalized-linear-mixed-models). Unlike frequentist approaches that rely on MLE or linearization techniques, Bayesian estimation fully incorporates prior information and naturally accounts for uncertainty in both parameter estimation and predictions.
In the Bayesian context, we are interested in the posterior distribution of the model parameters, given the observed data $\mathbf{y}$:
$$
f(\boldsymbol{\alpha}, \boldsymbol{\beta} \mid \mathbf{y}) \propto f(\mathbf{y} \mid \boldsymbol{\alpha}, \boldsymbol{\beta}) \, f(\boldsymbol{\alpha}) \, f(\boldsymbol{\beta})
$$
Where:
- $f(\mathbf{y} \mid \boldsymbol{\alpha}, \boldsymbol{\beta})$ is the likelihood of the data,
- $f(\boldsymbol{\alpha})$ is the prior distribution for the random effects,
- $f(\boldsymbol{\beta})$ is the prior distribution for the fixed effects,
- $f(\boldsymbol{\alpha}, \boldsymbol{\beta} \mid \mathbf{y})$ is the posterior distribution, which combines prior beliefs with observed data.
**Advantages of Bayesian Estimation**
- **No Need for Simplifying Approximations:** Bayesian methods do not require linearization or asymptotic approximations.
- **Full Uncertainty Quantification:** Provides credible intervals for parameters and predictive distributions for new data.
- **Flexible Modeling:** Easily accommodates complex hierarchical structures, non-standard distributions, and prior information.
**Computational Challenges**
Despite its advantages, Bayesian estimation can be computationally intensive and complex, especially for high-dimensional models. Key implementation issues include:
1. **Non-Valid Joint Distributions:**\
In some hierarchical models, specifying valid joint distributions for the data, random effects, and parameters can be challenging.
2. **Constraints from Mean-Variance Relationships:**\
The inherent relationship between the mean and variance in GLMMs, combined with random effects, imposes constraints on the marginal covariance structure.
3. **Computational Intensity:**\
Fitting Bayesian models often requires advanced numerical techniques like Markov Chain Monte Carlo, which can be slow to converge, especially for large datasets or complex models.
------------------------------------------------------------------------
#### Bayesian Estimation Methods
Bayesian estimation can proceed through two general approaches:
1. **Approximating the Objective Function (Marginal Likelihood)**
The marginal likelihood is typically intractable because it requires integrating over random effects:
$$
f(\mathbf{y} \mid \boldsymbol{\beta}) = \int f(\mathbf{y} \mid \boldsymbol{\alpha}, \boldsymbol{\beta}) \, f(\boldsymbol{\alpha}) \, d\boldsymbol{\alpha}
$$
Since this integral cannot be solved analytically, we approximate it using the following methods:
- **Laplace Approximation**
- Approximates the integral by expanding the log-likelihood around the mode of the integrand.
- Provides an efficient, asymptotically accurate approximation when the posterior is approximately Gaussian near its mode.
- **Quadrature Methods**
- **Gaussian quadrature** (e.g., Gauss-Hermite quadrature) is effective for low-dimensional random effects.
- Approximates the integral by summing weighted evaluations of the function at specific points.
- **Monte Carlo Integration**
- Uses random sampling to approximate the integral.
- **Importance sampling** improves efficiency by drawing samples from a distribution that closely resembles the target distribution.
2. **Approximating the Model (Linearization)**
Alternatively, we can approximate the model itself using **Taylor series linearization** around current estimates of the parameters. This approach simplifies the model, making Bayesian estimation computationally more feasible, though at the cost of some approximation error.
------------------------------------------------------------------------
#### Markov Chain Monte Carlo Methods
The most common approach for fully Bayesian estimation is **MCMC**, which generates samples from the posterior distribution through iterative simulation. Popular MCMC algorithms include:
- **Gibbs Sampling:**\
Efficient when full conditional distributions are available in closed form.
- **Metropolis-Hastings Algorithm:**\
More general and flexible, used when full conditionals are not easily sampled.
- **Hamiltonian Monte Carlo (HMC):**\
Implemented in packages like `Stan`, provides faster convergence for complex models by leveraging gradient information.
The posterior distribution is then approximated using the generated samples:
$$
f(\boldsymbol{\alpha}, \boldsymbol{\beta} \mid \mathbf{y}) \approx \frac{1}{N} \sum_{i=1}^N \delta(\boldsymbol{\alpha} - \boldsymbol{\alpha}^{(i)}, \boldsymbol{\beta} - \boldsymbol{\beta}^{(i)})
$$
Where $N$ is the number of MCMC samples and $\delta(\cdot)$ is the Dirac delta function.
------------------------------------------------------------------------
### Practical Implementation in R
Several R packages facilitate Bayesian estimation for GLMMs and NLMMs:
- **GLMM Estimation:**
- `MASS::glmmPQL` --- Penalized Quasi-Likelihood for GLMMs.
- `lme4::glmer` --- Frequentist estimation for GLMMs using Laplace approximation.
- `glmmTMB` --- Handles complex random effects structures efficiently.
- **NLMM Estimation:**
- `nlme::nlme` --- Nonlinear mixed-effects modeling.
- `lme4::nlmer` --- Extends `lme4` for nonlinear mixed models.
- `brms::brm` --- Bayesian estimation via `Stan`, supporting NLMMs.
- **Bayesian Estimation:**
- `MCMCglmm` --- Implements MCMC algorithms for GLMMs.
- `brms::brm` --- High-level interface for Bayesian regression models, leveraging `Stan` for efficient MCMC sampling.
------------------------------------------------------------------------
**Example: Non-Gaussian Repeated Measurements**
Consider the case of **repeated measurements**:
- **If the data are Gaussian:** Use [Linear Mixed Models](#sec-linear-mixed-models).
- **If the data are non-Gaussian:** Use [Nonlinear and Generalized Linear Mixed Models](#sec-nonlinear-and-generalized-linear-mixed-models).
## Application: Nonlinear and Generalized Linear Mixed Models
### Binomial Data: CBPP Dataset
We will use the **CBPP dataset** from the `lme4` package to demonstrate different estimation approaches for binomial mixed models.
```{r}
library(lme4)
data(cbpp, package = "lme4")
head(cbpp)
```
The data contain information about contagious bovine pleuropneumonia (CBPP) cases across different herds and periods.
1. **Penalized Quasi-Likelihood**
**Pros:**
- Linearizes the response to create a pseudo-response, similar to linear mixed models.
- Computationally efficient.
**Cons:**
- Biased for binary or Poisson data with small counts.
- Random effects must be interpreted on the link scale.
- AIC/BIC values are not interpretable since PQL does not rely on full likelihood.
```{r}
library(MASS)
pql_cbpp <- glmmPQL(
cbind(incidence, size - incidence) ~ period,
random = ~ 1 | herd,
data = cbpp,
family = binomial(link = "logit"),
verbose = FALSE
)
summary(pql_cbpp)
```
Interpretation
```{r}
exp(0.556)
```
The above result shows how herd-specific odds vary, accounting for random effects.
The fixed effects are interpreted similarly to logistic regression. For example, with the logit link:
- The **log odds** of having a case in **period 2** are **-1.016** less than in **period 1** (baseline).
```{r}
summary(pql_cbpp)$tTable
```
2. **Numerical Integration with `glmer`**
**Pros:**
- More accurate estimation since the method directly integrates over random effects.
**Cons:**
- Computationally more expensive, especially with high-dimensional random effects.
- May struggle with convergence for complex models.
```{r}
numint_cbpp <- glmer(
cbind(incidence, size - incidence) ~ period + (1 | herd),
data = cbpp,
family = binomial(link = "logit")
)
summary(numint_cbpp)
```
**Comparing PQL and Numerical Integration**
For small datasets, the difference between PQL and numerical integration may be minimal.
```{r}
library(rbenchmark)
benchmark(
"PQL (MASS)" = {
glmmPQL(
cbind(incidence, size - incidence) ~ period,
random = ~ 1 | herd,
data = cbpp,
family = binomial(link = "logit"),
verbose = FALSE
)
},
"Numerical Integration (lme4)" = {
glmer(
cbind(incidence, size - incidence) ~ period + (1 | herd),
data = cbpp,
family = binomial(link = "logit")
)
},
replications = 50,
columns = c("test", "replications", "elapsed", "relative"),
order = "relative"
)
```
**Improving Accuracy with Gauss-Hermite Quadrature**
Setting `nAGQ > 1` increases the accuracy of the likelihood approximation:
```{r}
numint_cbpp_GH <- glmer(
cbind(incidence, size - incidence) ~ period + (1 | herd),
data = cbpp,
family = binomial(link = "logit"),
nAGQ = 20
)
summary(numint_cbpp_GH)$coefficients[, 1] -
summary(numint_cbpp)$coefficients[, 1]
```
3. **Bayesian Approach with `MCMCglmm`**
**Pros:**
- Incorporates prior information and handles complex models with intractable likelihoods.
- Provides full posterior distributions for parameters.
**Cons:**
- Computationally intensive, especially with large datasets or complex hierarchical structures.
```{r}
library(MCMCglmm)
Bayes_cbpp <- MCMCglmm(
cbind(incidence, size - incidence) ~ period,
random = ~ herd,
data = cbpp,
family = "multinomial2",
verbose = FALSE
)
summary(Bayes_cbpp)
```
- `MCMCglmm` fits a residual variance component (useful with dispersion issues).
Variance Component Analysis
```{r}
# explains less variability
apply(Bayes_cbpp$VCV, 2, sd)
```
Posterior Summaries
```{r}
summary(Bayes_cbpp)$solutions
```
MCMC Diagnostics
```{r}
library(lattice)
xyplot(as.mcmc(Bayes_cbpp$Sol), layout = c(2, 2))
```
There is no trend (i.e., well-mixed).
```{r}
xyplot(as.mcmc(Bayes_cbpp$VCV), layout = c(2, 1))
```
For the herd variable, many of the values are 0, which suggests a problem. To address the instability in the herd effect sampling, we can either:
- Modify prior distributions,
- Increase the number of iterations
```{r}
Bayes_cbpp2 <- MCMCglmm(
cbind(incidence, size - incidence) ~ period,
random = ~ herd,
data = cbpp,
family = "multinomial2",
nitt = 20000,
burnin = 10000,
prior = list(G = list(list(V = 1, nu = 0.1))),
verbose = FALSE
)
xyplot(as.mcmc(Bayes_cbpp2$VCV), layout = c(2, 1))
```
To change the shape of priors, in `MCMCglmm` use:
- `V` controls for the location of the distribution (default = 1)
- `nu` controls for the concentration around V (default = 0)
### Count Data: Owl Dataset
We'll now model count data using the **Owl dataset**
```{r}
library(glmmTMB)
library(dplyr)
data(Owls, package = "glmmTMB")
Owls <- Owls %>% rename(Ncalls = SiblingNegotiation)
```
1. **Poisson GLMM**
Modeling call counts with a Poisson distribution:
In a typical Poisson model, the Poisson mean $\lambda$ is modeled as: $$
\log(\lambda) = x' \beta
$$ However, if the response variable represents a rate (e.g., counts per **BroodSize**), we can model it as: $$
\log\left(\frac{\lambda}{b}\right) = x' \beta
$$ This is equivalent to: $$
\log(\lambda) = \log(b) + x' \beta
$$ where $b$ represents **BroodSize**. In this formulation, we "offset" the mean by including the logarithm of $b$ as an offset term in the model. This adjustment accounts for the varying exposure or denominator in rate-based responses.
```{r}
owls_glmer <- glmer(
Ncalls ~ offset(log(BroodSize)) + FoodTreatment * SexParent + (1 | Nest),
family = poisson,
data = Owls
)
summary(owls_glmer)
```
- Nest explains a relatively large proportion of the variability (its standard deviation is larger than some coefficients).
- The model fit isn't great (deviance of 5202 on 594 df).
2. **Negative Binomial Model**
Addressing overdispersion using the negative binomial distribution:
```{r}
owls_glmerNB <- glmer.nb(
Ncalls ~ offset(log(BroodSize)) + FoodTreatment * SexParent + (1 | Nest),
data = Owls
)
summary(owls_glmerNB)
```
There is an improvement using negative binomial considering over-dispersion
```{r}
hist(Owls$Ncalls,breaks=30)
```
3. **Zero-Inflated Model**
Handling excess zeros with a zero-inflated Poisson model:
```{r}
library(glmmTMB)
owls_glmm <-
glmmTMB(
Ncalls ~ FoodTreatment * SexParent + offset(log(BroodSize)) +
(1 | Nest),
ziformula = ~ 0,
family = nbinom2(link = "log"),
data = Owls
)
owls_glmm_zi <-
glmmTMB(
Ncalls ~ FoodTreatment * SexParent + offset(log(BroodSize)) +
(1 | Nest),
ziformula = ~ 1,
family = nbinom2(link = "log"),
data = Owls
)
# Scale Arrival time to use as a covariate for zero-inflation parameter
Owls$ArrivalTime <- scale(Owls$ArrivalTime)
owls_glmm_zi_cov <- glmmTMB(
Ncalls ~ FoodTreatment * SexParent +
offset(log(BroodSize)) +
(1 | Nest),
ziformula = ~ ArrivalTime,
family = nbinom2(link = "log"),
data = Owls
)
as.matrix(anova(owls_glmm, owls_glmm_zi))
as.matrix(anova(owls_glmm_zi, owls_glmm_zi_cov))
summary(owls_glmm_zi_cov)
```
`glmmTMB` can handle ZIP GLMMs since it adds automatic differentiation to existing estimation strategies.
We can see ZIP GLMM with an arrival time covariate on the zero is best.
- Arrival time has a positive effect on observing a nonzero number of calls
- Interactions are non significant, the food treatment is significant (fewer calls after eating)
- Nest variability is large in magnitude (without this, the parameter estimates change)
### Binomial Example: Gotway Hessian Fly Data
We will analyze the **Gotway Hessian Fly** dataset from the `agridat` package to model binomial outcomes using both frequentist and Bayesian approaches.
#### Data Visualization
```{r}
library(agridat)
library(ggplot2)
library(lme4)
library(spaMM)
data(gotway.hessianfly)
dat <- gotway.hessianfly
dat$prop <- dat$y / dat$n # Proportion of successes