15
Trend Test for Binary Data with Survivability and Clustering Adjustments
Simulated data sets were also generated with two positive levels of sibling correlation: 0.24 and
0.48, the highest level comparable to the highest observed correlation in Table 2. Since
clustering decreases the effective sample size,
4; 14
ignoring clustering when present results in
inflated Type I error. Without being able to adjust for clustering, the Type I error rates for the
Poly-3 adjustment increase with correlation values, especially when litters are likely to contain
tumor-bearing animals, such as with moderate-to-high tumor rates (leukemia/lymphoma tumor
rate) and/or low mortality rates (simulations with theta values of 0 or 1). Inflated error rates are
reduced using the clusterPoly-3 method (Figure 2; Table A-4).
To test the effect of the number of siblings used per litter on Type I error, additional simulation
results for dose groups of 10 litters with five siblings per litter were generated for sibling
correlation 0.48 between siblings and treatment-induced lethality of 4. This setting corresponds
to the highest simulated sibling correlation discussed in this report and the highest lethality level
in the original paper.
1
As shown in Table A-5, using the larger number of five siblings does lead
to higher Type I error rates. The highest error rate is for the leukemia/lymphoma tumors (12.9%
for litters with five siblings compared with 7.6% for litters with two siblings).
As sibling correlation increases to 0.24 and 0.48, the power for the clusterPoly-3 method
decreases with respect to the Poly-3 method in the same cases that showed Type I error inflation:
with higher tumor rates and high sibling correlation (Figure 3; Table A-4). Table A-6 shows the
effective dose group sizes estimated using just the Poly-3 adjustment as well as using the
clusterPoly-3 adjustment accounting for the clustering. For each tumor rate and lethality level,
the effective sample sizes decrease with correlation, as predicted by equation [7] in the text.
For the simulations in this report, the strongest factor in determining power is the background
tumor rate. Treatment effect size was modeled for the simulations as in Bailer and Portier
1
and
Bieler and Williams,
2
as increasing linearly with dose to a twofold increase at the highest dose.
For tumors with a high background rate like the leukemia/lymphoma tumors, a substantial tumor
rate of nearly 40% for the highest dose group results in good power. For very low background
rates such as with as lung and pancreatic islet tumors, the twofold tumor rate is only about 2%
for the high dose resulting in lower power. In NTP studies, to counter low power, the evaluation
of test articles includes pairwise testing as well as trend testing for many endpoints. For some
test articles, NTP studies use an increased sample size. Table A-5 includes a protocol with dose
group size of 30 litters with three siblings, a protocol also used in NTP studies.
20
This protocol
shows the highest power for all endpoints. However, these simulation results show that for any
reasonable sample size, detection of a twofold increase in the tumor rate is an unrealistic goal,
regardless of the distribution of animals or the statistical method used.
The results from applying both Poly-3 and clusterPoly-3 methods to real observed data on
25 nonneoplastic lesions in a recent NTP perinatal chronic study
11
confirm the results from the
simulations. In the presence of positive correlation, the clusterPoly-3 p values tend higher than
those predicted by the Poly-3 method. The normed distance between the p values for the two
methods increases as estimated correlation increases (Figure 4).
As stated in the Bailer and Portier study,
1
the Poly-3 (and therefore the clusterPoly-3) test can be
modified by allowing the exponent of the score function to take on values other
than 3. Results can be improved by estimating the k-parameter from the probability distribution