The Need for ProUCL Software
EPA guidance documents (e.g., EPA [1989a, 1989b, 1992a, 1992b, 1994, 1996, 2000, 2002a, 2002b,
2002c, 2006a, 2006b, 2009a, and 2009b]) describe statistical methods including: DQOs-based sample
size determination procedures, methods to compute decision statistics: UCL95, UPL, and UTLs,
parametric and nonparametric hypotheses testing approaches, Oneway ANOVA, OLS regression, and
trend determination approaches. Specifically, EPA guidance documents (2000, 2002c, 2006a, 2006b)
describe DQOs-based parametric and nonparametric minimum sample size determination procedures
needed: to compute decision statistics (e.g., UCL95); to perform site versus background comparisons
(e.g., t-test, proportion test, WMW test); and to determine the number of discrete items (e.g., drums filled
with hazardous material) that need to be sampled to meet the DQOs (e.g., specified proportion, p
0
of
defective items, allowable error margin in an estimate of mean). Statistical methods are used to compute
test statistics (e.g., S-W test, t-test, WMW test, T-S trend statistic) and decision statistics (e.g., 95% UCL,
95% UPL, UTL95-95) needed to address statistical issues associated with CERCLA and RCRA site
projects. For example, exposure and risk management and cleanup decisions in support of EPA projects
are often made based upon the mean concentrations of the contaminants/constituents of potential concern
(COPCs). Site-specific BTVs are used in site versus background evaluation studies. A UCL95 is used to
estimate the EPC terms (EPA 1992a, 2002a); and upper limits such as upper percentiles, UPLs, or UTLs
are used to estimate BTVs or not-to-exceed values (EPA 1992b, 2002b, and 2009). The estimated BTVs
are used to address several objectives: to identify the COPCs; to identify the site areas of concern
(AOCs); to perform intra-well comparisons to identify MWs not meeting specified standards; and to
compare onsite constituent concentrations with site-specific background level constituent concentrations.
Oneway ANOVA is used to perform inter-well comparisons and OLS regression and trend tests are often
used to determine potential trends present in constituent concentrations identified in GW monitoring wells
(MWs). Most of the methods described in this paragraph are available in the ProUCL 5.1 (ProUCL 5.0)
software package.
It is noted that not much guidance is available in the guidance documents cited above to compute rigorous
UCLs, UPLs, and UTLs for moderately to highly skewed uncensored and left-censored data sets
containing NDs with multiple DLs, a common occurrence in environmental data sets. Several parametric
and nonparametric methods are available in the statistical literature (Singh, Singh, and Engelhardt 1997;
Singh, Singh, and Iaci 2002; Krishnamoorthy et al. 2008; Singh, Maichle, and Lee, 2006) to compute
UCLs and other upper limits which adjust for data skewness. During the years, as new methods became
available to address statistical issues related to environmental projects, those methods were incorporated
in ProUCL software so that environmental scientists and decision makers can make more accurate and
informed decisions. Until 2006, not much guidance was provided on how to compute UCL95s of the
mean and other upper limits (e.g., UPLs and UTLs) based upon data sets containing NDs with multiple
DLs. For data sets with NDs, Singh, Maichle, and Lee (2006) conducted an extensive simulation study to
compare the performances of the various estimation methods (in terms of bias in the mean estimate) and
UCL computation methods (in terms of coverage provided by a UCL). They demonstrated that the
nonparametric KM method performs well in terms of bias in estimates of mean. They also concluded that
UCLs computed using the Student's t-statistic and percentile bootstrap method using the KM estimates do
not provide the desired coverage to the population mean of skewed data sets. They demonstrated that
depending upon sample size and data skewness, UCLs computed using KM estimates, the BCA bootstrap
method (mildly skewed data sets), the bootstrap-t method, and the Chebyshev inequality (moderately to
highly skewed data sets) provide better coverage (closer to the specified 95% coverage) to the population
mean than other UCL computation methods. Based upon their findings, during 2006-2007, several UCL
and other upper limits computation methods based upon KM and ROS estimates were incorporated in the
ProUCL 4.0 software. It is noted that since the inclusion of the KM method in ProUCL 4.0 (2007), the