Benchmark Dose Software (BMDS)

# II. Benchmark Dose (BMD) Methodology

##### Navigation

**Introduction**

The first part of this training material discusses the computation of BMDs, their lower confidence limits (BMDLs), data requirements, dose-response analysis, and EPA reporting requirements that are specific to the use of BMDs or BMCs. The second part of this training will provide an overview of the Agency's Benchmark Dose Software (BMDS) which is freely available on the Internet and can be downloaded from this link: BMD Software.**The Point of Departure (POD) Approach**- Definition of a point of departure (POD); and
- Extrapolation from the POD to low environmentally-relevant exposure levels.
**BMD/BMDL Terminology**- BMD is used generically to refer to the Benchmark Dose approach.
- In the more specific cases, BMD and BMC refer to the central estimates, for example the EDx or ECx for dichotomous endpoints (with x referring to some level of response above background, e.g., 5% or 10%).
- BMDL or BMCL refers to the corresponding lower limit of a one-sided 95% confidence interval on the BMD or BMC, respectively. This is consistent with the terminology introduced by Crump (1995) and with that used in EPA's BMDS.
- This terminology is a change, however, from that used in previous Agency documents (e.g., EPA, 1995), but has been adopted because it more clearly conveys the fact that the BMDL refers to the lower confidence limit on the dose that would result in the required response.
**BMD/BMDL vs. NOAEL/LOAEL**-
**Determination of Appropriate Studies and Endpoints on Which to Base BMD Calculations** **Three Types of Endpoint Data; dichotomous, continuous and categorical****Are the Data Appropriate for a BMD Analysis?****Can the Data be Excluded from BMD Analysis for other Reason?****Three types of endpoint data; dichotomous, continuous and categorical.**- Dichotomous (Quantal) - A dichotomous response may be reported as either the presence or absence of an effect;
- Continuous - A continuous response may be reported as an actual measurement, or as a contrast (absolute change from control or relative change from control). In the case of continuous data, when individual data are not available, the number of subjects, mean of the response variable, and a measure of response variability (e.g., standard deviation (SD), standard error (SE), or variance) are needed for each group; and
- Categorical - For categorical data, the responses in the treatment groups are often characterized in terms of the severity of effect (e.g., mild, moderate, or severe histological change).
**Are the data appropriate for a BMD analysis?**- There must be at least a statistically or biologically significant dose-related trend in the selected endpoint; and
- The data set should contain information relevant to dose-response for modeling. A determination of the amount of information about the dose-response that is available need not be quantitative or technical. For example: A data set in which all non-control doses have essentially the same response level provides limited information about the dose-response, since the complete range of response from background to maximum must occur somewhere below the lowest dose: the BMD may be just below the first dose, or orders of magnitude lower. When this situation arises in quantal data, especially if the maximum response is less than 100%, it is tempting to use a model like the Weibull with no restrictions on the power parameter, because such models reach a plateau of less than 100%. This situation can result in seriously distorted BMDs, because the model predictions jump rapidly from background levels to the maximum level and. In principle, other models could be found that force the BMD to be anywhere between that extreme and the lowest administered dose. Thus the BMD computed here depends solely on the model selected, and goodness of fit provides no help in selecting among the possibilities. (See the quantal data examples in the Technical Guidance (EPA, 2000) Appendix for an example of this situation):
**Can the data be excluded from BMD analysis for other reasons?****Selection of the Benchmark Response (BMR) Value****For Quantal Data****For Continuous Data****For Quantal Data:**- An excess risk of 10% is the default BMR, since the 10% response is at or near the limit of sensitivity in most cancer bioassays and in some noncancer bioassays as well. If a study has greater than usual sensitivity, then a lower BMR can be used, although the ED10 and LED10 should always be presented for comparison purposes.
- For example, reproductive and developmental studies having nested study designs often have greater sensitivity, and for such studies a BMR of 5% has typically been used; and
- Similarly, epidemiology studies often have greater sensitivities and a BMR of 1% has typically been used for quantal human data.
**For Continuous Data:**- If there is a minimal level of change in the endpoint that is generally considered to be biologically significant (for example, a change in average adult body weight of 10%, or the doubling of average level for some liver enzyme), then that amount of change can be used to define the BMR. (The BMD [and BMDL] corresponding to a change in the mean response equal to one control standard deviation from the control mean should also be presented for comparison purposes.); and
- If individual data are available and a decision can be made about which individual levels can be reasonably considered adverse (perhaps based on a quantile of the control distribution, for example), then the data can be "dichotomized" based on that cutoff value, and the BMR can be set as above for quantal data. (The BMD [and BMDL] corresponding to a change in the mean response equal to one control standard deviation from the control mean should also be presented for comparison purposes.)
- In the absence of any other idea of what level of response to consider adverse, a change in the mean equal to one control standard deviation from the control mean can be used; the control standard deviation can be computed including historical control data, but the control mean must be from data concurrent with the treatments being considered (Crump, 1995). Crump (1995) found that this gives an excess risk of approximately 10% for the proportion of individuals below the 2nd percentile or above the 98th percentile of controls for normally distributed effects.
**Choice of the Model to Use in Computing the BMD**

The goal of the mathematical modeling in BMD computation is to fit a model to dose-response data that describes the data set, especially at the lower end of the observable dose-response range. In practice, this involves:**Selecting a Family or Families of Models****Fitting the Models****Assessing How Well the Model Describes the Data****Global Goodness of Fit Measures****Scaled Residuals at Each Dose Level****Graphical Displays****Acceptable Adjustments to the Data****Comparing Models****Selecting a Family or Families of Models**

The initial selection of a group of models to fit to the data is governed by the nature of the measurement that represents the endpoint of interest and the experimental design used to generate the data. In addition, certain constraints on the models or their parameter values sometimes need to be observed, and may influence model selection. Finally, it may be desirable to model multiple endpoints, at the same time. The diversity of possible endpoints and shapes of their dose-responses for different agents precludes specifying a small set of models to use for BMD computation. This will inevitably lead to the need for judgement and occasional ambiguity when selecting the final model and BMDL for dose-response assessment. It is hoped that, as experience using benchmark dose methodology in dose-response assessment accumulates, it will be possible to narrow the number of acceptable models.**Type of Endpoint****ExperimentalDesign****Constraints****Covariates****Fitting the Models****Nonlinear Least Squares Method****Maximum Likelihood Method****Generalized Estimating Equations (GEE)****Assessing How Well the Model Describes the Data****Global Goodness of Fit Measures****Scaled Residuals at Each Dose Level****Graphical Displays****Acceptable Adjustments to the Data****Comparing Models****Within a Family of Models; the Akaike Information Criterion (AIC)****Other Considerations****Using Confidence Limits to Get a BMDL****Reporting Requirements from the BMD/BMDL Calculations****Study or Studies Selected for BMD Calculation(s)**- Rationale for study selection
- Rationale for endpoints (effects)
- List dose response data used
**Dose-Response Model(s) Chosen for each Case**- Rationale
- Estimation procedure (e.g., maximum likelihood, least squares, generalized estimating equations)
- Estimates of model parameters with standard errors
- Goodness-of fit test statistics
- Standardized residuals (observed minus predicted response/standard deviation)
**Choice of BMR for Each Case**- Rationale
- Procedure used if for continuous data
**Computation of the BMD**- List the BMD Value.
**Calculation of the BMDL for Each Case**- Confidence limit procedure (e.g., likelihood profile, delta method, bootstrap)
- List BMDL Value(s)
**Graphics for Each Case**- Plot of data points with error (standard deviation) bars
- Plot of fitted dose-response
- Plot of confidence limits for the fitted curve (optional; if included, the narative should describe the methods used to compute them.)
- Identify BMD and BMDL
**BMDs and BMDLs for Default BMRs**- For dichotomous data, the BMD and BMDL for an extra risk of 0.10
- For continuous data, the BMD and BMDL corresponding to a change in the mean response equal to one control standard deviation from the control mean.
**Decision Tree****Select the Appropriate BMR**based on the type of data (i.e., quantal vs. continuous), sensitivity of study design, toxicity endpoint, and judgements about the adversity of the endpoint if continuous. (See Selection of the Benchmark Response (BMR) Value)**Model the Dose-Response Data**, using appropriate model structures for the type of data (i.e., quantal vs. continuous, depending on how the BMR is defined) and study design (e.g., nested). For modeling cancer bioassay data, a specific default algorithm is generally used except for case-specific situations in which an alternate model may be superior (e.g. a time-to-tumor model, a biologically-based model). For other types of experimental animal data, curve-fitting can be attempted with any appropriate models. Human data are modeled in a case-specific way which may need to account for covariates, competing causes of mortality, etc. (See Selecting a Family or Families of Models)**Assess the Fit of the Models.**Retain models that are not rejected using a p-value of 0.1. Examine the residuals and plot the data and models; check that the models adequately describe the data, especially in the region of the BMR. (Sometimes it may be necessary to transform the data in some way or to drop the highest exposure group(s) (e.g., if the behavior at high exposures can be attributed to early mortality or enzyme saturation effects) and repeat the modeling in order to get a good fit.) (See Assessing How Well the Model Describes the Data)**Calculate 95% Lower Confidence Limits**on the Candidate BMDs (i.e., BMDLs) using the models which adequately fit the data (See Confidence Limits for the BMD (BMDL))**Select from among the models which adequately fit the data.**If the BMDL estimates from these remaining models are within a factor of 3 they are considered indistinguishable, and the model with the lowest AIC can be selected to provide the BMDL. If the BMDL estimates are not within a factor of 3, some model dependence is assumed, and the model with the lowest BMDL estimate should be selected unless it appears to be an outlier, in which case further analysis may be appropriate. (See Comparing Models)**Document the BMD Analysis**as outlined in reporting requirements. (See Reporting Requirements)

The US EPA conducts risk assessments for an array of health effects that may result from exposure to environmental agents, and that require an analysis of the relationship between exposure and health-related outcomes. The dose-response assessment is essentially a two-step process:

The Benchmark Dose (BMD) approach provides a more quantitative alternative to the first step in the dose-response assessment than the current NOAEL/LOAEL process for noncancer health effects, and is similar to that for determining the POD proposed for cancer endpoints (EPA, 1996). As the Agency moves toward harmonization of approaches for cancer and noncancer risk assessment, the dichotomy between cancer and noncancer health effects is being replaced by consideration of mode of action and whether the effects of concern are likely to be linear or nonlinear at low doses. The purpose of this training material is to provide guidance for the Agency and the outside community on the application of the BMD approach in determining the POD for all types of health effects data, whether a linear or nonlinear low dose extrapolation is used.

The following convention for terminology has been adopted in this training material:

As indicated above, the BMD approach is an alternative to the NOAEL/LOAEL approach that has been used for many years in dose-response assessment. The development of this approach has been pursued because of recognized limitations in the NOAEL/LOAEL approach. However, it is likely that there will continue to be endpoints that are not amenable to modeling and for which a NOAEL/LOAEL approach must be used. In some cases, there may be a combination of BMDs and NOAELs to be considered in the assessment of a particular agent, and the most appropriate value to use for dose-response assessment must be made by the risk assessor on the basis of scientific judgment and the modeling results.

Following the hazard characterization and selection of appropriate endpoints to use for the dose-response assessment, the studies appropriate for modeling and BMD analysis can be evaluated. All studies that show a graded monotonic response with dose likely will be useful for BMD analysis, and the minimum data set for calculating a BMD should at least show a significant dose-related trend in the selected endpoint(s). It is preferable to have studies with one or more doses near the level of the BMR to give a better estimate of the BMD, and thus, a shorter confidence interval. Studies in which all the dose levels show changes compared with control values (i.e., there is no NOAEL) are readily useable in BMD analyses, unless the lowest response level is much higher than the BMR.

Once the critical endpoints have been selected, data sets are examined for the appropriateness of a BMD analysis. The following constraints on data sets to use for BMD calculations should be applied:

In general, an endpoint that has been judged by the risk assessor to be appropriate and relevant to the exposure should be modeled if its LOAEL is less than 10-fold above the lowest LOAEL for the entire set of endpoints being considered. This will help ensure that no endpoints with the potential to have the lowest BMDL are excluded from the analysis on the basis of the value of the LOAEL or NOAEL. Selected endpoints from different studies that are likely to be used in the dose-response assessment should all be modeled, especially if different uncertainty factors may be used for different studies and endpoints. As indicated above, the selection of the most appropriate BMDs and/or NOAELs (if some endpoints cannot be modeled) to use for determination of the POD must be made by the risk assessor using scientific judgement and principles of risk assessment, as well as the results of the modeling process.

The calculation of a BMD is directly determined by the selection of the BMR, which is defined differently for quantal and continuous data.

The kind of measurement variable that represents the endpoint of interest is an important consideration in selecting mathematical models. Commonly, such variables are either continuous, like liver weight or the activity of a given liver enzyme, or discrete, commonly dichotomous, like the presence or absence of abnormal liver status. However, other types are common in biological data; for example: ordered categorical, like a histology score that ranges from 1-normal to 5-extremely abnormal; counts, such as counts of deaths or the numbers of cases of illness per thousand person-years of exposure to a given exposure condition; waiting time, such as the time it takes for an illness to appear after exposure, or age at death, or multiple endpoints (such as survival, weight, and malformations in a developmental toxicity study) considered jointly. In this tutorial, we will focus on dichotomous and continuous variables.

**Dichotomous Variables**. Data on dichotomous variables are commonly presented as a fraction or percent of individuals that exhibit the given condition at a given dose or exposure level. For such endpoints, normally we select probability density models like logistic, probit, Weibull, and so forth, whose predictions lie between zero and one for any possible dose, including zero.

**Continuous Variables**. Data for continuous variables are often presented as means and standard
deviations or standard errors, but may also be presented as a percent of control or some other standard. From a modeling standpoint, the most desirable form for such data is by individual. Unlike the usual situation for dichotomous variables, summarization of continuous variables results in a loss of information about the distribution of those variables.

The preferred approach to expressing the BMR will determine the approach to modeling continuous data. Two broad categories of approach have been proposed: 1) to express the BMR as a particular change in the mean response, possibly as a fraction of the control mean, a fraction of the range of the response (when there is a clear maximum response), a fraction of the standard deviation of the measurement from untreated individuals, or a level of the response that expert opinion holds is adverse; or 2) to decide on a level of the outcome to consider adverse, and treat the proportion of individuals with the adverse outcome much as one would a dichotomous variable.

Typical models to use in the first situation include linear and polynomial models, and power models or other nonlinear models such as Hill models. In the second situation, one approach is to classify each individual as affected or not, and model the resulting variable as dichotomous.

An alternative is to use a so-called "hybrid" approach, such as that described by Gaylor and Slikker (1990), Kodell et al. (1995), and Crump (1995), which fits continuous models to continuous data, and, presuming a distribution of the data, calculates a BMD in terms of the fraction affected. Refer to the 2000 - Development of Draft Benchmark Dose Technical Guidance Document for more on this approach.

The aspects of experimental design that bear on model selection include the total number of dose groups used and possible clustering of experimental subjects. The number of dose groups has a bearing on the number of parameters that can be estimated: the number of parameters that affect the overall shape of the dose-response curve generally cannot exceed the number of dose groups.

Clustering of experimental subjects is another design issue that can impact model choice. The most common situation in which clustering occurs is in developmental toxicity experiments, in which the agent is applied to the mother, and individual offspring are examined for adverse effects. Another example is for designs in which individuals yield multiple observations (repeated measures). This can happen, for example, when each subject receives both treatment and control (common in studies with human subjects), or each subject is observed multiple times after treatment (e.g., neurotoxicity studies). The issue in all of these examples is that individual observations cannot be taken as independent of each other. Most methods used for fitting models rely heavily on the assumption that the data are independent, and special fitting methods need to be used for data sets that exhibit more complicated patterns of dependence.

An obvious constraint on models for dichotomous data has already been discussed: probabilities are constrained to be positive numbers no greater than one. However, biological reality may impose other constraints on models. For example, most biological quantities are constrained to be positive, so models should be selected so that their predicted values, at least in the region of application, conform to that constraint. In models in which dose is raised to a power which is a parameter to be estimated (such as a Weibull model), if that parameter is allowed to be less than 1.0, the slope of the dose-response curve becomes infinite at a dose of zero. This often results in numerical problems in calculating the confidence interval. This is an undesirable situation, and the default is to constrain these parameters to be at least 1.0.

In quantal models, often a background parameter quantifies the probability that the outcome being modeled can occur in the absence of exposure. It may be tempting to reduce the number of parameters to be estimated by fixing the value of the background parameter to be zero. However, only when it is clear that an outcome is impossible in the absence of the exposure is it permissible to fix the value of the background to zero. It is preferred that a so-called "threshold" term not be included in the models used for BMD/C analysis because, while it is not an estimate of a biological threshold, it is easily confused with such because of confusing terminology, and because most data sets can be fit adequately without this parameter and the associated loss of a degree of freedom. The software currently distributed by EPA does not currently include this parameter. However, occasionally, it may happen that the increase in a response is so precipitous that including a threshold parameter facilitates the dose-response modeling, and in such cases it is acceptable to include the parameter.

It is sometimes desirable to include covariates on individuals when fitting dose-response models. For example, litter size has often been included as a covariate in modeling laboratory animal data in developmental toxicity studies. Another example is in modeling epidemiology data, when certain covariates (e.g., age, parity) are included that are expected to affect the outcome and might be correlated with exposure. In continuous models, if the covariate has an effect on the response, including it in a model may improve the precision of the overall estimate by accounting for variation that would otherwise end up in the residual variance. In any kind of model, any variable that is correlated (non-causally) with dose, and which affects outcome, would need to be included as a covariate.

The goal of the fitting process is to find values for all the model parameters so that the resulting fitted model describes those data as well as possible; this is termed "parameter estimation" In practice, this happens when the dose-group means predicted by the model come as close as possible to the data means. One way to achieve this is to write down a function (the objective function) of all the parameters and all the data, with the property that the parameter values that correspond either to an overall minimum (or, equivalently, an overall maximum) of the function, or that result in function values of zero, give the desired model predictions.

The actual fitting process is carried out iteratively, and starts with an initial guess for the parameter values. This guess is iteratively updated to produce a sequence of estimates that (usually) converge. Many models will converge to the right estimates for most data sets from just about any reasonable set of initial parameter values; however, some models, and some data sets, may require multiple guesses at initial values before the model converges. It also happens occasionally that the fitting procedure will converge to different estimates from different initial guesses. Only one of these sets of estimates will be "best". It is always good practice when fitting nonlinear models to try different initial values, just in case.

There are a few common ways to construct objective functions: the methods of nonlinear least squares, maximum likelihood, and generalized estimating equations (GEE). The choice of objective function is determined in large part by the nature of the variability of the data around the fitted model.

The method of nonlinear least squares, where the objective function is the sum of the squared differences between the observed data values and the model-predicted values, is a common method for continuous variables when observations can be taken as independent. A basic assumption of this method is that the variance of individual observations around the dose-group means is a constant across doses. When this assumption is violated (commonly, when the variance of a continuous variable changes as a function of the mean, often proportional to the square of the mean, giving a constant coefficient of variation), a modification of the method may be used in which each term in the sum of squares is weighted by the reciprocal of an estimate of the variance at the corresponding dose. This method is especially appropriate when the data to be fitted can be supposed to be at least approximately normally distributed.

Maximum likelihood is a general way of deriving an objective function when a reasonable supposition about the distribution of the data can be made. Because estimates derived by maximum likelihood methods have good statistical properties, such as asymptotic normality, maximum likelihood is often a preferred form of estimation when that assumption is reasonably close to the truth. An example of such a situation is the case of individual independently treated animals (e.g., not clustered in litters) scored for a dichotomous response. Here it is reasonable to suppose that the number of responding animals follows a binomial distribution with the probability of response expressed as a function of dose. Continuous variables, especially means of several observations, are often normal (gaussian) or log-normal. When variables are normally distributed with a constant variance, minimizing the sum of squares is equivalent to maximizing the likelihood, which explains in part why least squares methods are often used for continuous variables. In developmental toxicity data, the distribution of the number of animals with an adverse outcome is often taken to be approximately beta-binomial. This particular likelihood is used to accommodate for the lack of independence among littermates.

A third group of approaches to estimating parameters are the related quasi-likelihood method (McCullagh and Nelder, 1989) and the method of GEE (see Zeger and Liang, 1986), which require only that the mean, variance, and correlation structure of the data be specified. GEE methods are similar to maximum likelihood estimation procedures in that they require an iterative solution, provide estimates of standard errors and correlations of the parameter estimates, and estimates are asymptotically normal. Their use so far has primarily been to handle forms of lack of independence, as in litter data, and would be useful in any of a number of kinds of repeated measures designs, such as occur in clinical studies and repeated neurobehavioral testing.

An important criterion is that the selected model should describe the data, especially in the region of the BMR. Most fitting methods will provide a global goodness-of-fit measure, usually providing a P-value. These measures quantify the degree to which the dose-group means that are predicted by the model differ from the actual dose-group means, relative to how much variation of the dose-group means one might expect. Small P-values indicate that it would be unlikely to achieve a value of the goodness-of-fit statistic at least this extreme if the data were actually sampled from the model, and, consequently, the model is a poor fit to the data. Since it is particularly important that the data be adequately modeled for BMD calculation, it is recommended that P=0.1 be used to compute the critical value for goodness of fit, instead of the more conventional values of 0.05 or 0.01. P-values cannot be compared from one model to another since they assume the different models are correct; they can only identify those models that are consistent with the experimental results. When there are other covariates in the models, such as litter size, the idea is the same, just more complicated to calculate. In this case, the range of doses and other covariates is broken up into cells, and the number of observations that fall into each cell is compared to that predicted by the model.

It can happen that the model is never very far from the data points (so the P-value for the goodness-of-fit statistic is not too small), but is always on one side or the other of the dose-group means. Also, there could be a wide range in the response, and the model predicts the high responses well, but misses the low dose responses. In such cases, the goodness-of-fit statistic might not be significant, but the fit should be treated with caution. One way to detect such situations is with tables or plots of residuals at each dose level: measures of the deviation of the response predicted by the model from the actual data. If the residuals are scaled by an estimate of their standard deviation, then residuals that exceed 2.0 in absolute value warrant further examination of the model.

Another way to detect the form of these deviations from fit is with graphical displays. Plots should always supplement goodness-of-fit testing. It is extremely helpful that plots that include data points also include a measure of dispersion of those data points, such as confidence limits.

Another way to detect the form of these deviations from fit is with graphical displays. Plots should always supplement goodness-of-fit testing. It is extremely helpful that plots that include data points also include a measure of dispersion of those data points, such as confidence limits.

In certain cases, the typical models for a standard study design cannot be used with the observed data as, for example, when the data are not monotonic, or when the response rises abruptly after some lower doses that give only the background response. In these cases, adjustments to the data (e.g., a log-transformation of dose) or the model (e.g., adjustments for unrelated deaths) may be necessary. In the absence of a mechanistic understanding of the biological response to a toxic agent, data from exposures that give responses much more extreme than the BMR do not really tell us very much about the shape of the response in the region of the BMR. Such exposures, however, may very well have a strong effect on the shape of the fitted model in the region of the BMD. Thus, if lack of fit is due to characteristics of the dose-response data for high doses, the data may be adjusted by eliminating the high dose group. The practice carries with it the loss of a degree of freedom, but may be useful in cases where the response plateaus or drops off at high doses. Since the focus of the BMD analysis is on the low dose and response region, eliminating high dose groups is reasonable. Alternatively, an entirely different model could be fit.

It will often happen that several models provide an adequate fit to a given data set. These models may be essentially unrelated to each other (for example a logistic model and a probit model often do about as well at fitting dichotomous data) or they may be related to each other in the sense that they are members of the same family that differ in which parameters are fixed at some default value. For example, one can consider the log-logistic, the log-logistic with non-zero background, and the log-logistic with threshold and non-zero background to all be members of the same family of models. Goodness-of fit statistics are not designed to compare different models, so alternative approaches to selecting a model to use for BMDL computation need to be pursued.

Generally, within a family of models, as additional parameters are introduced the fit will appear to improve. This general behavior is due solely to the increase in the additional parameters. Likelihood ratio tests can be used to evaluate whether the improvement in fit afforded by estimating additional parameters is justified. Such tests cannot be applied to compare models from different families, however. Some statistics, notably Akaike's Information Criterion (AIC) (Akaike, 1973; Linhart and Zucchini, 1986; Stone, 1998; AIC is -2L + 2p, where L is the log-likelihood at the maximum likelihood estimates for the parameters, and p is the number of model degrees of freedom) can be used to compare models with different numbers of parameters using a similar fitting method (for example, least squares or a binomial maximum likelihood). Although such methods are not exact, they can provide useful guidance in model selection.

When other data sets for similar endpoints exist, an external consideration can be applied. It may be possible to compare the result of BMDL computations across studies if all the data were fit using the same form of model, presuming that a model can be found that describes all the data sets. Another consideration is the existence of a conventional approach to fitting a kind of data. In this case, communication with specialists in that type of data is eased when a familiar model is used to fit the data. Neither of these considerations should be seen as justification for using ill-fitting models. Finally, it is generally considered preferable to use models with fewer parameters, when possible.

Confidence limits express the uncertainty in a parameter estimate that is due to sampling and/or
experimental error. The interval between two confidence limits is called a *confidence interval*. Confidence intervals can be two-sided, that is, localize their corresponding parameter on both sides, or one-sided, that is, localize their corresponding parameter on only one side. It may be convenient to think of a one-sided confidence interval as one limit
of a two-sided interval goes to either infinity or minus infinity. For example, a one-sided 95% confidence interval for a parameter would share a limit with the two-sided 90% confidence interval for the parameter, and have plus or minus infinity (or, perhaps, 0, for a parameter such as the BMD that must be non-negative) as its second limit. Confidence limits bracket
those values which, within a particular model family, are consistent with the data, but do not account for or assume any correspondence between the modeled animal data and the human population of concern. The "confidence" or "coverage" associated with an interval indicates the percent of repeated intervals based on experiments of the same sort that are expected to include the parameter being estimated, for example, the BMD. With rare but important exceptions, calculated confidence intervals are approximations, in the sense that the actual coverage of the interval usually diverges somewhat from the desired. The choice of confidence level represents tradeoffs in data collection costs and the needed data precision. Just as 0.05 is a convenient (but not necessarily good for all data) level for tests, 95% is a convenient choice for most limits and is the default value recommended in this guidance.

A lower confidence limit is placed on the BMD to obtain a dose (BMDL) that assures with high confidence (e.g., 95%) that the BMR is not exceeded. This process rewards better experimental design and procedures that provide more precise estimates of the BMD, resulting in tighter confidence intervals and thus higher BMDLs. A detailed discussion of how BMDLs are calculated is beyond the scope of this tutorial. Some procedures and examples for calculating BMDLs or BMCLs are given by Gaylor et al. (1998) and are discussed in the 2000 - Development of Draft Benchmark Dose Technical Guidance Document.

EPA requires a number of reporting requirements for the BMD and BMDL. These are considered important for the risk assessor to judge whether or not the choice of studies and endpoints for modeling has been done appropriately and whether the most appropriate BMD and BMDL have been selected as the POD for low dose extrapolation. The following elements should be included in any computation of a BMD pr BMDL:

The following Decision Tree depicts the general progression of steps in a BMD calculation. A separate BMD calculation should be conducted for each endpoint/study combination that is a reasonable candidate for becoming the basis for a final quantitative risk estimate. Unlike comparing NOAELs or LOAELs across endpoints or studies, the relative values of potential BMDs are not readily transparent until after the modeling has been completed. For each candidate endpoint/study combination:

Back to Previous Section |
Forward to the Next Section |