Assessment and Remediation of Contaminated Sediments (ARCS) Program
Table of Contents
- Chapter 1
- Chapter 2
- Chapter 3
- Chapter 4
- Chapter 5
- Chapter 6
- Chapter 7
- Chapter 8
- Chapter 9
- Chapter 10
- List of Tables
- List of Figures
Assessment Guidance Document
US Environmental Protection Agency. 1994. ARCS Assessment Guidance Document. EPA 905-B94-002. Chicago, Ill.: Great Lakes National Program Office.
Table Of ContentsQUALITY ASSURANCE AND QUALITY CONTROL
- QUALITY ASSURANCE PROGRAM
- DEVELOPMENT OF DATA QUALITY OBJECTIVES AND MEASUREMENT QUALITY OBJECTIVES
- QUALITY ASSURANCE AND QUALITY CONTROL SAMPLES
- PREPARATION OF QUALITY ASSURANCE PLANS
- DEVELOPMENT OF A LABORATORY AUDIT PROGRAM
- DATABASE REQUIREMENTS AND DATA VERIFICATION/VALIDATION METHODS
It is USEPA policy that all environmental sampling and testing be conducted in accordance with a formalized quality assurance program. Quality assurance has been defined as "those operations and procedures which are undertaken to provide measurement data of stated quality with a stated probability of being right" (Taylor 1987). The purpose of the quality assurance program is to specify the policies, organization, objectives, and QA/QC activities needed to achieve the data quality requirements of the program. These specifications are used to assess and control measurement errors that may enter the system at various phases of the project, such as during sampling, preparation, and analysis. Therefore, QA/QC procedures implemented in any program should be designed to ensure that the best possible data are collected and that the quality of the resulting data can be evaluated and documented. Adherence to an overall quality assurance program is essential for large, multiparticipant programs, such as the ARCS Program, to ensure that the data collected by individual investigators will be comparable and congruous. Some of the QA/QC considerations specific to sediment toxicity tests are discussed in Chapter 6.
USEPA currently recognizes four categories of quality assurance programs. These categories differ according to the end use of the data. The following definitions of the four categories are modified from Simes (1989):
Category I Projects that produce results that can stand alone. These projects are of sufficient scope and substance that their results could be used directly, without additional support, for compliance or other litigation. Such projects are of critical importance to USEPA goals and must be able to withstand legal challenge. Accordingly, the quality assurance requirements for these projects will be the most rigorous and detailed to ensure that such goals are met.
Category II Projects that produce results that complement information from other projects. These projects are of sufficient scope and substance that their results could be combined with the results of other projects of similar scope to produce narratives that would be used for making rules, regulations, or policies. In addition, projects that do not fit this pattern, but have high public visibility, would also be included in this category.
Category III Projects that produce results for the purpose of evaluating and selecting basic options, or performing feasibility studies or reconnaissance of unexplored areas that might lead to further work.
Category IV Projects that produce intermediate results used in testing assumptions.
Each program, or individual project within a program, should be categorized at its inception. The quality assurance category selected for the program will have a dramatic effect on the complexity of the quality assurance program as well as the writing requirements for the quality assurance project plans (QAPPs) that must be prepared (see Preparation of Quality Assurance Plans). Category I projects involve the most stringent data acceptance criteria, the most expansive quality assurance approach, and the most detailed QAPP, whereas Category IV projects involve the least stringent requirements for data acceptance, perhaps the fewest number of QA/QC samples, and the least number of issues to be addressed in the QAPP. Categories II and III fall progressively between these two categories. The various projects completed during the ARCS Program were Category II and III projects. Generally, Category II or III projects are recommended for integrated sediment assessments in the Great Lakes. However, when developing DQOs, it is imperative to consider all potential uses of the data. For example, if potential data uses might support enforcement, a Category I project is recommended.
The general components of a good quality assurance program for any level of effort, ranging from the individual laboratory through the nationwide program level, should address the following issues:
- Development of the DQOs and MQOs
- Preparation of the quality assurance plans
- Development of a laboratory audit program
- Development of the database requirements and data verification/validation methods.
These issues should be addressed and in place prior to any sampling; however, the quality assurance program should be flexible enough to allow for changes during the study. Each of these issues is discussed in more detail in the following sections.
One of the initial activities in any environmental assessment program is the development of DQOs. DQOs are used to focus the initial design of the field and laboratory studies to provide the necessary data to guide selection of remedial alternatives (if necessary). The DQO process also provides a logical, objective, and quantitative framework for finding an appropriate balance between the time and resources that will be used to generate the data and the quality of the resulting data (Neptune et al. 1990). DQOs may be defined as the "qualitative and quantitative statements of the overall level of uncertainty that a decision-maker is willing to accept in results or decisions derived from environmental data" (USEPA 1987a). DQOs result from an iterative process of logical interaction between the decision-makers and the technical team involved in a given project.
The development of the DQOs can be divided into seven steps (Figure 2-1). The seven steps presented do not include all of the individual operational processes that may be involved at each step in the DQO process, but do provide guidance on the overall development of the program or laboratory DQOs. In Step 1, the problems that need to be resolved or studied are defined and the overall objectives for the program are formulated. For example, one of the questions to be answered in the ARCS Program was, "What is the nature and extent of bottom sediment contamination at the selected Great Lakes AOCs?"
The next step in the DQO process is to define the specific decisions to be made or questions to be answered based on the data collected (Step 2, Figure 2-1). For example, in the ARCS Program, one of the decisions that needed to be made to guide the selection of remedial alternatives for the Buffalo River AOC was, "Are the sediments contaminated with organic compounds?"
In Step 3 (Figure 2-1), the types of data required to make decisions, how the data will be obtained, and the use(s) of the collected data are defined. The types of data that may be required include, but are not limited to, physical, chemical, biological, and toxicological properties of the site. Data may be obtained by sampling and laboratory analysis, physical testing, or modeling studies. Potential uses of the data encompass site screenings and evaluations, human health and ecological risk assessments, regulatory compliance or violation assessments using predefined action limits, modeling efforts, and the determination of remedial process effectiveness and efficiency. To answer the example question identified in Step 2, sediment chemical data would be required for a variety of persistent organic compound classes such as PCBs, chlorinated pesticides, polychlorinated dibenzo-p-dioxins (PCDDs), polychlorinated dibenzofurans (PCDFs), and PAHs (specific compound names should be listed in the QAPP). The chemical data should be generated from recently collected sediment samples to accurately represent current conditions in the Buffalo River.
It is during this third step in the DQO process that the database manager(s) should become involved. Three general phases of database design and implementation are 1) initial design, 2) initial concepts, and 3) detailed design. During the initial design process the "structure" of the project and all potential data users should be identified. The "structure" is defined by site and sample identifiers, original data specifications, and user-identified data requirements. The structure of the project addresses the experimental design in both the field and laboratory, the volume of data to be collected, and the required turn-around time for data entry and manipulations. In the initial concepts phase, sample tracking, data management (computer systems and/or hardcopy data recording and subsequent data entry), data collation methods, quality assurance checking (electronic vs. manual), and quality control data requirements (batch-wide vs. sample-specific quality control) should be addressed. Detailed design considerations include: where the data will be stored, the development of a system if an appropriate storage system is not available, the specific format in which the data should be reported, which data are necessary during the data reporting, which data need to be maintained in the database, how the data will be retrieved (hardcopy, PC-based systems, or mainframe computers), and which statistical tests will be available for subsequent data analysis. Further information on system design and database development can be found in the USEPA systems design and development guidance documents (USEPA 1989).
Defining the boundaries of the study area (Step 4, Figure 2-1) is necessary to limit studies to a manageable area, without excluding any areas of significant interest identified from historical or other ongoing studies. This boundary definition incorporates not only spatial but temporal and demographic considerations based on past and present land use. For example, in the ARCS Program, the boundary of the Buffalo River AOC was defined to extend from the mouth of the river upstream to just above the confluence of the Buffalo River and Cazenovia Creek. Thus, the boundary of this AOC was restricted to the stretch of river in which a majority of the industrial outflows exist (or existed, if the companies no longer operate) that could have contributed to the potential contamination of the Buffalo River sediments.
The development of decision rules is the next important step (Step 5, Figure 2-1) in the DQO process. A decision rule is a restatement of the decision to be made that clearly indicates how the data to be collected will influence the outcome of the decision. The decision rule is typically formulated as an if-then statement, showing all the possible outcomes. For example, "If . . ." may specify exceedance of some criterion or action level and "then . . ." would state the action to be taken. These decision rules help decision-makers bring the study into sharper focus. An example of a decision rule developed for the ARCS Program was, "If concentrations of total PCBs exceed a particular level in sediments, as determined by USEPA SW-846 Method 8080 (USEPA 1986b), then the Buffalo River sediments will be classified as toxic and considered for remediation by the Engineering/Technology Work Group." A similar decision rule for biological analyses could be, "If exposure to whole sediment significantly (P<=0.05) reduces survival or growth of test organisms relative to appropriate reference conditions, then the sediment will be classified as toxic and considered for remediation by the Engineering/Technology Work Group." The use of either or both example decision rules may be appropriate in a program, depending on an assessment of uncertainty and cost addressed in the following two steps (as discussed in Step 7 below, a combination of these decision rules is recommended).
The sixth step in the DQO process (Figure 2-1) is to specify the constraints or levels of uncertainty that are acceptable in addressing the issues defined in Step 1. Uncertainty levels can be both qualitative and quantitative. It is in this step that the MQOs are established (a more complete discussion of the components of the MQOs is provided in the next section). These objectives determine how many samples to collect, where to sample in the AOC, what methodologies will be used for all phases of the program (including field sampling, sample preparation, and analysis), how reliable the resultant analyses need to be in terms of accuracy (bias and precision), and how to assemble the data to present the desired results. The effects of potential false positives (e.g., sample is uncontaminated yet chemical or biological results indicate contamination) and false negatives (e.g., sample is contaminated yet chemical or biological results indicate no contamination) should also be assessed during this DQO step.
An example of a constraint that was applied to sediment chemistry data in the ARCS Program is "Triplicate analyses of sediment samples analyzed for PCBs following USEPA SW-846 Method 8080 (USEPA 1986b) should have a precision, measured as percent relative standard deviation (%RSD), of less than or equal to 20 percent." The %RSD is the standard deviation of multiple (three or more) measurements divided by the mean of the measurements and multiplied by 100. The uncertainty associated with this constraint affects the ability to use the decision rule identified in Step 5 for sediment chemistry. Similarly, constraints applied to biological data will affect use of the decision rule developed in Step 5 for biological analyses.
Generally, it is during this assessment of uncertainty in the DQO process that budgetary constraints that limit the number of samples and analyses to be performed should be taken into account. If all the analyses cannot be performed under the budgetary constraints, the number of samples to be collected or the number of analyses to be performed on a given sample should be reevaluated, keeping in mind how the elimination of a given test affects uncertainty and the researcher's ability to answer critical questions (Step 2) as determined in the logic statements developed in Step 5.
The final step in the DQO process is to optimize the study design so that the most cost-effective decision rules with an acceptable degree of uncertainty are selected to meet the specified DQOs. For example, in the ARCS Program, biological testing (i.e., Microtox ®) was determined to be a cost-effective way to screen hot spots (e.g., areas of high acute toxicity) and clean areas that exhibit no statistically significant (P<=0.05) response. Intermediate areas that exhibit moderate toxicity or conflicting toxicity results using well-accepted tests have sufficient uncertainty to warrant chemical analysis and further evaluation by the integrated sediment assessment approach.
In addition, chemical analyses are needed in follow-up assessments of biological hot spots to identify the specific composition of chemicals that may be contributing to the observed toxicity as well as potential sources of the chemicals. Therefore, the decision rule concerning potential remediation requires the assessment of both sediment toxicity and chemical concentration data.
MQOs are specific goals defined by the data users that clearly describe the data quality that is sought for the project phase. The quality assurance program should focus on the definition, implementation, and assessment of MQOs that are specified for the sampling, analysis, and verification phases of the project. The MQOs should be defined according to the following six quality assurance objectives and attributes:
- Detection Limit--The lowest concentration of an analyte that a specified analytical procedure can reliably detect
- Bias--The difference between an observed value and the "true" value (or known concentration) of the parameter being measured; bias is the first component of accuracy, which is the ability to obtain precisely a nonbiased (true) value
- Precision--The level of agreement among multiple measurements of the same characteristic; precision is the second component of accuracy
- Representativeness--The degree to which the data collected accurately represent the population of interest (e.g., contaminant concentrations)
- Comparability--The similarity of data from different sources included within individual or multiple data sets; the similarity of analytical methods and data from related projects across AOCs
- Completeness--The quantity of data that is successfully collected with respect to the amount intended in the experimental design.
Each of these objectives and attributes will be discussed separately in the following text. A list of MQOs for the ARCS Program is provided in Table 2-1.
All analytical laboratories should be required to determine the instrument detection limit (IDL) prior to any analysis of the routine samples. The IDLs serve as a statistical estimate of the lowest concentration of an analyte that an instrument could reliably distinguish between the background noise and the signal. The target detection limit (TDL) is the concentration at which the presence of an analyte must be detected to properly be able to assess and satisfy the DQOs. Method detection limits (MDLs) are based on a method's ability to determine the presence, qualitatively or quantitatively, of an analyte in a sample matrix, regardless of its source of origin (Glaser et al. 1981). MDLs may be determined by making repeated measurements (a minimum of seven) over several days of either a calibration blank (a blank consisting solely of the reagents mixed in the same proportions as those to be used during routine sample extraction/digestion) or a low-level standard with a concentration within 1-5 times the IDL. The MDL is calculated, at the 95 percent confidence level, as 3 times the standard deviation of the measured sample concentrations. Generally, the conditions for acceptance of a laboratory's ability to determine small quantities of various analytes is that the MDL is less than or equal to the TDL. The detectability attribute is generally only applicable for quantitative physical and chemical analyses.
The advantage of determining MDLs by the analysis of spiked uncontaminated field samples is that the concentration of the analyte can be in the optimum range for quantification and the variance caused by the sample matrix and sample processing will be reflected in the MDLs. MDLs are affected by both matrix interferences and highly contaminated samples. The MDLs for highly contaminated samples will often be much greater than those for relatively "clean" samples.
In addition to MDLs, a second limit commonly associated with detectability in a sample matrix is the limit of quantification (LOQ). This limit is often arbitrarily defined as 5-10 times the standard deviation of the measured low-level standard or blank sample concentration. At the higher end of this concentration range, the relative confidence in the measured value is about +/-30 percent at the 95 percent probability level (Taylor 1987).
Bias is the degree of agreement of a measured value with the true or expected value, and is the first component of accuracy. A highly biased value has low accuracy. Bias, for physical and chemical measurements, is commonly assessed through the use of certified reference materials (CRMs; a reference material certified by a technically competent organization), standard reference materials (SRMs; a reference material certified by the U.S. National Institute of Standards Technology [U.S. NIST]), or other standards (either created internally by the laboratory or provided by another laboratory). In the absence of CRMs or SRMs, matrix spikes can be used to determine bias. Bias can be determined by comparing the analytical results to the known value of the reference material, plus or minus an established acceptance range either provided with the reference material or agreed upon as part of the DQO process. For example, in the ARCS Program, acceptable recovery values should be within 85-115 percent of the spiked values for metals and 70-130 percent for organic and organometallic compounds. Control samples for assessing bias should be analyzed at a rate of 1 per 20 environmental samples.
For toxicity tests, bias can be assessed through the use of a control sediment (a "clean" sediment that contains only background quantities of the analyte[s] of interest), reference toxicants, and long-term monitoring of the coefficients of variance among reference toxicants used for a given toxicity test. The assessment of organism response from the use of the control sediment and reference toxicant establishes the test validity and is similar to that of the physical and chemical testing in that the results should be within the bias window (e.g., mean plus or minus acceptance range, percent survival greater than a set limit, given number of young produced by the third brood) established for that reference material. Long-term monitoring of the coefficient of variation of the use of a given reference toxicant provides the researcher with an assessment of the degree of temporal change in the test organism.
Precision is the degree of agreement among repeated independent measurements under specified conditions. Precision is the second component of accuracy; a measurement with poor precision (high variability) can only sometimes be accurate. Alternatively, measurement systems that have both low bias and good precision are always accurate. Precision is assessed through the use of replicate samples and determining the statistical relationship among the results compared to the mean of the results. Commonly, the coefficient of variation (standard deviation divided by the mean) for triplicate or greater replication, or the relative percent difference (RPD) for duplicate samples (the absolute difference between two duplicate measurements divided by the mean, and multiplied by 100), is calculated to rapidly assess the precision of a set of measurements. Precision is deemed acceptable when the obtained precision result is less than or equal to some defined value agreed upon during the DQO process. For example, in the ARCS Program, precision was based on analytical duplicates or triplicates analyzed at a rate of 1 per 20 samples; the acceptable coefficient of variation was <=20 percent.
Representativeness in the quality assurance program should be defined for both the field sampling and laboratory analysis aspects of the program. Representativeness may be defined as the degree to which the sampling data properly characterize the study environment. In the field sampling and characterization phase of a program, representativeness should be maintained by the collection of samples throughout the entire AOC (to address the spatial variability of the area) or at the locations identified by the decision-makers and technical work group members during their initial establishment of the program DQOs. In the analytical phase of the program, representativeness considerations include proper sample storage and preservation conditions (to ensure that the sample does not substantially change from the time of sampling until the time of analysis) and sample homogenization (to ensure that the subsample taken for analysis is no different from any other subsample).
Comparability is an important component of the MQOs because it states the confidence with which one data set can be compared to another. If data are not comparable, then conclusions drawn from the combination of two data sets will have an increased level of uncertainty. Comparability is enhanced by the consistent use of standardized sampling methods and specified protocols for the sampling phase and through the use of standard documented methodologies (e.g., USEPA, American Society for Testing and Materials [ASTM], U.S. Army Corps of Engineers [the Corps]) for analyte determinations. If a standard method is not available, the method selected should be clearly documented by reference or provided as a written standard operating procedure in the QAPP (to be discussed in a later section). Any deviations from the standardized, selected methods or protocols should be clearly documented because these changes may significantly affect the resultant data.
One issue that should be considered when evaluating comparability is the influence of temporal variation, especially if resampling events are planned in the same AOC. The influence of short-term discrete disturbances (e.g., storm events) and long-term changes (e.g., seasonal variations) can markedly change the sediment contaminant concentrations, biological communities, and toxicity of the sediments in the system. Therefore, temporal variability can play an important role when data are evaluated and compared between sampling events.
Completeness levels should be established during the DQO process. These levels state the minimum number of samples that must be obtained during the field sampling phase and the minimum amount of acceptable data (i.e., data that must meet and pass the QA/QC requirements of the program) that must be generated to be able to confidently resolve the identified program issues. Completeness is generally expressed as the amount of data actually obtained divided by the amount of data expected to be obtained, on a percentage basis. The ARCS Program used a 90 percent level of completeness.
To achieve the DQOs and MQOs, various types of measurement samples can be used to quantitatively assess and control the error associated with the results. These samples fall into two categories, QA and QC samples.
Quality assurance samples are samples incorporated into batches during sample collection or preparation. These samples provide data users with a means of independently assessing the quality of the data generated at a given analytical laboratory. These samples can be either double-blind samples (sample identity and analyte concentration are unknown to the laboratory) or single-blind samples (sample identity is known but analyte concentration is unknown to the analytical laboratory). Double-blind samples are preferable to single-blind samples. Examples of typical quality assurance samples can include reference materials, field replicates, field-prepared blanks (e.g., trip blanks), and preparation laboratory replicates, if sample preparation is performed at a separate laboratory.
Quality control samples are those samples prepared at the analytical laboratory, and hence the sample identity and analyte concentration, if applicable, are known to the laboratory personnel. These samples enable the laboratory to control measurement error and meet the program MQO requirements. Typically, quality control samples include blanks, controls, ongoing calibration check standards, analytical replicates, matrix spikes, and surrogate spikes.
The QA/QC samples should be analyzed with the routine sample analysis. Each QA/QC sample should have specifications to be met before the data are considered acceptable. These specifications include acceptance limits and required frequency of use (i.e., 1 blank per 20 routine samples with an acceptable measured concentration below the MDL). The use of QA/QC samples and their required frequency of use should be balanced with the data quality needs of the program.
The following sections briefly describe the types and uses of the various QA/QC samples.
A replicate sample may be used to assess the precision MQO. Replicates of samples can be obtained from the field, preparation laboratory (if separate from the analytical laboratory), and analytical laboratory. The most common form of replicates is the analytical replicates. These samples are created at the analytical laboratory by obtaining two or more subsamples from a single routine sample and analyzing them as separate individual samples. The results from the analytical replicates can be used to demonstrate or confirm that the analytical precision MQOs are being satisfied.
Field replicate samples are collected during the sampling phase of the program. A field replicate sample may be obtained by collecting two unique individual samples from the same location that will be treated as separate samples throughout the rest of the sample preparation and analysis phases. These samples are generally submitted to the preparation and/or analytical laboratory as blind samples (identities of the replicates are unknown to the laboratory personnel). The individual sets of samples are used to assess the overall (laboratory plus field) precision. Interpretation of the results of the field replicates can be difficult due to the fact that significant variability may exist in the field. Generally, a failure to meet the MQOs for the field replicates would result in only a minor concern, indicating the existence of minor uncertainty in the data (assuming that the laboratory replicates show no major problem with analytical variability). Such concerns should not be used in isolation to disqualify data from the sample or sample batch (Papp et al. 1989).
Field split samples represent an additional form of replicate sample commonly collected in the field. The field split samples are similar to the field replicate samples except that the two samples are subsampled from a single collected sediment sample and not from two separately collected samples. These samples can be used to assess the same error components (or variances) as the field replicate samples but on a smaller spatial scale.
Preparation laboratory replicates can be created at the preparation laboratory if it is separate from the laboratories responsible for sampling and parameter analysis. These samples are prepared by splitting a randomly selected routine sample into two representative halves (i.e., after the sample is homogenized). Each half is then treated as a separate sample at the analytical laboratory. Preparation laboratory replicates can be used to assess the preparation laboratory within-batch precision.
Precision acceptance limits for the analytical replicates should be tighter (smaller allowable variability) than the precision limits for the field replicates and field splits. Preparation laboratory replicates should be expected to have a precision variability somewhere between those for the analytical replicates (i.e., split of one sample) and field replicates (i.e., two samples from the same location). For example, in the ARCS Program, the precision requirement for the field replicates is an RPD of <=30 percent, while for the analytical replicates is a %RSD of <20 percent for acceptance. It should be noted that the field and preparation laboratory replicates are forms of quality assurance samples and can only be checked by the data user. In contrast, the analytical replicates are quality control samples that the laboratory can use to immediately check the precision of their measurement system.
Blank samples are quality control samples that can be used for two purposes in a quality assurance program, as a calibration check and as a check for potential contamination of the measurement system. A calibration blank is defined as a zero mg/L or ug/L standard and contains only the solvent or acid used to dilute the calibration standards without any analyte present (Papp et al. 1989). The calibration blank is analyzed periodically to check for significant instrument baseline drift and should have results that are below the MDL.
To assess whether outside contamination has entered the measurement system, various blank samples can be used. Perhaps the two most common forms of blank samples are field and reagent blanks. Reagent blanks may be defined as a sample composed of all the reagents, in the same quantities, used to prepare an actual routine sample for analysis. A field blank generally consists of either "clean" water or reagents brought from the laboratory to the field and passed through all the sampling equipment used to obtain the routine samples. Both field and reagent blanks should undergo the same preparation and analysis procedures as an actual routine sample. Similar to the calibration blanks, these blanks should have measured concentrations that are below the MDL (Appendix B of 40 CFR Part 136).
For toxicity tests, the blank is better known as the control sample or the negative control. This sample simply consists of the water or sediment in which the organisms had either been cultured or raised. The negative control sample in these tests is used to assess organism health during the given toxicity test period and the influence of the "clean" water or sediment on the organism. The response of the organisms in the control samples should be required to equal or exceed a specified response limit (e.g., 90-percent survival, if survival is the toxicity test endpoint) that was determined during the DQO process.
Reference materials are analyzed to assess the bias of measurements being made at the analytical laboratories. These samples can be used either as quality assurance or quality control samples. The three major forms of reference materials commonly used are CRMs, SRMs, and standards. CRMs are those materials that have one or more of their property values established by a technically valid procedure and are accompanied by or traceable to a certificate or other documentation issued by the certifying body. For example, reference materials produced by USEPA, the U.S. Geological Survey (USGS), or the Canadian Centre for Mineral and Energy Technology are considered CRMs. SRMs are CRMs produced by the U.S. NIST and characterized for absolute analyte content independent of the analytical method. If reference materials, either certified or standard, are not available for a given analyte, the laboratory can assess bias by using a standard of known concentration created by the quality assurance officer or other member of the quality assurance staff at the analytical laboratory, or using a standard provided by a different laboratory. The standards should be submitted as at least single-blind samples to the analyst, if possible. Bias can be determined by comparing the analyticalresults to the known value of the reference material, plus or minus anestablished acceptance range either provided with the reference material oragreed upon as part of the DQO process. For the ARCS Program, the accuracy requirement for bias in either SRMs or CRMs is that the measured value must bewithin +/-20 percent of the known concentration. The reference materials areused to control bias and reduce between-batch components of the measurement uncertainty.
For toxicity tests, two forms of reference materials can be used to assess the bias of organism responses. The first is to expose the organism to a reference toxicant that has a known and quantifiable response in the organism. The reference toxicants can be used to test organism sensitivity to waterborne or sediment-associated contaminants. The reference toxicants can also be used to control bias and to assess the within- and between-batch components of the measurement uncertainty.
The second reference material for assessing the bias of toxicity tests is the control sediment (see Blank Samples above). The control sediment is a "clean" sediment that contains only background quantities of the analytes of interest and that has been routinely used to assess the acceptability of the test. The control sediment exposes the organism to a matrix similar to the sediments being assayed without elevated concentrations of contaminants. The acceptability of the toxicity tests can be assessed by the response (e.g., survival or growth) of the control organisms to the control sediment. The acceptance criteria for control sediment toxicity tests should be determined during the DQO process.
Matrix spike samples are quality control samples used to assess the efficiency of the extraction technique and as a form of accuracy testing. These samples are prepared by adding the spiking analyte to the routine sample prior to extraction or digestion and ensuring that the spiking solution is thoroughly mixed with the sample matrix. If no matrix is present and the spike is added to the reagents only, this is called a spiked reagent blank. The spike concentration should be approximately equal to the expected concentration (if known or can be reasonably estimated) of the analyte in the environmental sample or 10 times the detection limit, whichever is larger (Papp et al. 1989). The volume of the added spike should be negligible (i.e., <=1 percent of the sample aliquot volume) to avoid any dilution effects. Matrix spikes are analyzed in conjunction with an unspiked routine sample. Matrix spike analyses are generally reported as the percent spike recovery of the known quantity added to the sample for each analyte and calculated as follows:
% Recovery = 100 x [(S-U)/C]
S = measured concentration in the spiked aliquot
U = measured concentration in the unspiked aliquot
C = actual concentration of spike added.
The MQOs for matrix spike recoveries should be 100 percent, plus or minus the acceptance range. For example, in the ARCS Program, the acceptance criterion for inorganic matrix spikes was limited to a percent recovery range of between 85 and 115 percent (100 +/- 15 percent).
Matrix spikes are used for all studies, even when SRMs or CRMs are used, because they can help determine potential, site-specific matrix problems.
Surrogate spike analyses are only applicable to the organic analyses, such as for PCBs, chlorinated pesticides, PCDDs and PCDFs, and PAHs. A surrogate spike may be defined as an added organic compound that is similar to the analytes of interest in chemical composition, extraction, and chromatography, but that is not normally found in the environmental sample (USEPA 1986b). These compounds are spiked into blanks, standards, reference materials, routine samples, and matrix spike samples prior to extraction. Percent recoveries are calculated for each surrogate compound. For the ARCS Program, acceptable surrogate spike recoveries were set at 100 +/- 30 percent. Surrogate spikes are used to assess the efficiency of the extraction technique and as a form of accuracy testing, but without the confounding influence of the analyte of interest already present in the sample.
Surrogate spike compounds may be target compounds labeled with stable isotopes of carbon (i.e., C13) or hydrogen (i.e., deuterium, H2),or other compounds that are physically and chemically similar to the chemicals of interest but that do not typically occur in nature. For example, dibromooctafluorobiphenyl is sometimes used as a surrogate for PCBs, although this compound is not identical in structure to a PCB. Analyses for semivolatile organic compounds typically include the spiking of three neutral compounds (e.g., naphthalene-d8), two organic acid compounds (e.g., phenol-d5),and sometimes two organic base compounds (e.g., n-nitrosodiphenylamine-d6).
Compound-specific recovery corrections in each sample analyzed can beaccomplished for organic analyses using the isotope dilution technique (e.g.,USEPA Method 1625C for solids). This technique is appropriate only when sampleresults will be quantified using gas chromatography/mass spectrometry (GC/MS)analysis. Sample processing is identical whether this technique is used or not, except that a large number of isotopically labeled compounds (available in kits) are spiked into the sample matrix prior to extraction instead of the three to five surrogate spike compounds normally used. Ideally, there should be an isotopically labeled compound that matches each (unlabeled) target compound that will be quantified. Rather than acting simply as indicators of analytical recovery for the sample (as do surrogate spike compounds), these labeled compounds are used as analytical "recovery standards" for their unlabeled counterparts. Therefore, the final concentration calculated by the GC/MS system for the target compounds can incorporate a correction for the analytical recovery experienced by the corresponding isotopically labeled compound.
The isotope dilution technique has been routinely used for years in USEPA methods for the quantification of PCDDs and PCDFs (e.g., USEPA Methods 8280 and 8290), and is now an option for other semivolatile (and volatile) organic compounds in hazardous waste samples analyzed under USEPA's Contract Laboratory Program. This technique is designed to increase the accuracy of chemical analyses and the comparability of results among laboratories. In addition, the technique increases the confidence in the validity of reported detection limits for undetected target compounds. By forcing a search for every recovery standard in each sample extract, the technique also increases the efficiency of detection and reporting frequency of compounds that otherwise may be overlooked in complex extracts.
A potential disadvantage of this technique is that the addition of a large number of isotopically labeled compounds complicates some automated machine searches for new or unknown compounds (i.e., tentatively identified compounds), although the labeled compounds can also serve as markers to help identify and locate unknown compounds. Also, not all laboratories are familiar with the isotope dilution technique, which requires additional computer programming and can have a higher analysis price than routine GC/MS analyses.
Initial instrument calibration should be performed for all analytical instruments immediately prior to analysis of any samples. The initial calibration should be completed using a minimum of a three-point calibration curve (five-point calibration for semivolatile and volatile organic compound analyses) or following the instrument manufacturer's instructions for special analyses. For metals run by atomic absorption, these calibration standards should be analyzed as standard additions to the matrix. The standard concentrations tested should encompass the range of expected sample concentrations, including one standard near the LOQ. The acceptance criterion for the initial calibration curve is that all points used in the determination of the calibration curve should have a calculated coefficient of determination (r2) of some fixed value determined during the establishment of the MQOs for the project or program. For the ARCS Program, an r2 of >=0.97 was required for the determination of a properly calibrated instrument (in practice the values are generally better than 0.99). In addition, the %RSD of the relative response factor (RRF) obtained for each standard in the initial calibration should not exceed 30 percent. The RRF is the ratio of the response measured by the mass spectrometer to a known amount (mass) of an analyte relative to that of a known amount (mass) of an internal standard.
The ongoing calibration check samples should be analyzed to verify the calibration curve before, during, and after any routine sample analyses to check for instrument drift. The ongoing calibration check sample is a standard prepared by the laboratory that has a concentration about mid-calibration range for the given analyte. The MQO for the ongoing calibration check samples should be the known concentration, plus or minus the acceptance range defined during the DQO process. The MQO for ongoing calibration check samples in the ARCS Program was set at +/-10 percent of the known concentration of the analyte. In addition, the RRF determined for PCBs, chlorinated pesticides, and selected semivolatile and volatile compounds should be within 25 percent difference of the RRF for those compounds in the initial calibration. Specific semivolatile and volatile compounds that should meet this requirement are listed in USEPA Contract Laboratory Program guidance.
Control charts, while not actually a type of QA/QC sample, are extremely useful for the monitoring of long-term bias within the measurement system. Control charts are generally constructed by plotting the individual analytical results from a quality assurance or quality control sample against the mean value with +/-2 and 3 times the standard deviation plotted as warning and action limits, respectively. Control charts can be created for accuracy samples, ongoing calibration check samples, replicate samples where individual values are plotted during the long-term use of replicates from the same source, reagent blanks, reference toxicants, cumulative mean LC50s, and control sediments. Ideally, control charts are updated after each day of analysis. Bias is indicated by the occurrence of seven or more consecutive points on one side of the cumulative mean.
After the DQOs and MQOs have been determined and to meet the Federally mandated USEPA policy that all environmental sampling and testing programs have a formalized quality assurance program, a program-wide quality assurance management plan (QAMP) and laboratory-based QAPPs must be prepared (Costle 1979a,b). A program-wide QAMP was prepared for the ARCS Program (Schumacher 1991). The QAMP encompasses all of the quality assurance activities that will occur at each of the laboratories participating in the program. These activities include all data generation phases of the program from sample collection and mapping through the final sample analysis, as well as database verification and database management activities. A QAPP should be prepared by each laboratory and needs to address only the QA/QC concerns for the work that will be performed by that individual laboratory. It is in these documents that the DQOs and MQOs are clearly defined for the program.
USEPA Quality Assurance Management Staff guidelines (Stanley and Verner 1985) state that the QAMP and QAPPs should address in detail or by reference, each of the following 16 items:
- Title page with provisions for approval signatures
- Table of contents
- Project description
- Project organization and responsibilities
- Quality assurance objectives for measurement data in terms of precision, accuracy, completeness, representativeness, and comparability
- Sampling procedures
- Sample custody
- Calibration procedures and frequency
- Analytical procedures and calibration
- Data reduction, validation, and reporting
- Internal quality control checks
- Performance and system audits
- Preventive maintenance procedures
- Calculation of data quality indicators
- Corrective actions
- QA/QC reports to management.
The preparation and approvals of the QAMP and QAPPs should take place prior to the initiation of any sample or data collection processes within the program. More specific information on the various requirements of the QAPP may be found in shortened format in the pocket guide titled Preparing Perfect Project Plans (Simes 1989) or in expanded format in the Preparation Aids for the Development of Category X Quality Assurance Project Plans, where X refers to Category I, II, III, or IV projects defined by Simes (1991).
A laboratory audit program is essential for the monitoring of all data generation phases of any project or program. The audit program should include the submittal of evaluation samples to each participating laboratory and the execution of laboratory performance and system audits by the funding agency. Each of these parts of the audit program is discussed in the following sections.
The submittal of evaluation or audit samples, if available, is an extremely useful technique for determining a laboratory's ability to successfully perform the required analyses of the program. Ideally, an initial set of "pre-award" evaluation samples should be sent to each participating laboratory prior to the awarding of a contract. The results from these samples will allow for the evaluation of the timeliness of sample analysis, the ability of the laboratory to follow the required quality assurance measures implemented in the program, and the ability of the laboratory to accurately perform the required analyses. Once the contract is awarded, a set of evaluation samples should be submitted with the routine samples to ensure that the laboratory is maintaining control of the analyses being performed. These samples will also allow for problem identification/resolution to occur during the analytical phase of the program.
Performance audits essentially consist of reviewing the ongoing quality assessment program of a laboratory (Taylor 1987). The objective of this form of audit is to evaluate the accuracy of the data being generated at the laboratory. The review should include data checks of the QA/QC samples as well as the examination of control charts for potential laboratory bias.
System audits should be conducted to ensure that all personnel are adhering to the protocols specified in the QAPP in a consistent manner. System audits should be conducted during both the field sampling and analytical laboratory phases of the data generation process.
On an individual laboratory basis, an internal performance and system audit should be performed at least once during the analysis of samples for each project or phase of a given project. An external combined performance/system audit should be conducted at least once (more frequently if major problems are identified) during a large multi-laboratory program.
During the initial phases of a program's development, consideration should be given to exactly what data should be received from each laboratory, which format is to be used for formal submittal of the data, which data acquisition methods are to be used (on-line computer feed vs. manual data entry), which methods of verification and/or validation are going to be performed on the submitted data, how the data are going to be analyzed, and where the data are going to be stored during and at the end of the program. These questions should be addressed at the beginning of the program to avoid confusion at the laboratory during the later phases of sample analysis and data report generation. These points should be clearly delineated in the program QAMP and in each participating laboratory's QAPP in the data reduction, validation, and reporting section.
It is suggested that all data generated in conjunction with the project (i.e., data from routine samples, QA/QC samples, instrument calibration, etc.) be obtained by the funding agency. The purpose of collecting all the data is to verify the calculations and final results, to check for transcription errors, to ensure that all the QA/QC requirements were addressed, and so that if any problems or questions arise concerning the data in the future, it will be possible to resolve issues without having to contact the original analytical laboratory. Collecting and storing laboratory data that supplement the results (i.e., meta-data) will improve the long-term viability of the data and will allow more secondary use of the data by those outside of the project. The data submitted from the laboratory should be in both hard-copy and computer-readable formats.
Whenever possible, a list of acceptable values should be developed for certain data elements. For example, the list for the chemical parameters might include methylmercury, total PCBs, benzo[a]pyrene, etc. Using such lists to simplify the task of data quality control can greatly reduce inconsistencies (spelling, synonyms, and data format) in both data reporting and in data entry.
Data verification should consist of those analyses or checks on the submitted data to assess the degree of success that a laboratory obtained in meeting the MQOs specified for their project. Data verification should be performed on both the field sampling and characterization data as well as the analytical laboratory data. Field data should be examined for consistency, relative accuracy, and completeness of the submitted data (as defined for the DQOs). For this discussion, consistency is defined as the use of the same descriptive terms, reporting units, and station coordinates (i.e., latitude and longitude) throughout the field database. Relative accuracy is defined as consistency within the reported measurements. If deficiencies are identified in the field data, the laboratories should be contacted and requested to provide missing data or correct erroneous data.
Verification of the analytical laboratory data should be performed to check all calculations and to check for missing data, proper use of QA/QC samples, proper sample identification, data transmittal errors, internal consistency, intralaboratory comparability (if similar analyses were performed on the same sample), and even temporal and spatial consistency. Statistical checking for outliers may be appropriate during data verification/validation.
A quality assurance program is an integrated system of activities involving planning, quality control, quality assessment, and reporting to ensure that the data generated in a program meet defined standards of quality with a stated level of confidence. Adherence to a well-defined quality assurance program is essential to ensure that the data collected will be of known and acceptable quality as well as comparable among laboratories.
The first step in the development of a good quality assurance program is to define the DQOs. The DQOs are clear, concise statements that delineate all aspects of a program. DQOs ensure that all parties understand the goals of the program and the "route" the program will take to meet the goals. Properly established DQOs help to eliminate unnecessary waste of time and money. Further, the DQOs allow for the upfront planning of the level of data quality needed to meet the program's goals.
There are many forms of QA/QC samples and measures that can be used to assess and control the sources of error throughout the sample processing stream. QA/QC samples used during chemical and physical analyses can identify system contamination, method extraction efficiency, and the accuracy (bias and precision) of the measurements. The health, sensitivity, and influence of "clean" water or sediments on test organisms, as well as the bias and test reproducibility, can be assessed through the use of quality control samples during sediment toxicity tests and bioaccumulation studies. In addition to these quality control samples prepared by the laboratory, samples can be incorporated by the data user into the study design to independently assess the quality of data generated by a laboratory. The selection and use of the QA/QC samples should be balanced in terms of quantity and acceptance limits to meet the needs of the program, as defined by the DQOs.
Upon completion of the DQO process and determination of which QA/QC samples are to be used, all the decisions need to be documented in a QAPP. It is USEPA policy that all environmental programs be performed in accordance with a formalized quality assurance program, and the QAPP is the formalized written statement of that program. The QAPP describes the management policies, objectives, principles, organizational authority, responsibilities, and implementation plan of the program (or laboratory) for ensuring the quality of the data.
A laboratory audit program is an integral part of a good quality assurance program. The use of evaluation or audit samples can provide valuable information on a laboratory's capabilities prior to awarding a contract and can be used to assess laboratory performance during the program. Laboratory system or performance audits should be conducted to ensure that the laboratory is adhering to the quality assurance program specified in the QAPP and to provide an assessment of the quality of the data being generated during the program.
If data are to be stored in an electronic database, procedures should be established to document all data sources and any changes made to the values over time. In addition, verification of any hand-entered data should be performed (by a second individual).