Assessment and Remediation of Contaminated Sediments (ARCS) Program
Table of Contents
- Chapter 1
- Chapter 2
- Chapter 3
- Chapter 4
- Chapter 5
- Chapter 6
- Chapter 7
- Chapter 8
- Chapter 9
- Chapter 10
- List of Tables
- List of Figures
Assessment Guidance Document
US Environmental Protection Agency. 1994. ARCS Assessment Guidance Document. EPA 905-B94-002. Chicago, Ill.: Great Lakes National Program Office.
Table Of ContentsEVALUATION OF SEDIMENT TOXICITY
- EXPERIMENTAL DESIGN
- METHODS FOR SAMPLE COLLECTION AND EXPOSURE
- DATA ANALYSIS
- EVALUATION OF SEDIMENT TOXICITY TESTS IN THE ARCS PROGRAM
- EVALUATION OF TOP-RANKED TOXICITY TESTS
- CONCLUSIONS AND RECOMMENDATIONS
This chapter reviews methods commonly used to evaluate the toxicity of freshwater sediments and summarizes experiences from the toxicity tests conducted as part of the ARCS Program (Burton 1994; Ingersoll et al. 1993). Laboratory sediment toxicity tests described in Burton (1994) include elutriate and whole-sediment toxicity tests with various organisms including bacteria, algae, macrophytes, rotifers, cladocerans, amphipods, mayflies, and fish. Laboratory sediment toxicity tests described in Ingersoll et al. (1993) include exposures with elutriates (algae [Hall et al. 1993]; cladocerans and Microtox® bacteria [Coyle et al. 1993]) and whole-sediment samples (amphipods and chironomids [Nelson et al. 1993]). Up to 12 stations were sampled from each of three AOCs (Buffalo River, New York [Figure 1-1]; Indiana Harbor, Indiana [Figure 1-2]; and Saginaw River, Michigan [Figure 1-3]) and evaluated for toxicity to selected test organisms.
Selected results from the ARCS Program that are described in this chapter include 1) ranking of toxicity tests by their sensitivity and discriminatory power, 2) response similarity and correlations among toxicity tests, and 3) comparison of responses of the amphipod Hyalella azteca in acute and chronic exposures to whole sediments.
Conclusions and recommendations for sediment testing in this chapter include:
- For most applications, a battery consisting of two to three sediment toxicity tests should be used. Testing multiple species reduces uncertainty and limits the probability of false positive or false negative results. The importance of testing multiple species increases with the level of ecosystem protection desired and the need to define "significant" contamination in the "grey" zone (marginally contaminated sites).
- At least two test organisms, comprising at least three measured responses (i.e., survival, growth, or reproduction) for a total of three tests, should be used in integrated assessments of sediment contamination. Behavior as a measured response is a fourth possible endpoint that can be considered, but tests incorporating this endpoint are less well developed. Integrative studies should use both water column and benthic species in whole-sediment exposures as resources permit.
- In the ARCS Program, the testing of survival and growth endpoints in the Hyalella azteca exposures (14- to 28-day) was the most efficient approach because each endpoint in this toxicity test produced unique information that was correlated with other toxicity test responses. Additional toxicity tests that ranked highest in their sensitivity, discriminatory power, and ability to produce unique information included the midge Chironomus riparius (14-day, survival and growth), the cladocerans Ceriodaphnia dubia (7-day, survival and reproduction) and Daphnia magna (7-day, survival and reproduction), the fathead minnow Pimephales promelas (7-day, larval survival and growth), the amphipod Diporeia spp. (formerly Pontoporeia hoyi) (5-day, avoidance/preference), and the mayfly Hexagenia bilineata (10-day, survival and molting frequency). The latter two toxicity tests require field collection of test organisms and therefore have a more limited use than the other toxicity tests.
- Sediment preference and avoidance endpoints with Diporeia spp. were the most sensitive endpoints overall. However, this toxicity test is one of the least developed. Because Diporeia spp. is of critical importance in the Great Lakes, this toxicity test should be given high priority for additional methods development and testing.
- The Microtox® test (elutriate pluse) is a useful tool for quickly processing large numbers of samples in reconnaissance surveys based on its ease of use, low cost, sensitivity, discriminatory power, and high correlation with other toxicity test responses.
- Interpretations of toxicity test data with the alga Selenastrum capricornutum were complicated by variable nutrient and inorganic carbon concentrations in the elutriate samples. The algal medium needs to be modified before this test can be used to evaluate toxicity in environmental samples with high nutrients.
- Whole sediment toxicity tests were very sensitive and provided the most realistic exposure system. Exposures using only interstitial (pore) waters may be subject to misinterpretation due to alteration of physical or chemical gradients, which modifies exposure routes. Elutriate tests are more appropriate for evaluation of the effects of suspended sediments (e.g., dredged material evaluations) to assess effects within the water column, but are not appropriate for assessing the in situ toxicity of sediments.
- The duration of the exposure can have an influence on the response of organisms in sediment toxicity tests. For example, exposures of 28 days with Hyalella azteca have identified toxic sediment samples that were not toxic in exposures of 2 to 14 days.
- Further method development is needed on culturing and chronic sediment testing procedures for additional infaunal species with a variety of feeding habits, including suspension and deposit feeders. Results of chronic tests should be used to help correlate the structure and function of benthic communities to the presence of contaminants.
- An integrated sediment assessment evaluation using toxicity testing, measures of benthic community structure, and physicochemical characteristics is necessary for accurate evaluation of the degree of sediment contamination. Identification of cause-and-effect relationships for specific chemical contaminants requires further evaluation through the use of spiked sediment toxicity tests (see Lamberson and Swartz 1992) or Toxicity Identification Evaluation (TIE) procedures (Ankley and Thomas 1992).
Sediment toxicity testing is a relatively new approach used in ecological risk assessments. The first sediment tests were developed because of concerns in the late 1960s and early 1970s over dredged material contamination and its suitability for open-water disposal by the Corps (USEPA-USACOE 1977). There was relatively little testing until the 1980s, with a dramatic increase in the past 5-10 years (Burton 1991). The science has progressed at a relatively fast rate because of the similarities to, and the earlier development of, the water column and effluent toxicity tests. The USEPA is developing approaches for managing contaminated sediments and method standardization that will undoubtedly result in an even greater amount of sediment testing and research in the near future (Southerland et al. 1992; USEPA 1994).
Historically, the assessment of sediment quality was often limited to chemical characterizations. However, quantifying contaminant concentrations alone cannot provide enough information to adequately evaluate the potential adverse effects, interactions among chemicals, or the time-dependent availability of these materials to aquatic organisms. Because relationships between total concentrations of contaminants in sediment and bioavailable concentrations are poorly understood, determination of the effects of contaminated sediment on aquatic organisms requires controlled laboratory toxicity and bioaccumulation tests.
The objective of a sediment toxicity test is to determine whether sediment is potentially harmful to aquatic organisms. Because these tests measure biological responses directly, they account for interactive toxic effects of complex contaminant mixtures in sediment. These tests do not require knowledge of specific pathways of interactions among sediment and test organisms (Kemp and Swartz 1988). Toxicity testing of sediment can be used to 1) determine the relationship between toxic effects and bioavailability, 2) investigate interactions among contaminants, 3) determine the spatial and temporal distribution of toxicity, 4) evaluate hazards of dredged material, 5) rank areas for cleanup, and 6) monitor the effectiveness of remediation and management actions. Toxicity tests on sediments spiked with known concentrations of contaminants can be used to establish cause-and-effect relationships between chemicals and responses, but the behavior of contaminants in spiked sediments cannot necessarily be equated with that in field-contaminated sediments.
Test organisms that have been used to evaluate the toxicity of freshwater sediments include 1) microbial enzyme systems and bacteria, 2) algae, 3) macrophytes, 4) amphipods, 5) midges, 6) mayflies, 7) cladocerans, 8) oligochaetes, and 9) fish (Burton 1991). The choice of the test organism has a major influence on the ecological relevance, success, and interpretation of the test. Furthermore, no one species is best suited for all applications over the wide range of sediment characteristics. ASTM E 1525 and USEPA (1994) outline the following criteria to consider when selecting an organism for sediment testing (see also Table 6-1):
- A toxicity database exists to evaluate the relative sensitivity of the organism
- The organism lives in contact with the sediment
- The organism can be cultured in the laboratory
- The organism can be maintained in the laboratory under test conditions
- Taxonomic identification of the organism presents no problems
- The organism is ecologically important
- The geographical distribution of the organism includes the area of interest
- The organism is tolerant of a wide range of natural sediment physico-chemical conditions
- The organism is tolerant of a wide range of water quality conditions
- Round-robin laboratory studies have been conducted
- The test using that organism has been peer reviewed
- The test using that organism has been field validated.
Various methods have been developed to evaluate sediment toxicity. These procedures range in complexity from short-term lethality tests that measure effects of individual contaminants on single species to long-term tests that determine the effects of chemical mixtures on the structure and function of communities. The sediment phase tested may include whole sediment, suspended sediment, elutriates, or sediment extracts (Lamberson et al. 1992; Burton 1991). Burton (1992b) provided a comprehensive review of sediment toxicity test methods, their advantages and disadvantages, and considerations related to sampling and testing of sediments.
The ARCS Program evaluated 20 single-species and 5 community toxicity tests comprising a total of 55 endpoints (Table 6-2). Species used in the tests included bacteria, algae, macrophytes, rotifers, cladocerans, chironomids, amphipods, mayflies, and fish. Together, these species represent many of the major trophic groups in aquatic ecosystems (Table 6-2). The toxicity tests evaluated have been used successfully in previous studies of sediment contamination.
The specific experimental design of a sediment toxicity assessment depends on the objectives of the study. Therefore, it is essential that the study objectives be sufficiently detailed to adequately guide a sediment toxicity evaluation. In turn, the experimental design determines the success or failure of a testing program. If a study is not designed properly, the best field collection protocols, laboratory methods, and data analysis techniques may not provide an adequate assessment of sediment toxicity. Additional design specifications that are related to the study objectives include the general assessment strategy, the kind of toxicity tests to use, the number of sampling stations, the number of replicates, and the collection of ancillary information.
The strategy for a toxicity evaluation may include a tiered assessment plan. In a tiered approach, a sensitive screening evaluation precedes one or more detailed, definitive evaluations. For example, the definitive evaluations would be conducted only at stations where the screening evaluation has indicated the likelihood of significant sediment contamination. The tiered approach can focus most of the evaluation effort on a subset of high priority stations, thereby reducing the cost of the overall evaluation.
Within a tier, effects within an AOC may be evaluated by a reference area approach or a gradient approach. For the reference area approach, effects are evaluated by statistically comparing the toxicity results for test sediments from an AOC with those from a reference area or to a control sediment. In a gradient approach, three or more stations are located along a suspected gradient of contamination, such as at increasing distances from a discharge point. Data analysis for the gradient approach may include graphical or statistical correlation analysis.
Some key considerations for selecting a toxicity test or battery of tests include the test species, the life stage tested, the test endpoints, the exposure period, and the reliability, ecological relevance, exposure relevance, and availability of the test. These criteria were used to evaluate the toxicity tests examined in the ARCS Program (Table 6-1). Available site-specific data on chemical and physical properties of the sediments can be useful in selecting test species that are sensitive to the presence of the contaminants of concern yet have minimal interferences from other properties of the sediment (e.g., grain size). Knowing what aquatic organisms would be expected to inhabit the study area can aid in selecting appropriate species. Other important information that should be assembled includes regional water quality data, habitat types, and seasonal patterns in biological or physical/chemical characteristics.
If the tests are to be conducted as part of a regulatory program, the selection of sediment toxicity tests should be based on thorough understanding of the applicable regulatory requirements. These factors can include specifications for lethal or sublethal tests, exposure duration, seasons for testing, the battery of species for testing, and DQOs. Guidelines for selecting toxicity tests can also be included as part of regulatory programs. Variables that need to be considered in the experimental design include the number of treatments and replicates, the number and type of control and reference sediments, and water quality characteristics (ASTM 1993). If the purpose of the study is to conduct a reconnaissance field survey to identify toxic stations for further investigation, experimental design might include only one composite sediment sample from each station to allow for maximum spatial coverage. Although composite sampling may be better than collecting one grab sample, compositing over a large area can dilute high contaminant concentrations and may produce false negatives. In a reconnaissance survey, the lack of replication usually limits statistical comparisons, but these surveys can be used to identify toxic stations for further study or can be used in correlation analyses.
The number of replicates per station should be based on the need for sensitivity or statistical power. For example, the purpose of the study might be to conduct a detailed quantitative sediment survey to determine statistically significant differences between effects of several test sediments, control, and reference sediments. In such a survey, replicates (separate samples from different grab samples collected at the same station) would need to be collected at each station. Sediment chemistry and physical characterizations would need to be performed on each of the grab samples. Separate subsamples might be used to determine within-sample variability (precision) or for comparisons of test procedures (e.g., comparative sensitivity among test species), but these subsamples cannot be considered to be true replicates for statistical comparisons among stations (ASTM 1993; USEPA 1994).
The application and interpretation of sediment toxicity tests can be limited by the presence of substances or conditions other than elevated concentrations of contaminants of concern (e.g., skewed sediment grain size distributions) that vary naturally and thereby interfere with the toxicity results. Information that may assist in the interpretation of the toxicity test results and in the selection of reference areas include analyses of sediment conventional variables (e.g., organic carbon and grain size composition), sediment chemical concentrations, and in situ biological effects.
Laboratory sediment toxicity tests generally include the use of control and reference sediment samples. A control sediment is a sediment that is essentially free of contamination and is used routinely to assess the acceptability of a test, although control sediment is not necessarily collected near the site of concern (USEPA-USACOE 1991). Any contaminants in control sediment may originate from the global spread of chemicals from both natural and synthetic sources and do not reflect any substantial input from local point or non-point sources. In addition, a control sediment may consist of formulated components, such as clay, sand, and organic matter (USEPA 1994). A control sediment provides a measure of test acceptability, evidence of test organism health, and is one basis for interpreting data obtained from the test sediments. In contrast, a reference sediment is collected near a study site and is used to assess sediment conditions exclusive of the contaminant material(s) of interest (USEPA-USACOE 1991). Testing a reference sediment provides a site-specific basis for evaluating toxicity. Selection of a reference material is not trivial. If the physico-chemical characteristics of the test sediment exceed the tolerance range of the test organism, a reference or control sediment encompassing these characteristics should be evaluated (DeWitt et al. 1988) or another test organism should be chosen. Selection of an inappropriate reference sediment, which may result in a reduction in the ability to statistically determine the effects of the test sediments, can be a problem in the assessment of highly contaminated sites.
Sediments are a semi-solid media comprised of minerals, organic material, interstitial water, and a myriad of physico-chemical and biological components. The ASTM Standard E 1391-90 (ASTM 1991) provides guidance on methods for collection, storage, and manipulation of sediments for toxicity testing. The following paragraphs summarize methods outlined in this ASTM guide.
Sediments cannot be collected in the field, transported to the laboratory, stored, and then tested for toxicity without some alteration to their original structure. Some methods of sample collection and testing are more disruptive than others. For example, use of a sediment grab sampler (e.g., Ponar, Ekman, van Veen, Shipek, Peterson) is more disruptive than a sediment core sampler. A standard core sampler is more disruptive than a box core sampler.
The advantages and disadvantages of elutriate, interstitial-water (pore-water), and whole-sediment toxicity tests are listed in Table 6-3. Toxicity tests of sediment interstitial water were developed for evaluating the potential in situ effects of contaminated sediment on aquatic organisms (Ankley et al. 1991). For many benthic invertebrates, the toxicity and bioaccumulation of sediment-associated contaminants such as metals and nonionic organic contaminants have been correlated with concentrations of these chemicals in interstitial water (Di Toro et al. 1991; USEPA 1994). Interstitial water may be an important route of exposure for many infaunal benthic invertebrates in contaminated sediments. However, interstitial water may not be the relevant route of exposure for evaluations of organisms that ingest sediment.
Testing of the elutriate (water-extractable) fraction of the sediment is a commonly used technique. The elutriate test was developed for evaluating the potential short-term effects (hours or days) of open-water disposal of dredged material. Tests with elutriate samples measure the potential effects of the release of water-soluble constituents from sediment to the water column during the disposal of dredged material. Advantages of testing elutriates are similar to those for interstitial water because the test method is similar to water column testing and is easy to perform. Elutriate samples are generally less toxic than either whole-sediment or interstitial water samples (Sasson-Brickson and Burton 1991; Ankley et al. 1991).
Whole-sediment toxicity tests are most appropriate for organisms that live directly in or on the sediments and ingest sediment particles. Use of whole sediments for toxicity tests also requires less manipulation of the original sample and preparation of special sample phases for testing. Whole-sediment toxicity tests with field-collected sediments are of limited use for establishing cause-and-effect relationships, although spiking of clean sediments with individual chemicals can be useful for this purpose.
Manipulation or storage of whole-sediment samples can alter the bioavailability of contaminants in sediment; however, the alterations that occur may not substantially affect toxicity. Storage of field-collected sediment samples for several months at 4deg C did not result in significant changes in chemistry or toxicity (Ankley 1994; pers. comm.); however, others have demonstrated changes in spiked sediment within days to weeks (e.g., Burton 1991; Stemmer et al. 1990). Sediments contaminated primarily with nonionic, semivolatile organic compounds will probably change little during storage at 4deg C because of their relative resistance to biodegradation and sorption to solids. However, metals and metalloids may be affected by changing redox, oxidation, or microbial metabolism (such as with arsenic, selenium, mercury, lead, and tin; all of which are methylated by various bacteria and fungi). Metal-contaminated sediments may need to be tested relatively soon after collection with as little manipulation as possible.
Given that the contaminants of concern and the influencing sediment characteristics are not always known a priori, it is desirable to hold sediments in the dark at 4deg C and start toxicity tests soon after collection from the field. Recommended sediment holding time ranges from less than two (ASTM 1993) to less than 8 weeks (USEPA-USACOE 1993). If whole-sediment toxicity tests are started more than 2 weeks after collection, it is desirable to conduct additional characterizations of sediment to evaluate possible effects of storage on sediment. For example, concentrations of contaminants of concern could be measured in pore water (extracted from a subsample of the sediment separate from that used in the toxicity test) within 2 weeks of sediment collection and in pore water from a second subsample (again separate from that used in the toxicity test) at the start of the test (Kemble et al. 1993). Ingersoll et al. (1993) recommend conducting a toxicity test with pore water within 2 weeks of sediment collection. Freezing and longer term storage might further change sediment properties such as grain size or partitioning and should be avoided (ASTM 1990; Schuytema et al. 1989; Day et al. 1994). Sediment should be stored with no air over the sealed samples (no head space) at 4deg C before the start of a test (Shuba et al. 1978; ASTM 1990). Sediment may be stored in containers constructed of suitable materials, as outlined in Chapter 3.
Characterization of sediment should include factors known to control the availability of contaminants in sediment because bulk chemical concentrations alone cannot be used to evaluate bioavailability (Di Toro et al. 1991). These measures should include sediment organic carbon, ammonia, percent water, and grain size (e.g., percent sand, silt, and clay). Depending on the experimental design, other analyses might include inorganic carbon, AVS, biochemical/sediment oxygen demand, chemical oxygen demand, dissolved organic carbon, pH, cation exchange capacity, oxidation-reduction potential, total volatile solids, metals, organosilicates, synthetic organic compounds, oil and grease, petroleum hydrocarbons, and chemical analysis of interstitial water (ASTM 1993). These characteristics should be measured in split samples related to those used for toxicity testing. For additional guidance on chemical analyses, see Chapter 5.
Currently, there are ASTM standards for several of the test species used in the ARCS Program (ASTM 1993). In addition, the USEPA is in the process of standardizing toxicity test methods for Hyalella azteca and Chironomus tentans, and the bioaccumulation assay using Lumbriculus variegatus (USEPA 1994). The Corps and the USEPA are also developing guidance for conducting dredged material evaluations USEPA-USACOE (1993). These standard test procedures may vary slightly from those used in the ARCS Program. The most appropriate methods for meeting a specific program's objectives should be selected before starting any field sampling.
Water for culturing organisms and testing should be acceptable to the test organisms and uniform in quality. Acceptable water quality allows satisfactory survival, growth, and behavior of test organisms. Natural overlying water should be uncontaminated and of constant quality as specified by ASTM (1993). For certain applications, the experimental design might require water from the same site as the sediment.
The day before the test starts, sediment is generally mixed in the storage container and a subsample of the whole sediment is added to each test chamber. Sediment depth in the test chambers is dependent on experimental design and the test organism. Overlying water is then gently poured along the side of the test chambers to minimize the resuspension of sediment. Gentle aeration is started and the test chambers are left to equilibrate overnight in a water bath (ASTM 1993).
The pH, alkalinity, hardness, dissolved oxygen, conductivity, and ammonia of the overlying water samples should be measured at the beginning, end, and at least weekly during the test in each sediment treatment. Toxicity tests are typically conducted at 23deg C (USEPA 1994). If the study objectives warrant monitoring changes in interstitial water or whole sediment during the test, separate test chambers should be set up and destructively sampled during the exposure (ASTM 1993).
In static tests, the volume of overlying water sampled for water quality determinations should be minimized and replaced with fresh overlying water. In static tests, the overlying water may have to be aerated throughout the exposure period. Evaporated water should be replaced at least weekly with deionized water.
In water-renewal tests with additions of one to four volumes of overlying water per day, water quality characteristics generally remain similar to the inflowing water (Ingersoll and Nelson 1990; Ankley et al. 1993). In static tests, however, water quality may change profoundly during the exposure (Ingersoll and Nelson 1990). Although contaminant concentrations are reduced in the overlying water in water-renewal tests, organisms in direct contact with sediment generally receive a substantial proportion of a contaminant dose directly from either the whole sediment or from the interstitial water.
Test animals should be handled as little as possible and should be introduced into the overlying water below the air-water interface. During the test, all chambers should be checked daily and observations should be made to assess test organism behavior such as sediment avoidance or reproductive behavior. Monitoring the behavior of burrowing test organisms is difficult because the animals are not normally visible during the exposure. At the end of an exposure, test organisms are typically removed from the chambers by wet-sieving the sediment.
General QA/QC considerations for sediment assessment programs are discussed in Chapter 2. QA/QC considerations for sediment toxicity tests are discussed in this section.
Before a toxicity test is conducted in a new facility, "non-contaminant" tests should be conducted in which all test chambers contain a control sediment and overlying water. This information is used to demonstrate that the facility, control sediment, water, and handling procedures provide acceptable species-specific responses. The within- and between-replicate variance should be determined and the statistical precision of the test should also be evaluated in relation to sample size (ASTM 1993). Performance-based criteria have been recommended for use in judging the quality of the culture and the test (USEPA 1994). For example, different culturing procedures would be acceptable if consistent organisms are produced for testing. Performance could be evaluated using criteria such as control survival and growth, and reference toxicant control charts.
It is the responsibility of a laboratory to demonstrate its ability to obtain precise results with reference toxicants before it performs toxicity tests. Intralaboratory precision, expressed as a coefficient of variation, of the range for each type of test to be used in a laboratory should be determined by performing five or more tests with different batches of test organisms, using the same reference toxicant, at the same concentrations, with the same test conditions (e.g., the same test duration, type of water, age of test organisms, feeding), and same data analysis methods. A reference toxicant concentration series (0.5 or higher) should be selected that will consistently provide partial mortalities at two or more concentrations of the test chemical (USEPA 1994).
Before conducting toxicity tests with contaminated sediment, the laboratory should demonstrate its ability to conduct tests by conducting five exposures in control sediment. It is recommended that these five exposures with control sediment be conducted concurrently with the five reference toxicity tests (USEPA 1994).
The quality of test organisms obtained from an outside source must be verified by conducting a reference toxicity test concurrently with the sediment test. The supplier should provide data with the shipment describing the history of the sensitivity of organisms from the same source culture. If the supplier has not conducted five reference toxicity tests with the test organism, it is the responsibility of the testing laboratory to conduct five reference toxicity tests before starting a sediment test (USEPA 1994).
It is desirable to conduct reference toxicant toxicity tests in conjunction with sediment tests to evaluate the condition of the test species (Lee 1980). Deviations outside an established normal range (e.g., +/-2 standard deviations) may indicate a change in the condition of the test organism population or a change in laboratory procedures. Results of reference toxicant tests also enable inter-laboratory comparisons of test responses. Reference toxicant tests are most often acute lethality tests performed in the absence of sediment (USEPA-USACOE 1991). Sediment spiked with a reference toxicant might also be included as a positive control for the sediment toxicity test. Many chemicals have been used as reference toxicants, including sodium chloride, potassium chloride, cadmium, copper, chromium, sodium lauryl sulfate, and phenol. No one reference toxicant can be used to measure the condition of test organisms with respect to another toxicant with a different mode of action. However, it is unrealistic to routinely test more than one reference toxicant.
The data analysis approach should be developed in conjunction with the study design specifications. Data analysis methods can then be tailored to the objectives and the level of detail in the assessment.
When developing a statistical approach, the first decision is whether to use parametric or nonparametric statistical methods. Typically, it is desirable to use parametric methods because they generally are more powerful than nonparametric methods in detecting significant differences. However, the assumptions that must be met by the data are generally stricter for parametric tests. Therefore, it is important that those assumptions be evaluated for each data set. If one or more parametric assumptions are not met, the data can be transformed and the assumptions can then be reevaluated for the transformed data. If the data still do not satisfy the assumptions, nonparametric methods should generally be used to evaluate the untransformed data.
The kind of statistical test to be used is usually determined by the study objectives. If the objective is to compare the toxicity results between test sites within an AOC or between each test site and a reference area, analysis of variance (ANOVA) can be used to conduct the evaluation. If the objective is to evaluate whether a gradient of toxicity exists with distance from a potential problem area, a correlation analysis or multivariate analysis approach can be used. For details of potential statistical approaches, refer to Gilbert (1987), Green (1979), and USEPA (1994).
USEPA (1994) provides the following guidance on statistical analysis of toxicity test data:
As the minimum difference between treatments that the test is required or designed to detect decreases, the number of replicates required to meet a given significance level and power increases. Because no consensus currently exists on what constitutes a biologically acceptable difference, the appropriate statistical minimum significant difference should be a DQO established by the individual user based on their data requirements, the logistics and economics of test design, and the ultimate use of the data.
Three replicates per treatment or control are the absolute minimum number of replicates for a sediment toxicity test. Eight replicates are recommended for each control or experimental treatment. It is always prudent to include as many replicates in the test design as economically and logistically possible.
Statistical tests of hypotheses can be designed to control for the chances of making incorrect decisions. Alpha represents the probability of making a Type I statistical error. A Type I statistical error in this testing situation results from the false conclusion that the treated sample is toxic or contains chemical residues not found in the control or reference sample. Beta (ß) represents the probability of making a Type II statistical error, or the likelihood that one erroneously concludes there are no differences among the mean responses in the treatment, control, or reference samples. Traditionally, acceptable values for a have ranged from 0.1 to 0.01, with 0.05 (or 5 percent) used most commonly. This choice should depend upon the consequences of making a Type I error. Historically, having chosen aplha, environmental researchers have ignored ß and the associated power of the test (1-ß).
The consequences of a Type II statistical error in environmental studies should never be ignored and may in fact be the most important criteria to consider in experimental designs and data analyses which include statistical hypothesis testing. The critical components of the experimental design associated with the test of the hypothesis are 1) the required minimum detectable difference between the treatment and control or reference responses, 2) the variance among treatment and control replicate experimental units, 3) the number of replicate units for the treatment and control samples, 4) the number of animals exposed within a replicate exposure chamber, and 5) the selected probabilities of Type I (alpha) and Type II (ß) errors.
In the ARCS Program, sediment toxicity tests were conducted with species or biotic communities representative of the major trophic levels in freshwater aquatic ecosystems (Table 6-2) in order to evaluate toxic effects of the sediments. Secondary objectives of the toxicity testing conducted as part of the ARCS Program were to:
- Evaluate the relative sensitivities of the various toxicity tests to sediment contaminants
- Evaluate the abilities of the various toxicity tests to discriminate between different degrees of sediment contamination
- Evaluate the degree of correlation between responses of the various toxicity tests and their redundancy
- Recommend toxicity tests for use in future studies of sediment contamination in the Great Lakes.
By conducting all laboratory toxicity tests on split sediment samples that were collected and processed in the same manner and by generally initiating testing within a 2-week period, the results of the various toxicity tests should be directly comparable.
The toxicity test methods are briefly described below. For a detailed description of the toxicity test methods used, see Burton (1994) and Ingersoll et al. (1993). Sediment samples for toxicity testing were collected from a number of stations in three of the priority AOCs: Buffalo River, New York (Figure 1-1); Indiana Harbor, Illinois (Figure 1-2); and Saginaw River, Michigan (Figure 1-3) (two separate sampling surveys in the Saginaw River AOC).
Toxicity tests were conducted with 1) fathead minnows (Pimephales promelas, whole sediment), 2) cladocerans (Daphnia magna and Ceriodaphnia dubia, elutriates or whole sediment), 3) amphipods (Hyalella azteca and Diporeia spp. [formerly Pontoporeia hoyi], whole sediment), 4) midges (Chironomus riparius and Chironomus tentans, whole sediment), 5) mayflies (Hexagenia bilineata, elutriates and whole sediment), 6) duckweed (Lemna minor, whole sediment), 7) macrophytes (Hydrilla verticillata, whole sediment), 8) rotifers (Brachionus calciflorus, elutriates), 9) microbial enzymes (whole sediment, elutriates) and Microtox® (elutriates), and 10) algae (Selenastrum capricornutum, elutriates). In situ colonization of artificial substrates by benthic invertebrates was also evaluated at each AOC (see also Chapter 7).
Ideally, toxicity tests with liquid-phase exposures should be conducted with interstitial water. Toxicity tests with interstitial water are preferable to tests with elutriates for evaluating the potential in situ effects of contaminated sediment on aquatic organisms (Ankley et al. 1991). Elutriate tests are most appropriately used in the evaluation of dredged material. However, because of the large water volumes required for conducting this test battery and the difficulty of collecting sufficient undisturbed interstitial water, the decision was made to test elutriates instead of interstitial water. Elutriate samples are generally less toxic than either whole-sediment or interstitial-water samples (Sasson-Brickson and Burton 1991; Ankley et al. 1991). The various advantages and disadvantages of each test phase are listed in Table 6-3.
Sediment toxicity tests were conducted with macrobenthic organisms in static or water-renewal systems at temperatures of 20 to 25deg C. Sediments were placed in the test chambers and overlying laboratory water was gently added. Test organisms were randomly added within 24 hours and the test was started. Numbers of replicates ranged from 3 to 10 depending on the toxicity test. Exposure water was moderately hard (hardness 134 mg/L as CaCO3; alkalinity 60 to 65 mg/L as CaCO3; pH 7.8 to 8.0; conductivity 300 umhos/cm; sulfate 72 mg/L). Dissolved oxygen, temperature, alkalinity, pH, conductivity, and hardness were measured in the surface water either daily or at the start and end of the test, depending on the parameter. See Burton (1994) and Ingersoll et al. (1993) for further details on toxicity test protocols using macrobenthic organisms.
Several indigenous microbial enzyme systems have been used to measure cycling of key elements and degradation of organic matter (Griffiths et al. 1982). The usefulness of microbial tests in evaluations of contaminant effects is well established (Stotzky 1980; Babich and Stotzky 1983). Shifts in hydrolase activity (e.g., protease, amylase) can be construed as resulting from chemical exposure (Griffiths and Morita 1981).
The Microtox® test measures luminescence of the marine bacterium Photobacterium phosphoreum. Inhibition of this luminescence is considered a toxic response because it results from disruption of cellular energy transfer. Results of Microtox® tests have been compared to those of standard toxicity tests with rainbow trout (Oncorhynchus mykiss), fathead minnow (Pimephales promelas), bluegill (Lepomis macrochirus), sheepshead minnow (Cyprinidon variegatus), and cladoceran (Daphnia magna) for a variety of pure compounds and complex environmental samples. In most cases, Microtox® results showed similar sensitivity to the compounds tested (Bulich et al. 1981; Curtis et al. 1982; Qureshi et al. 1982).
The Selenastrum capricornutum test measures effects on photosynthesis by following cell growth or uptake of radioactively-labeled carbon (as bicarbonate). Inhibition or stimulation of photosynthesis is considered an abnormal response due to toxicant or nutrient presence. Some studies have shown the algal growth test to be more sensitive than other traditionally used surrogate species (DeZwart and Sloof 1983; LeBlanc 1984).
Rooted aquatic vascular plants (e.g., Hydrilla verticillata) occupy a unique niche in aquatic ecosystems. A major contributor to primary productivity in some systems, these plants are in direct contact and dynamic interaction with both the overlying water and the interstitial water of sediment. Thus, rooted aquatic macrophytes can be used to evaluate the entire aquatic system, not just the sediments or the water column.
Hyalella azteca and Diporeia spp. are two amphipods that have been used successfully to evaluate freshwater and estuarine sediments. These organisms play a dominant role in many aquatic ecosystems, assisting with the processing of organic matter (detritus), and represent a primary food source for many benthic-feeding fish species (Pennak 1989). Toxicity tests with H. azteca generally start with immature animals (less than 2 weeks old) and can be conducted for up to 4 weeks through reproductive maturation (ASTM 1993; USEPA 1994). Toxicity tests with Diporeia spp. are initiated with field-collected juveniles and can continue for up to 4 weeks (ASTM 1993, draft Annex #7). Endpoints measured in toxicity tests with amphipods include survival, growth, behavior, or reproductive maturation.
Chironomids (midges) are also important benthic macroinvertebrate species in many aquatic systems. They tend to be the dominant benthic macroinvertebrate taxon in systems where there is an ample supply of organic material associated with fine- to medium-grained sediments. In the past, midges were considered to be relatively insensitive in toxicity assessments (Ingersoll and Nelson 1990). This conclusion was based on the practice of conducting short-term toxicity tests with fourth instar larvae in water-only exposures, a procedure that may underestimate the sensitivity of midges to toxicants. The first and second instar larvae are more sensitive to contaminants than are the third or fourth instar larvae. For example, first instar Chironomus tentans larvae were 6 to 27 times more sensitive than fourth instar larvae to acute copper exposure (Nebeker et al. 1984; Gauss et al. 1985), and first instar Chironomus riparius larvae were 127 times more sensitive than second instar larvae to acute cadmium exposure (Williams et al. 1986). Endpoints typically measured in sediment toxicity tests with C. riparius and C. tentans include growth and survival.
Mayflies (e.g., Hexagenia bilineata) are an important component of fish and waterfowl diets. They are also important as an indicator of overall ecosystem health and provide a critical ecological link in the conversion process of changing organic detritus into a readily available food source for aquatic microbial communities. Sediment toxicity tests with mayflies are generally conducted for up to 10 days (Bahnick et al. 1980; Nebeker et al. 1984). Survival, growth, or molting frequency are the toxicity endpoints measured in the mayfly tests. Unfortunately, few laboratories have been successful at routinely culturing or maintaining these species, and testing often requires use of field-collected organisms.
Cladocerans represent a major group in many zooplankton communities. There is a large database that exists from chemical-specific, effluent, and water quality testing with the cladocerans Daphnia and Ceriodaphnia. Survival, growth, or reproduction are typically measured in the cladoceran tests. Although cladocerans do not live in continuous contact with sediment, they are frequently in contact with the sediment surface and are exposed to both water-soluble contaminants in the overlying water and particulate-bound contaminants at the sediment surface (ASTM 1993). Cladocerans are also one of the more sensitive groups of organisms used in toxicity testing (Mayer and Ellersieck 1986).
Oligochaetes, like chironomids, are often associated with aquatic systems rich in organic matter. They also play a major role in the processing of organic material and as a food source for benthic feeding fish. Most oligochaetes are relatively tolerant of many classes of chemical contaminants; however, this tolerance may be a positive attribute for assessing bioaccumulation or the toxicity of severely contaminated sites (Phipps et al. 1993). Due to their relative insensitivity to chemical contaminant toxicity, they were not included in the ARCS Program. The most frequently described sediment testing methods for oligochaetes are acute toxicity tests (Keilty et al. 1988a), although Wiederholm et al. (1987) described methods for conducting 500-day oligochaete exposures that measure effects of sediment on growth and reproduction. Recently, Reynoldson et al. (1991) and ASTM (1993) described a 28-day test starting with sexually mature Tubifex tubifex. In this shorter test, effects on growth and reproduction are monitored and the duration of the exposure makes the test more useful for routine assessments of sediment toxicity. Phipps et al. (1993) outlined testing methods for Lumbriculus variegatus to assess lethal and sublethal toxicity and bioaccumulation of sediment contaminants in 10- to 28-day exposures.
In addition to the aforementioned toxicity tests, an investigation of the bioaccumulation potential of sediment-associated contaminants was also conducted under the ARCS Program by exposing the fathead minnow Pimephales promelas to contaminated sediments in the laboratory. Sediment samples were collected from three predetermined stations in the Saginaw River, Michigan in June, 1990 and from three predetermined stations in the Buffalo River, New York in August, 1990. The sediment samples were placed in laboratory aquaria with flow-through water systems. The fathead minnows were exposed in these aquaria for 10 days according to the methods of Mueller et al. (1992). Pre-exposure samples of the minnows were analyzed for PCBs, chlorinated pesticides, and metals. After the 10-day exposure, the exposed minnows were also analyzed for the same contaminants. An assessment of bioaccumulation was attempted by comparing the post-exposure contaminant concentrations in the fish with both the pre-exposure contaminant concentrations in the fish and the contaminant concentrations in fish exposed to a clean reference sediment under similar conditions. The results of these bioaccumulation bioassays were varied. While there were indications of significant bioaccumulation of several metals, the assessment of bioaccumulation of PCBs was confounded by apparent contamination of the test organisms before their arrival in the laboratory. Several pesticides detected in the sediments were also found in low concentrations in the tissue samples. In addition, the test sediments did not exhibit the expected high concentrations of the analytes of interest. Although such bioaccumulation bioassays are considered feasible, further research and development work will be required before they can be recommended for routine application. Therefore, these bioaccumulation bioassays are not discussed further in this document.
Other species of organisms have been suggested for possible use in studies of chemical bioaccumulation from aquatic sediments. Several criteria should be considered before a species is adopted for routine use (Ankley et al. 1992a; Call et al. 1993; USEPA 1994). These criteria include 1) availability of organisms throughout the year, 2) known chemical exposure history, 3) adequate tissue mass for chemical analyses, 4) ease of handling, 5) tolerance of a wide range of sediment physico-chemical characteristics (e.g., particle size), 6) low sensitivity to contaminants associated with sediment (e.g., metals, organics), 7) amenability to long-term exposures without adding food, and 8) ability to accurately reflect concentrations of contaminants in field-exposed organisms (e.g., exposure is realistic). With these criteria in mind, the advantages and disadvantages of several potential freshwater taxa for bioaccumulation testing are discussed below. See USEPA (1994) for additional detail.
Freshwater fingernail clams provide an adequate tissue mass, are easily handled, and can be used in long-term exposures. However, few freshwater clam species are available for testing. Exposure of clams is uncertain because of valve closure. Chironomids can be readily cultured, are easy to handle, and reflect appropriate routes of exposure. However, their rapid life-cycle makes it difficult to perform long-term exposures with hydrophobic compounds that equilibrate slowly between sediment, pore water, and tissue. Further, chironomids are capable of biotransforming PAHs (Leversee et al. 1982). Larval mayflies reflect appropriate routes of exposure, have adequate tissue mass for residue analysis, and can be used in long-term tests. However, mayflies cannot be continuously cultured in the laboratory and consequently are not always available for testing. Furthermore, the background concentrations of contaminants and the health of field-collected individuals may be uncertain. Amphipods (e.g., Hyalella azteca) can be cultured in the laboratory, are easy to handle, and reflect appropriate routes of exposure. However, their size may be insufficient for residue analysis, and H. azteca are sensitive to contaminants in sediment. Fish (e.g., fathead minnows) provide an adequate tissue mass, are readily available, are easy to handle, and can be used in long-term exposures. However, the routes of exposure are not appropriate for evaluating the bioavailability of sediment-associated contaminants to benthic organisms.
Oligochaetes are infaunal benthic organisms that meet many of the test criteria listed above. Certain oligochaete species are easily handled and cultured, provide reasonable biomass for residue analyses, and are tolerant of varying sediment physical and chemical characteristics. Oligochaetes are exposed to contaminants via all appropriate routes of exposure, including pore water and ingestion of sediment particles. Oligochaetes do not need to be fed during long-term bioaccumulation exposures (Phipps et al. 1993). Various oligochaete species have been used in toxicity and bioaccumulation evaluations (Chapman et al. 1982a,b; Wiederholm et al. 1987; Keilty et al. 1988a,b; Mac et al. 1990; Phipps et al. 1993), and field populations have been used as indicators of pollution of aquatic sediments (Brinkhurst 1980; Spencer 1980; Oliver 1984; Lauritsen et al. 1985; Robbins et al. 1989; Ankley et al. 1992b; Brunson et al. 1994).
USEPA (1994) describes methods for 28-day bioaccumulation tests with the oligochaete Lumbriculus variegatus. The use of L. variegatus in laboratory bioaccumulation studies has been field validated with natural populations of oligochaetes. Total PCB concentrations in laboratory-exposed L. variegatus were similar to concentrations measured in field-collected oligochaetes from the same sites (Ankley et al. 1992b). PCB homolog patterns also were similar between laboratory-exposed and field-collected oligochaetes. The more highly chlorinated PCBs tended to have greater bioaccumulation in the field-collected organisms. In contrast, total PCBs in laboratory-exposed (Pimephales promelas) and field-collected (Ictalurus melas) fish revealed poor agreement in bioaccumulation relative to sediment concentrations at the same sites (Ankley et al. 1992b). However, laboratory exposures supply PCBs to organisms from test sediments, while field exposures can potentially supply PCBs from sediments, diet, and water. Brunson et al. (1994) also compared bioaccumulation of laboratory-exposed L. variegatus and field-collected oligochaetes from the same sites. Select PAH and DDT peak concentrations were similar in field-collected oligochaetes and L. variegatus exposed for 28 days in the laboratory.
The toxicity test responses were evaluated and compared by several methods, as described below:
- Sensitivity--Sensitivity was evaluated by comparison of the toxicity test responses to the control response (only applicable for laboratory sediment toxicity tests where a control sediment was also evaluated). Test responses were considered to be indicative of effects if they were 20 percent or more above the control response. Test responses indicative of effects were then grouped into two categories, 1) 20-50 percent difference and 2) greater than 50 percent difference from the control. Tests with responses in the first category were judged to be relatively insensitive; tests with responses in the second group were judged to be more sensitive. The numbers of responses within each category were used to rank the relative sensitivity among tests within each of the four surveys. In general, the most sensitive toxicity test endpoints were considered to be those associated with the highest percentage of the stations exhibiting responses of 20 percent or more above the control response. In cases where more than one toxicity test endpoint exhibited the same percentage of stations with responses of 20 percent or more above the control response, the toxicity test endpoint with a higher percentage of responses in the more sensitive group (i.e., those exhibiting responses of 50 percent or more above the control) was considered to be more sensitive.
- Discrimination--Discrimination is the ability of the toxicity test to detect differing degrees of toxicity among samples. It is important when defining the spatial extent of contamination to be able to ascertain whether sediment samples vary in toxicity. A nonparametric statistical test (Kruskal-Wallis) was conducted to determine whether the toxicities of the sediment samples from each station within an AOC (e.g., within the Buffalo River AOC) were different from the control. The lower the P value was for the statistical comparisons between stations, the more discriminatory the toxicity test was considered to be. The average P value, the range of P values, and the number of AOC surveys (one to four) for which this discrimination analysis was conducted were all considered in the relative ranking of their toxicity tests by their discriminatory power. It is misleading, in some cases, to only consider the average P value, if it only came from one AOC survey or if highly significant P values (e.g., P = 0.0001) for some station comparisons were offset by very high P values (e.g., P = 0.9) for other station comparisons.
- Redundancy--The degree of similarity between toxicity test responses was evaluated using correlation analyses (both parametric and nonparametric) and by grouping the test responses into patterns through factor analysis. A high degree of correlation or pattern (grouping) similarity implies that the toxicity tests were responding in a similar manner. These analyses were conducted across all AOC surveys to better meet the study objective of determining which toxicity tests were best (in terms of predictive power) for Great Lakes studies. If a group of toxicity tests are producing similar information, then it is less important that each toxicity test be conducted, unless a weight-of-evidence assessment approach is being used. It is, perhaps, of greater importance that a range of toxicity tests be used that respond differently to varying types of sediment contamination (i.e., that show different response patterns and groups). This approach will increase the likelihood that any detrimental effects on the aquatic ecosystem will be detected.
Data analyses included parametric or nonparametric correlation and mean comparison analyses. Correlation analyses, sensitivity analyses, discriminatory analyses, and principal component analysis (PCA) were generated using a Statistical Analysis Systems computer package. Because sediments from each of the stations sampled were not analyzed with all of the toxicity tests, a weight-of-evidence approach was applied to interpret the results and identify trends in test responses. Conclusions from the results of these AOC surveys may change with testing of additional contaminated sites.
Sediment toxicity test raw data and summary statistics are presented in Burton (1994), Nelson et al. (1993), Hall et al. (1993), and Coyle et al. (1993). The data have also been entered into the USEPA's Ocean Data Evaluation System (ODES) database and have received a quality assurance validation from the USEPA (see Chapter 2).
A total of 11 toxicity tests, comprising 43 endpoints, were ranked for sensitivity (Table 6-4). The remainder of the toxicity tests and endpoints were deleted from this ranking either because there were insufficient data or because the controls were not appropriate for the sensitivity calculation used in the ranking process (e.g, microbial enzymes or artificial substrate colonization).
Several benthic test species were very sensitive to sediment contamination. Preference behavior by Diporeia spp. was the most sensitive endpoint, exhibiting responses of 20 percent or more above the control in 90 percent of the samples. Behavior would be expected to be a responsive sublethal measure, but the ecological significance of behavioral responses is difficult to interpret. Diporeia spp. is a clearwater species and may exhibit behavioral responses in the test exposures as a result of factors other than sediment contaminants. Although Hexagenia bilineata test endpoints were among the most sensitive responses, the small data set for this species precluded use of the results in the final relative ranking. Sediment samples were also stored for prolonged periods (up to 6 months) before the H. bilineata tests were started.
The discriminatory ability of a toxicity test measures how well the response detects varying levels of sediment toxicity. This ability was evaluated using levels of statistical significance, or P values. The smaller the P value, the greater the capacity to detect statistical differences between samples/stations. A total of 53 endpoints were ranked for their discriminatory ability (Table 6-5). Some toxicity test data were not available or could not be analyzed by this procedure, so discriminatory ability was not determined for all endpoints for all four AOC surveys.
The photosynthetic and indigenous microbial endpoints would be expected to be good discriminators because they can exhibit both inhibitory and stimulatory responses, giving them a wider range of response than just 0 to 100 percent, as with conventional toxicity test responses. Indeed, the Selenastrum capricornutum growth at 48 h (average P value of 0.0213) and at 96 h (average P value of 0.0150) were among the best discriminatory toxicity tests for the four AOC surveys. However, of the other photosynthetic endpoints, only Lemna minor chlorophyll a production showed significant differences for three AOC surveys. Lemna minor frond number and biomass showed significant differences for only one AOC survey and Hydrilla verticillata endpoints did not detect any significant differences.
The indigenous microbial endpoints were better discriminators than these latter two photosynthetic surrogate endpoints, with significant differences observed for two or three of the AOC surveys. These endpoints ranked from high to low discriminatory ability, in order, as: dehydrogenase, glucosidase, galactosidase, and alkaline phosphatase.
Several of the benthic macroinvertebrate community indices, sampled using the artificial substrates, were good discriminators. The top two listed in Table 6-5 (hydra numbers and macroinvertebrate biomass) cannot be reliably evaluated because they were only analyzed or determined for one AOC survey. The Family Biotic Index, however, was highly discriminatory (P = 0.0291 to 0.0319) for all three AOC surveys where it was evaluated. The second best discriminator in this group of endpoints was percent flatworm composition, showing significant differences for two of the three AOC surveys where it was evaluated. Two other endpoints showing this level of discrimination, but with slightly lower P values, were percent contributing dominant family and oligochaete number.
Among the other toxicity tests evaluated, several benthic species endpoints were good discriminators. Survival of Hyalella azteca, Chironomus riparius, and Diporeia spp. did not rank high in discriminatory ability for any of the four AOC surveys. However, chronic endpoints of length and sexual maturation were highly discriminatory for a minimum of one AOC survey. The C. riparius length (average P value of 0.0116) and H. azteca 28-day length (average P value of 0.0298) were significant for all three AOC surveys where they were evaluated. The most discriminatory nonbenthic invertebrate endpoints were ranked as follows: Brachionus sp. survival, Ceriodaphnia dubia reproduction (elutriate), and Pimephales promelas larval weight, each showing significant differences for all four AOC surveys. Although the Brachionus sp. test showed significant discrimination for all four AOC surveys, the data are questionable, for comparison purposes, due to storage of sediment for 12 months before testing. Five endpoints had significant P values for three of the four AOC surveys, including Selenastrum capricornutum 14C-uptake, Daphnia magna reproduction, P. promelas embryo-larval terata, C. dubia reproduction (whole sediment), and C. riparius survival. Some other endpoints (e.g., C. dubia survival [whole sediment], P. promelas embryo larval length, C. dubia reproduction [whole sediment] and survival [elutriate], and D. magna survival [whole sediment]) showed highly significant P values for two of the four AOC surveys, but had high P values for the other AOC surveys.
In summary, there were several toxicity test endpoints that proved to be highly discriminatory of degrees of sediment toxicity. This is a critically important trait for toxicity tests when attempting to define the spatial extent of site contamination. The nonbenthic toxicity tests tended to be more discriminatory than the benthic toxicity tests, and therefore should be included in any test battery.
The rankings developed for sensitivity and discriminatory ability were combined to provide a comprehensive rank over all four AOC surveys (Table 6-6). It is evident in this table that there is a wide range in ranks for each characteristic, ranging from ranks of 1 to 25 for the toxicity tests with the top 10 combined ranks. The Daphnia magna 7-day reproduction test ranked first, while the Pimephales promelas 7-day larval weight test was second. All of the top five combined rank test endpoints were nonbenthic, tending to have more discriminatory ability than the benthic test endpoints. The Daphnia magna (7-day reproduction) test had ranks of 5 for both sensitivity and discriminatory ability. The Microtox® (45 percent, 5 minute and 15 minute) tests were the next most consistent tests between the two characteristics of sensitivity and discriminatory ability, ranking 8 or 10 for each characteristic. The high combined ranking of Microtox® at 3 and 4, and the high degree of correlation with other responses (as discussed below), illustrates the usefulness of Microtox® in reconnaissance surveys.
A PCA was conducted to determine if there were meaningful groupings of toxicity tests that could be used to further refine a list of recommended tests. In the PCA, the data undergo a transformation to generate factors that remain independent of each other. The results of the analysis are presented as separate factors, each of which explains one aspect of the variability among test responses. In the ARCS Program, these factors were evaluated to determine if they could be interpreted as different response patterns. The percent contribution of each variable (test response) to each factor is listed in Table 6-7. Test responses for similar endpoints (e.g., growth) that contribute similarly to a factor may represent redundant tests. There can be no missing data for any variable; that is, the number of data points must be equal. There were only 20 endpoints (Table 6-7) that met these data requirements.
The results of the correlation analysis indicated that a large number of endpoints were significantly related. These similarities are also observed in the results of the factor analysis (Table 6-7), which shows several endpoints contributing to Factors 1-3. These findings suggest that responses within each factor are producing similar and redundant information. If a test battery were to be selected that detected each type of toxicity response pattern (Factors 1-4), one toxicity test consisting of two or more endpoints could provide unique information for multiple groupings. For example, the Hyalella azteca 14-day test consisting of survival, length, antenna segment number, and sexual maturation endpoints is representative of three unique response patterns, while only Hexagenia bilineata describes the fourth pattern. Both the Ceriodaphnia dubia and Chironomus riparius tests can be used to explain Factors 1 and 2. Use of these toxicity tests would enable each unique response pattern to be covered with fewer organism types.
Correlating the endpoint responses (both laboratory toxicity tests and community structure analyses) to detect similar response patterns is another useful method to evaluate data redundancy and provide field validation of toxicity tests. All 93 measured endpoints were correlated with each other (Spearman rank correlation) and the top 10 correlations for each toxicity test were further evaluated based on the resulting r2 and P values.
The numbers of significant correlations between endpoint responses varied with the degree of site contamination. Indiana Harbor was the most contaminated (Nelson et al. 1993) and most toxic of the three AOCs surveyed. Indiana Harbor had the highest number of significant (P <= 0.05) correlations. The Buffalo River samples exhibited less contamination and toxicity compared to the other two AOCs and had the fewest significant correlations. The Saginaw River No. 1 survey had a moderate level of toxicity. The response patterns among toxicity tests were similar for sediments collected from the Indiana Harbor and Saginaw River No. 1 surveys. There were only three samples collected in the Saginaw River No. 1 survey, and therefore correlations would be similar, particularly because one sample (Station No. 6) was very toxic. There was little toxicity observed in the Saginaw River No. 3 sediment samples, and consequently there were fewer significant correlations in that survey.
Seventy-two percent of the endpoints had more than 10 significant correlations and 77 percent had endpoint correlations with r2 greater than 0.80. Endpoints with the fewest significant correlations included Hydrilla verticillata root and shoot length (no significant correlations), Lemna minor biomass and benthic taxa richness (2 correlations each), percent flatworms and microbial galactosidase activity (3 correlations), Daphnia magna 7- day survival (4 correlations), and L. minor chlorophyll a (5 correlations).
The endpoints with the highest average correlation (r2 value) were (in rank order) Microtox®, Chironomus tentans length, and percent chironomids and percent tolerant species in the artificial substrate samples (Table 6-8). The high number of significant correlations between laboratory toxicity test endpoints and some artificial substrate benthic macroinvertebrate endpoints (e.g., percent chironomids and percent tolerant species) provides a high degree of field validation for the laboratory tests.
When assessing sediment toxicity, it is important to consider effects on both benthic and nonbenthic species, because there may be interactions between the sediment and the overlying water and between benthic and nonbenthic species. Of the nonbenthic species, the Pimephales promelas and cladoceran toxicity tests are the most commonly used in sediment testing. Fish and cladocerans feed on the sediment surface during whole sediment exposures, which increases their exposure. When toxicity response patterns were compared between benthic and nonbenthic species, there were many significant correlations (Table 6-9). The 7-day toxicity tests with Ceriodaphnia dubia, Daphnia magna, and Pimephales promelas larval growth were significantly correlated with 10 to 70 percent of the benthic responses. The various endpoint responses of Hyalella azteca were significantly correlated with up to 80 percent of the nonbenthic endpoint responses. Chironomus tentans and Chironomus riparius endpoint responses were significantly correlated with greater than 60 and 70 percent of the nonbenthic endpoint responses, respectively. The indigenous sediment microbial enzyme activities were significantly correlated with up to 70 percent of the nonbenthic endpoint responses.
Ideally, a sediment toxicity test should be rapid, simple, and inexpensive if the objective of the study is to screen a large number of samples. Acute lethality tests are useful in identifying "hot spots" of sediment contamination, but these tests cannot be used to evaluate moderately contaminated areas where only chronic effects may occur. Concentrations of contaminants in sediments may not be lethal, but may interfere with the ability of an animal to develop, grow, or reproduce. A better understanding of the sublethal effects of chemicals in sediment is needed to identify areas with moderate contamination and evaluate chemicals that do not elicit acutely lethal responses.
Many benthic organisms continuously inhabit sediment. Extrapolations from a 10-day lethality test conducted in the laboratory to a lifetime of exposure in the field may underestimate effects from long-term exposures to benthic organisms. Desorption of contaminants from sediment into interstitial water may be kinetically limited. Therefore, long-term exposures should be used to better evaluate moderate levels of contamination where subtle effects are more difficult to discern.
Estimates of sublethal effects of contaminated sediment are typically based on exposures of 10 days or less with midges, amphipods, or cladocerans (e.g., Burton 1991). These partial life-cycle exposures may not always include the most sensitive life stage(s) of the test species. Testing sensitive life stages in longer-term exposures may provide a more subtle measure of chemical toxicity (Breteler et al. 1989; Ingersoll and Nelson 1990; Kemble et al. 1993; Nelson et al. 1993).
Procedures for conducting whole-sediment toxicity tests for up to 29 days with Hyalella azteca have been recently reported (Borgmann and Munawar 1989; Ingersoll and Nelson 1990; Nelson et al. 1993; Kemble et al. 1993). Endpoints monitored at the end of these exposures include survival, growth, or sexual maturation. Supplemental food is typically added to the chambers during exposures, with daily renewal of water overlying the sediment.
The toxicity to Hyalella azteca of sediment contaminated with PAHs and PCBs was evaluated after exposures of 2, 10, and 29 days in static and water-renewal exposures (Ingersoll and Nelson 1990). Survival of amphipods was not reduced after a 2-day exposure, was reduced by about 50 percent after a 10-day exposure, and was reduced by about 70 to 90 percent after a 29-day exposure. Body length of amphipods was only reduced in the 29-day exposure.
The toxicity to Hyalella azteca of contaminated Great Lakes sediment was evaluated after 7-, 14-, or 28-day exposures (Burton 1994; Nelson et al. 1993). Survival and length endpoints were more discriminatory compared to sexual maturation. Effects after 28 days of exposure were often more severe than effects after 7 or 14 days of exposure. For example, only one station in the first survey of the Saginaw River was toxic to amphipods after 14 days of exposure (reduced survival but not length with exposure to sediment from Station SR-6). After 28 days of exposure, Station SR-6 sediment was still the only sample that reduced survival. Sexual maturation did not identify any additional toxic samples. However, length of amphipods was reduced in all of the exposures to Saginaw River sediments after 28 days.
The toxicity of metal-contaminated sediment to Hyalella azteca was evaluated after 28-day exposures (Kemble et al. 1993). Length was a more sensitive endpoint compared to survival or sexual maturation. Only 7 percent of the samples reduced survival and 23 percent of the samples reduced sexual maturation. However, 62 percent of the samples reduced length of the amphipods after 28 days of exposure. Reduction in length of amphipods was correlated to metal concentration in the whole sediment and in the interstitial water. Amphipod length and benthic community evaluations both provided complementary evidence of metal-induced degradation to aquatic communities at study sites in the Milltown Reservoir and Clark Fork River in Montana (Kemble et al. 1993).
In summary, the duration of the exposure can have a profound influence on the response of organisms in sediment toxicity tests. Extended exposures (i.e., 14-28 days) with Hyalella azteca may exhibit toxicity for sediment samples that do not exhibit toxicity in exposures of 2 to 7 days. In addition, assessment of sublethal endpoints such as length may detect subtle effects for sediment samples that do not reduce survival in 14- or 28-day exposures. Additional method development is needed on culturing and chronic sediment testing procedures for other benthic infaunal species with a variety of feeding habits including suspension and deposit feeders. Potential depletion of contaminants or changes in sediment during exposures may be a problem when conducting long-term tests. Effects of natural physico-chemical characteristics of sediment (e.g., grain size) or indigenous animals (e.g., predators) may also be exacerbated in chronic exposures (Reynoldson et al. 1994). Despite these limitations, sublethal responses of benthic organisms need to be evaluated in sediment assessments. Long-term exposures should be used to provide data on growth and reproduction of organisms inhabiting sediment. Results of these chronic exposures can be used to better evaluate the structure and function of benthic communities in moderately contaminated areas.
Several promising test species for which an adequate database exists for use in sediment toxicity testing are listed in Table 6-1 with a subjective ranking of selection criteria for sediment testing. The primary advantages and disadvantages of each of the test species used in the ARCS Program are discussed in this section.
While the Diporeia spp. preference and avoidance endpoints were the most sensitive overall, this toxicity test is one of the least developed (Gossiaux et al. 1993). The survival endpoint for this organism was relatively insensitive (sensitivity ranks from 7 to 14 in the four AOC surveys) and Diporeia spp. must be collected from the field for testing. The ecological significance of behavioral endpoints, such as avoidance/preference, is difficult to evaluate at this time. However, Diporeia spp. is of critical importance in the Great Lakes. This characteristic alone indicates this toxicity test should be given high priority for additional methods development and testing.
Hexagenia bilineata endpoints exhibited relatively sensitive responses for most of the AOC surveys. However, the Kruskal-Wallis test could not be run with this data set. Previous discriminatory analysis using a different procedure (whereby the geometric mean is divided by the arithmetic mean) indicated that the molting endpoint was relatively discriminatory (rank = 5); however, survival was not discriminatory (rank = 21). Surprisingly, the elutriate exposures were, for H. bilineata, more sensitive than the whole-sediment toxicity tests. The sensitivity of H. bilineata exhibited in the ARCS Program may have resulted from the prolonged storage of sediment before testing. The validity of the data comparisons with this toxicity test are compromised due to the different storage periods. The inability to continuously culture mayflies in the laboratory has limited their routine use in sediment testing. Mayflies may also be sensitive to sediment grain size in whole-sediment exposures (ASTM 1993).
The rotifer Brachionus sp. survival toxicity test (Snell and Persoone 1989) had to be conducted after prolonged sediment storage (up to 12 months). As with the Hexagenia bilineata toxicity test, comparison of sediment effects on rotifers to the other toxicity tests is tenuous because of potential toxicity artifacts caused by prolonged sediment storage. The rotifer was insensitive, but was discriminatory in elutriate exposures.
Hyalella azteca responses were highly variable, depending on the length of exposure (7 to 28 days) and the endpoint measured, with sensitivity ranks ranging from 1 to 27 for the four AOC surveys. The advantages of conducting sediment toxicity tests with H. azteca are 1) the animals can be cultured in the laboratory, 2) testing and culturing methods have been standardized, 3) effects on survival, growth, or sexual maturation can be monitored in 7- to 28-day exposures, 4) H. azteca are insensitive to grain size of the sediment (Ankley et al. 1994), 5) H. azteca had a combined rank of 4 for sensitivity and discriminatory ability for 14-day survival, and 6) H. azteca endpoints correlated well with other toxicity test endpoints.
As with H. azteca, the midges Chironomus tentans and Chironomus riparius exhibited a wide range of sensitivity and discriminatory ability over the four AOC surveys, but ranked relatively high overall. Control survival for the midges was typically lower than for the other test species. The advantages of conducting sediment tests with midges are 1) the animals can be cultured in the laboratory, 2) testing and culturing methods have been standardized, and 3) effects on survival and growth can be monitored in 10- to 14-day exposures.
Toxicity tests with the aquatic macrophyte Hydrilla verticillata have been conducted by very few laboratories. Some of the measured endpoints used in this test proved to be sensitive (root length, sensitivity ranks of 1-11 for the four AOC surveys), but the endpoints were not discriminatory. H. verticillata represents a unique level of biological organization and should be considered in future assessments if adequate resources are available for testing. The Lemna minor (duckweed) toxicity test also measures a unique biological level of organization that is of importance to ecosystem functioning. By design, this test cannot be highly sensitive to sediment contaminants because the plants float on the surface of the water. Therefore, the only exposure is to contaminants that are water soluble or associated with suspended colloidal particles.
Hall et al. (1993) reported problems conducting elutriate toxicity tests using the 24-hour, 14C-assimilation with Selenastrum capricornutum. Interpretations of toxicity using S. capricornutum were complicated by variable nutrient and inorganic carbon concentrations in the elutriate samples. All of the elutriate samples tested stimulated carbon assimilation by S. capricornutum in one or more of the dilutions. Attempts to modify the algal medium to provide unlimited nutrients were not successful. An algal medium that supports greater growth potential should be developed in order to evaluate the toxicity of environmental samples with high concentrations of algal nutrients.
The Microtox® test response was relatively sensitive (overall sensitivity rank of 8). Its discriminatory ability was moderate (Table 6-6) and was well correlated with other toxicity test responses (Table 6-8). Other advantages of the Microtox® test are rapid response, small volume requirements, and standardized testing procedures.
The indigenous tests included the benthic macroinvertebrate indices from artificial substrates and the microbial enzyme activities of the sediment samples. These data could not be analyzed for sensitivity with the above data sets because of the lack of controls for comparisons. Several endpoints for these tests proved to be highly discriminatory (Table 6-5). The percent tolerant species and percent chironomid composition indices were highly correlated with toxicity test responses. Both indices represent unique levels of biological organization. Microbial enzyme and benthic colonization tests evaluate indigenous organisms, not surrogate species, and therefore there is reduced uncertainty in data extrapolations. See Chapter 7 for more complete analyses of the benthic macroinvertebrate data.
A wide range of sediment toxicity tests covering multiple levels of biological organization and trophic levels should be used to effectively assess sediment toxicity. Each toxicity test provides information that is unique to that species and the life process measured (e.g., survival, growth). Use of a battery of toxicity tests allows a "weight-of-evidence" assessment approach and yields stronger conclusions because false negatives or false positives from individual tests can be interpreted in light of results of the entire battery. Nevertheless, combinations of tests that provide redundant information should be avoided to be more cost effective and allow greater spatial coverage of a site (i.e., allowing more samples to be tested).
Selection of the appropriate toxicity test(s) depends on the characteristics of the site, the resources available, and the objectives of the study. Criteria for selecting toxicity tests are listed in Table 6-1. Two critical factors to consider are relative abilities at detecting sediment toxicity (i.e., sensitivity) and measuring level of toxicity (i.e., discrimination). Sediment toxicity appeared to correlate with the relative degree of chemical contamination at the ARCS priority AOCs. Further relationships between biological and chemical variables could be developed using detailed analyses (described in Chapter 9) based on the Apparent Effects Threshold (AET), Sediment Quality Triad, TIE procedures, and sediment spiking studies (Ingersoll et al., in prep.). Nevertheless, the present results are based on the most comprehensive study of its kind (7,600 data points). Toxicity tests that were relatively sensitive or discriminatory for three or four of the AOC surveys in the ARCS Program would probably be sensitive or discriminatory at other sites. The toxicity tests recommended here are similar to those recommended in studies by the IJC (1988), Giesy and Hoke (1990), Giesy et al. (1988a, 1989), Kemble et al. (1993), and Burton et al. (1989).
Ecological significance of the measured endpoints is not directly addressed with laboratory toxicity tests alone. The most sensitive toxicity endpoint in the ARCS Program was the avoidance or preference behavior of Diporeia spp., a common amphipod in the Great Lakes. Behavior is often a sensitive indicator of sublethal responses. What is not known; however, is whether the preference of organisms for one sediment over another would alter the population, community, or ecosystem to any degree that constitutes short- or long-term impairment. These issues are best resolved using a "weight-of-evidence" assessment approach in which other toxicity endpoints and community analyses are considered along with chemical and physical characteristics. As discussed above, there were many significant correlations between laboratory toxicity test responses and benthic community structure patterns in the field.
The process of selecting the optimal toxicity test(s) for use in an ecosystem assessment is not simple or straightforward. The optimal toxicity test can only be selected when the objectives of the study and associated DQOs have been defined (see Chapter 2) and there is a reasonable understanding of the physical, chemical, and biological characteristics of the study site. This information must be combined with an understanding of the strengths and weaknesses of the various sediment toxicity tests that are available (Table 6-1).
No one toxicity test is superior to all others. A number of useful toxicity tests have been evaluated in freshwater and marine studies (Burgess and Scott 1992; Burton 1991; Lamberson et al. 1992; Burton and Scott 1992). To reduce uncertainty and reduce the chance of obtaining false positive or false negative results, it is important to test more than one species. The importance of testing multiple species increases with the importance of protecting the ecosystem and the need to define "significant" contamination in the "grey" (marginally contaminated) zone.
For most applications, a battery consisting of two to three toxicity tests should be evaluated. These recommendations are for waters in the United States and are based on the above characteristics and on comparison studies where multiple species have been used simultaneously in sediment contamination investigations (Burton 1991; Burton et al. 1992b; Burton and Scott 1992; Giesy et al. 1988a; Giesy and Hoke 1990; Hoke et al. 1990; Ingersoll et al. 1993; Kemble et al. 1993; Chapman et al. 1992; Long and Buchman 1989).
The choice of the appropriate endpoint (response) to measure is important to the assessment process. All toxicants do not affect the same metabolic processes and result in the same effects because they have differing modes of action and target receptors. Some toxicants may interfere with processes essential for reproduction or growth. Relative species sensitivity frequently varies among contaminants. For example, Reish (1988) reported the relative toxicity of six metals (arsenic, cadmium, chromium, copper, mercury, and zinc) to marine crustaceans, polychaetes, pelecypods, and fishes, and concluded that no one species or group of organisms was the most sensitive to all of the metals. Contaminants may also stimulate a process due to interruption of a feed-back mechanism, or contaminants may be essential nutrients at low concentrations (e.g., selenium). Stimulation at low concentrations of toxicant exposure (hormesis) is often reported in the literature (Stebbing 1982; Burton and Stemmer 1988; Burton et al. 1989). Some responses are much more sensitive than others (e.g., enzyme inhibition vs. lethality), and should not necessarily be weighted equally in evaluating the importance of effects.
The duration of the exposure can have a profound influence on the response of organisms in sediment toxicity tests. Extended exposures of up to 28 days with Hyalella azteca can be used to identify sublethal responses for sediment samples that are not acutely toxic in exposures of 2 to 7 days. Additional method development is needed on culturing and chronic sediment testing procedures for additional infaunal species with a variety of feeding habits, including suspension and deposit feeders. Results of chronic exposures should be used to better evaluate the structure and function of benthic communities in moderately contaminated areas. The USEPA is currently developing standardized acute toxicity test methods for sediments using Hyalella azteca 10-day survival and Chironomus tentans 10-day survival and growth endpoints (USEPA 1994). These methods should become final in 1994 and should be strongly considered for use in any studies of contaminated sediments.
It appears from the ARCS Program data that several measured endpoints would be useful for routine sediment contamination assessments. Results from the statistical analyses indicate two test species (with 4 measured endpoints) could be used to describe the 3 major toxicity response patterns observed at the ARCS AOCs. The endpoints that could be selected vary in their sensitivity, discrimination of toxicity, relationship to other toxicity test responses and benthic community indices, and other advantages and disadvantages (Tables 6-1, 6-6, 6-7, and 6-8). Selection of the appropriate toxicity test depends on the characteristics of the site, the resources available, and the objectives of the study.
The following recommendations for selection of optimal toxicity tests in future assessments of contaminated Great Lakes sediment are based on sensitivity, discrimination, and similarity analyses, and on the advantages and disadvantages of the selection criteria listed in Table 6-1. It is evident that the optimal toxicity tests vary between sites and this variation cannot be confirmed a priori. Factor analysis provides an approach for selection of toxicity tests to be included in a test battery. Species can be chosen with endpoints representing each of the major response pattern groups identified in Table 6-7, to better ensure that the many varied and potentially adverse species responses are being evaluated. Many of the toxicity tests that appeared best in the factor analysis and in the sensitivity and discriminatory analyses have also been demonstrated to be good indicators of sediment toxicity in previous assessments (Burton 1991). The minimal test battery recommended for Great Lakes sediment toxicity studies should consist of 2 species, 4 measurement endpoints, and represent 3 of the 4 major response pattern groups (Table 6-10a). This enables some flexibility in the choice of the test species, which may be based on other decision criteria, such as resource requirements, laboratory expertise or organism availability, need for sensitivity or discriminatory power, or other characteristics (Table 6-11). Some examples of different study objectives that may be important are shown in Tables 6-11 and 6-12, with recommended test species.
Based on the response patterns (Table 6-7), sensitivity, and discriminatory patterns, the following toxicity test combinations are recommended. However, other decision criteria (as discussed above) should also be considered in the selection process. A number of test battery options are outlined in Table 6-10b. One test battery option could consist of two species. The only toxicity test whose endpoints characterized three of the four response patterns was the Hyalella azteca 14-day test, consisting of survival, length, and sexual maturation endpoints. Unfortunately, to measure organism length accurately requires use of digitizing microscope equipment, which is not common in most testing laboratories. It is possible that dry weight could be measured instead of length (USEPA 1994). Furthermore, antenna segment number was a good predictor of organism length (ASTM 1993). In combination with this amphipod, any of five different toxicity tests should be tested, including Ceriodaphnia dubia 7-day survival and reproduction, Chironomus riparius 14-day survival and length, Daphnia magna 7-day survival and reproduction, Pimephales promelas 7-day larval growth, Diporeia spp. 5-day preference, or Hexagenia bilineata 10-day survival and molting test.
Another test battery option could consist of either C. dubia or C. riparius, and either Diporeia spp. or H. bilineata (Table 6-10b).
A third option for a test battery could consist of three species: D. magna, P. promelas, and either Diporeia spp. or H. bilineata (Table 6-10b).
The Microtox® test is superior to the others tested for use in reconnaissance surveys. The ease of operation, cost, correlation with other toxicity tests, and sensitivity and discriminatory ability of the Microtox® test make it a useful tool for quickly processing large numbers of samples.
There is no perfect toxicity test. Each of these toxicity tests has advantages and disadvantages. Many of the toxicity tests that ranked high in the ARCS Program have been used successfully in other studies of sediment toxicity. Evaluations of sediment using laboratory toxicity tests and benthic community structure indices, combined with physico-chemical characterization of the test site, will allow for an integrated "weight-of-evidence" assessment approach that can be used to provide evidence of contaminant-induced degradation to aquatic communities.