EPA Region III Land Cover Data Set VERSION 2 INTRODUCTION The main objective of this project was to generate a generalized and consistent (i.e. seamless) land cover data layer for EPA Region III, which includes the states of Pennsylvania, Maryland, Delaware, Virginia, and West Virginia. This data set was developed by personnel at the EROS Data Center (EDC), Sioux Falls, SD. The project was initiated during the summer of 1995, and a first draft product was completed in February, 1996 (Version 1). The write-up that follows pertains to Version 2, in which a number of errors were fixed, and the "Barren" class was subdivided into several separate classes. We anticipate that additional revisions to the data set may be necessary before the land cover data set is completed. Questions about the data set can be directed to Jim Vogelmann (EDC; email vogel@edcserver1.cr.usgs.gov; telephone 605-594-6062). GENERAL PROCEDURES Data sources: The primary source of data for this project was leaves-on (summer) Landsat TM data, acquired in 1991, 1992 and 1993. These data sets were referenced to Lambert Azimuthal coordinates. Additionally, leaves-off TM data sets were acquired and referenced. While most of the leaves-off data sets were acquired in spring, a few were from late autumn due to the difficulties in acquiring cloud-free TM data. A wider seasonal range of dates, covering a wider span of years, characterize the leaves-off data. In total, 48 TM scenes were analyzed. Data sets used are provided in Table 1. In addition, other intermediate scale spatial data were acquired and utilized. These included 3-arc second Digital Terrain Elevation Dataset (DTED) and derivative DTED products (including slope, aspect and shaded relief), population density data, Defense Meteorological Satellite Program (DMSP) city lights data, LUDA, and National Wetlands Inventory (NWI) data. It is anticipated that Digital Line Graph (DLG) data and National Biological Service Gap Analysis Program (GAP) data will be incorporated at a future date. Methods: The general procedure of this project was to (1) mosaic multiple summer TM scenes and classify them using an unsupervised classification algorithm, (2) interpret and label classes into twelve land cover categories using aerial photographs as reference data, (3) resolve confused classes using the appropriate ancillary data source(s), and (4) incorporate land cover information from leaves-off TM data and NWI data to refine and augment the "basic" classification developed above. The entire region was divided into two halves, which were analyzed separately. This was done in part to keep amounts of analyzed data reasonable, and in part because scenes from the west half of the region were acquired during late summer and early autumn, whereas scenes from the east half of the region were acquired during early summer. It was felt that the mosaicking of early summer and late summer scenes might create difficulties due to phenological differences in the vegetation. For mosaicking purposes, a base scene was selected, and other scenes were normalized to mimic spectral properties of the base scene following histogram equalization using pixels in regions of spatial overlap. Following mosaicking, mosaicked scenes were clustered into 100 spectrally distinct classes using the Cluster algorithm developed by Los Alamos [1]. Clusters were assigned into Anderson level 1 and 2 land cover classes using National High Altitude Photography program (NHAP) aerial photographs as reference information. Almost invariably, individual spectral classes were confused between/among two or more "targeted" land cover classes. Separation of spectral classes into meaningful land cover units was accomplished using ancillary data. Briefly, for a given confused spectral class, digital values of the various ancillary data layers were compared to determine: (1) which data layers were the most effective for splitting the confused class into the appropriate land cover units, and (2) the appropriate thresholds for splitting the classes. Models were then developed using one to several data sets to split each confused class into the desired land cover categories. As an example, a spectral class might be confused between stressed deciduous forest and low density residential areas. In order to split this particular class into more meaningful land cover units, population density and city lights data were assessed to determine if they could be used to split the class into residential and forested categories, and if so, to define the appropriate thresholds to be used in the class splitting model. Following the above class splitting steps, a "first order" classification product was constructed for each of the two halves of the study region. Leaves-off data were then clustered with the goal of discerning certain land cover features not easily discriminated using leaves-on TM data. Classes easily defined using leaves-off data included conifer vegetation and hay/pastures. Both are green in early spring and late autumn, and are readily discernable from each other and almost all other (non-green) land cover categories. This information was then incorporated into the classification product. Land cover classes that were spatially but not spectrally distinct (barren areas, clearcuts) were digitized off the screen and incorporated; wetlands information was derived from NWI data. Resultant classification products from the east and west halves of the region were then mosaicked together. Classes: The resulting classification (Version 2) includes: Class 1: Water Class 2: Low intensity developed Class 3: High intensity developed Class 4: Hay/pasture/grass (areas that "green up" before deciduous tree species leaf out) Class 5: Row crops Class 6: Probable row crops Class 7: Conifer (evergreen) forest Class 8: Mixed forest Class 9: Deciduous forest Class 10 Woody wetlands Class 11 Emergent wetlands Class 12 Barren; Quarry areas (excluding spectrally dark coal mines) Class 13 Barren; Coal mines Class 14 Barren; Beach areas Class 15 Barren; Transitional (including clear cut areas) Current definitions of the classes are as follows; percentages given must be viewed as guidelines. A. Water (all areas of open water, generally with less than 30% cover of vegetation/land cover). Class 1 of EPA Region III land cover data set. B. Developed (areas characterized by high percentage (approximately 50% or greater) of construction materials (e.g. asphalt, concrete, buildings, etc)). Classes 2 and 3 of EPA Region III land cover data set. B1. Low Intensity Developed (approximately 50-80% constructed material; approximately 20-50 % vegetation cover; high percentage of residential development typifies this class). Class 2 of EPA Region III land cover data set. B2. High Intensity Developed (20% or less vegetation, high percentage (80-100%) building materials; typically low percentage of residential development in this class). Class 3 of EPA Region III data set. C. Cultivated (areas that are typically planted, tilled, or harvested. Includes pastures, row crops, and hay). Classes 4, 5 and 6 in the EPA Region III land cover data set. C1. Grasslands (areas characterized by high percentages of grasses and other herbaceous vegetation that is regularly mowed for hay and/or grazed by livestock; predominantly hay fields and pastures, but also currently includes golf courses and city parks...) Class 4 of EPA Region III land cover data set. C2. Row Crops (areas regularly tilled and planted, often on an annual or biennial basis; corn, cotton, sorghum, vegetable crops. . .) Classes 5 of EPA Region III land cover data set is a row crop class; Class 6 is designated as probable row crops. Class 6 will sometimes be confused with other areas, such as grasslands that were not green during times of spring data acquisitions. D. Natural Vegetated areas. (Classes 7, 8, 9, 10 and 11 of EPA Region III data set. D1. Upland Forests (trees covering 40% or greater area). Includes Classes 7, 8 and 9 of EPA Region III data set. D1a. Conifers/Evergreens (of trees present, 70% or higher conifers). Class 7 of EPA Region III data set. D1b. Mixed Forest (both conifers and deciduous tree species present, with neither particularly dominant) Class 8 of EPA Region III land cover data set. D1c. Deciduous Forest (of trees present, 70% or higher deciduous tree species). Class 9 of EPA Region III data set. D2. Wetlands (from NWI) D2a. Woody Wetlands (wetlands with substantial amount of woody vegetation present, either trees or shrubs). Class 10 of EPA Region III land cover data set. D2b. Non-Woody Wetlands (wetlands without a substantial amount of woody vegetation present, usually with substantial amounts of herbaceous vegetation). Class 11 of EPA Region III land cover data set. E. Barren areas (composed of bare rock, sand, gravel, or other earthen material with little (in the order of 20% or less) living vegetation present. Includes quarries (strip mines, sand and gravel operations), beaches, and recent clear cuts.) Classes 12, 13, 14 and 15 of EPA Region III land cover data set. E1. Quarries. Includes all quarry areas (including sand/gravel operations) except some spectrally dark coal areas in northern Pennsylvania. Class 12. E2. Dark Coal areas (region dominated by spectrally dark coal piles and strip mines, mostly located in northern Pennsylvania). Class 13. E3. Beaches. Class 14. E4. Transitional Barren (includes areas likely to change to other land cover categories, such as clear cuts). Class 15 CAVEATS AND CONCERNS While we believe that the approach taken has yielded a very good general land cover classification product for a very large region, it is important to indicate to the user where there might be some potential problems. Some of the problems are relatively easily remedied; these types of problems are currently being worked on, with "fixes" being incorporated as I write this. The biggest concerns are listed in bullet form below: 1) Quantitative accuracy checks have yet to be conducted. We plan to make comparisons with existing data sets in order to develop a general overview regarding the quality of the land cover data set developed. Feedback from users of the data will be greatly appreciated. 2) Some of the leaves-off data sets used for hay/pasture delineations were sub-optimal. In this project, leaves-off data sets were used for discriminating hay and pasture areas. The success of discriminating these areas using leaves-off data sets hinges of the greenness of the grasses during time of data acquisition. When hay/pasture areas are non-green, they are not easily distinguishable from other agricultural areas using remotely sensed data. However, there is a temporal window during which hay and pasture areas green up before most other vegetation (excluding conifers, which have different spectral properties); during this window these areas are easily distinguishable from other crop areas. 3) The data sets used cover a range of years, and changes that have taken place across the landscape over the time period may not have been captured. While this is not viewed as a major problem for most classes, it is possible that some land cover features change more rapidly than might be expected (e.g. hay one year, row crop the next). 4) Some clear-cut areas have spectral properties similar to row crops, depending upon the times of data acquisition. Thus, there could be some confusion in areas where both clear-cuts and row crops occur in close proximity to each other. 5) NWI data were not available for some of the region, most notably western Pennsylvania. Also, the NWI data are relatively old, and wetland changes that have taken place since the NWI data were acquired will not be captured in this EPA Region III land cover data set (unless fixed manually). 6) Throughout this project, we relied heavily on the use of multi-temporal data sets (leaves-on and leaves-off). We did not have both leaves-on and leaves-off data sets for a few relatively small areas (especially along the west portion of the southern edge of Virginia and the Newport News area in southeastern Virginia). Consequently, the quality of the data product in these areas is expected to be somewhat diminished. UPCOMING TASKS 1) Fix obvious misclassifications on a case by case basis, especially in urban areas. 2) Assess NWI data in conjunction with imagery to determine how many changes have taken place, and fix when appropriate. 3) Conduct comparisons with existing data sets. 4) Incorporate GAP information when it becomes available. ACKNOWLEDGMENTS This work was performed by the Hughes STX Corporation under U.S. Geological Survey Contract 1434- 92-C-40004. REFERENCE [1] Kelly, P.M., and White, J.M., 1993, Preprocessing remotely sensed data for efficient analysis and classification, Applications of Artificial Intelligence 1993: Knowledge-Based Systems in Aerospace and Industry, Proceedings of SPIE, 1993, 24-30. Table 1. MRLC Landsat thematic mapper (TM) data sets available to develop EPA Region III data set; asterisks represent data sets used to develop land cover data set. Path/Row Date EOSAT-ID 14/31 05/09/93 5014031009312910* 14/32 03/17/91 5014032009107610 14/32 05/20/91 5014032009114010* 14/32 03/25/89 5014032008908610* 14/33 03/15/91 5014033009107610* 14/33 05/04/91 5014033009112410 14/33 06/10/93 5014033009316110* 14/34 03/15/91 5014034009107610* 14/34 05/04/91 5014034009112410* 14/34 08/10/92 5014034009222310 14/35 06/23/92 5014035009217510* 14/35 10/13/92 5014035009228710* 15/31 03/31/91 5015031008809110* 15/31 06/14/92 5015031009216610* 15/31 10/07/93 5015031009330410 15/32 11/14/90 5015032009032010* 15/32 06/17/93 5015032009316810* 15/32 10/20/92 5015032009229410 15/33 03/16/89 5015033008907710* 15/33 05/08/90 5015033009012810* 15/33 09/16/91 5015033009125910 15/34 04/11/92 5015034009210210* 15/34 06/17/93 5015034009316810* 15/34 10/18/91 5015034009129110 15/35 05/16/93 5015035009313610* 15/35 10/20/92 5015035009229410* 16/31 03/29/90 5016031009109010* 16/31 05/20/92 5016031009214110 16/31 06/24/93 5016031009317510* 16/32 03/29/91 5016032009109010* 16/32 06/24/93 5016032009317510* 16/32 08/24/92 5016032009223710 16/33 03/01/92 5016033009206110* 16/33 04/16/91 5016033009110610 16/33 05/20/92 5016033009214110 16/33 09/28/93 5016033009327110* 16/34 04/16/91 5016034009110610* 16/34 05/20/92 5016034009214110 16/34 09/28/93 5016034009327110* 16/35 03/01/92 5016035009206110* 16/35 09/28/93 5016035009327110 17/31 03/29/88 5017031008808910* 17/31 05/11/92 5017031009213210 17/31 10/02/92 5017031009227610* 17/32 11/12/90 5017032009031810 17/32 03/24/86 5017032008608310* 17/32 05/14/93 5017032009313410 17/32 10/02/92 5017032009227610* 17/33 03/24/86 5017033008608310* 17/33 07/17/93 5017033009319810 17/33 10/02/92 5017033009227610* 17/34 03/24/86 5017034008608310* 17/34 10/02/92 5017034009227610* 17/35 05/11/92 5017035009213210 17/35 11/03/92 5017035009230810* 18/31 04/22/94 5018031009411210* 18/31 08/09/93 5018031009322110 18/32 04/22/94 5018032009411210* 18/32 08/06/92 5018032009221910* 18/33 04/19/87 5018033008710910* 18/33 08/06/92 5018033009221910* 18/34 09/29/94 5018034009427210* 18/34 11/29/93 5018034009333310* 18/35 06/06/93 5018035009315710* 18/35 10/25/92 5018035009229910* 19/35 04/23/92 5019035009211410* 19/35 09/30/92 5019035009227410*