According to the EPA (2009a) model evaluation is defined as:
"The process used to generate information that will determine whether a model and its analytical results are of a sufficient quality to inform a decision."
The process of evaluation is used to address the:
soundness of the underlying science of the model
quality and quantity of available data
degree of correspondence between model output and observed conditions
appropriateness of a model for a given application
Best Modeling Practices for Model Evaluation:
All models (especially regulatory models) should be continually evaluated at all stages within their life-cycle. The Guidance on the Development, Evaluation, and Application of Environmental Models (EPA, 2009a) describes best practices for model evaluation that include the following activities:
Extreme and inefficient approaches to problem solving
Delays in decision making which require model support
Unintended consequences from misinformed decisions
Model evaluation provides the model development team (developers, intended users, and decision makers) with the level of model corroboration; providing an understanding of how consistent the model is with data.
Model evaluation can provide answers to questions like:
Does the model reasonably approximate the system?
Is the model supported by the available data?
Is the model founded with principles of sound science?
Does the model perform the specified task?
What level of uncertainty is attributable to the data or the model?
Is better data needed for future model applications?
Evaluation Techniques
There are many techniques and approaches for model evaluation, as made evident by a review conducted by researchers in the EPA Office of Research and Development. Their efforts (Matott et al., 2009) identified 70 model evaluation tools, classified into seven thematic categories:
Data Analysis: to evaluate or summarize input, response or model output data
Identifiability Analysis: to expose inadequacies in the data or suggest improvements in the model structure
Sensitivity Analysis: to determine which inputs are most significant
Uncertainty Analysis: to quantify output uncertainty by propagating sources of uncertainty through the model
Multi-model Analysis: to evaluate model uncertainty or generate ensemble predictions via consideration of multiple plausible models
Bayesian Networks: to combine prior distributions of uncertainty with general knowledge and site-specific data to yield an updated (posterior) set of distributions
Further Insight:
Evaluating uncertainty in integrated environmental models: A review of concepts and tools. Matott, L. S., J. E. Babendreier and S. T. Purucker 2009. Water Resour. Res. 45: Article Number: W06421.
Best Practice: Graded Approach
Model evaluation should be conducted using a graded approach that is adequate and appropriate to the model application or decision at hand (EPA, 2009a).
A graded approach recognizes that model evaluation, as an iterative process, cannot be designed in a 'one size fits all approach.' Further, the NRC (2007) recommends that model evaluation be designed to the complexity and impacts (of the model) in addition to consideration of the life-cycle stage of the model and the evaluation history.
For example, a screening model (a type of model designed to provide a "conservative" or risk-averse answer) that is used for risk management should undergo rigorous evaluation to avoid false negatives, while still not imposing unreasonable data-generation burdens (false positives) on the regulated community.
Ideally, decision makers and modeling staff work together at the onset of new projects to identify the appropriate degree of model evaluation.
Intended Use of Model Results
Regulatory compliance
Litigation
Congressional testimony
Regulatory development
State Implementation plan attainment
Verification of model
Trends monitoring (non-regulatory)
Proof of principle
Basic research
Bench-scale testing
A Graded Approach to Model Evaluation. Model results that have higher consequences our outcomes should be subjected to more rigorous evaluation methods.
A Model Evaluation Plan
An evaluation plan can be a valuable tool for outlining the evaluation exercises that will be carried out during the model life-cycle. The evaluation plan should be determined during the earliest stages of model development (i.e. in the required QA project plan) and include all members of the model development team.
Model Development Team:
Comprised of model developers, users (those who generate results and those who use the results), and decision makers; also referred to as the project team.
Recommendations of Elements to Include in an Evaluation Plan (NRC, 2007)
Describe the model and its intended uses
Describe the relationship of the model to data (for both inputs and corroboration)
Describe how model performance will be assessed
Use an outline or diagram that show how the elements and instances of evaluation relate to the model's life cycle
Describe the factors or events that might trigger the need for major model revisions or the circumstances that might prompt users to seek an alternative model. These can be fairly broad and qualitative.
Identify the responsibilities, accountabilities, and resources needed to ensure implementation of the evaluation plan.
A Model Evaluation Plan (cont.)
The over arching goal of model evaluation is to ensure model quality. The EPA defines quality in the Information Quality Guidelines (IQGs) (EPA, 2002b). The IQGs apply to all information that EPA disseminates, including models, information from models, and input data.
The Atmospheric Modeling and Analysis Division
(AMAD Website)
The Atmospheric Modeling and Analysis Division (AMAD) of the Office of Research and Development leads the development and evaluation of predictive atmospheric models on all spatial and temporal scales for assessing changes in air quality and air pollutant exposures. For evaluation of their models, AMAD has developed a framework to describe different aspects of model evaluation:
Framework to describe different aspects of model evaluation:
A model evaluation framework example for the CMAQ model. Adapted from the AMAD Website. (Click on image for a larger version)
Operational evaluation: a comparison of model-predicted and routinely measured concentrations of the end-point pollutant(s) of interest in an overall sense. (See example in right page.)
Diagnostic evaluation: investigates the atmospheric processes and input drivers that affect model performance to guide model development and improvements needed in emissions and meteorological data.
Dynamic evaluation: assesses a model's air quality response to changes in meteorology or emissions, which is a principal use of an air quality model for air quality management.
Probabilistic evaluation: characterizes uncertainty of model predictions for model applications such as predicted concentration changes in response to emission reductions.
An example of operational evaluation - a fundamental first phase of any model evaluation. A scatter plot of observed versus CMAQ (an air quality model) predicted sulfate (SO4) concentrations for August 2006. (Image modified from AMAD) (Click on image for a larger version)
QA Planning
Quality Assurance Planning
A well-executed quality assurance project plan (QAPP) helps to ensure how model evaluation will be performed and that a model performs the specified task. The objectives and specifications of the model set forth in a quality assurance plan can also be subjected to peer review.
Data quality assessments are an integral component of any QA plan that includes modeling activities. Similar to peer review, data quality assessments evaluate and assure that (EPA, 2002a):
the data used by model is of high quality
data uncertainty is minimized
the model has a foundation of sound scientific principles
The data used to parameterize and corroborate models should be assessed during all relevant stages of a modeling project. These data assessments should be both qualitative and quantitative (i.e. is there enough appropriate data). These assessments ensure that the data sufficiently represent the system being modeled.
Additional Web Resources
The topic of model documentation (an important component of a complete QA plan) is discussed in other modules as well:
Additional information (including guidance documents) can be found at the Agency's website for the Quality System for Environmental Data and Technology.
Drivers for QA Planning
Congress has directed the Office of Management and Budget (OMB) to issue government-wide guidelines that
"...provide policy and procedural guidance to Federal agencies for ensuring and maximizing the quality, objectivity, utility, and integrity of information (including statistical information) disseminated by Federal agencies..."
EPA is dedicated to the collection, generation, and dissemination of high quality information (EPA, 2002b). The EPA states that its quality systems must include:
"...use of a systematic planning approach to develop acceptance or performance criteria for all work covered" and "assessment of existing data, when used to support Agency decisions or other secondary purposes, to verify that they are of sufficient quantity and adequate quality for their intended use."
Requirements for QA plans for data collection and modeling activities is one of the EPA's major means to achieve its high quality assurance goals.
The EPA's Quality System for Environmental Data and Technology is in place to manage the quality of its environmental data collection, generation, and use (including models). Guidelines provide information about how to document and conduct quality assurance planning for modeling. Specific recommendations include:
Specifications for developing assessment criteria
Assessments at various stages of the modeling process
Reports to management as feedback for corrective action
The process for acceptance, rejection, or qualification of the output for the intended use
QAPP For Modeling (cont.)
A graded approach is also practical in the development of quality assurance project plan (QAPP) for modeling (EPA, 2002b). When models are developed or applied, the intended use of the generated results should be known. This information provides guidance for determining the appropriate level of quality assurance.
A Graded Approach: Examples of modeling projects with increasing consequence and the associated level of QA planning. (Click on image for a larger version)
Data Quality Assessment
In some instances, the modeling project may utilize data coming from both direct and indirect measurements. A QA plan for data quality should identify:
The need and intended use of each type of data or information to be acquired.
Requirements on how indirect measurements are to be acquired and used
How the data will be identified or acquired, and expected sources of these data.
The method of determining the quality of the data.
The criteria established for determining the quantity and quality level of data that is acceptable.
A Modeling Caveat
The EPA recommends using the terms 'precision' and 'bias,' rather than 'accuracy,' to convey the information usually associated with accuracy (the closeness of a measured or computed value to its 'true' value)
Acceptance Criteria for Individual Data Values (EPA, 2009a)
Is the estimate of variability sufficiently small to meet the uncertainty objectives of the modeling project
Data Quality Assessment (cont.)
Acceptance Criteria for Individual Data Values (EPA, 2009a)
Qualifiers:
Have the data met the quality assurances and data quality objectives?
Is the system of qualifying or flagging data adequately documented?
Summarization:
Is the data summarization process clear and sufficiently consistent with the goals of this project?
Processing and transformation equations should be made available so that the underlying assumptions can be evaluated against the objectives of the project.
Peer Review
The peer review process provides the main mechanism for independent evaluation and review of environmental models used by the EPA. Its purpose is two-fold:
To evaluate whether the assumptions, methods, and conclusions derived from environmental models are based on sound scientific principles
To check the scientific appropriateness of a model for informing a specific regulatory decision
Peer review can uncover technical problems, oversights, or unresolved issues in preliminary versions of the model (EPA, 2006b).
Using an established peer review mechanism (e.g. Science Advisory Board, Committee on Environment and Natural Resources Research)
Holding a technical workshop
The peer review process should also be well documented. In the earliest stages of model development, the team should identify the expected evaluation events and peer review processes. During any peer review process the review panel would also be presented with charge questions to drive the review.
Peer Review Figure
A detailed diagram of the model life-cycle (EPA, 2009a); including peer review during at every stage of the model life-cycle. (Click on image for a larger version)
Best Practices (cont.)
Peer Review Elements
Critical Elements of the Peer Review Process Individual Data Values (EPA, 1994; 2009a)
Application Niche: The set of conditions under which the use of a model is scientifically defensible. The identification of application niche is a key step during model development.
Example Charge Questions
Examples of charge questions for a peer review process are provided to the right. For additional information on these models please see the citations and links provided below.
The Science Advisory Board Review of EPA's Second Generation Model (EPA, 2007a)
The Second Generation Model (SGM) is a computable general equilibrium model designed specifically to analyze issues related to energy, economy, and greenhouse gas emissions. These questions represent general and over arching questions charged to the peer review group. See EPA (2007a) for the full report and more specific questions.
Is the SGM appropriate and useful for answering questions on the economic effects of climate policies?
Are the model's structure and fundamental assumptions reasonable and consistent with economic theory?
The Science Advisory Board Review of the Estimation Programs Interface Suite (EPI Suite™) (EPA, 2007b)
The EPI Suite™ is suite of physical/chemical property and environmental fate estimation models developed by the EPA's Office of Pollution Prevention Toxics and Syracuse Research Corporation (SRC). These questions represent general and over arching questions charged to the peer review group. See EPA (2007b) for the full report and more specific questions.
Are there additional properties that should be included in upgrades to the model for its various specified uses?
Are there places where EPI Suite™'s user guide (and other program documentation) does not clearly explain EPI's design and use? How can these be improved?
Are there aspects of the user interface that need to be corrected, redesigned, or otherwise improved?
Are there other features that could enhance convenience and overall utility for users?
Is adequate information on accuracy/validation conveyed to the user by the program documentation and/or the program itself?
Model corroboration assesses the degree to which a model corresponds to reality, using both quantitative and qualitative methods. The modelers may use a graded approach, as mentioned earlier, to determine the rigor of these assessments which should be appropriately defined for each model application.
Qualitative methods, like expert elicitation, can provide the development team with beliefs about a system's behavior in a data-poor situation. Utilizing the expert knowledge available, qualitative corroboration is achieved through consensus and consistency (EPA, 2009a).
Uncertainty:
Describes a lack of knowledge about models, parameters, constants, data, and beliefs.
Clarification on Model Evaluation, Validation and Verification
Model evaluation is defined as the process used to generate information to determine whether a model and its analytical results are of a quality sufficient to serve as the basis for a decision (EPA, 2009a).
Validated models are those that have been shown to successfully perform a specific task (model application scenario) of site-specific field data. Model validation is essentially problem specific since there are few 'universal' models (Beck et al., 1994). The Guidance Document (EPA, 2009a) focuses on the processes and techniques for model evaluation rather than model validation or invalidation. Please see the Case Study sub tab at the end of this section.
Verification is another term commonly applied to the evaluation process. However, model verification typically refers to model code. Verification is an assessment of the algorithms and numerical techniques used in the computer code to confirm that they work correctly and represent the conceptual model accurately - a process typically applied during the Development Stage (EPA, 2009a).
Verification, Validation, and Confirmation of Numerical Models in the Earth Sciences. 1994. Oreskes, N., K. Shrader-Frechette and K. Belitz. Science 263 (5147): 641-646.
Quantitative Methods
Quantitative measures assess the robustness of a model - the capacity of a model to perform equally well across the full range of environmental conditions for which it was designed (EPA, 2009a). These assessments rely upon statistical measures to calculate various measures of fit between modeled results and measured data.
Model performance measures assess how close modeled results are to the measured data through deviances. Each method (Janssen and Heuberger, 1995) has strengths and weaknesses that should be considered when choosing an assessment measure. For example, modeling efficiency (ME) is a dimensionless statistic which directly relates model predictions to observed data and root mean square error (RMSE) is a method which is sensitive to outliers, but can accurately describe relationships between modeled data and noise-free (measured) data.
Deviance Measures Between Modeled (P) and Observed/Measured (O) Values
From Janssen and Heuberger (1995) n = number of observations.(Click on images for a larger version)
Measure
Equation
Average Error:
Modeling Efficiency:
Mean Square Error:
Mean Absolute Error:
Root Mean Square Error:
Graphical Methods
Simple plots between modeled and measured data can reveal qualitative assessments of model performance (i.e. time series, scatter plots, quantile-quantile (Q-Q) plots, spatial concentration plots, etc.).
Plotting modeled data against measured data is a simple way to assess model performance, as depicted in the figure to the right. The 1:1 line represents a model that is accurately predicting measured data.
Comparisons between Measured and Modeled Data along the '1:1 Line.' This simple comparison can be useful in early stages of model evaluation as a qualitative way to assess model performance.
Model Selection
In many scenarios, there may be a number of models suited for a particular application and the project team uses both quantitative and qualitative methods of model evaluation to select the best model for their modeling application.
Ranking models on the basis of their statistical performance against measured data can aid in the process of model selection. When quantitative measures of models performance do not distinguish one model from another, model selection can shift to a more qualitative nature. Past use, public familiarity, cost or resource requirements, and availability can all be useful metrics to help determine the most suitable model.
During a model selection process, all potential environmental models are examined to determine to the most appropriate models and the selected model(s). (Click on image for a larger version)
The aquatic ecosystem model AQUATOX is one of the few general ecological risk models that represents the combined environmental fate of various pollutants and their effects on the ecosystem, including fish, invertebrates, and aquatic plants (EPA, 2000).
In this example, AQUATOX was validated for use at Lake Onondaga, NY. The validation report (EPA, 2000) highlights multiple levels of validation with calibrated versions of the model.
Dissolved oxygen concentrations in the Lake Onondoaga hypolimnion in 1990. AQUATOX predictions indicate anoxic conditions in the middle of summer and the episode is remarkably close to the observed conditions (EPA, 2000).
Sensitivity Analysis (SA)
The purpose of a sensitivity analysis (SA) can be two-fold. First, SA computes the effect of changes in model inputs on the outputs. Second, SA can be used to study how uncertainty in a model output can be systematically apportioned to different sources of uncertainty in the model input.
Sensitivity analysis is defined as the computation of the effect of changes in input values or assumptions (including boundaries and model functional form) on the outputs. In other terms: how sensitive are the results to changes in the inputs, parameters, or model assumptions.
A non-intensive sensitivity analysis can first be applied to identify the most sensitive inputs. By discovering the 'relative sensitivity' of model parameters, the model development team is then aware of the relative importance of parameters in the model and can select a subset of the inputs for more rigorous sensitivity analyses (EPA, 2009a). This also ensures that a single parameter is not overly influencing the results.
A spider diagram used to compare relative changes in model output to relative changes in the parameter values can reveal sensitivities for each parameter (Addiscott, 1993). In this example, the effects of changing parameters A, B, and C are compared to relative changes in model output. The legs represent the extent and direction of the effects of changing parameter values. (Click on image for a larger version)
Sensitivity Analysis Methods
There are many methods for sensitivity analysis (SA), a few of which were highlighted in the Guidance on the Development, Evaluation, and Application of Environmental Models (EPA, 2009a). The chosen method is dependent upon assumptions made and the amount of information needed from the analysis. Those methods are categorized into:
Screening Tools
Morris's One-at-a-Time
Differential Analysis Methods
Methods Based on Sampling
Variance Based Methods
For many of the methods it is important to consider the geometry of the response plane and potential interactions among parameters and/or input variables. Depending on underlying assumptions of the model, it may be best to start SA with simple methods to initially identify the most sensitive inputs and then apply more intensive methods to those inputs.
Physiologically based pharmacokinetic (PBPK) models represent an important class of dosimetry models that are useful for predicting internal dose at target organs for risk assessment applications (EPA 2006a). This figure is an example of the Monte Carlo simulation method for Sensitivity Analysis. The distribution of internal concentration versus time (output) is simulated by repeatedly (often as many as 10,000 iterations) sampling input values based on the distributions of individual parameters (blood flow rate, body weight, metabolic enzymes, partition coefficients, etc.) in a population. Adapted from EPA (2006a).
Further Insight
Sensitivity analysis (SA) is an important component of Model Evaluation. When combined with uncertainty analysis (UA) the contribution of input parameters to total uncertainty can be revealed. Further, knowing which inputs to focus further analyses on saves the research team valuable time.
There are many methods for SA, each coming with a set of caveats and features that can be used to select the best SA for a specific model application.
Uncertainties (i.e. a lack of knowledge) are present and inherent throughout the modeling process. However, models can continue to be valuable tools for informing decisions through proper quantification and communication of the associated uncertainties (EPA, 2009a).
Uncertainty analysis (UA) investigates the effects of lack of knowledge or potential errors on model output. When UA is conducted in combination with sensitivity analysis; the model user can become more informed about the confidence that can be placed in model results (EPA, 2009a).
Model Uncertainty (EPA, 2009a)
Application niche uncertainty – uncertainty attributed to the appropriateness of a model for use under a specific set of conditions (i.e. a model application scenario).
Structure/framework uncertainty – incomplete knowledge about factors that control the behavior of the system being modeled; limitations in spatial or temporal resolution; and simplifications of the system.
Input/data uncertainty – resulting from data measurement errors; inconsistencies between measured values and those used by the model; also includes parameter value uncertainty.
Uncertainty and Variability
Uncertainty represents lack of knowledge about something that is true. It is a general term that is often applied in a number of contexts. In environmental modeling, it may describe a lack of knowledge about models, parameters, constants, data, or the underlying assumptions.
The nature of uncertainty can be described as (Walker et al., 2003; Pascual 2005; EPA, 2009b):
Stochastic uncertainty – resulting from errors in empirical measurements or from the world's inherent stochastically
Epistemic uncertainty – uncertainty from imperfect knowledge
Technical uncertainty – uncertainty associated with calculation errors, numerical approximations, and errors in the model algorithms
Variability vs. Uncertainty
Variability is a special instance of uncertainty - often called data uncertainty. Variability of environmental data is a product of the inherent randomness and heterogeneity of the environment.
Variability can be better characterized, but hard to reduce, with further study.
Separating variability and uncertainty is necessary to provide greater accountability and transparency (EPA, 1997).
Uncertainty Matrix
Many of the modeling uncertainties identified have limited room for improvement. However, there are methods to resolving model uncertainty; even some of the qualitative uncertainties can be addressed.
Uncertainty analysis begins with characterizing the associated modeling uncertainties; often accomplished using a framework or uncertainty matrix (i.e. Walker et al., 2003; Refsgaard et al., 2007). Walker et al. (2003) discuss uncertainty as a 3-Dimensional relationship:
Location: Where the uncertainty manifests itself within the model complex (application niche, framework, or input uncertainty). Level: The degree of uncertainty along the spectrum between deterministic knowledge and total ignorance. Nature: whether the uncertainty comes from epistemic uncertainty, or the inherent variability of the phenomena being described.
An example of an uncertainty matrix (Walker et al., 2003). Frameworks like this assist the model development team in identifying the sources of uncertainty and where efforts for resolving uncertainty may be best applied. (Click on image for a larger version)
Location
Level
Nature
Statistical Uncertainty
Scenario Uncertainty
Recognized Uncertainty
Epistemic Uncertainty
Stochastic Uncertainty
Context
Natural, technological, economic, social, and political representation
Mapping the model attributes to the problem statement
Confirming the degree of certainty needed from model outputs
Determining the amount of reliable data available or the resources available to collect more
The quality of the scientific foundations of the model
The technical competence of the model development / application team
Model Attributes:
The processes (chemical, biological, physical); variables; scale; and outputs described or contained within the model.
Further Insight
The EPA has produced a number of resources and guidance documents on uncertainty analysis that are specific to a variety of environmental modeling fields. A few of those resources are identified to the right.
The purpose of this module is to explore the topic of model evaluation and identify the 'best modeling practices' and strategies for the Evaluation Stage of the model life-cycle. In summary:
Model evaluation is a process of many activities that should include:
Peer review
Quality Assurance (QA) project planning
Model corroboration
Sensitivity analysis
Uncertainty analysis
Model evaluation should be conducted using a graded approach that is adequate and appropriate to the objectives of the modeling exercise.
The peer review process provides the main mechanism for independent evaluation and review of environmental models used by the EPA.
QA project planning promotes model transparency.
There are many techniques and approaches for model corroboration. An appropriate method should be determined at the beginning of the model life-cycle.
When practiced together, sensitivity and uncertainty analyses can used to study how uncertainty in a model output can be systematically apportioned to different sources of uncertainty in the model input.
Additional Web Resources:
SuperMUSE Website: Ecosystems Research Division's Supercomputer for Model Uncertainty and Sensitivity Evaluation (SuperMUSE) is a key to enhancing quality assurance in environmental models and applications.
Addiscott, T. M. 1993. Simulation modelling and soil behavior. Geoderma 60(1-4): 15-40.
Beck, B., L. Mulkey and T. Barnwell. 1994. Model Validation for Exposure Assessments. DRAFT. Athens, GA: US Environmental Protection Agency.
Cullen, A. C. and H. C. Frey 1999. Probabilistic Techniques in Exposure Assessment: A Handbook for Dealing with Variability and Uncertainty in Models and Inputs. New York. Plenum Press
Hanna, S. R. 1988. Air quality model evaluation and uncertainty. Journal of the Air Pollution Control Association 38(4): 406-412.
Jakeman, A. J., R. A. Letcher and J. P. Norton 2006. Ten iterative steps in development and evaluation of environmental models. Environmental Modeling & Software 21(5): 602-614.
Janssen, P. H. M. and P. S. C. Heuberger 1995. Calibration of process-oriented models. Ecol. Model. 83(1-2): 55-66.
Matott, L. S., J. E. Babendreier and S. T. Purucker 2009. Evaluating uncertainty in integrated environmental models: A review of concepts and tools. Water Resour. Res. 45: Article Number: W06421.
NRC (National Research Council) 2007. Models in Environmental Regulatory Decision Making. Washington, DC. National Academies Press.
Pascual, P. 2005. Wresting Environmental Decisions From an Uncertain World. Environmental Law Reporter 35: 10539-10549
Refsgaard, J. C., J. P. van der Sluijs, A. L. Højberg and P. A. Vanrolleghem 2007. Uncertainty in the environmental modeling process - A framework and guidance. Environ. Model. Software 22(11): 1543-1556.
Saltelli, A., K. Chan, and M. Scott, eds. 2000. Sensitivity Analysis. New York: John Wiley and Sons.
Sunderland, E. 2008. Addressing Model Uncertainty & Best Practices for Model Evaluation. Presentation at Presentation at Region 1 Regional Science Council Models Training Course. October 23, 2008.
Walker, W. E., P. Harremoës, J. Rotmans, J. P. van der Sluijs, M. B. A. van Asselt, P. Janssen and M. P. Krayer von Krauss 2003. Defining Uncertainty: A Conceptual Basis for Uncertainty Management in Model-Based Decision Support. Integrated Assessment 4(1): 5-17.