The scatter diagram is a graph (Figure 3) where predicted values are plotted versus measured ones. On this diagram the y=x line represents the perfect agreement between predictions and reference values. A value above (below) the y=x line indicates a situation of over-prediction (under-prediction).
Suppose that N pairs (Mi,Pi) are plotted in a scatter diagram. If is the number of over-predictions, i.e. the number of pairs where Pi > Mi, FOEX is defined as:
FOEX ranges between -50% and +50%; a FOEX equal to -50% means that all the points are below the y=x line, i.e. all the values are under-predicted; a FOEX equal to +50% means that all the points are above the y=x line, i.e. all the values are over-predicted. The best value is 0%, meaning that there are half under-predictions and half over-predictions.
In the scatter diagram, the 'FAa ' band is the region between the two lines of equation:
where x0 and y0 are the co-ordinates of the origin of the axes.
Scatter diagram, FOEX and FAa can be applied on any kind of data, for example concentrations, percentiles of the distribution, integrated concentrations. In ATMES II a global scatter diagram is plotted for each model considering all the concentration pairs (Mi,Pi) at any time all over the domain, and the FA2 and FA5 coefficients are evaluated.
|
Figure 3. Global scatter diagram of concentration values in logarithmic scale for one of the models participating to ATMES-II.Continuous line is y=x line, dashed and dotted lines identify FA2 and FA5 bands, respectively. |
It must be pointed out that FOEX index does not take into account the magnitude of the over-prediction; it evaluates only the number of events of over-prediction. However, a quantitative estimate can be obtained by coupling FOEX and several FAa and by an examination of the diagram. For example a 'good' performance will be characterised by a close-to-zero value of FOEX and a high value of FA2; a high value of FA5 and a high positive value of FOEX correspond to a model that slightly over-predicts measurements in many events. Nonetheless, if a more detailed evaluation must be done, these indexes can be associated with the information derived from the calculation of bias, geometric mean bias, NMSE, geometric mean variance, and from the comparison of measured and predicted percentiles (see next paragraphs).
The pairs (Mi,Pi ) used to generate the scatter diagrams of ATMES II analysis were restricted to those having a non-zero value of measured concentration Mi, or having a zero value of Mi but occurring, in each station, not earlier than two time intervals (6 hours) before the arrival of the cloud, and not later than two time intervals (6 hours) after the departure of the cloud.
This filtering results in 34% of measurements equal to zero. In case of an hypothetical model predicting zero concentration everywhere, at any time, the FA2 coefficient would thus be equal to 34. Therefore, the zero measurements influence the values of FAa coefficients, sometimes attributing higher values to models with worse performance. Therefore, it is very important to take into account the weight of zero measured values when looking at FAa factors. The evaluation has always to rely on the scatter diagram as well, and to an evaluation of all the FAa : if a relatively high FA2 value for a model is mainly due to the pairs (0,0) the FA5 value will not improve significantly.
This filtering is particularly important for a long range concentration field experiment, where a substance with low background is emitted by a single source. Many zero concentrations (i.e. not above background) are supposed to be measured at the beginning of the release, especially far from the source, as well as close to the source long after the end of the release. Also, some areas might be ignored by the cloud. In the case of a global statistics including all the data, zero measurements would artificially improve the performance of a model. In particular, they improve the performance of those models which underestimate the extension in space and/or in time of the pollution episode, when compared to overestimating models. On the contrary, using only non-zero measurements, over-predicting models are privileged.
The data selection criterion employed in ATMES II global analysis may privilege some models than others, as well. The choice of two zeros before and after (that means six hours before and after) the non-zero concentration time series at each receptor may be questioned and investigated. However it seemed to be the criterion that solely corresponds to the purposes of the comparison of a model prediction with the results of a long-range experiment.
Another approach might consist in including all the points where either measurements or any model gives a non-zero concentration. However, this criterion has been rejected from the beginning, because it depends on the models, while the criterion selected for ATMES II depends only on measurements and can thus be adopted by anyone who will want to evaluate a model on the same basis as it was done in this ATMES II exercise.