Modelling geosmin concentrations in three sources of raw water in Quebec, Canada


  • Parinet Julien
  • Rodriguez Manuel J.
  • Serodes Jean-Baptiste

document type



The presence of off-flavour compounds such as geosmin, often found in raw water, significantly reduces the organoleptic quality of distributed water and diverts the consumer from its use. To adapt water treatment processes to eliminate these compounds, it is necessary to be able to identify them quickly. Routine analysis could be considered a solution, but it is expensive and delays associated with obtaining the results of analysis are often important, thereby constituting a serious disadvantage. The development of decision-making tools such as predictive models seems to be an economic and feasible solution to counterbalance the limitations of analytical methods. Among these tools, multi-linear regression and principal component regression are easy to implement. However, due to certain disadvantages inherent in these methods (multicollinearity or non-linearity of the processes), the use of emergent models involving artificial neurons networks such as multi-layer perceptron could prove to be an interesting alternative. In a previous paper (Parinet et al., Water Res 44: 5847-5856, 2010), the possible parameters that affect the variability of taste and odour compounds were investigated using principal component analysis. In the present study, we expand the research by comparing the performance of three tools using different modelling scenarios (multi-linear regression, principal component regression and multi-layer perceptron) to model geosmin in drinking water sources using 38 microbiological and physicochemical parameters. Three very different sources of water, in terms of quality, were selected for the study. These sources supply drinking water to the Qu,bec City area (Canada) and its vicinity, and were monitored three times per month over a 1-year period. Seven different modelling methods were tested for predicting geosmin in these sources. The comparison of the seven different models showed that simple models based on multi-linear regression provide sufficient predictive capacity with performance levels comparable to those obtained with artificial neural networks. The multi-linear regression model (R (2) = 0.657, < 0.001) used only four variables (phaeophytin, sum of green algae, chlorophyll-a and potential Redox) in comparison with ten variables (potassium, heterotrophic bacteria, organic nitrogen, total nitrogen, phaeophytin, total organic carbon, sum of green algae, potential Redox, UV absorbance at 254 nm and atypical bacteria) for the best model obtained with artificial neural networks (R (2) = 0.843).

more information