Document Type : Research Paper
Authors
1 PhD student, Department of Civil Engineering, Science and Research Branch, Islamic Azad University, Tehran, Iran
2 Associate Professor, Department of Civil Engineering, Science and Research Branch, Islamic Azad University, Tehran, Iran
3 Associate Professor, Department of Range and Watershed Management, Urmia University, Urmia, Iran
4 Professor, Department of Water Engineering, Center of Excellence in Hydroinformatics, Faculty of Civil Engineering, University of Tabriz, and Farazab Co. (Consulting Engineers), Research and Writing Capacity Enhancement Affairs, Tabriz, Iran
Abstract
Introduction
Accurate streamflow prediction is essential for water resources management and flood control. Due to the complex and nonlinear behavior of streamflow, traditional models are often inadequate. Machine learning and deep learning algorithms offer more robust solutions; however, their accuracy can be affected by sudden climatic fluctuations. Consequently, employing hybrid methods is necessary to improve prediction accuracy. The literature review reveals that, despite the high capabilities of machine learning models, a research gap still exists in managing multi-scale fluctuations in streamflow data. This underscores the necessity of using hybrid approaches to enhance prediction accuracy. The innovation of this study is a hybrid framework that simultaneously models both long-term patterns and short-term fluctuations by integrating wavelet analysis, used to decompose the streamflow signal, with a powerful deep learning model.
Materials and methods
In this study, to predict the streamflow of the Kurkursar River in Nowshahr, hydrological data including daily precipitation and river discharge over a 20-year period at a daily resolution were utilized. The input variables included daily precipitation (Pt) and streamflow with time lags of one, two, and three days (Qt−1, Qt−2, Qt−3). Before the modeling process, data preprocessing was performed, which included reconstructing missing data, removing anomalous data (outliers), and normalizing the values to improve data quality and enhance their reliability in hydrological analyses. The hydrological data from the watershed were divided into three subsets: training (70%), validation (15%), and testing (15%). Four streamflow prediction scenarios were selected based on Pearson correlation coefficient analysis to identify sensitive variables and determine the model inputs. The river streamflow modeling process was carried out using two algorithms: Random Forest (RF) and the deep learning Long Short-Term Memory (LSTM) recurrent neural network. Furthermore, to enhance the accuracy and improve the generalizability of the models, various wavelet transform methods, including Daubechies 4 (Db4), Haar, and Mexican Hat wavelets, were used to extract multi-scale features and combine them with the input data for the RF and LSTM models. This hybrid approach facilitated the identification of complex spatio-temporal patterns in the hydrological time series. After the final evaluation of the prediction models' performance, the Daubechies 4 (Db4) wavelet transform was employed to optimize their coefficients and structural parameters. Performance evaluation metrics, including the Coefficient of Determination (R²), Mean Absolute Error (MAE), Root Mean Square Error (RMSE), Percent Bias (PBIAS), Mean Absolute Percentage Error (MAPE), and Kling-Gupta Efficiency (KGE), were used to assess the accuracy of the models' predictions. Ultimately, the optimal models were selected based on a comparative analysis of these quantitative criteria. Additionally, for data analysis and visual presentation of the results, various plots were used, including scatter plots, time series of observed and predicted data, and error distributions such as error histograms, normal density curves, cumulative distribution functions of errors, and quantile-quantile (Q-Q) plots.
Results and discussion
The results showed that in streamflow prediction, previous time steps (different lags) were the most important variables for predicting all subsequent horizons. The final results regarding the model scenarios indicated that the first scenario (S1), which only used the precipitation variable, was the weakest performer in all cases. Furthermore, the sixth scenario (S6), which utilized all available variables (Pt,Qt−1,Qt−2,Qt−3), had the best performance in the training and testing phases for both standalone and hybrid models. The research findings indicated that the hybrid Random Forest-Wavelet (RF-Wavelet) model had the best performance in both the training (R²=0.907, RMSE=0.0192) and testing (R²=0.942, RMSE=0.0106) phases. Additionally, the standalone Long Short-Term Memory (LSTM) deep learning model had the weakest performance in the training (R²=0.499, RMSE=1.6) and testing (R²=0.579, RMSE=1.149) phases. The findings also showed that the Daubechies 4 wavelet , when combined with the Random Forest model, was able to reduce the error of the standalone RF model by approximately 55%. Additionally, the wavelet, when combined with the LSTM model, was able to increase the prediction accuracy by approximately 39%. Furthermore, a comparison of the wavelet-hybrid models showed that the RF-Wavelet model reduced the error by approximately 23% compared to the hybrid LSTM-Wavelet model.
Conclusion
In this research, various wavelet transform models, including Daubechies 4, Haar, and Mexican Hat, were utilized for integration with RF and LSTM algorithms. Quantitative and qualitative analyses showed that the Daubechies 4 wavelet transform had significant superiority in improving streamflow prediction accuracy compared to other wavelet types within both RF and LSTM model frameworks. Therefore, this type of wavelet transform was selected and used as the primary basis for integration with these two prediction models. Examination of the error distribution pattern in the training data indicates a major concentration of error values in regions adjacent to zero. The distribution of errors was observed to be approximately symmetrical and showed considerable consistency with a normal distribution. This pattern signifies the model's satisfactory accuracy in the training and data-fitting process. Ultimately, the present study focused on the development of data-driven models to determine the optimal combination of predictor variables for modeling and predicting river streamflow. This research demonstrated that integrating the Daubechies 4 wavelet transform with the Random Forest (RF) model served as the optimal and superior approach for predicting hydrological streamflow in the present case study. The aforementioned hybrid model, in addition to significantly enhancing performance compared to standalone models by reducing prediction error by up to 55%, showed notable superiority over complex deep learning models, including LSTM and its associated hybrid combinations. This achievement highlights the importance of extracting multi-scale time-frequency features using the wavelet transform and emphasizes its pivotal role in improving the accuracy and generalizability of hydrological streamflow predictions, even in comparison to advanced architectures of deep temporal models.
Keywords