Applying wavelet-based machine learning and deep learning algorithms for streamflow prediction of the Kurkursar River

Merufinia, Edris; Sharafati, Ahmad; Abghari, Hirad; Hassanzadeh, Yousef

doi:10.22092/ijwmse.2025.369325.2115

Document Type : Research Paper

Authors

¹ PhD student, Department of Civil Engineering, Science and Research Branch, Islamic Azad University, Tehran, Iran

² Associate Professor, Department of Civil Engineering, Science and Research Branch, Islamic Azad University, Tehran, Iran

³ Associate Professor, Department of Range and Watershed Management, Urmia University, Urmia, Iran

⁴ Professor, Department of Water Engineering, Center of Excellence in Hydroinformatics, Faculty of Civil Engineering, University of Tabriz, and Farazab Co. (Consulting Engineers), Research and Writing Capacity Enhancement Affairs, Tabriz, Iran

10.22092/ijwmse.2025.369325.2115

Abstract

Introduction
Accurate streamflow prediction is essential for water resources management and flood control. Due to the complex and nonlinear behavior of streamflow, traditional models are often inadequate. Machine learning and deep learning algorithms offer more robust solutions; however, their accuracy can be affected by sudden climatic fluctuations. Consequently, employing hybrid methods is necessary to improve prediction accuracy. The literature review reveals that, despite the high capabilities of machine learning models, a research gap still exists in managing multi-scale fluctuations in streamflow data. This underscores the necessity of using hybrid approaches to enhance prediction accuracy. The innovation of this study is a hybrid framework that simultaneously models both long-term patterns and short-term fluctuations by integrating wavelet analysis, used to decompose the streamflow signal, with a powerful deep learning model.

Materials and methods
In this study, to predict the streamflow of the Kurkursar River in Nowshahr, hydrological data including daily precipitation and river discharge over a 20-year period at a daily resolution were utilized. The input variables included daily precipitation (P_t) and streamflow with time lags of one, two, and three days (Q_t−1, Q_t−2, Q_t−3). Before the modeling process, data preprocessing was performed, which included reconstructing missing data, removing anomalous data (outliers), and normalizing the values to improve data quality and enhance their reliability in hydrological analyses. The hydrological data from the watershed were divided into three subsets: training (70%), validation (15%), and testing (15%). Four streamflow prediction scenarios were selected based on Pearson correlation coefficient analysis to identify sensitive variables and determine the model inputs. The river streamflow modeling process was carried out using two algorithms: Random Forest (RF) and the deep learning Long Short-Term Memory (LSTM) recurrent neural network. Furthermore, to enhance the accuracy and improve the generalizability of the models, various wavelet transform methods, including Daubechies 4 (Db4), Haar, and Mexican Hat wavelets, were used to extract multi-scale features and combine them with the input data for the RF and LSTM models. This hybrid approach facilitated the identification of complex spatio-temporal patterns in the hydrological time series. After the final evaluation of the prediction models' performance, the Daubechies 4 (Db4) wavelet transform was employed to optimize their coefficients and structural parameters. Performance evaluation metrics, including the Coefficient of Determination (R²), Mean Absolute Error (MAE), Root Mean Square Error (RMSE), Percent Bias (PBIAS), Mean Absolute Percentage Error (MAPE), and Kling-Gupta Efficiency (KGE), were used to assess the accuracy of the models' predictions. Ultimately, the optimal models were selected based on a comparative analysis of these quantitative criteria. Additionally, for data analysis and visual presentation of the results, various plots were used, including scatter plots, time series of observed and predicted data, and error distributions such as error histograms, normal density curves, cumulative distribution functions of errors, and quantile-quantile (Q-Q) plots.

Results and discussion
The results showed that in streamflow prediction, previous time steps (different lags) were the most important variables for predicting all subsequent horizons. The final results regarding the model scenarios indicated that the first scenario (S1), which only used the precipitation variable, was the weakest performer in all cases. Furthermore, the sixth scenario (S6), which utilized all available variables (P_t,Q_t₋₁,Q_t₋₂,Q_t₋₃), had the best performance in the training and testing phases for both standalone and hybrid models. The research findings indicated that the hybrid Random Forest-Wavelet (RF-Wavelet) model had the best performance in both the training (R²=0.907, RMSE=0.0192) and testing (R²=0.942, RMSE=0.0106) phases. Additionally, the standalone Long Short-Term Memory (LSTM) deep learning model had the weakest performance in the training (R²=0.499, RMSE=1.6) and testing (R²=0.579, RMSE=1.149) phases. The findings also showed that the Daubechies 4 wavelet , when combined with the Random Forest model, was able to reduce the error of the standalone RF model by approximately 55%. Additionally, the wavelet, when combined with the LSTM model, was able to increase the prediction accuracy by approximately 39%. Furthermore, a comparison of the wavelet-hybrid models showed that the RF-Wavelet model reduced the error by approximately 23% compared to the hybrid LSTM-Wavelet model.

Conclusion
In this research, various wavelet transform models, including Daubechies 4, Haar, and Mexican Hat, were utilized for integration with RF and LSTM algorithms. Quantitative and qualitative analyses showed that the Daubechies 4 wavelet transform had significant superiority in improving streamflow prediction accuracy compared to other wavelet types within both RF and LSTM model frameworks. Therefore, this type of wavelet transform was selected and used as the primary basis for integration with these two prediction models. Examination of the error distribution pattern in the training data indicates a major concentration of error values in regions adjacent to zero. The distribution of errors was observed to be approximately symmetrical and showed considerable consistency with a normal distribution. This pattern signifies the model's satisfactory accuracy in the training and data-fitting process. Ultimately, the present study focused on the development of data-driven models to determine the optimal combination of predictor variables for modeling and predicting river streamflow. This research demonstrated that integrating the Daubechies 4 wavelet transform with the Random Forest (RF) model served as the optimal and superior approach for predicting hydrological streamflow in the present case study. The aforementioned hybrid model, in addition to significantly enhancing performance compared to standalone models by reducing prediction error by up to 55%, showed notable superiority over complex deep learning models, including LSTM and its associated hybrid combinations. This achievement highlights the importance of extracting multi-scale time-frequency features using the wavelet transform and emphasizes its pivotal role in improving the accuracy and generalizability of hydrological streamflow predictions, even in comparison to advanced architectures of deep temporal models.

Keywords

References

Adnan, R.M., Liang, Z., Trajkovic, S., Zounemat-Kermani, M., Li, B., Kisi, O., 2019. Daily streamflow prediction using optimally pruned extreme learning machine. J. Hydrol. 577, 123981.

Adnan, R.M., Zounemat-Kermani, M., Kuriqi, A., Kisi, O., 2020. Machine learning method in prediction streamflow considering periodicity component. In Intelligent Data Analytics for Decision-Support Systems in Hazard Mitigation: Theory and Practice of Hazard Mitigation, 383-403. Singapore: Springer Singapore.

Al-Juboori, A.M., 2019. Generating monthly stream flow using nearest river data: Assessing different trees models. Water Resour. Manage. 33(9), 3257-3270.

Altman, D.G., Bland, J.M., 1999. Statistics notes variables and parameters. Bmj, 318(7199), 1667.

Arathy, N.G., Adarsh, S., 2024. A hybrid RF-LSTM model for daily streamflow prediction of greater Pamba River Basin, kerala incorporating dominant hydro-climatic drivers. In Recent Advances in Civil Engineering, 169-174. CRC Press.

Breiman, L., 2001. Random forests. Machine learning, 45(1), 5-32.

Cheng, M., Fang, F., Kinouchi, T., Navon, I.M., Pain, C.C., 2020. Long lead-time daily and monthly streamflow forecasting using machine learning methods. J. Hydrol. 590, 125376.

Danesh, M., Gharehbaghi, A., Mehdizadeh, S., Danesh, A., 2025. A comparative assessment of machine learning and deep learning models for the daily river streamflow forecasting. Water Resour. Manage. 39(4), 1911-1930.

Difi, S., Elmeddahi, Y., Hebal, A., Singh, V.P., Heddam, S., Kim, S., Kisi, O., 2023. Monthly streamflow prediction using hybrid extreme learning machine optimized by bat algorithm: a case study of Cheliff watershed, Algeria. Hydrol. Sci. J. 68(2), 189-208.

Hadi, S.J., Tombul, M., 2018. Streamflow forecasting using four wavelet transformation combinations approaches with data-driven models: a comparative study. Water Resour. Manage. 32(14), 4661-4679.

Han, D., Cluckie, I.D., Karbassioun, D., Lawry, J., Krauskopf, B., 2002. River flow modelling using fuzzy decision trees. Water Resour. Manage. 16(6), 431-445.

Hastie, T., 2009. The elements of statistical learning: data mining, inference, and prediction.

Hunt, K.M., Matthews, G.R., Pappenberger, F., Prudhomme, C., 2022. Using a long short-term memory (LSTM) neural network to boost river streamflow forecasts over the western United States. Hydrol. Earth Sys. Sci. 26(21), 5449-5472. https://doi.org/10.5194/hess-26-5449-2022

James, G., Witten, D., Hastie, T., Tibshirani, R., 2013. An introduction to statistical learning: with applications in R. 103, New York: Springer.

Khosravi, K., Golkarian, A., Tiefenbacher, J.P., 2022. Using optimized deep learning to predict daily streamflow: A comparison to common machine learning algorithms. Water Resour. Manage. 36(2), 699-716. https://doi.org/10.1007/s11269-021-03051-7

Khosravi, K., Miraki, S., Saco, P.M., Farmani, R., 2021. Short-term River streamflow modeling using Ensemble-based additive learner approach. J. Hydro-Environ. Res. 39, 81-91.

Kratzert, F., Klotz, D., Shalev, G., Klambauer, G., Hochreiter, S., Nearing, G., 2019. Towards learning universal, regional, and local hydrological behaviors via machine learning applied to large-sample datasets. Hydrol. Earth Sys. Sci. 23(12), 5089-5110. https://doi.org/10.5194/hess-23-5089-2019

Kumar, P.S., Praveen, T.V., Prasad, M.A., 2016. Artificial neural network model for rainfall-runoff-A case study. Int. J. Hybrid Info. Technol. 9(3), 263-272. http://dx.doi.org/10.14257/ijhit.2016.9.3.24

Latif, S.D., Ahmed, A.N., 2023. Streamflow prediction utilizing deep learning and machine learning algorithms for sustainable water supply management. Water Resour. Manage. 37(8), 3227-3241. https://doi.org/10.1007/s11269-023-03499-9

Le, X.H., Nguyen, D.H., Jung, S., Yeon, M., Lee, G., 2021. Comparison of deep learning techniques for river streamflow forecasting. IEEE Access. 9, 71805-71820.

Li, X., Sha, J., Wang, Z.L., 2019. Comparison of daily streamflow forecasts using extreme learning machines and the random forest method. Hydrol. Sci. J. 64(15), 1857-1866. https://doi.org/10.1080/02626667.2019.1680846

Lin, Y., Wang, D., Wang, G., Qiu, J., Long, K., Du, Y., Dai, Y., 2021. A hybrid deep learning algorithm and its application to streamflow prediction. J. Hydrol. 601, 126636.

Liu, Z., Zhou, P., Chen, G., Guo, L., 2014. Evaluating a coupled discrete wavelet transform and support vector regression for daily and monthly streamflow forecasting. J. Hydrol. 519, 2822-2831. https://doi.org/10.1016/j.jhydrol.2014.06.050

Merufinia, E., Sharafati, A., Abghari, H., Hassanzadeh, Y., 2023. On the simulation of streamflow using hybrid tree-based machine learning models: A case study of Kurkursar basin, Iran. Arabian J. Geosci. 16(1), 28. https://doi.org/10.1007/s12517-022-11045-x

Muhammad, A.U., Li, X., Feng, J., 2019. Using LSTM GRU and hybrid models for streamflow forecasting. In International Conference on Machine Learning and Intelligent Communications, 510-524. Cham: Springer International Publishing. https://doi.org/10.2166/hydro.2025.276

Nguyen, N.Y., Kha, D.D., Ninh, L.V., Anh, V.T., Anh, T.N., 2025. Streamflow prediction using Long Short-Term Memory networks: a case study at the Kratie Hydrological Station, Mekong River Basin. J. Hydroinform. 27(2), 275-298. https://doi.org/10.2166/hydro.2025.276

Obeta, S., Grisan, E., Kalu, C.V., 2020. A comparative study of long short-term memory and gated recurrent unit. Available at SSRN 4442677.

Peng, T., Zhou, J., Zhang, C., Fu, W., 2017. Streamflow forecasting using empirical wavelet transform and artificial neural networks. Water 9(6), 406. https://doi.org/10.3390/w9060406

Pham, L.T., Luo, L., Finley, A.O., 2020. Evaluation of Random Forest for short-term daily streamflow forecast in rainfall and snowmelt driven watersheds. Hydrol. Earth Sys. Sci. Discu. 1-33. https://doi.org/10.5194/hess-25-2997-2021.

Ragettli, S., Cortés, G., McPhee, J., Pellicciotti, F., 2014. An evaluation of approaches for modelling hydrological processes in high‐elevation, glacierized Andean watersheds. Hydrol. Process. 28(23), 5674-5695. https://doi.org/10.1002/hyp.10055

Rahimzad, M., Moghaddam Nia, A., Zolfonoon, H., Soltani, J., Danandeh Mehr, A., Kwon, H.H., 2021. Performance comparison of an LSTM-based deep learning model versus conventional machine learning algorithms for streamflow forecasting. Water Resour. Manage. 35(12), 4167-4187. https://doi.org/10.1007/s11269-021-02937-w

Roy, B., Singh, M.P., 2020. An empirical-based rainfall-runoff modelling using optimization technique. Int. J. River Basin Manage. 18(1), 49-67. https://doi.org/10.1080/15715124.2019.1680557

Sahour, H., Gholami, V., Torkaman, J., Vazifedan, M., Saeedi, S., 2021. Random forest and extreme gradient boosting algorithms for streamflow modeling using vessel features and tree-rings. Environ. Earth Sci. 80(22), 747. https://doi.org/10.1007/s12665-021-10054-5

Sharma, R.K., Kumar, S., Padmalal, D., Roy, A., 2024. Streamflow prediction using machine learning models in selected rivers of Southern India. Int. Journal River Basin Manage. 22(4), 529-555. https://doi.org/10.1080/15715124.2023.2196635

Singh, A., Saranya Das, K., Vijay, A., Nath, A.R., Chithra, N.R., 2025. Comparative study of different wavelet-machine learning models for agricultural drought prediction. Acta Geophysica 1-22. https://doi.org/10.1007/s11600-025-01660-z

Solomatine, D.P., Ostfeld, A., 2008. Data-driven modelling: some past experiences and new approaches. J. Hydroinform. 10(1), 3-22. https://doi.org/10.2166/hydro.2008.015

Wu, C.L., Chau, K.W., Fan, C., 2010. Prediction of rainfall time series using modular artificial neural networks coupled with data-preprocessing techniques. J. Hydrol. 389(1-2), 146-167. https://doi.org/10.1016/j.jhydrol.2010.05.040

Xu, W., Chen, J., Zhang, X. J., 2022. Scale effects of the monthly streamflow prediction using a state-of-the-art deep learning model. Water Resour. Manage. 36(10), 3609-3625. https://doi.org/10.1007/s11269-022-03216-y

Watershed Engineering and Management

Applying wavelet-based machine learning and deep learning algorithms for streamflow prediction of the Kurkursar River

References

References

Volume 17, Issue 4
January 2026
Pages 424-447

Applying wavelet-based machine learning and deep learning algorithms for streamflow prediction of the Kurkursar River

References

References

Volume 17, Issue 4January 2026Pages 424-447

Volume 17, Issue 4
January 2026
Pages 424-447