The present research aims to explore the utility of a widely adopted deep learning technique in determining the daily groundwater levels (GWL) for different horizons. To accomplish the overarching purpose of this work, GWL measurements taken from the state of Maine (ME) which is located in the northeast part of the US were utilized. The original dataset was first subjected to the partial autocorrelation function to identify the input variables, and then, Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (CEEMDAN) was employed to acquire the sub-signals of the original time series. As a deep learning algorithm, the long short-term memory network (LSTM) was employed and the predictions were performed for two different time scales, i.e., 1-day ahead and 15-days ahead. It is worth mentioning that the proposed hybrid model was further benchmarked with two enhanced tree-based algorithms, namely eXtreme Gradient Boosting (XGBoost) and Adaptive boosting (Adaboost), each also hybridized with the CEEMDAN. The PACF analysis highlighted the 1-month, 2-months, and 3-months lag-times as the input variables that can be used in predicting the GWL fluctuations. In addition, the efforts devoted to divide the original time series into its sub-bands through the CEEMDAN resulted in a total of 12 sub-signals containing 11 intrinsic mode functions (IMFs) and a residual series. Hence, the predictions using each ML algorithm are made for all the extrapolated signals and the predictive outcomes are summed to conduct comparisons against the measured GWL pertaining to the pre-defined testing set. The overall results underpinned that the CEEMDAN-LSTM model outperformed its counterparts in both two lead-times (called t+1 and t+15) with respect to various performance indicators, such as Nash-Sutcliffe efficiency (NSE) index, determination coefficient (R2), and root mean square error (RMSE). Based on comprehensive comparisons, the proposed prediction scheme yielded slightly better accuracies for short-term predictions represented by 1-day lead-time, while it demonstrated superior performance in long-term prediction, i.e., 15-days lead-time. Such that, the CEEMDAN-LSTM model gave an NSE of 0.9980, R2 of 0.9987, and RMSE of 1.8395 for t+1 prediction, while the corresponding performance indicators were obtained as NSE: 0.9966, R2: 0.9972, RMSE: 2.3923 for the CEEMDAN-XGBoost and NSE: 0.9535, R2: 0.9701, RMSE: 8.4665 for the CEEMDAN-Adaboost in short-term predictions. In addition, for long-term predictions, the CEEMDAN-LSTM yielded satisfactory performance with NSE of 0.9495, R2 of 0.9470, and RMSE of 9.5668, whereas the benchmarking attempts resulted in statistically acceptable but lower accuracies with NSE of 0.8642, R2 of 0.8884, and RMSE of 13.8883 for the CEEMDAN-XGBoost and NSE of 0.6529, R2 of 0.7540, and RMSE of 20.6193 for the CEEMDAN-Adaboost. Hence, the superiority of the deep-learning algorithm over the shallow-learning algorithms was proven with regard to the prediction of GWL fluctuations. The key outcomes of the current study are expected to assist researchers who focalize on the incorporation of enhanced data-driven techniques into the hydrological variables’ determination. ORCID NO: 0000-0002-7144-2338

Anahtar Kelimeler: Deep Learning, Groundwater Level, Hydrology, LSTM, Signal Processing, Tree-Based Machine Learning