The present research aims to explore the utility of a widely adopted deep learning technique in determining the daily groundwater levels (GWL) for different horizons. To accomplish the overarching purpose of this work, GWL measurements taken from the state of Maine (ME) which is located in the northeast part of the US were utilized. The original dataset was first subjected to the partial autocorrelation function to identify the input variables, and then, Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (CEEMDAN) was employed to acquire the subsignals of the original time series. As a deep learning algorithm, the long shortterm memory network (LSTM) was employed and the predictions were performed for two different time scales, i.e., 1day ahead and 15days ahead. It is worth mentioning that the proposed hybrid model was further benchmarked with two enhanced treebased algorithms, namely eXtreme Gradient Boosting (XGBoost) and Adaptive boosting (Adaboost), each also hybridized with the CEEMDAN. The PACF analysis highlighted the 1month, 2months, and 3months lagtimes as the input variables that can be used in predicting the GWL fluctuations. In addition, the efforts devoted to divide the original time series into its subbands through the CEEMDAN resulted in a total of 12 subsignals containing 11 intrinsic mode functions (IMFs) and a residual series. Hence, the predictions using each ML algorithm are made for all the extrapolated signals and the predictive outcomes are summed to conduct comparisons against the measured GWL pertaining to the predefined testing set. The overall results underpinned that the CEEMDANLSTM model outperformed its counterparts in both two leadtimes (called t+1 and t+15) with respect to various performance indicators, such as NashSutcliffe efficiency (NSE) index, determination coefficient (R2), and root mean square error (RMSE). Based on comprehensive comparisons, the proposed prediction scheme yielded slightly better accuracies for shortterm predictions represented by 1day leadtime, while it demonstrated superior performance in longterm prediction, i.e., 15days leadtime. Such that, the CEEMDANLSTM model gave an NSE of 0.9980, R2 of 0.9987, and RMSE of 1.8395 for t+1 prediction, while the corresponding performance indicators were obtained as NSE: 0.9966, R2: 0.9972, RMSE: 2.3923 for the CEEMDANXGBoost and NSE: 0.9535, R2: 0.9701, RMSE: 8.4665 for the CEEMDANAdaboost in shortterm predictions. In addition, for longterm predictions, the CEEMDANLSTM yielded satisfactory performance with NSE of 0.9495, R2 of 0.9470, and RMSE of 9.5668, whereas the benchmarking attempts resulted in statistically acceptable but lower accuracies with NSE of 0.8642, R2 of 0.8884, and RMSE of 13.8883 for the CEEMDANXGBoost and NSE of 0.6529, R2 of 0.7540, and RMSE of 20.6193 for the CEEMDANAdaboost. Hence, the superiority of the deeplearning algorithm over the shallowlearning algorithms was proven with regard to the prediction of GWL fluctuations. The key outcomes of the current study are expected to assist researchers who focalize on the incorporation of enhanced datadriven techniques into the hydrological variables’ determination. ORCID NO: 0000000271442338
Anahtar Kelimeler: Deep Learning, Groundwater Level, Hydrology, LSTM, Signal Processing, TreeBased Machine Learning
