Mastering seismic time series response predictions using an attention-Mamba transformer model for bridge bearings and piers across varied testing conditions

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Scientific Reports volume 14, Article number: 29751 (2024 ) Cite this article High Quality And Low Price Bridge Expansion Joints

Mastering seismic time series response predictions using an attention-Mamba transformer model for bridge bearings and piers across varied testing conditions | Scientific Reports

This research introduces an advanced method for predicting seismic responses and hysteresis curves of instrumented bridge piers and bearings under various loading conditions, leaning solely on a single deep learning architecture and the same hyperparameters tuning. Test specimens are subjected to ground accelerations including vertical seismic loads and axial forces. To accurately capture peak values, particularly on the negative side of the hysteresis loop (unloading region), the model employs a stacked deep architecture. A key component to overcome the challenges is the self-attention-Mamba-driven transformer layer, which enhances the model’s ability to capture long-range dependencies in seismic data. This layer works in conjunction with other deep learning techniques to ensure robust and precise predictions. Implemented with Python’s Keras functional API, the model processes inputs like ground accelerations, actuator loads, effective height, moment of inertia, and superstructure mass. The model is evaluated with a dataset of 95 real-time hybrid simulation (RTHS) tests for lead rubber bearings, 29 RTHS tests for bridge piers, and 17 cyclic tests (10 fast and 7 slow). Extensive hyperparameter tuning demonstrates the model’s proficiency to capture hysteresis and residual deformations accurately. Achieving an impressive correlation with experimentally measured values, ranging from 88.1 to 98.9%, and a reasonable dissipated energy error ratio are notable. The deep learning model reduces the need for additional tests, offering time and cost savings, and provides rapid, and accurate insights into bridge behavior. This supports timely and precise bridge design and aids decision-makers during emergencies.

Hysteresis and backbone curves play a vital role in bridge pier analysis and design, predicting how piers behave under various loads, including earthquakes. They assist in identifying potential failure modes, formulating maintenance strategies, and optimizing designs for durability1,2.

Using isolation and energy-dissipating devices like lead-rubber bearings (LRBs) is an effective engineering strategy for seismic response mitigation. LRBs act as buffers between the bridge superstructure (deck and girders) and substructure (piers), allowing controlled movement during earthquakes to absorb and dissipate energy. This diminishes seismic force transmission to bridge piers, enhancing structural safety. Accurate LRB design requires careful material selection, stiffness control, and quality assurance to ensure effective energy capture and protection3,4.

Advanced deep learning techniques provide significant advantages in managing piers and LRBs in isolated bridges by enabling real-time health monitoring, predictive maintenance, and seismic event detection. They support adaptive control systems, retrofitting, and data-driven design, enhancing decision-making, risk assessment, and overall resilience. This reduces maintenance costs, improves structural integrity, and ensures optimal performance in earthquake-prone regions.

Collecting data from sensors on bridge piers and bearings, such as linear variable differential transformers (LVDTs), accelerometers, and strain gauges, provides real-time insights into their performance. Computational simulations and laboratory experiments offer valuable information for predictive analysis and design optimization. Field data from actual bridge installations further evaluate real-world performance. Testing methods include cyclic tests, real-time hybrid simulations (RTHS), and shaking table tests, capturing bearing and piers responses under various conditions.

Cyclic tests are the most accessible method for approximating the behavior of structures under seismic loads and help study seismic responses by applying controlled displacements using servo-hydraulic actuators5. However, slow cyclic tests fall short of capturing rate-dependent behavior and failure mechanisms under higher-speed dynamic loads. To overcome these limitations in fast cyclic tests with axial loads, Chae et al.6 proposed an innovative method combining a displacement-adaptive time series (D-ATS) compensator with a flexible loading beam (FLB) as a compliance spring. This method allows for the application of a constant axial force to reinforced concrete bridge piers during fast cyclic tests, particularly benefiting axially stiff members like base isolators, piers, and walls.

Ensuring that experimental testing conditions closely mimic real-world scenarios is essential. This involves applying real-time simulated loads to test specimens in the laboratory for a more accurate understanding of seismic responses. Shaking table tests are highly accurate in capturing structural responses of bridge piers or bearings during earthquakes but require building full-scale structures, which is often costly and impractical for large structures. Alternatively, RTHS offers a cost-effective solution for assessing seismic responses of large structures. RTHS physically tests only the critical substructure or component, like a pier or bearing, while simulating the rest of the structure with computational models. The experimental results are integrated with real-time simulations to provide a comprehensive understanding of the seismic behavior of the entire structure7,8.

Some studies focused on the relationship between the hysteresis curves of bearings and piers and bridge damages using machine learning algorithms9,10,11. The blind spot of such algorithms lies in their inability to maintain the long-range dependencies when capturing the entire time series responses across linear, low, and high nonlinear regions. Additionally, various methods have been explored for predicting dynamic responses of civil infrastructure to ground vibrations, including multifractal dimensions, symbolic and Bayesian regressions, ARIMA, MLP networks, CNNs, and LSTMs. Each has limitations, but Bidirectional LSTM (Bi-LSTM) networks, particularly with CuDNN acceleration, show improved performance by capturing complex temporal patterns and long-term dependencies, despite higher computational costs12,13.

This study focuses on two-span bridges for real-time hybrid simulation (RTHS) tests, both with and without isolating LRBs. The bridge superstructure is modeled analytically, while the LRBs undergo experimental testing. In 2023, four smaller LRBs were tested; in 2024, a larger LRB was tested (refer to Fig. 1a, b; only the larger LRB setup is included for brevity), all positioned at the top of the pier. In trials without LRBs, the bridge pier is volunteered for the experimental setup (Fig. 2). The bridge abutments are assumed to be equipped with roller supports. Additionally, in the LRB-isolated bridges, it is presumed that the bridge pier is strong enough that its lateral deformation is negligible compared to the LRBs’ deformation under earthquake-induced loads. The bridges are subjected to both horizontal and vertical ground motions, illustrated in Figs. 1 and 2. Vertical ground acceleration causes time-varying dynamic axial forces on the LRBs and pier, calculated from the support reaction force at the bridge pier due to the vertical oscillations of the bridge superstructure. When the bridge superstructure is considered longitudinally inflexible, the governing equation for the bridge’s horizontal motion simplifies to that of a single-degree-of-freedom structure.

Selected LRB-isolated bridge structure for RTHS tests. (a) A two spans LRB-isolated bridge structure designed for RTHS (larger LRB) (2024). (b) Horizontal RTHS model (larger LRB).

Selected bridge structure for RTHS, fast and slow cyclic tests. (a) A bridge structure with two spans designed for RTHS. (b) Analytical modeling for simulating vertical vibrations in the bridge structure. (c) Horizontal RTHS model.

Equation (1) introduces several key parameters: m represents the total mass of the bridge superstructure, with assumed values listed in Table 1; me=3.8 × 103 Kg; ma= m – me; c stands for the damping coefficient of the system, determined based on an inherent damping ratio of 3%, which is derived from the primary stiffness of the LRBs, given as for the smaller LRBs: K1(s) = 14.032 KN/mm, for the larger LRB: K1(l) = 81.281 KN/mm; for the bridge pier the system damping coefficient c is assigned to be 3% of the critical damping of the entire bridge structure. u, \(\dot {u}\) , and \(\ddot {u}\) denote the horizontal displacement, velocity, and acceleration of the bridge, respectively; P represents the axial force applied to the LRBs and pier; R signifies the horizontal restoring force of the LRBs and pier, and it is a function of u and P.

This paper employs the innovative real-time force control technique introduced by Cho et al.17 for RTHS tests conducted in 2023 and 2024. This approach provides benefits by utilizing only the standard sensors found in a typical servo-hydraulic actuator, eliminating the requirement for additional sensors. Polynomial regression (PR) is engaged to estimate the current state displacement. This simplifies the implementation of axial force, improving practicality and cost-effectiveness overall. Consequently, the adjusted input displacement to the actuator is computed in the revised format detailed in Eq. (2):

Figure 3 depicts the experimental test setup used to assess the seismic performance of a test specimen equipped with four smaller LRBs and a larger LRB (for brevity, only the larger LRB setup is shown). The testing is conducted in the Hybrid Structural Testing Center (Hystec) facility, utilizing equipment to perform 95 RTHS, comprising 53 tests in 2023 and 42 tests in 2024. These tests aim to study variations in factors such as horizontal and vertical ground acceleration, axial load, superstructure mass, height, and moment of inertia of the LRBs as detailed in Table 1. To induce horizontal displacement in the LRBs, a 1000 kN dynamic actuator is employed, while two 500 kN dynamic actuators apply the desired axial force to the LRBs through the flexible loading beam (FLB), compliance spring, and steel loading block. Actuator control occurs with a time step of ∆t = 1/1024 sec. Massive steel blocks serve as a mounting platform for the LRBs, providing support and space for accommodating the two vertical actuators beneath the FLBs. The smaller LRBs used in this study possess the following parameters: primary stiffness, K1(s) = 14.032 KN/mm, secondary stiffness, K2(s) = 0.147 KN/mm, characteristic strength, Qd(s) = 2.894 KN, effective stiffness, Keff(s) = 0.291 KN/mm, an equivalent damping constant ratio, heq(s) = 0.313, and shear strain, γ(s) = 100%. These parameters differ for the larger LRB as follows: K1(s) = 81.281 KN/mm, K2(s) = 0.8445 KN/mm, Qd(s) = 50.715 KN, Keff(s) = 1.689 KN/mm, and heq(s) = 0.315, refer to the references for further details on the calculations18,19,20.

Experimental configurations for RTHS established in Hystec at Myongji University, South Korea. (a) Overview. (b) Detailed Close-Up of the LRB setup. (c) LRB section used in this research.

Figure 4 includes the design drawing and the force-displacement history graph for the larger LRBs. In general, the effective horizontal period (Teff) of the entire seismic isolation structure was designed to be 0.92 s, and 1.38 s for the smaller and larger LRBs, respectively, simulating the typical period of existing civil infrastructures.

Design drawings and force-displacement relationships of the LRB.

The experimental setups for the bridge pier during RTHS, fast, and slow cyclic tests conducted at Hystec are detailed in12,13. The configurations for the datasets from 2017 to 2018 are derived from the work of Chae et al.6,21, including details on pier section characteristics, nominal and effective heights, and test setups. In 2022, 13 new tests were carried out to investigate variations in factors such as vertical ground acceleration, cross-sections, axial loads, superstructure masses, and heights12,13. Additionally, to expand the diversity of the data by incorporating various ground motions, axial forces, cross-sections, and heights, 17 new RTHS tests have been scheduled in 2024 (refer to the experimental setups shown in Fig. 5). A compliance spring, known as FLB, is installed at the top of the bridge piers to facilitate the application of axial force; more information can be found in21,22. The details of RTHS and cyclic tests are presented in Tables 4 and 5, respectively.

Experimental configurations of bridge piers established at Hystec. (a) Circular section RC piers specimen (unit: mm). (b) Experimental test setup (2024). (c) Square section with vertical load (unit: mm). (d) Experimental test setup (2024).

The accuracy of the RTHS method was validated through a shaking table experiment at Pusan National University, involving a 23,500 kg mass on four LRBs, with Northridge ground acceleration applied in both lateral and vertical directions. Due to the capacity limitations of the shaking table and the test specimen, the input acceleration was set at 30% of actual seismic levels. In the RTHS setup, the acceleration recorded on the shaking table is utilized, and the vertical load is determined using the PR state estimator. Results showed a 0.09% difference in maximum displacement between the RTHS and shaking table experiment under real vertical excitation. However, excluding vertical acceleration increased the error to 21.08%, highlighting the importance of including both horizontal and vertical excitations in the RTHS to accurately replicate real-world conditions (more details in14).

Effective data preprocessing is essential for the success of deep learning models in forecasting time series responses. This step transforms raw data into a format suitable for deep learning algorithms, aiming to enhance data quality, reduce noise, and eliminate inconsistencies or errors. In dealing with the disturbances caused by oil column resonance in actuators, a 6th-order low-pass Butterworth filter with an 8 Hz cutoff frequency is applied to both horizontal and vertical actuator forces in tests conducted in 2023 and 2024. For tests from 2017 to 2022, a 5 Hz cutoff frequency was used. It is crucial to address the initial time step delays in the actuators’ measured response, which need to be synchronized with the duration of the input ground motion data as shown in Fig. 6. Additionally, to ensure that all time series data in the database are of the same length, shorter sequences are padded using linear interpolation. This approach preserves the original data’s trends and characteristics while introducing variations within the sequences. This technique not only ensures uniform data lengths but also enhances the diversity of the training dataset and eliminates potential outliers.

Data preprocessing. (a) Unprocessed data (Kobe). (b) Processed data (Kobe).

In order to comply with LSTM network requirements, the input uncertain variables need to be structured as a 3D array with the shape (samples or mini-batch size, time steps, features). Leaving input variables unscaled in deep learning models like LSTM can lead to unstable learning and significant errors. To mitigate this, data scaling is performed using sklearn.preprocessing.StandardScaler23. Additionally, a stacked scheme proposed by Zhang et al.24 is implemented for the Bi-CuDNNLSTM network, enhancing the speed and accuracy of time history response predictions in RTHS. The optimization of the window size, which indicates the size of each stack in deep learning models, is conducted using the GridSearchCV class from Python’s Keras library25 and babysitting the model. Through grid search, a window size of 10 is identified as optimal for achieving accurate predictions, crucial given the lengthy sequence of time series data comprising 3426 data points per variable. As a result, each input feature has a shape of (None, 342, 10). Initially, the time steps total 71,940, based on a sampling frequency of 1024 Hz. To prevent memory errors, the data undergoes downsampling to reduce the sampling frequency to 48.76 Hz before being fed into the deep model.

To achieve optimal performance on unseen data associated with piers and LRBs, it is often necessary to explore and refine various components of neural network development. This includes selecting the most effective combination of deep learning architecture, loss function, training algorithm, hyperparameter tuning, and data preprocessing techniques tailored to the specific task. Below is a detailed explanation of the algorithms used in the proposed self-attention-Mamba-driven transformer-based stacked CNN-bidirectional CuDNNLSTM network, referred to in Fig. 7.

A plot of the proposed self-attention-Mamba-driven transformer-based stacked CNN-bidirectional CuDNNLSTM model involving multiple inputs. (a) Architecture of the proposed deep learning model. (b) Self-attention-Mamba-driven transformer layer.

The task involves addressing the sensitivity of data to negative weight values by introducing a hybrid activation function. This function combines the Exponential Linear Unit (ELU)26 for negative weights and the Gaussian Error Linear Unit (GELU)27 for positive weights, enhancing nonlinearity and improving model performance. In this study, the ELU’s hyperparameter α is set to -1.5, emphasizing negative weights and preventing the vanishing gradient problem. The GELU, though computationally demanding due to the Φ(x) calculation, is favored due to its success in tasks like computer vision and natural language processing (NLP), often using a faster approximation in practice27, as described in Eq. (3).

Causal padding is employed to maintain sequence length and enforce temporal causality28,29, with parameters including a filter size of 100, kernel size of 3, and ELU_GELU activation function. Figure 7 shows the integration of Conv1D with AveragePooling1D30 for temporal data to downsample the input representation, using a pooling size of four and strides of one. SeparableConv1D31, inspired by MobileNet’s depthwise separable convolutions32, is applied to time series data for parameter efficiency, with hyperparameters tuned to 100 filters, a kernel size of 3, ELU_GELU activation function, and ‘SAME’ padding.

The architecture takes advantage of NVIDIA’s optimized Bidirectional LSTM (Bi-CuDNNLSTM)12,33,34 for accelerated GPU training with default parameters (50 units, return_sequences = True).

Skip connections in ResNet architecture35 improve information flow and gradient propagation by creating shortcuts that bypass layers, crucial for deep network training and mitigating vanishing gradients. This paper enhances performance and generalization by using skip connections to link outputs from feature extraction layers and the LayerNormalization of the second Bi-CuDNNLSTM layer to the last Bi-CuDNNLSTM layer through element-wise addition with the Add()36 layer.

A hybrid loss function, combining Mean Squared Error (MSE) and Mean Absolute Error (MAE), is introduced to balance sensitivity to outliers and local fluctuations. A weight parameter (w = 0.5), presented in Eq. (4), controls this trade-off, optimizing model accuracy for time series prediction. Metrics like MSE, MAE, and R2 are reported to evaluate model performance and trend-capturing capabilities.

Key techniques in deep learning to improve generalization and prevent overfitting include BatchNormalization37, LayerNormalization38, and dropout39. BatchNormalization normalizes activations across mini-batches, while LayerNormalization does so independently for each example, without adding learnable parameters. Dropout (set to 0.2) deactivates a subset of neurons, reducing reliance on specific inputs and fostering more resilient learning. These regularization techniques, when combined, stabilize training and prevent issues like vanishing or exploding gradients.

The TimeDistributed Dense layer40, essential for learning local temporal patterns, wraps a Dense layer around each sequence element. Specifically, the model includes four TimeDistributed(Dense) layers, with two incorporated within the transformer layer, followed by a Dense output layer. The first three TimeDistributed(Dense) layers are configured with 100 units each, and the subsequent layer with 50 units, all utilizing the ELU_GELU activation function. Additionally, an exponential decay learning rate scheduler41 is employed, starting with an initial rate of 0.0015, decay constant of 0.99, and weight decay of 0.00001 to ensure stable training and improve generalization.

Input features include GA, Fa, Fv1, Fv2, He, I, and M, representing ground acceleration, actuator forces, height, moment of inertia, and superstructure mass within the RTHS setup. It is noteworthy that the influence of ground acceleration in the vertical direction, as discussed in Subsection 2, is taken into account when evaluating the vertical load. Key inputs are processed individually through an attention dropout layer to emphasize significant elements (see Fig. 7 and Appendix A).

The model is trained with a mini-batch size of 64, using the Adamax optimizer42 and a learning rate of 0.0015. Gradient explosions are prevented with a clipnorm of 4, and early stopping43, with a patience setting of 3, ensures training halts when no improvement is seen. Critical preprocessing steps include reshaping, scaling, and stacking inputs before defining layers, and model compilation involves callbacks like weight decay, an exponential learning rate scheduler, and early stopping using the model.fit()44 method to address potential overfitting. Further information about the previously mentioned layers and configuration is available in the following papers13,14.

The novelty of this paper lies in its effective handling of negative weights in the unloading region of hysteresis curves. By integrating the recently released cutting-edge deep learning algorithm, Mamba15, along with a transformer layer45 and other advanced techniques discussed earlier, the model successfully passes the verification tests. The Self-Attention-Mamba-driven transformer layer is thoroughly detailed in the following subsection. Furthermore, this paper draws on 17 new RTHS tests on piers and 42 new RTHS tests on a larger LRB conducted in 2024, with and without accounting for vertical ground acceleration, offering the possibility of including additional input features (He and I) for the deep learning (DL) model and further assessing the impact of vertical ground acceleration. The model’s capability is further enhanced to predict hysteresis curves and the corresponding dissipated energies of bridge piers under RTHS, including 17 tests conducted in 2024, as well as during slow and fast cyclic tests.

Transformers excel at capturing relationships between distant elements in a sequence, thanks to the multi-head attention mechanism, enabling effective handling of long-range dependencies45. They have set benchmarks across various NLP tasks, demonstrating high accuracy and flexibility in Large Language Models (LLMs), and can be adapted to diverse tasks beyond text, including image and audio analysis.

In this research, we propose a hybrid architecture that leverages the strengths of both the Mamba and Transformer models, referred to as a self-attention-Mamba-driven transformer (find Fig. 7b for additional details), to predict seismic responses of piers and LRBs under diverse testing methods and loading conditions. The Mamba Transformer model thus represents a compelling synthesis, promising superior performance in tasks requiring both intricate domain understanding and efficient handling of extensive sequential data. Specifically, it excels in seismic time series response, setting a new standard for state-of-the-art deep learning architectures.

The key components of the proposed self-attention-Mamba-driven transformer model including the hyperparameters tuning are discussed in detail as follows:

The Mamba block refined in this study includes (1) Input Projection Layer: Input is passed through a dense layer to project it into a higher-dimensional space, doubling the internal dimension size, which is set to 50. This layer prepares the input for further processing by splitting it into two parts, the main input and a residual connection. (2) A self-attention layer implemented in Keras47 with parameters set to “use_scale = False, score_mode=’dot’, dropout = 0.2”, to further refine the output by allowing the model to focus on specific parts of the input sequence. (3) A SeparableConv1D layer31 with a kernel size of 3 and 342 filters, using a causal padding scheme to maintain sequence order. (4) The Swish activation functions48: The output of the SeparableConv1D layer is passed through the Swish activation function to introduce non-linearity and enhance the representation capability. (5) Intermediate projection layer within the SSM, a dense layer that projects the processed input from the SeparableConv1D layer into three parts: delta, B, and C matrices. It should be highlighted that the delta denotes a trainable parameter called step size representing the input’s resolution, matrix B determines how the input influences the state, and matrix C translates the current state to the output, see Eq. (5) and Fig. 8 for details. This layer prepares the input for the selective SSM by projecting it into the appropriate dimensions. The hyperparameters in this dense layer are set to “units = delta_t_rank + 2×model_state, use_bias = False”, where delta_t_rank = 13 and model_state = 64. (6) Delta projection layer: delta values are further transformed using a dense layer with hyperparameters set to “units = 50, use_bias = True”. The Softplus activation function is then applied to the output of this dense layer ensuring that delta values remain positive. (7) The state transition matrix A, initialized using HiPPO49 for handling long-range dependencies, is used in its logarithmic form for stability. These matrices model the state transitions within the SSM. The rationale behind using the HiPPO Matrix is to create a hidden state that preserves its historical information by compressing and reconstructing the signal information16,49. (8) Matrix D: A trainable variable representing the direct influence of the input on the output. It starts with all elements set to one. During training, its values are updated as part of the optimization process. It represents the direct influence of the input on the output, scaling the input directly in the final output computation of the SSM. (9) Selective SSM: It employs the selective_scan function to compute state updates dynamically over time. It utilizes delta, B, C, A, and D matrices to incorporate input effects over time, manage long-range dependencies, and generate a refined representation of the input sequence. (10) Residual Connection: The initial input, passed through the input projection, is stored as a residual connection. The ELU_GELU activation function is applied to this residual and then the output is multiplied by the SSM output normalized using Keras LayerNormalization38. (11) Output Projection: The final processed output is passed through a dense layer with “units = 100 and use_bias = False”, to project it back to the original input dimensions. This layer produces the final output of the Mamba block after completing all internal processing. In the Mamba and transformer blocks shown in Fig. 7, M and N are configured to 6, meaning the blocks are repeated six times.

State space model representation, visualizing the two equations in a single architecture.

Interested readers are encouraged to consult the references for more detailed information about the original Mamba block if needed15.

Since Transformers do not inherently capture sequence order, they require explicit positional information. This is where positional encoding comes into play. The goal of positional encoding, as presented in Eq. (6) is to maintain knowledge about the order of objects in a sequence45.

where P, k, and i stand for the positional encoding, the position, and the dimension. dmodel refers to the dimensionality of the model’s hidden states.

The multi-head attention layer is pivotal within the transformer architecture, functioning as a module for attention mechanisms. It operates by executing the attention mechanism simultaneously across multiple heads in parallel. These individual attention outputs are then combined through concatenation and linear transformation to achieve the desired dimensionality, thereby enhancing the overall performance of the model. The TensorFlow library, specifically through its Keras API50, provides a MultiHeadAttention layer, simplifying the integration of multi-head attention into neural network designs. The mathematical formulation of this layer, as detailed in Eqs. (7) and (8), offers a clear and concise description.

where the projections represent weight matrices:\(W_{i}^{Q} \in {R^{{d_{\bmod el}} \times {d_k}}},W_{i}^{K} \in {R^{{d_{\bmod el}} \times {d_k}}},W_{i}^{V} \in {R^{{d_{\bmod el}} \times {d_v}}},\) and \({W^O} \in {R^{h{d_v} \times {d_{\bmod el}}}}\) . Here, Q, K, and V are the quary, key, and value matrices, respectively. Given an input sequence\(X \in {R^{n \times d}}\) : Q = XWQ, K = XWK, V = XWV. The Scaled Dot-Product Attention function is defined as:

in this research, the values for dk and dv, which represent the dimensionality of the key and value vectors respectively, are both set to 16. Additionally, the number of attention heads or parallel attention layers, denoted as h, is configured to 12. A dropout rate of 0.1 is applied to the output of the multi-head attention layer, which then connects to the output positional encoding layer via element-wise addition using the Add() layer in Keras. Here, ‘Norm’ refers to the application of the LayerNormalization layer to the output of the Add() layer. The resulting output is then passed to the Mamba block. A dropout rate of 0.2 is applied to the output of the Mamba block, followed by normalization using the LayerNormalization layer. Subsequently, two fully connected layers, each with 100 units, and the activation function set to ELU_GELU, are wrapped with the TimeDistributed(Dense) layer in Keras. Notably, the first feed-forward layer is normalized using the LayerNormalization layer, and a dropout rate of 0.1 is applied to the output of the second wrapper. The final element-wise addition is implemented between the output of the second normalized wrapper and the output of the multi-head attention layer, after applying the dropout and Add&Norm layers.

The training, validation, and test dataset scenarios for LRBs during the RTHS tests are randomly outlined in Table 1. Northridge scenarios are chosen for unseen datasets for both the smaller and larger sizes of LRBs, experimented on in 2023 and 2024, respectively. These scenarios involve horizontal and vertical ground acceleration intensity, constant axial load, effective height of the LRBs, moment of inertia, and mass values. Among them, 52 scenarios are designated for the training set, 23 for validation, and 20 for test datasets, making a total of 95 RTHS tests.

Model training involves epochs, each representing a full pass through the training dataset in smaller subsets called batches (set to 64). Predictions and errors are calculated using a designated hybrid loss function, with individual batch losses averaged to determine epoch-level loss. Monitoring training and validation losses is essential: the former indicates model fit, while the latter assesses generalization. More epochs allow for complex pattern learning, but too many can lead to overfitting. To mitigate this risk, three evaluation metrics, MSE, MAE, and R2, track losses, guiding the optimal number of epochs, set at 1000 in this study. As illustrated in Fig. 9a, both training and validation losses trend downward, approaching zero after 1000 epochs. The following subsections discuss the predicted outcomes of the RTHS tests.

The hybrid loss variations observed during assigned epochs. (a) RTHS tests (LRBs data). (b) RTHS tests (piers data). (c) Fast and slow tests (piers data).

To assess how well the proposed deep model predicts, a detailed analysis of error sensitivity is performed. As outlined in Table 2, five error measures are utilized to examine the predictions for RTHS tests. Equation (9) straightforwardly computes the normalized estimation error (ε), which is defined as follows:

where Yo, Yp, and i, represent the observation (experimental) response, prediction response, and the number of time steps, respectively.

Assessing the estimated displacement time histories during RTHS tests involves comparing them to the experimental displacement time histories from the unseen dataset. This comparison evaluates the accuracy and effectiveness of the model’s predictions. Figure 10 highlights a strong agreement between the experimental and predicted unseen LRB data listed in Table 1, especially in terms of maximum displacement, which is critical for evaluating the performance of bridge bearings under seismic excitations. The proposed deep learning model shows significant robustness in accurately predicting displacement time series, even in low and high nonlinear regions. The hysteresis curves further validate the findings. As illustrated in Fig. 10, the model successfully predicts the trend of the hysteresis curves, including the area under the curve, which represents dissipated energy. Table 2 shows that the maximum dissipated energy error ratio is 8.89%.

Assessment of estimated versus empirical displacement time series and hysteresis curves for LRBs Using the unseen Northridge dataset in RTHS tests. (a) Displacement time histories (scenario 78). (b) Hysteresis curves (scenario 78). (c) Displacement time histories (scenario 79). (d) Hysteresis curves (scenario 79). (e) Displacement time histories (scenario 82). (f) Hysteresis curves (scenario 82). (g) Displacement time histories (scenario 85). (h) Hysteresis curves (scenario 85). (i) Displacement time histories (scenario 90). (j) Hysteresis curves (scenario 90). (k) Displacement time histories (scenario 91). (l) Hysteresis curves (scenario 91). (m) Displacement time histories (scenario 95). (n) Hysteresis curves (scenario 95).

To validate the objectives of this research and demonstrate the novelty of the architectural components, we trained a comprehensive DL model using over 1000 combinations of architectures and layer configurations. This included variations like CNN layers, LayerNormalization, BatchNormalization, GPU acceleration with NVIDIA cuDNN, skip connections, causal padding, hybrid loss, and activation functions, exponential learning rate scheduling, TimeDistributed Dense layers, various attention mechanisms and transformers, stacking methodologies, optimizers, learning rates, decays, scaling techniques, and Mamba algorithm. A summary of the various model configurations and hyperparameter tunings is outlined in Table 3. We evaluated numerous alternative hyperparameter settings based on four error measures to find the optimal combination for superior model performance. The conclusions drawn here provide the most accurate predictions possible with the proposed DL model architecture, demonstrating its reliability and robustness in predicting unseen LRB time series data and the corresponding dissipated energies under different ground motions, particularly accounting for vertical ground accelerations and axial forces in RTHS tests. Notably, the variations listed in Table 3 are applied to the DL architecture depicted in Fig. 7.

The experimental data presented in Tables 1 and 4 encompasses a wide range of ground motion intensities, including the Chi-Chi earthquake with scale factors from 0.5 to 1.25 and the Kobe earthquake with scale factors between 0.1 and 0.4. This comprehensive range indicates the reliability of the data, as it represents both low- and high-magnitude seismic intensities. It is important to emphasize that we do not rely solely on the correlation coefficient (R factor) to evaluate the prediction reliability. Table 3 highlights four error measures, where the proposed DL model outperforms other configurations when all metrics are considered. For instance, removing the Mamba algorithm results in an R factor of 0.9545, but the Dissipated Energy Error Ratio rises to 30.61%, showing that focusing only on the R factor is insufficient for determining the best model. Besides, Table 2 highlights five error measures, including the maximum displacement ratio (prediction/observation), used to evaluate various DL architectures in accurately capturing the seismic behavior of LRBs. The R factor ranges from 88.1 to 98.9% across the 20 unseen datasets in Table 1, demonstrating the proposed model’s ability to predict unseen displacement time histories. However, in some cases, the model struggles to capture the full displacement trend, with the lowest R factor being 88.1% for unseen data. It can be concluded that the proposed DL model faces challenges in accurately capturing the overall displacement time history trend, corresponding hysteresis curve, and achieving a higher R factor, particularly when LRBs experience lower axial forces and vertical ground accelerations. Notably, the lowest R factor is associated with Scenario 86 in Table 1 (see Fig. 11), which aligns with our previous finding14 discussed in Sect. 2 that excluding vertical acceleration increases errors in assessing maximum displacement during RTHS tests compared to the corresponding shaking table tests.

Evaluation of estimated versus empirical displacement time history and hysteresis curves for LRBs Using Unseen Northridge Data in RTHS, excluding vertical ground acceleration. (a) Displacement time histories (scenario 86). (b) Hysteresis curves (scenario 86).

An additional trial is performed to assay the effectiveness of the proposed deep learning model in predicting time series responses. This trial focuses on estimating the seismic responses of bridge piers subjected to RTHS and cyclic tests, incorporating both slow and fast loading rates. Importantly, the same deep learning framework and hyperparameter tuning are used throughout these tests, with the only variation being the number of training epochs. This adjustment is made because the pier data required longer training times, as indicated by monitoring metrics during the training and validation phases. As evident from Fig. 9b, c, the training and validation losses exhibited a downward trend, ultimately approaching nearly zero after 3000 epochs for the RTHS tests and 2000 epochs for the cyclic tests. Details of these tests can be found in Tables 4 and 5. Figure 12 offers further information on the predefined sinusoidal displacement time histories applied to the horizontal actuator, highlighting variations in loading rates during the fast and slow cyclic tests. The slow tests from 2017 are found to be 91 times slower than their fast counterparts, with discrepancies of 89 and 48 times in 2018 and 2022, respectively. Figures 13 and 14 display the hysteresis and dissipated energies linked to the unseen piers data for the RTHS and cyclic tests, respectively.

Displacement time histories for slow and fast tests. (a) Displacement time histories (2017). (b) Displacement time histories (2018). (c) Displacement time histories (2022).

Comparing the performance of the estimated and empirical displacement time series and hysteresis curves of bridge piers under RTHS tests. (a) Displacement time histories (scenario 25). (b) Hysteresis curves (scenario 25). (c) Displacement time histories (scenario 26). (d) Hysteresis curves (scenario 26). (e) Displacement time histories (scenario 28). (f) Hysteresis curves (scenario 28). (g) Displacement time histories (scenario 29). (h) Hysteresis curves (scenario 29).

Comparing the performance of the estimated and empirical displacement time series and hysteresis curves of bridge piers under slow and fast cyclic tests. (a) Displacement time histories (scenario 14). (b) Hysteresis curves (scenario 14). (c) Displacement time histories (scenario 15). (d) Hysteresis curves (scenario 15). (e) Displacement time histories (scenario 16). (f) Hysteresis curves (scenario 16). (g) Displacement time histories (scenario 17). (h) Hysteresis curves (scenario 17).

The results indicate that the proposed model effectively passed the verification tests, demonstrating its ability to predict the seismic responses of bridge piers. In terms of estimating dissipated energies for unseen bridges, the errors in scenarios 25, 26, 28, and 29 are recorded at 1.98%, 1.58%, 19.16%, and 10.78%, respectively. For the unseen slow and fast cyclic tests in scenarios 14, 15, 16, and 17, the error rates are 26.39%, 14.65%, 18.64%, and 1.54%. Although some scenarios exhibited notable discrepancies in energy estimation, the overall trend of the hysteresis curves is accurately captured, resulting in generally satisfactory predictions. It is worth emphasizing that the model hyperparameters, unchanged from those tuned in the RTHS tests of LRBs, may negatively impact bridge pier response predictions. To address this, two approaches are recommended: (1) preprocessing pier data with higher sampling frequency and using a larger window size in the stacking method; (2) maintaining the same architecture but increasing the number of samples for RTHS and cyclic tests. These solutions may reduce the number of epochs and save training and validation time. Further discussion is omitted for brevity.

To evaluate the training and validation runtime, a further investigation is conducted. As outlined in Table 6, the proposed deep model predicts the seismic responses for unseen (inference) data in the twinkling of an eye. Furthermore, the training runtimes for the RTHS tests on LRBs, RTHS tests for piers, and cyclic tests for piers are roughly 29.952 min, 72.306 min, and 34.098 min, respectively. It should be highlighted that a desktop PC running TensorFlow version 2.10.1 is utilized, successfully detecting a single GPU, specifically an NVIDIA GeForce RTX 4060 Ti with a memory limit of 5682 MB. The system’s CPU is an AMD processor with 12 cores with a memory limit of 268 MB, complemented by 63.89 GB of RAM. The Python environment is based on version 3.9.7. Compatibility with TensorFlow 2.10.1 is ensured by employing CUDA version 11.2 and cuDNN version 8.1.0.1.

Figure 15 provides more insights into evaluating the performance of the proposed DL model when the piers are subjected to severe earthquakes and experience significant residual deformation. As illustrated, the pier underwent substantial permanent deformation during high-intensity Kobe (Scenario 15) and El Centro (Scenario 24) earthquakes (refer to Table 4). It is important to note that the RTHS test was halted during the Kobe scenario to ensure the safety of the testing process. Further verification regarding the residual displacement is found in Fig. 13 for scenario 29 (unseen data). Overall, the proposed DL model effectively manages the permanent displacement and accurately captures the displacement time histories, as well as the corresponding hysteresis curves and dissipated energies.

Assessment of the proposed DL model’s performance in handling significant residual displacement during an RTHS test under severe earthquake conditions. (a) Displacement time histories (scenario 15). (b) Hysteresis curves (scenario 15). (c) Displacement time histories (scenario 24). (d) Hysteresis curves (scenario 24).

This study introduces a deep learning framework featuring a self-attention-Mamba-driven transformer block, which demonstrates strong generalization capabilities with unseen data. The novel transformer block incorporates a multi-head attention layer and Mamba block to effectively manage long-term dependencies and nonlinearity in complex time series data, such as seismic responses of piers and LRB-isolated bridges. The Mamba block, enhanced by its S4 capacity, is augmented with a self-attention layer to prioritize the most critical features. To optimize the parameter count, a SeparableConv1D layer with causal padding is employed, making the model more lightweight. Additionally, TimeDistributed Dense and Dropout layers are integrated into the transformer block to improve learning of temporal data patterns and mitigate overfitting. Each input feature, including ground acceleration (GA), horizontal (Fa), and vertical (Fv1 and Fv2) actuator forces, is processed individually through a custom attention layer to emphasize the most significant features by assigning them higher weights. This processing for each head is implemented using the functional API of the Keras Python library. Additional input features, such as effective height (He), moment of inertia (I), and superstructure mass (M), are integrated into the model architecture after concatenating the other heads.

This study proposes several advanced techniques to enhance the performance of the deep learning model designed for seismic response prediction. Among these innovations are a hybrid loss function integrating MSE and MAE, and a novel activation function combining ELU and GELU. These enhancements effectively manage both negative and positive weights, thereby improving training, validation, and prediction accuracy on unseen data. Additionally, the model incorporates a 6th -order low-pass Butterworth filter for data preprocessing, employs data augmentation techniques, and utilizes an exponential learning rate scheduler alongside weight decay regularization. The optimization process is further supported by the Adamax optimizer and includes elements such as LayerNormalization, BatchNormalization, dropout layers, Time Distributed Dense layers, AveragePooling1D, and causal Conv1D layers. The architecture itself features a stacked CNN-Bi-CuDNNLSTM design with skip connections to bolster performance. The CNN component integrates AveragePooling1D and SeparableConv1D layers to downsample and reduce parameter complexity, facilitating improved generalization and mitigating overfitting. Each aspect of the model design is meticulously crafted to capture long-term dependencies critical for accurate seismic response forecasting. For each input feature head, the skip connections occur between the final Bi-CuDNNLSTM layer and the following layers: the layer normalization of the Conv1D layer, SeparableConv1D layer, and its layer normalization along with the layer normalization of the second Bi-CuDNNLSTM layer. Furthermore, to expedite training, cuDNN, a GPU-accelerated library, is leveraged, optimizing computational efficiency throughout the training process. The findings of this research reveal that layer normalization significantly improves the performance of the proposed DL model.

Utilizing comprehensive experimental data from 95 RTHS tests on LRB-isolated bridges, the deep learning model has been rigorously trained, validated, and tested. Verification against shaking table tests, which consider vertical ground acceleration, reveals a mere 0.09% difference in maximum displacement. Over 1000 epochs, the proposed hybrid loss function exhibits a substantial decrease, approaching zero values in both training and validation datasets. Notably, the predicted time series responses for Northridge scenarios as the unseen dataset achieve an impressive correlation with experimentally measured values, ranging from 88.1 to 98.9%. The model effectively characterizes bearing behavior, encompassing both linear and nonlinear aspects (residual deformations) during RTHS tests. Furthermore, it accurately captures the time history responses, including hysteresis curves and dissipated energies, closely mirroring the experimental data trends.

Further validation of the model involves predicting the seismic responses of bridge piers using extensive data from 29 RTHS tests and 17 cyclic tests conducted at both slow and fast loading rates. The model architecture and hyperparameter tuning remain consistent with those used for the LRB-isolated bridge tests, with the only adjustment being an increase in the number of epochs, as monitoring training and validation losses indicate a need for additional learning. While some predictions do not precisely capture the dissipated energies, the overall trend of the hysteresis curves is effectively represented, making the predictions generally satisfactory.

By providing a quick and accurate assessment of the hysteretic behavior and dissipated energies of LRBs and piers, the model supports engineers in bridge design across various scenarios with improved efficiency. These findings affirm the model’s accuracy, reliability, and robustness in capturing both linear and nonlinear (focusing on low and high nonlinearity) dynamics of bridge components. This underscores its potential to enhance deep learning network performance while reducing labor and experimental costs, thereby minimizing the need for new manufacturing and testing.

From a practical point of view, consider an instrumented bridge where the performance of its LRB or pier needs assessment. Instead of testing the entire bridge on a shaking table, which is both expensive and often impractical, RTHS enables testing only the critical component (LRB or pier), while the rest of the structure is simulated numerically. During an earthquake, the numerical and experimental parts are integrated in real-time for each time step. In RTHS tests, a horizontal actuator applies displacements to the LRBs or piers, simulating seismic forces based on numerical models. These displacements induce internal forces due to the stiffness and resistance of the component, which are measured by a load cell. An LVDT records displacements and both devices feed data into a Data Acquisition System (DAQ) in real-time, creating a comprehensive dataset of force and displacement over time. In actual bridge instrumentation, load cells and LVDTs are placed at critical points, such as the top of the pier where it connects to the superstructure, to measure reaction forces and displacements during seismic events. This setup provides accurate data on the dynamic response of LRBs or piers, enabling analysis, validation of deep learning models, and improvements in seismic design without repeated physical tests.

The authors outline several limitations of the proposed deep learning model:

Hyperparameters are tuned based on the current PC system’s capacity, including CPU, GPU, and RAM. A higher downsampling ratio is employed to manage a large database and avoid out-of-memory errors. This approach uses greater time steps for preprocessing, resulting in a lower window size of 10 for data stacking in this study. With access to a supercomputer, it would allow for an increase in the number of units, filters, kernels, and window size, potentially resulting in more accurate predictions. Of course, trade-offs are necessary to prevent overfitting.

Utilizing a supercomputer could enable the incorporation of a more diverse database, including various shapes of LRBs, such as rectangular and circular configurations, as well as different ages and temperature conditions.

We acknowledge that other severe earthquakes, especially those with pulse-like near-fault characteristics, may compromise the performance of the proposed DL model in accurately capturing large residual displacements. This limitation can be mitigated by conducting additional RTHS tests under extreme ground motions, allowing the model to enhance its learning capabilities in such challenging conditions.

The predictions generated by the model can be utilized to develop fragility curves, enabling the assessment of damage states in bridge structures.

The predictive model can be integrated into a digital twin framework for real bridge structures. This involves comparing predicted and actual behaviors, adjusting model parameters accordingly, and validating the updated model against real-world data. Such a model provides valuable real-time insights into the structural health monitoring of civil structures.

Data collection through multi-axis real-time hybrid simulations that incorporate both horizontal and vertical ground accelerations enhances the realism of structural dynamics, allowing for more accurate simulations of multi-directional loading scenarios.

Regularly retesting or reevaluating the model’s hyperparameters is recommended to adapt to changing data conditions and prevent model degradation.

The data underpinning this study are available from the corresponding author upon request.

Long, X., Zhou, Q., Ma, Y., Gui, S. & Lu, C. Displacement-based seismic design of SMA cable-restrained sliding lead rubber bearing for isolated continuous girder bridges. Eng. Struct. 300, 117179 (2024).

Shen, Y., Freddi, F., Li, Y. & Li, J. Parametric experimental investigation of unbonded post-tensioned reinforced concrete bridge piers under cyclic loading. Earthq. Eng. Struct. Dynamics. 51 (15), 3479–3504 (2022).

Chen, X., Ikago, K., Guan, Z., Li, J. & Wang, X. Lead-rubber-bearing with negative stiffness springs (LRB-NS) for base-isolation seismic design of resilient bridges: a theoretical feasibility study. Eng. Struct. 266, 114601 (2022).

Zhang, Y., Guo, Z., Liu, D. & Sun, W. Seismic response analysis of super-high-rise building structures with three-layer isolation systems. Sci. Rep. 13 (1), 19165 (2023).

Article ADS CAS PubMed PubMed Central Google Scholar

Yang, D. et al. Quasi-static testing of UHPC cupped socket piers-footing connection and its seismic fragility analysis under near-fault ground motions. Sci. Rep. 14 (1), 10903 (2024).

Article ADS CAS PubMed PubMed Central Google Scholar

Chae, Y., Lee, J., Park, M. & Kim, C. Y. Fast and slow cyclic tests for reinforced concrete columns with an improved axial force control. J. Struct. Eng. 145 (6), 04019044 (2019).

Chae, Y., Rabiee, R., Dursun, A. & Kim, C. Y. Real-time force control for servo‐hydraulic actuator systems using adaptive time series compensator and compliance springs. Earthq. Eng. Struct. Dynamics. 47 (4), 854–871 (2018).

Chae, Y., Kazemibidokhti, K. & Ricles, J. M. Adaptive time series compensator for delay compensation of servo-hydraulic actuator systems for real‐time hybrid simulation. Earthq. Eng. Struct. Dynamics. 42 (11), 1697–1715 (2013).

Zhang, B., Wang, K., Lu, G. & Guo, W. Seismic response analysis and evaluation of laminated rubber bearing supported bridge based on the artificial neural network. Shock and Vibration. 5566874 (2021). (2021)(1).

Zhang, B., Wang, K., Lu, G., Qiu, W. & Yin, W. Experimental and seismic response study of laminated rubber bearings considering different friction interfaces. Buildings. 12 (10), 1526 (2022).

Guo, W., Wang, K., Yin, W., Zhang, B. & Lu, G. Research on seismic excitation direction of double-deck curved bridges: a probabilistic method based on the random forest algorithm. Struct. 39, 705–719 (2022).

Yazdanpanah, O., Chang, M., Park, M. & Chae, Y. Force-deformation relationship prediction of bridge piers through stacked LSTM network using fast and slow cyclic tests. Struct. Eng. Mech. 85 (4), 469–484 (2023).

Yazdanpanah, O., Chang, M., Park, M. & Kim, C. Y. Seismic response prediction of RC bridge piers through stacked long short-term memory network. In Structures 45, 1990–2006 (2022).

Yazdanpanah, O., Chang, M., Park, M. & Mangalathu, S. Smart bridge bearing monitoring: Predicting seismic responses with a multi-head attention-based CNN-LSTM network. Earthq. Eng. Struct. Dynamics. 1-25 https://doi.org/10.1002/eqe.4223 (2024).

Gu, A., Dao, T. & Mamba Linear-time sequence modeling with selective state spaces. arXiv preprint arXiv:2312.00752 (2024). https://arxiv.org/abs/2312.00752v2. https://github.com/state-spaces/mamba

https://newsletter.maartengrootendorst.com/p/a-visual-guide-to-mamba-and-state#%C2%A7the-convolution-representation

Cho, C. B., Chae, Y. & Park, M. Improved real-time force control for applying axial force to axially stiff members. Earthq. Eng. Struct. Dynamics. 53 (1), 331–347 (2024).

Nishi, T., Suzuki, S., Aoki, M., Sawada, T. & Fukuda, S. International investigation of shear displacement capacity of various elastomeric seismic-protection isolators for buildings. J. Rubber Res. 22, 33–41 (2019).

International standard ISO 22762-1:2018(E). Elastomeric seismic-protection isolators — Part 1: Test methods. https://www.iso.org/standard/70215.html

International standard ISO 22762-2:2018(E). Elastomeric seismic-protection isolators —Part 2: Applications for bridges —Specifications. https://www.iso.org/standard/70218.html

Chae, Y., Park, M., Kim, C. Y. & Park, Y. S. Experimental study on the rate-dependency of reinforced concrete structures using slow and real-time hybrid simulations. Eng. Struct. 132, 648–658 (2017).

Chae, Y., Lee, J., Park, M. & Kim, C. Y. Real-time hybrid simulation for an RC bridge pier subjected to both horizontal and vertical ground motions. Earthq. Eng. Struct. Dynamics. 47 (7), 1673–1679 (2018).

https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html

Zhang, R. et al. Deep long short-term memory networks for nonlinear structural seismic response prediction. Comput. Struct. 220, 55–68 (2019).

https://keras.io/api/keras_tuner/tuners/grid/

Clevert, D. A., Unterthiner, T. & Hochreiter, S. Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015).

Hendrycks, D. & Gimpel, K. Gaussian error linear units (gelus). arXiv preprint arXiv.:1606.08415 (2016).

Mariani, S., Rendu, Q., Urbani, M. & Sbarufatti, C. Causal dilated convolutional neural networks for automatic inspection of ultrasonic signals in non-destructive evaluation and structural health monitoring. Mech. Syst. Signal Process. 157, 107748 (2021).

Karami, R., Yazdanpanah, O., Dolatshahi, K. M. & Chang, M. Hybrid neural network empowered by differencing loss function for structural response history prediction using input excitation and roof acceleration. Eng. Appl. Artif. Intell. 136, 108984 (2024).

https://keras.io/api/layers/pooling_layers/average_pooling1d/

https://keras.io/api/layers/convolution_layers/separable_convolution1d/

Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., et al. Mobilenets: Efficient convolutional neural networks for mobile vision applications.arXiv preprint arXiv:1704.04861(2017).

Schuster, M. & Paliwal, K. K. Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 45 (11), 2673–2681 (1997).

The NVIDIA CUDA® Deep Neural Network library (cuDNN-11.2). (2021). https://developer.nvidia.com/cudnn

He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition 770–778 (2016).

https://keras.io/api/layers/merging_layers/add/

https://keras.io/api/layers/normalization_layers/batch_normalization/

https://keras.io/api/layers/normalization_layers/layer_normalization/

https://keras.io/api/layers/regularization_layers/dropout/

https://keras.io/api/layers/recurrent_layers/time_distributed/

Ng, A. & Deep Learning Specialization Course 2, Week 2, Optimization Methods; Improving Deep Neural Networks: Hyperparameter Tuning, Regularization and Optimization. An online non-credit course authorized by DeepLearning.AI and offered through Coursera, Stanford University. (2023). https://www.coursera.org/specializations/deep-learning

https://keras.io/api/optimizers/adamax/

https://keras.io/api/callbacks/early_stopping/

https://keras.io/api/models/model_training_apis/

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., et al. Attention is all you need. Advances in neural information processing systems 30 (2017).

Waleffe, R., Byeon, W., Riach, D., Norick, B., Korthikanti, V., Dao, T., et al. An Empirical Study of Mamba-based Language Models. arXiv preprint arXiv:2406.07887(2024).

https://keras.io/api/layers/attention_layers/attention/

Ramachandran, P., Zoph, B. & Le, Q. V. Searching for activation functions. arXiv Preprint arXiv :171005941 (2017).

Gu, A., Dao, T., Ermon, S., Rudra, A. & Ré, C. Hippo: recurrent memory with optimal polynomial projections. Adv. Neural. Inf. Process. Syst. 33, 1474–1487 (2020).

https://keras.io/api/layers/attention_layers/multi_head_attention/

The authors gratefully acknowledge the financial support provided by the Creative Challenge Research project funded by the Ministry of Science and ICT through the National Research Foundation of Korea (NRF) (RS-2023-00241807), and by the Korea Basic Science Institute (National Research Facilities and Equipment Center) grant, funded by the Ministry of Education (2022R1A6C103B771). Moreover, the authors appreciate all the support the engineers and laborers provided at the Hybrid Structural Testing Center (Hystec), Myongji University, Yongin-si, South Korea, for the experimental setups.

Hybrid Structural Testing Center (Core Research Center for Smart Infrastructure), Myongji University, Yongin-si, Republic of Korea

Omid Yazdanpanah & Minseok Park

Department of Civil and Environmental Engineering, Myongji University, Yongin-si, Republic of Korea

Department of Civil and Environmental Engineering, Seoul National University, Seoul, Republic of Korea

You can also search for this author in PubMed Google Scholar

Omid Yazdanpanah: Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Resources, Data Curation, Writing - Original Draft, Writing - Review & Editing, Visualization, Supervision, Project administration Minseok Park: Methodology, Software, Validation, Formal analysis, Resources, Data Curation, Writing - Review & Editing, VisualizationMinwoo Chang: Methodology, Validation, Formal analysis, Resources, Data Curation, Writing - Review & Editing, VisualizationYunbyeong Chae: Methodology, Software, Formal analysis, Resources, Data Curation, Writing - Review & Editing, Visualization.

The authors declare no competing interests.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Below is the link to the electronic supplementary material.

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Yazdanpanah, O., Park, M., Chang, M. et al. Mastering seismic time series response predictions using an attention-Mamba transformer model for bridge bearings and piers across varied testing conditions. Sci Rep 14, 29751 (2024). https://doi.org/10.1038/s41598-024-79195-4

DOI: https://doi.org/10.1038/s41598-024-79195-4

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Scientific Reports (Sci Rep) ISSN 2045-2322 (online)

PVC Waterstop Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Mastering seismic time series response predictions using an attention-Mamba transformer model for bridge bearings and piers across varied testing conditions | Scientific Reports