Blog

Constant force grinding controller for robots based on SAC optimal parameter finding algorithm | Scientific Reports

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Scientific Reports volume  14, Article number: 14127 (2024 ) Cite this article cutting off wheel

Since conventional PID (Proportional–Integral–Derivative) controllers hardly control the robot to stabilize for constant force grinding under changing environmental conditions, it is necessary to add a compensation term to conventional PID controllers. An optimal parameter finding algorithm based on SAC (Soft-Actor-Critic) is proposed to solve the problem that the compensation term parameters are difficult to obtain, including training state action and normalization preprocessing, reward function design, and targeted deep neural network design. The algorithm is used to find the optimal controller compensation term parameters and applied to the PID controller to complete the compensation through the inverse kinematics of the robot to achieve constant force grinding control. To verify the algorithm's feasibility, a simulation model of a grinding robot with sensible force information is established, and the simulation results show that the controller trained with the algorithm can achieve constant force grinding of the robot. Finally, the robot constant force grinding experimental system platform is built for testing, which verifies the control effect of the optimal parameter finding algorithm on the robot constant force grinding and has specific environmental adaptability.

Traditional manual grinding not only has uneven grinding quality and low efficiency but also generates dust during the grinding process, which is hazardous to the physical and mental health of workers, while robotic grinding can replace manual grinding and achieve automation1,2. Maintaining a constant grinding force during robotic grinding not only ensures the accuracy and consistency of the surface quality of the workpiece but also reduces the wear and tear on the robot and end-effector3. Therefore, robotic constant force grinding has become a hot research topic.

Robot constant force grinding is divided into two stages: passive flexibility control and active flexibility control. Passive flexibility control passively changes the contact force by installing a passive suppleness device at the end of the robot. Wang et al. proposed a new type of passive suppleness control method for overcoming the influence of fluctuating normal grinding force on the precision grinding of nickel-based high-temperature alloy blade abrasive belts and proved through experiments that the method can improve the control accuracy of normal grinding force4. However, passive flexibility control does not have force feedback and cannot perform precise grinding force control, so active flexibility control using force sensors for force feedback has become the research focus. Active flexibility control mainly goes through three stages: impedance, adaptive, and intelligent control. Impedance control is adding a mass-damping-spring system at the robot's end, which establishes the relationship between the contact force and the end position. However, in practice, precise constant force control cannot be achieved solely through impedance control. Therefore Wang et al. proposed an adaptive variable impedance control algorithm with pre-PD adjustment to realize the soft grinding and polishing of complex surfaces, and the stability and convergence of the force control algorithm were proved by comparative experiments5. Li, based on traditional impedance control, proposed a position-based force-tracking adaptive impedance control strategy to improve the complex surface grinding quality of aero engine parts grinding quality of aero-engine6.

On the other hand, adaptive control can adapt to changes in the environment. Zhao et al. proposed an adaptive constant force controller based on stiffness estimation, which realizes constant force grinding of unknown workpiece contours and ensures grinding consistency7. Zhang et al. proposed a trajectory compensation method based on the collaborative kriging spatial interpolation method to reduce the grinding trajectory deviation caused by the robot's absolute positioning accuracy and the trajectory compensation method based on the one-dimensional force tracking method. Meanwhile, an adaptive iterative constant force control method based on a one-dimensional force sensor is proposed to improve the processing quality and efficiency of robot belt grinding3.

However, the adaptive capacity of adaptive control is related to the intrinsic parameters, and if the parameters exceed the range of intrinsic parameters, the system will be out of balance. With the continuous development of artificial intelligence, more and more intelligent algorithms are applied to robot constant force control8. Such as fuzzy control, neural networks, and reinforcement learning, each of which has its advantages. The fuzzy control algorithm has better robustness; when there are disturbances or parameter changes in the system, the fuzzy control algorithm still has a better control effect. Zhang et al. developed a hybrid force/position anti-disturbance control strategy based on fuzzy PID control to improve the grinding quality of aerospace blades9. Shen et al. proposed a fuzzy-based adaptive impedance control that can grind or polish workpieces of different materials with constant contact force10. The neural network algorithm is robust, fault-tolerant, and can adequately approximate any complex nonlinear relationship. Conventional PID makes it challenging to realize the problem of complex parameters and nonlinear characteristics; Zhu et al. developed a PID controller based on a fuzzy neural network algorithm for simultaneous trajectory and contact constant force tracking, which proved the effectiveness through simulation results11. Hamedani et al. proposed an intelligent impedance control strategy based on wavelet neural networks to adapt the impedance parameters according to the variable environment, and the experimental results proved that the method has better force tracking performance than the general impedance with constant parameters12. Reinforcement learning has apparent advantages in the direction of complex system control, solving nonlinear problems, and finding optimal parameters, so it is widely used in the fields of neurobiology13, autonomous driving14, and robotics15. Zhang et al. proposed an online optimization method for constant force grinding controller parameters based on deep reinforcement learning Rainbow. The technique can optimize the control parameters online while converging stably to solve the constant force control problem in the grinding process16. Zhang et al. proposed a reinforcement learning-based machine manpower control algorithm to solve the problem that the contact force of the robot end-effector is challenging to keep constant when the robot is tracking an unknown curved workpiece17.

Since the conventional PID controller cannot control the robot to stabilize for constant force grinding under environmental changes, a compensation term needs to be added to the conventional PID controller. To solve the problem that the compensation term parameters are difficult to obtain, an optimal parameter finding algorithm based on SAC (Soft-Actor-Critic) is proposed, including training state action and normalization preprocessing, reward function design, and targeted deep neural network design. The algorithm is used to find the optimal compensation term parameters and applied to the PID controller to accomplish the compensation through the robot's inverse kinematics for constant force grinding control. The innovations of this paper are as follows.

① An optimal parameter finding algorithm based on SAC is proposed to obtain the optimal controller compensation term parameters, and a deep neural network is utilized to fit a model of the relationship between the training grinding force and the end-effector compensation displacement, and the compensation is accomplished by using the robot inverse kinematics. The deep neural network that completed the training was used as a robot constant force grinding controller.

② A robot grinding simulation platform with perceptible force information was built to verify the feasibility and adaptability of the robot constant force controller.

③ The robot constant force grinding experimental system platform is built, and bench tests are conducted to verify the control effect of the SAC-based optimal parameter finding algorithm on robot constant force grinding.

The main contents of this paper are organized as follows. In the first part of this paper, the mechanical model of the end-effector will be designed. In the second part, a specific analysis of the optimal control parameter solution algorithm based on SAC for the compensation term will be carried out. The experimental design of the robotic grinding simulation platform with force information sensing will be designed in the third part. The bench test of the robotic grinding system platform will be carried out in the fourth part, and the conclusion will be given in the fifth part.

The force analysis of the end-effector when the robot is grinding flat and curved surfaces, respectively, as shown in Fig. 1. The positional relationship between the sensor coordinate system {S} and the workpiece coordinate system {C} is established according to the model during grinding. The origin of the sensor coordinate system is the geometric center of the sensor, the Z-axis direction is the axial direction of the sensor, the X-axis direction is the radial direction of the sensor, and the force in the Y-axis direction is determined according to the right-hand rule of the coordinate system. The origin of the workpiece coordinate system is the geometric center of the workpiece, the X-axis direction is the radial direction of the workpiece, the Z-axis direction is the axial direction of the workpiece, and the Y-axis direction is determined according to the right-hand rule of the coordinate system. The X-axis of the two coordinate systems is kept parallel during grinding.

Assuming that F and G are the tangential and normal forces of grinding in the coordinate system of workpiece, respectively, and M and N denote the forces that transfer F and G to the sensor's coordinate system, then \(F_{t^{\prime}} = F_{t}\) , \(F_{{{\text{n}}^{\prime}}} = F_{n}\) . When a grinding operation is performed, the normal force \(F_{n}\) is the main object that affects the grinding effect.

Since the force sensor is mounted between the end of the robot and the grinding tool, its measured value \(F_{s}\) includes not only the normal grinding force \(F_{n}\) at the grinding end but also its own gravity \(G\) and inertia force \(F_{l}\) , as shown in Eq. (1), and therefore needs to be gravity compensated.

Since the grinding process is at a constant speed and the acceleration is zero, the inertial force \(F_{l}\) is very small and can be ignored. Here, the primary consideration is the measured value of the sensor for gravity compensation. The grinding robot end-effector is manually adjusted to a vertical downward position without contact with the grinding workpiece, at which point \(F_{n}\) is 0. As the direction of gravity \(G\) is always vertical downwards, opposite to the z-axis direction of the base coordinates, it can be expressed as \({}^{b}F_{G} = [0,0, - G_{T} ]^{T}\) under the base coordinates. When the robot arm changes position, the value of the base coordinates can be converted to the value of the sensor coordinates using the rotation matrix \({}_{b}^{s} R\) , i.e.\({}^{s}F_{G} = {}_{b}^{s} R \times {}^{b}F_{G}\) , where \({}^{s}F_{G}\) is the influence value of gravity under the sensor coordinate system. So the grinding value at the sensor seated \({}^{s}F_{n}\) is the measured value minus \({}^{s}F_{G}\) , i.e.\({}^{s}F_{n} = F_{s} - {}^{s}F_{G}\) .

To solve the problem that the compensation term parameters are difficult to obtain in the traditional PID controller, a deep reinforcement learning algorithm is introduced to find the optimal compensation term parameters. Deep reinforcement learning does not need to adjust the controller parameters artificially, but obtains the optimal compensation term parameters through self-learning the algorithm itself.

The target force \(F_{d}\) is compared with the current normal force \(F_{n}\) , and the difference between the two is converted into a compensating displacement of the end-effector by the force controller, which is adjusted by the robot inverse kinematics, thus indirectly controlling the contact force between the robot and the environment. The PID controller is chosen to model the relationship between the robot end force error and the robot control parameters due to its simple structure and ease of tuning the parameters.

As the grinding process progresses, the external environment may change, and to better accommodate such changes, a compensation amount is added to the conventional force controller with the equation shown in (2).

where \(ef(t)\) is the error of normal force and target normal force in time period t; \(ecf(t)\) is the rate of change of force error in time period t; \(k_{d}\) , \(k_{p}\) are the fixed scale parameters, the fixed scale parameter can make the robot contact with the workpiece in the process of avoiding robot overload;\(\delta_{k}\) ,\(\zeta_{k}\) are the compensation amount parameters; \(sgn\) is the step function, its size is − 1 or 1. When the fixed scale parameter is set, the compensation term parameter is obtained by reinforcement learning.

Manually adjusting the compensation amount requires a lot of time and effort; however, reinforcement learning intelligently acquires the final control parameters through its self-learning capability without the need to manually adjust the compensation parameters, so reinforcement learning is used to learn the control parameters during the grinding process and complete the constant force grinding work. Reinforcement learning aims to maximize the expectation of the cumulative reward. In robot constant force grinding control, the SAC algorithm continuously iterates on the strategy to improve the selection of control parameters, and the optimal control parameters are defined when the expectation of the cumulative maximum reward is maximized. The grinding control schematic is shown in Fig. 2. Firstly, the normal force \(F_{n}\) is read by the force sensor, the current read normal force and the target force \(F_{d}\) are compared, the force error, the rate of change of the force error, and the controller compensation parameter, which is searched by the SAC algorithm are inputted into the force controller, and the robot's inverse kinematics accomplishes the output of the compensated displacement and the compensation.

Schematic diagram of robot constant force grinding control based on SAC's optimal parameter finding algorithm.

The SAC algorithm is formed based on the Actor-Critic network architecture, in which the SAC algorithm consists of an Actor network, two Critic networks, and two Critic target networks. The Critic network evaluates the action chosen by the actor network in the current state, while the Actor-network improves the desired action based on the Critic's evaluation of the activity, saving the \(\{ s_{t} ,a_{t} ,s_{t + 1} ,r\}\) in the experience pool, which is used to update the parameters of the network. The SAC algorithm structure is shown in Fig. 3.

Structure of the SAC algorithm.

As shown in the figure above, the Critic network is updated first, when the environment state \(s_{t}\) and the action \(a_{t}\) selected under the environment state \(s_{t}\) enter the Critic network, the two networks output different Q values, respectively, Q values represent the value evaluation of the action \(a_{t}\) selected under the environment state \(s_{t}\) , using the action entropy generated by the Actor network, the action entropy generated by the Critic target network \(Q_{s}\) and the reward \(r\) to reverse the update the Critic network, \(Q_{s}\) serves to allow the Critic network to consider both the action entropy and the reward \(r\) ; the Actor network is then updated using the KL scatter to evaluate the action selected by the Actor network and the Q value generated by the Critic network.

Usually, the goal of reinforcement learning is to maximize cumulative reward, as shown in Eq. (3).

In contrast, the objective of the SAC algorithm is to maximize the cumulative reward with entropy, as shown in Eq. (4).

where \(\alpha\) is the temperature coefficient that determines the importance of entropy relative to the reward, and thus the degree of randomness of the control strategy; H is the entropy function, where \(s_{t}\) ,\(a_{t}\) are the state at moment t and the action selected in the state at time t respectively; r is the reward obtained.

The entropy function is calculated as shown in Eq. (5).

where \(s_{t + 1}\) is the state at the next moment of t; \(a_{t + 1}\) is the action selected by the state at the next moment of t; \(\pi (a_{t + 1} |s_{t + 1} )\) is the probability of taking the action \(a_{t + 1}\) in the state \(s_{t + 1}\) .

The action value function for the Critic target network \(Q_{s}\) shown in Fig. 3 is calculated from the action entropy and the action taken by the system at the next moment, and is calculated as shown in Eq. (6).

where \(V(s_{t + 1} )\) is the state value function under the state \(s_{t + 1}\) , \(\phi_{tj}\) is the parameter for the two Critic target networks, \(\min Q_{{\phi_{tj} }} (s_{t + 1} ,a_{t + 1} )\) is the minimum value at the output of the two Critic target networks, which prevents overestimation of the estimated value.

The SAC algorithm contains two identical Critic reviewer networks with the same structure as the Critic reviewer target network. In the algorithm, the two Critic reviewer networks are evaluated separately for the selected action and the difference with \(Q_{s}\) is calculated, and this difference is the loss value, which is calculated as shown in Eq. (7).

where \((s_{t} ,r,s_{t + 1} ,a_{t} )\sim D\) is the variable extracted from the experience pool D; \(\phi_{i}\) is the weight of the ith Critic network; \(Q_{{\phi_{i} }} (s_{t} ,a_{t} )\) is the evaluation of the ith network taking the action \(a_{t}\) under the environmental state \(s_{t}\) using the weight \(\phi_{i}\) .

After the Critic reviewer network is continuously updated, its level of evaluating the Actor network for the selected action increases and the evaluation of the value of \(Q_{s}\) will be close to the true value, using the KL scatter to update the Actor network, see Eq. (8).

where \(\theta\) is the weight of the Actor network and \(\mathop {\min }\limits_{{}} Q_{{\phi_{j} }} (s_{t} ,a^{\prime}_{t} )\) is the minimum value selected at the output of the two Critic reviewer networks, which prevents overestimation.

The Critic target network is updated by scaling the parameters \(\rho\) as shown in Eq. (9).

The SAC algorithm pseudo-code is shown in Table 1.

In the SAC algorithm environment, the Agent is the robot, and the environment is the robot grinding environment where force information can be sensed.

In this SAC-based constant force grinding control algorithm, the state s is designed as the normal force error \(ef\) and the rate of change of the normal force error \(ecf\) .

The output actions are the force controller compensation parameters \(\delta_{k}\) and \(\zeta_{k}\) .

The goal of training is to enable the current normal force to reach the normal force, and the smaller the difference between the current normal force and the target normal force, the higher the reward is obtained. Therefore, the reward function used in designing the deep reinforcement learning SAC algorithm for constant force grinding control of a robot can be described as:

In order to accelerate the convergence of SAC model, the data input to the model is normalized. The input state force error and force error rate and controller compensation parameters of the deep neural network are divided by the corresponding upper limits, respectively, so that each of these elements has a value domain of [− 1, 1] before entering the algorithm training; the normalised input state quantity is noted as s_ norm and the normalized output action is indicated as a_ norm.

According to the needs of the algorithm, the Actor and Critic networks in the SAC neural network are designed separately, and the Critic network consists of two target networks and two prediction networks.

The structure of the Actor network has four layers; the number of nodes in the hidden layer is 256, 128, and 128, respectively, and the output is the probability distribution of the action and the action entropy, the activation function used between the hidden layers is Relu. The number of nodes in the hidden layers is 256, 128, 128, and the output is the Q-value, and the activation function used between the hidden layers is Relu. The structure of the SAC neural network is shown in Fig. 4.

Structure of SAC neural network.

The Critic network evaluates the value of the action under the current normal force error and corrects the parameters of the compensation term selected by the Actor network according to the critics' evaluation. The SAC algorithm requires only the Actor network output at the time of inference, and subsequent updates to the network require the data stored in the memory pool D \((s_{t} ,a_{t} ,r,s_{t + 1} )\) .

Coppeliasim is selected as the simulation software platform. Coppeliasim contains many encapsulated robots for easy use and supports Bullet and other physical engines. The physical machines enable the established simulation model to have mechanical properties and simulate the natural environment more accurately. At the same time, CoppeliaSim includes a remote API that allows communication with languages such as Python, solving the problem that the Lua language built into CoppeliaSim cannot build neural networks and enabling co-simulation. In addition, Coppeliasim provides an inverse kinematics solution module based on the numerical iterative solution, which can realize the function of directly outputting and executing the control action of the end tool pose by the grinding force control algorithm.

The sensible force information robot uses a Franka robot encapsulated in the simulation platform. The six-dimensional force sensor is mounted at the end flange position of the robot and is used to obtain the grinding force and grinding torque in the x, y, and z axes of the sensor. Then, set the mechanical characteristics, set the rotary axis to drive the end grinding tool for grinding operations, and install the end grinding actuator with the completed settings under the robot end six-dimensional force sensor to become part of the robot grinding simulation platform. It should be noted that the robot end flange, the six-dimensional force sensor, and the end grinding actuator need to be under the same Z coordinate. The perceptible force information robot grinding simulation model is shown in Fig. 5.

Force-aware robot grinding simulation model.

The surface workpiece is imported into Coppeliasim, and its properties are set, and the tutorial sets the initial trajectory of grinding to make the robot end move along the initial trajectory. The plan view of the workpiece and the initial trajectory are shown in Fig. 6.

Surface workpiece and grinding initial trajectory.

This simulation only investigates the processing of grinding. Before the experiment starts, the end-effector of the mobile robot is controlled to be above the initial point of grinding so that the end-effector is in contact with the grinding workpiece and the six-dimensional sensor acquires the initial force information so that the training experiment can be carried out. The flow chart of the constant force grinding control based on the SAC algorithm is shown in Fig. 7.

Flowchart of constant force grinding control based on SAC optimal parameter finding algorithm.

The target normal force is first set, followed by initialization of the robot position, starting the robot to interact with the environment to generate the grinding force and record the data; after force information processing, the force error \(ef\) and force error rate \(ecf\) are input into the neural network structure of the SAC algorithm, the output is the force controller compensation parameters after the force controller to get the compensation displacement of the robot end-effector, through the robot inverse kinematics to complete the compensation. The cycle continues until the end of the training, and the neural network is saved after the training.

The SAC algorithm neural network structure trains the robot for constant force grinding control. The total number of training sessions is set to 100, and each training session consists of 200 training steps, during which the robot and the workpiece interact, and the data obtained from the interaction is stored in the experience pool in a time series. The trained model needs to satisfy the deep network structure to converge to a steady state.

The trained neural network structure is used for finding the optimal parameters of the machine manpower controller; the inputs are the normalized force error \(ef\) and the force error rate \(ecf\) , and the outputs are the compensation parameters \(\delta_{k}\) and \(\zeta_{k}\) of the force controller. The compensation is achieved by controlling the robot's joint angles obtained through the inverse kinematics of the robot. The specific parameters of the SAC algorithm are shown in Table 2.

The SAC algorithm-based constant force grinding control for robots is used for robotic grinding of surfaces. Firstly, for comparison purposes, the force controller compensation parameters were not obtained using the SAC algorithm but rather using fixed compensation parameters. Figure 8 shows the variation of the normal force using the fixed compensation parameters \(\delta_{k}\)  = 0 and \(\zeta_{k}\)  = 0.2.

Variation of normal force using fixed parameters.

In the SAC algorithm-based robot constant force grinding, the system state is normal force error and normal force error rate, so only the normal force change needs to be observed. The graph shows that when using fixed compensation parameters, the value of normal force is unstable, fluctuates more, and cannot be stabilized to converge to the target normal force, which cannot meet the grinding requirements and force control is needed.

The training method for the constant force sanding control was executed according to Fig. 7. After the deep neural network converged and saved, the total reward graph of the robot grinding surface control obtained at the end of the training is shown in Fig. 9. From the graph, it can be seen that the total reward kept fluctuating before the training times of 40, when the SAC algorithm was continuously trial and error, and after the training times of 40, the total reward value kept increasing as the training times improved and finally stabilized at around − 10. The results show that the gap between the current normal force and the target normal force is shrinking and gradually approaching the target polished force, and eventually fluctuates above and below the target normal force, indicating that the trained deep neural network model has converged to a stable state.

Total reward diagram for the SAC algorithm for sanding surface control.

The reward and normal force diagrams with 20, 40, 60, and 80 training sessions were selected for comparison, and the comparison is shown in Fig. 10.

Iteration diagram of the SAC algorithm normal force and reward function.

The green, yellow, blue, and red curves in the graph represent the change in normal force and reward value for the 20th, 40th, 60th, and 80th training sessions, respectively. In the 60th and 80th training sessions, the normal force quickly reached the target normal force and maintained fluctuations above and below the target normal force, and even if there were sudden changes in the normal force, it could quickly recover to above and below the target normal force, indicating that as the number of iterations increased, the trend of normal force error would decrease, the total reward increased with the number of iterations, and the normal force error tended to be stable.

Taking a set of experimental results after convergence, the standard force change and reward function change for the converged experimental results are shown in Fig. 11. Comparing Fig. 11 with Fig. 8, it can be concluded that the controller compensation parameters found using the SAC algorithm can keep the normal force under control at the target normal force and can quickly adjust the normal force to the target normal force when there is a sudden change in force.

Plot of normal force variation and reward function variation for the SAC algorithm.

To validate the robot constant force grinding intelligence algorithm, a robot constant force grinding experimental system platform is built. The platform consists of a UR5 robot, a robot control cabinet, force sensors, end-effectors, motor controllers, fixtures and grinding tools, and a PC, which performs the grinding task and the reading and processing of force information. The robot constant force grinding experimental system platform is shown in Fig. 12. The constant force grinding control scheme based on UR5 robot is shown in Fig. 13.

The robot constant force grinding experimental system platform.

Diagram of a constant force solution for grinding control based on the UR5 robot.

After fixing the workpiece with the fixture, the initial grinding trajectory was planned using the demonstrator, and the UR5 robot was controlled to move the end grinding device to the initial point of grinding. After the communication was established, the UR5 robot running algorithm was controlled using the host computer, and the speed of the grinding head was kept constant during the experiment. The sanding process is shown in Fig. 14.

During the experiments, the force sensor software was used to eliminate the effects of gravity and heavy moments, and the force data was subsequently synchronized into Python for coordinate conversion and Kalman filtering and processed into the algorithm.

In the experiment, the "adj zero" function in the six-dimensional force sensor acquisition software iDAS R&D is used to eliminate the influence of gravity and gravitational moment. Then the force data are synchronized into Python for coordinate conversion and Kalman filtering, and then into the algorithm after processing.

(a) Variation of grinding normal force using fixed compensation parameters. (b) Variation of normal force using the SAC algorithm.

The comparison between simulation and experimental data is shown in Table 4. Due to the small weight of the end effector during simulation, the target normal force during simulation was 1N. From Table 4, it can be seen that both the simulated and experimental normal forces achieved the target normal force, but the average value of the simulated normal force was smaller than the actual average value. This is because the experiment did not achieve the ideal simulation environment.

To verify the algorithm's ability to adapt to the environment, protrusions and grooves on the ground workpiece are added to the grinding experiment, and the normal force change graph obtained is shown in Fig. 16. As can be seen in the figure, after hitting the interference during grinding, the normal force undergoes a sudden change, followed by a rapid adjustment to converge to the target normal force. Finally, it fluctuates on the target normal force, proving that the constant force grinding controller with the addition of the SAC algorithm, has a strong ability to adapt to the environment.

Plot of the change in normal force with the addition of protrusions and grooves.

This paper proposes an optimal parameter finding algorithm based on SAC to solve the problem that traditional PID controllers cannot control the robot to perform stable constant force grinding under environmental changes due to the difficulty of obtaining the compensation term parameters. Coppeliasim is used to build a robot grinding simulation model with force-aware information, and simulation experiments of chewing surfaces are carried out, which show that the constant force controller with this algorithm can maintain constant force when the robot grinds surfaces, and it is feasible. Finally, the experimental verification is carried out on the platform of the robot constant force-grinding experimental system. The results show that the optimal parameter finding algorithm based on SAC can help the PID controller to obtain the optimal compensation term. The PID controller with the optimal compensation term has a good effect on controlling the robot's constant force grinding with a strong ability for environmental adaptation. This article only deals with the comparison of PID controllers using fixed parameters and SAC algorithms. In the following, we will use other reinforcement learning grinding experiments to verify the advantages of reinforcement learning algorithms (Supplementary Information file 1).

The datasets used during the current study available from the corresponding author on reasonable request.

Guo, W., Zhu, Y. & He, X. A robotic grinding motion planning methodology for a novel automatic seam bead grinding robot manipulator. IEEE Access 8, 75288–75302. https://doi.org/10.1109/ACCESS.2020.2987807 (2020).

Brinksmeier, E. et al. Advances in modeling and simulation of grinding processes. CIRP Ann. 55(2), 667–696 (2006).

Zhang, T. et al. Robot grinding system trajectory compensation based on co-kriging method and constant-force control based on adaptive iterative algorithm. Int. J. Precis. Eng. Manuf. 21, 1637–1651. https://doi.org/10.1007/s12541-020-00367-z (2020).

Wang, Z. et al. Study on passive compliance control in robotic belt grinding of nickel-based superalloy blade. J. Manufact. Process. 68, 168–179 (2021).

Wang, G. et al. PD-adaptive variable impedance constant force control of macro-mini robot for compliant grinding and polishing. Int. J. Adv. Manuf. Technol. 124, 2149–2170. https://doi.org/10.1007/s00170-022-10405-x (2023).

Li, L., Wang, Z., Zhu, G. & Zhao, J. Position-based force tracking adaptive impedance control strategy for robot grinding complex surfaces system. J. Field Robot. 40, 1–18 (2023).

Zhao, W., Xiao, J. & Liu, S. Robotic direct grinding for unknown workpiece contour based on adaptive constant force control and human–robot collaboration. Industrial Robot https://doi.org/10.1108/IR-01-2022-0021 (2022).

Li, Y. et al. Wind power forecasting considering data privacy protection: A federated deep reinforcement learning approach. Appl. Energy 329, 120291 (2023).

Zhang, H. et al. The hybrid force/position anti-disturbance control strategy for robot abrasive belt grinding of aviation blade base on fuzzy PID control. Int. J. Adv. Manuf. Technol. 114, 3645–3656. https://doi.org/10.1007/s00170-021-07122-2 (2021).

Shen, Y., Lu, Y. & Zhuang, C. A fuzzy-based impedance control for force tracking in unknown environment. J. Mech. Sci. Technol. 36, 5231–5242. https://doi.org/10.1007/s12206-022-0936-6 (2022).

Zhu Dachang, Du., Baolin, Z. P. & Shouyan, C. Constant force PID control for robotic manipulator based on fuzzy neural network algorithm. Complexity 2020, 3491845. https://doi.org/10.1155/2020/3491845 (2020).

Hamedani, M. H., Sadeghian, H., Zekri, M., Sheikholeslam, F. & Keshmiri, M. Intelligent impedance control using wavelet neural network for dynamic contact force tracking in unknown varying environments. Control Eng. Pract. 113, 104840. https://doi.org/10.1016/j.conengprac.2021.104840 (2021).

Gershman, S. J. & Ölveczky, B. P. The neurobiology of deep reinforcement learning. Curr. Biol. 30(11), R629–R632. https://doi.org/10.1016/j.cub.2020.04.021 (2020).

Article  CAS  PubMed  Google Scholar 

Fu, Y., Li, C., Yu, F. R., Luan, T. H. & Zhang, Y. A selective federated reinforcement learning strategy for autonomous driving. IEEE Trans. Intell. Transp. Syst. 24(2), 1655–1668. https://doi.org/10.1109/TITS.2022.3219644 (2023).

Singh, B., Kumar, R. & Singh, V. P. Reinforcement learning in robotic applications: A comprehensive survey. Artif. Intell. Rev. 55, 945–990. https://doi.org/10.1007/s10462-021-09997-9 (2022).

Zhang, T., Yuan, C. & Zou, Y. Online optimization method of controller parameters for robot constant force grinding based on deep reinforcement learning rainbow. J. Intell. Robot. Syst. 105, 85. https://doi.org/10.1007/s10846-022-01688-z (2022).

Zhang, T. et al. Robotic curved surface tracking with a neural network for angle identification and constant force control based on reinforcement learning. Int. J. Precis. Eng. Manuf. 21, 869–882. https://doi.org/10.1007/s12541-020-00315-x (2020).

Nobot Intelligent Equipment (Shandong) Co., Ltd, Liaocheng, Shandong, China

Chosei Rei, Xinhua Yan, Peng Zhang & Liwei Fu

School of Mechanical and Automotive Engineering, Liaocheng University, Liaocheng, Shandong, China

Qichao Wang, Linlin Chen & Chong Wang

Department of Materials Physics, Saveetha School of Engineering, Saveetha Institute of Medical and Technical Sciences (SIMTS), Chennai, Tamilnadu, India

Science and Technology on Aerospace Chemical Power Laboratory, Hubei Institute of Aerospace Chemotechnology, Xiangyang, 441003, China

You can also search for this author in PubMed  Google Scholar

You can also search for this author in PubMed  Google Scholar

You can also search for this author in PubMed  Google Scholar

You can also search for this author in PubMed  Google Scholar

You can also search for this author in PubMed  Google Scholar

You can also search for this author in PubMed  Google Scholar

You can also search for this author in PubMed  Google Scholar

You can also search for this author in PubMed  Google Scholar

C.R. and Q.W. wrote the main manuscript text; X.Y., P.Z., L.F., and C.W. gave constructive suggestions; L.C. and X.L. supervise the projects.

Correspondence to Linlin Chen or Xinghui Liu.

The authors declare no competing interests.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Rei, C., Wang, Q., Chen, L. et al. Constant force grinding controller for robots based on SAC optimal parameter finding algorithm. Sci Rep 14, 14127 (2024). https://doi.org/10.1038/s41598-024-63384-2

DOI: https://doi.org/10.1038/s41598-024-63384-2

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Scientific Reports (Sci Rep) ISSN 2045-2322 (online)

abrasive flap Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.