Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.
Scientific Reports volume 14, Article number: 21137 (2024 ) Cite this article automatic cutter
This study introduces a novel approach that addresses the limitations of existing methods by integrating 2D image processing with 3D point cloud analysis, enhanced by interpretable neural networks. Unlike traditional methods that rely on either 2D or 3D data alone, our approach leverages the complementary strengths of both data types to improve detection accuracy in environments adversely affected by welding spatter and smoke. Our system employs an improved Faster R-CNN model with a ResNet50 backbone for 2D image analysis, coupled with an innovative orthogonal plane intersection line extraction algorithm for 3D point cloud processing. By incorporating explainable components such as visualizable feature maps and a transparent region proposal network, we address the “black box” issue common in deep learning models.This architecture enables a more transparent decision-making process, providing technicians with necessary insights to understand and trust the system’s outputs. The Faster-RCNN structure is designed to break down the object detection process into distinct, understandable steps, from initial feature extraction to final bounding box refinement. This fusion of 2D-3D data analysis and interpretability not only improves detection performance but also sets a new standard for transparency and reliability in automated welding systems, facilitating wider adoption in industrial applications.
In modern industrial welding, the quality of welds is crucial for successful engineering projects1. Traditionally reliant on manual labor, welding quality has been largely dependent on operator skills, posing both consistency and health risks. The advent of robotic technology has significantly transformed industrial welding, enhancing efficiency and quality2. However, these advancements bring new challenges, particularly in maintaining precise control over the welding process, such as the critical distance between the welding gun and the weld seam.
To address these challenges, various sensing technologies have been developed, including arc sensors and vision sensors3,4,5,6. While arc sensors analyze the relationship between arc length and welding voltage, they are prone to interference. Vision sensors, both passive and active, offer more stable data but face their own limitations7. Traditional methods of weld detection rely on clear images with uniform brightness of laser stripes8, conditions often unmet in real-world welding environments characterized by spatter and smoke9. These factors significantly disrupt image quality, affecting seam detection accuracy and exposing the limitations of traditional methods in dealing with image distortion and noise.
The emergence of deep learning technology has opened new avenues in image processing and feature recognition for welding applications. Deep learning-based seam detection methods can learn complex image features, improving accuracy in challenging environments10,11. These approaches show remarkable adaptability to varying visual representations of seams and robustness against welding process disturbances. However, the “black box” nature of deep learning models presents a significant drawback, particularly in industrial settings where safety and accuracy are paramount12. The lack of transparency and interpretability in the model’s decision-making process hinders trust and widespread adoption of these technologies.
To address this, there is a pressing need for research into interpretable deep learning algorithms for seam detection13. By incorporating interpretability mechanisms, such as feature visualization techniques, it becomes possible to reveal key visual features used in seam recognition14. This approach not only enhances user understanding and trust but also provides valuable diagnostic information for continuous model optimization.
This paper presents a novel solution that integrates 2D image processing with 3D point cloud analysis, enhanced by interpretable neural networks. We employ an improved Faster R-CNN model with a ResNet50 backbone for 2D analysis, complemented by an innovative orthogonal plane intersection line extraction algorithm for 3D processing. To tackle the interpretability challenge, we incorporate explainable components such as visualizable feature maps and a transparent region proposal network. This architecture breaks down the detection process into understandable steps, from feature extraction to bounding box refinement. By fusing 2D–3D analysis with interpretability, our approach not only enhances detection performance in challenging welding environments but also sets new standards for transparency and reliability in automated welding systems. This fusion of deep learning efficiency and interpretability promises to significantly advance the modernization and automation of welding technology, facilitating wider industrial adoption and improving overall welding quality and safety.
The field of weld seam detection has witnessed significant advancements in recent years, driven by the convergence of machine vision, 3D point cloud processing, and deep learning techniques. These developments have paved the way for more accurate, efficient, and robust detection methods in industrial welding processes.
At the forefront of these advancements15, introduced a groundbreaking approach that combines convolutional neural networks (CNNs) with structured light imaging. This innovative method achieved remarkable accuracy in weld seam recognition and feature point identification, setting a new standard in the field. Building upon this foundation16, further refined the application of deep learning, particularly CNNs, in industrial settings. Their work not only enhanced the precision of weld seam detection but also significantly improved processing speed, addressing crucial requirements in real-world manufacturing environments.
The integration of machine vision and neural networks for real-time optimization of robotic arc welding, as pioneered by17, laid the groundwork for more sophisticated techniques. This early work demonstrated the potential of combining visual data with neural network processing for welding applications. Expanding on this concept18, proposed an innovative method that utilizes 3D point cloud data in conjunction with deep learning for automatic welding path planning. Their approach emphasized accurate point cloud segmentation and optimized path generation, marking a significant step towards fully automated welding processes.
The application of advanced neural network architectures to 3D point cloud data has shown promising results beyond welding, indicating the versatility and potential of these techniques. 19demonstrated the effectiveness of dynamic graph convolutional neural networks in detecting surface defects using 3D point cloud data, achieving significant recall rates. This research aligns with the work of 20, who explored the use of machine learning and binocular vision in robotic systems to enhance accuracy and response speed in tasks like weld seam detection. These studies underscore the potential of adapting similar techniques to enhance weld seam detection accuracy and reliability.
The importance of precise spatial data processing in weld seam detection is further emphasized by recent advancements in related fields. 21benchmarked deep CNNs for 3D point cloud classification, aiming to fully leverage deep learning for complex 3D labeling tasks. Complementing this approach22. focused on applying neural networks to LiDAR data for efficient classification while preserving 3D structures, a crucial aspect in maintaining spatial accuracy. These studies highlight the growing trend towards multi-modal data integration for improved performance in industrial applications.
Recent research has also demonstrated the adaptability of deep learning techniques across various domains related to object detection and pose estimation. 23showcased the effectiveness of ResNet for grasping pose detection, achieving high success rates in tasks involving multiple objects. This work illustrates the potential for adapting similar architectures to the specific challenges of weld seam detection. Additionally24, applied deep learning and 3D point cloud data to detect structural damage, providing valuable insights that can be translated to weld seam inspection and quality control.
While these studies have made significant contributions to the field, a critical gap remains in the integration of 2D image processing with 3D point cloud analysis using interpretable neural networks specifically for weld seam detection. Our research addresses this gap by proposing a novel fusion method that not only enhances detection accuracy but also provides transparency in the decision-making process, a crucial factor for widespread adoption in industrial applications.
By synthesizing the strengths of CNNs in image processing, the spatial accuracy of 3D point cloud data, and the interpretability of neural networks, our approach aims to push the boundaries of weld seam detection technology. This integrated method offers a more comprehensive, accurate, and explainable solution for industrial welding processes, potentially revolutionizing quality control and automation in manufacturing.
In machine vision applications, establishing a camera’s geometric imaging model is crucial to determining the three-dimensional position of points on an object and their image mappings. This model, encompassing the camera’s parameters, is essential for tasks like coarse image-based weld recognition and precise point cloud extraction, requiring a dual system for image and point cloud collection. Camera parameters are typically determined through calibration, a process aimed at deriving intrinsic and extrinsic parameters and distortion coefficients. This might extend to calibrating structured light systems, tool center point (TCP), and hand–eye coordination. In this context, a structured light system, comprising a structured light module and an industrial camera, projects coded light patterns onto the object. The camera captures these patterns across multiple images, using variations to generate three-dimensional point cloud data.
In various machine vision applications, calibrating the camera parameters is crucial, as the accuracy of these parameters directly affects the outcome of the camera’s performance. Thus, accurate calibration of the camera is fundamental to subsequent operations. As mentioned, the purpose of camera calibration is to solve for the camera’s intrinsic and extrinsic parameters and distortion coefficients.
There are four coordinate systems associated with the camera, defined as follows:
Pixel coordinate system \((u,v)\) : the origin is at the top left of the image, with the u-axis running horizontally to the right along the image plane, and the v-axis running vertically downwards, measured in pixels.
Physical coordinate system \((x,y)\) : the origin is at the focal point of the image plane along the camera axis, with the x and y axes aligned with the directions of the u and v axes, respectively, measured in millimeters to physically represent pixel positions.
Camera coordinate system \(({X}_{c},{Y}_{c},{Z}_{c})\) : the origin is at the camera’s optical center, with the \({Z}_{c}\) axis coinciding with the optical axis (perpendicular to the imaging plane), pointing from the optical center towards the imaging plane. \({X}_{c}\) and \({Y}_{c}\) are parallel and directionally identical to the \(x\) and \(y\) axes of the image coordinate system, measured in millimeters.
World coordinate system \(({X}_{w},{Y}_{w},{Z}_{w})\) : the global coordinate system in a three-dimensional world is represented by \(({X}_{w},{Y}_{w},{Z}_{w})\) , following the right-hand rule. These four coordinate systems are depicted in Fig. 1.
The relationship between the four coordinate systems.
The point \({O}_{0}\) serves as the origin for the pixel coordinate system, and \({O}_{1}\) as the origin for the physical coordinate system. In imaging, this typically involves setting the top-left corner as the origin. This setup method is arbitrary; for instance, in OpenCV, the top-left corner is defined as the origin of the pixel coordinate system, but in some other image formats, such as those captured by Kinect, the bottom-left corner is set as the origin. The coordinates of the physical coordinate system’s origin, \({O}_{1}\) , in the pixel coordinate system are \(\left({u}_{0},{v}_{0}\right)\) . The transformation relationship between the two systems is as follows:
After transforming into matrix form, the relationship is represented as follows:
Assuming a point \(P\) in the camera coordinate system has coordinates \(({x}_{c},{y}_{c},{z}_{c})\) , based on the similarity of triangles, it can be deduced:
Here, \(f\) represents the focal length of the camera. For ease of computation, Eq. (4) can be transformed into its homogeneous matrix form as follows:
The transformation between the camera coordinate system and the world coordinate system is achieved through a rigid body transformation, which involves rotational and translational movements in three-dimensional space. Rigid body transformations are transformations that preserve inner products and metrics. The two most critical elements of a rigid body transformation are rotation and translation, represented by the matrices \(R\) and \(T\) , respectively. \(R\) is a 3 × 3 matrix representing rotation, and \(T\) is a 3 × 1 matrix representing translation. The expression for this transformation is as follows:
Thus, by utilizing Eqs. (2), (3), Eqs. (2), (3), (4), (5), and Eqs. (2), (3), (4), (5), (6), the transformation formula that maps pixel points in the image to three-dimensional spatial coordinates can be derived:
During camera manufacturing and installation, deviations from the ideal model can cause lens misalignment, leading to image distortions such as radial and tangential distortions. Radial distortion stems from the inherent properties of convex lenses, bending light rays more as they move away from the center, thus distorting the image either outward or inward. Tangential distortion occurs when the lens is not perfectly parallel to the camera’s imaging plane, skewing the image and altering the true appearance of objects. These distortions can be corrected through calculations performed in the camera coordinate system, with all necessary parameters determined during camera calibration.
The correction formula for radial distortion is as follows:
The correction formula for tangential distortion is as follows:
Before calibration, it’s essential to prepare a chessboard grid and attach it neatly to a horizontal surface to form a calibration board. Using a camera, capture multiple images of the chessboard, ensuring it is fully visible in the images while rotating and translating it. By detecting the corners of the chessboard, the camera’s intrinsic and extrinsic parameters can be calculated. From the coordinate transformation model described above, it is known that during calibration, \({Z}_{w}=0\) as shown in Eq. (10).
In this context, \({M}_{1}\) represents the intrinsic parameters of the camera, and \({M}_{2}\) represents the extrinsic parameters. The extrinsic parameter matrix \({M}_{2}\) can be expressed using three column vectors, assuming \({M}_{2}=\left[\begin{array}{ccc}{r}_{1}& {r}_{2}& t\end{array}\right]\) and are the rotation vectors and translation vector, respectively. This setup allows the establishment of the equation \({sM}_{1}\left[\begin{array}{ccc}{r}_{1}& {r}_{2}& t\end{array}\right]=\left[\begin{array}{ccc}{m}_{1}& {m}_{2}& {m}_{3}\end{array}\right]\) , where s is a scaling factor.
Figure 2 illustrates the structured light calibration model. The relationship between the projector coordinate system and the world coordinate system can be obtained based on it, as shown in Eq. (13):
When a certain point in the world coordinate system has its corresponding coordinate representation in the camera coordinate system, it can be expressed through this pair of points \(\left(\begin{array}{ccc}{x}_{wc}& {y}_{wc}& {z}_{wc}\end{array}\right)\) located in the camera coordinate system and located in the projector coordinate system. At the point below \(\left(\begin{array}{ccc}{x}_{pc}& {y}_{pc}& {z}_{pc}\end{array}\right)\) , the internal parameters of the projector and the external parameters between the camera coordinate system and the projector coordinate system are obtained according to the following formula:
The external parameters of the projector can be solved through the simultaneous world coordinate system using the following formula:
According to the above formula, the external parameters of the projector can be obtained, and the results of the external parameters are as follows:
The above completes the calibration of the opposite structured light. Considering that the lens structures of projectors and cameras are similar, distortion problems will inevitably occur. The distortion parameters can be obtained from the distortion model. The optimization function is as follows. When the summation is the minimum, the value of the distortion coefficient can be obtained. Similarly, the internal and external parameters of the projector after distortion correction can be obtained.
According to the structured light system that has been calibrated above, the encoding scheme that combines Gray code and phase shift is decoded, and the three-dimensional point cloud coordinates in the camera coordinate system can be obtained25. The decoding formula is shown in Eq. (20):
Among them, \({u}_{c},{v}_{c}\) are any points in the encoding stripe; \(s\) is the scale factor; \({f}_{x}^{c},{f}_{y}^{c}\) are the scale factors of the camera in the direction of \({u}_{c}\) axis and \({v}_{c}\) axis; \({f}_{x}^{p},{f}_{y}^{p}\) are the projector in \({u}_{p}\) axis and \({v}_{p}\) axis direction scale factor.
Our proposed system for weld seam detection integrates 2D image processing with 3D point cloud analysis, enhanced by interpretable neural networks. Figure 3 illustrates the overall architecture of our system.
Weld seam detection system architecture.
The proposed weld seam detection system integrates six key components in a seamless workflow, designed to leverage both 2D and 3D data for accurate and interpretable results. The process begins with the data acquisition module, which simultaneously captures 2D images and 3D point cloud data of the welding area, providing complementary visual and spatial information. These raw inputs then undergo preprocessing, where 2D images are enhanced and denoised, while point cloud data is filtered and downsampled to optimize processing efficiency. The refined 2D images are then analyzed by an interpretable neural network, based on an improved Faster R-CNN model26, which not only processes the images but also offers crucial insights into its decision-making process, a feature particularly valuable in industrial applications. Concurrently, the point cloud processing module examines the 3D data, extracting planes, identifying orthogonal relationships, and detecting intersection lines that may indicate weld seams. The system then employs data fusion and localization techniques to combine the outputs from both the 2D and 3D analyses, achieving precise weld seam localization. Finally, the system generates output results, providing both weld seam type identification and exact position coordinates. This integrated approach ensures a comprehensive, accurate, and transparent weld seam detection process, crucial for maintaining high standards in industrial welding applications.
This section extends from the calibration of a monocular camera, which established its intrinsic and extrinsic parameters, to focusing on calibrating the projector to determine its intrinsic parameters and its relationship with the camera. The calibration employs images projected onto a calibration board, captured alternately with and without projections to ensure accurate synchronization between the projector and the camera.
As shown in Fig. 4, the Faster-RCNN model, chosen for its robust object detection capabilities, is utilized for recognizing and localizing weld seams. This model is particularly suited for industrial applications due to its structured, hierarchical design that enhances explainability through its various components. The explainability of the Faster-RCNN components is as follows:
The proposed explainable RCNN architecture.
The model starts with input images of different dimensions (P × Q and R × S), which are processed by a backbone network, in this case, utilizing the ResNet50 architecture. This feature extraction network processes input images into hierarchical feature maps, capturing essential features like edges and textures. Each layer’s visualization assists in understanding the diverse features the network captures—from basic textures to intricate object structures—essential for detecting weld seams. These visual insights clarify how different features impact the network’s predictive decisions.
The extracted feature map is then passed to the region proposal network (RPN), which enhances the model’s explainability by calculating and visualizing potential bounding boxes and their objectness scores. The RPN uses convolutional layers to predict objectness scores and bounding box coordinates for each proposal, showing how the network initially assesses and selects possible object locations. This provides a transparent view of how it refines these initial proposals.
The proposed regions undergo region of interest (ROI) pooling to extract fixed-size feature maps, ensuring uniform size regardless of the original proposal dimensions. These fixed-size feature maps are further processed by fully connected layers, producing two sets of predictions: classification and bounding box regression. The classification layer, using a softmax function, predicts the class of the object in the proposed region by assigning a probability to each class. The bounding box regression layer predicts the precise coordinates of the bounding box surrounding the object, refining the proposals to better fit the objects. Visualizing this phase illustrates how the model refines rough proposals into precise predictions, underscoring the network’s ability to fine-tune its focus from broad regions to specific targets, and revealing the sophisticated decision-making that leads to the final detections.
The final outputs of the model are the predicted class labels and the refined bounding box coordinates for each proposed region. These predictions can be visualized and interpreted to understand which objects are detected and their locations in the input images. Integrating these explanations into the process of identifying and positioning weld seams makes the model’s workflow—from image input to final output—clear and traceable. This clarity is crucial for ensuring the reliability and building trust in the model’s effectiveness, especially in critical industrial settings where precise and interpretable object detection is essential. This explainable RCNN architecture aims to provide clear and interpretable results by breaking down the object detection process into distinct and understandable steps, achieving high accuracy and interpretability in object detection tasks.
The weld point cloud data for the carriage panel is generally represented as intersecting straight lines of two orthogonal planes. The welding areas of the carriage’s bottom, front, and side panels are formed by one or several sets of orthogonal planes, with their intersections marking the areas to be welded. Incorrect or inaccurate point cloud extraction can lead to issues like miswelds, false welds, and missed welds, severely compromising weld strength and endangering both train safety and public safety. Therefore, to mitigate safety risks associated with welding errors, it is crucial to precisely extract and detect orthogonal planes and their intersections. Additionally, identifying vertical relationships in 3D is an essential task in point cloud processing due to the prevalence of orthogonal planes in everyday environments.
This chapter introduces a method based on improved point pair feature estimation instead of using traditional plane fitting and segmentation methods. This approach introduces an orthogonal plane local parameterization method. The algorithm is divided into two stages: detecting orthogonality and refining point clouds. This chapter first introduces in detail the detection link of this algorithm, how to define the orthogonality of planes and how to detect two orthogonal plane pairs from point cloud data. Later, the refinement of this algorithm is introduced in detail, and a new voting scheme is introduced to discover orthogonal planes and detect the intersection lines of orthogonal planes.
Plane extraction algorithm, the simplest method is to perform the RANSAC method for extraction28, but the RANSAC algorithm extraction speed is slow, and the distance threshold setting is more inconvenient, and requires continuous iterative search, which is very time-consuming for large point sets. Therefore, this paper uses the mathematical relationship between point pair features and adopts a plane extraction algorithm based on PPF point pair features29. This algorithm can be calculated. The PPF point pair feature descriptor \(F\left({p}_{1},{p}_{2}\right)\) of any two directed points is used to determine whether it has coplanar characteristics, and then the plane is extracted from the point cloud. At the same time, the curved surface and noise in the environment can also be filtered out.
The schematic diagram of the algorithm is shown in Fig. 5. First, the point cloud is sampled. In this article, the sampling points are obtained from the input point cloud \(A=\{{a}_{0},{a}_{1},\dots ,{a}_{n}\}\) according to the distance method. Taking into account the relative area of the weld slope, the background area of the field of view is too small, so the sampling point interval should not be set too large. In this article, the sampling point interval is set to 25 mm. The obtained sampling points constitute a point set \(=\{{b}_{0},{b}_{1},\dots ,{b}_{m}\}\) . The remaining points constitute the point set \(C=\{{c}_{0},{c}_{1},\dots ,{c}_{n-m}\}\) . According to the coplanar PPF characteristics, the plane set corresponding to each sampling point is calculated from the sampling point set \(=\{{b}_{0},{b}_{1},\dots ,{b}_{m}\}\) }. Considering the efficiency factor of calculation, the plane set \(\Pi =\{{P}_{0},{P}_{1},\dots ,{P}_{m}\}\) is obtained in order from least to most according to the number of point clouds contained in the obtained plane set. Calculate the distance from the points in the point set \(C=\{{c}_{0},{c}_{1},\dots ,{c}_{n-m}\}\) except the sampling points to each plane \(P\) in the plane set \(\Pi =\{{P}_{0},{P}_{1},\dots ,{P}_{m}\}\) . If the point is to a certain The distance between a plane \(P\) is less than the set threshold, then the point is summarized into the corresponding plane \(P\) . After all points in the point set \(C=\{{c}_{0},{c}_{1},\dots ,{c}_{n-m}\}\) are processed, You can get the extracted plane set \(\Pi =\{{P}_{0},{P}_{1},\dots ,{P}_{m}\}\) from the original input point cloud \(A=\{{a}_{0},{a}_{1},\dots ,{a}_{n}\}\) .
Plane extraction algorithm based on point pair features.
The geometric relationship of a set of directed point pairs can be obtained only by the second component of the PPF point pair feature descriptor. However, for a point set, what is more important is the geometric characteristics presented by the point set, so it is necessary to conduct the overall detailed judgment. Therefore, in order to better distinguish between plane point sets that are truly orthogonal to each other, the local Hough voting method is introduced to detect plane pairs. This local voting method is used in the literature to detect single targets, such as spheres, cylinders, etc., but this method can still be used on the geometric relationship between orthogonal planes. Figure 6 shows the schematic diagram of points and normal vectors of an orthogonal plane.
Schematic diagram of points and normal vectors of an orthogonal plane.
It can be seen that once a set of directed point pairs is found to be OPP, \(\{{x}_{1},{n}_{1}\}\) can be defined as the reference plane, and according to the rotation invariance, use the rotation matrix \({R}_{z}\) to \({n}_{1}=({n}_{x},{n}_{y},{n}_{z})\) is aligned with the z-axis, where the expression of \({R}_{z}\) is as:
Among them, \(\varphi =\text{arctan}\left({n}_{x}/{n}_{z}\right), \omega =\text{arctan}({r}_{y}/{r}_{z})\) . In the same way, \(\{{x}_{2},{n}_{2}\}\) also uses the rotation matrix \({R}_{z}\) to complete the transformation. At this point, the conversion from three-dimensional to two-dimensional is completed. Only two degrees of freedom need to be solved. Using a two-dimensional polar coordinate system for representation can be solved using mature two-dimensional algorithm, completing the task of reducing the solution from three dimensions to two dimensions. Hough voting is performed in the 2D space \(\left(\theta ,\rho \right)\) . The variables of the voting space are as follows:
θ represents the normal direction (parallel to the datum plane, only one degree of freedom), and ρ represents the orthogonal distance from the intersection line to the datum point.
This structure provides a comprehensive overview of the hybrid 2D–3D weld seam detection system, with the system architecture diagram serving as a visual guide to the overall process. The added description of the system architecture helps to tie together the subsequent detailed explanations of the image-based and point cloud-based processing techniques, creating a more cohesive and complete presentation of the proposed method.
The experimental environment for the target detection in this project was built using Python 3.8. The deep learning framework employed is TensorFlow-gpu 2.4.0, with all necessary toolkits installed. The experiments were conducted on a laboratory computer equipped with a GTX 1660TI graphics card and 6 GB of video memory. TensorFlow, an open-source software library developed by Google, was used due to its flexibility in constructing data flow graphs for mathematical calculations and its suitability for building end-to-end deep learning model training platforms.
The dataset consists of over 2500 images, divided into a training set (90%, 2250 images), a test set (10%, 250 images), and a validation set (randomly sampled 10% from the training set, 225 images). Each weld seam sample is represented by an average of six images captured under different angles and lighting conditions. The weld seam types included in the dataset are T-shaped welds (40%), V-shaped welds (35%), and butt welds (25%).
For 2D image acquisition, we used an FLIR Blackfly S BFS-U3-51S5C-C industrial camera with a resolution of 2448 × 2048 pixels and a frame rate of 75 FPS. The lighting was provided by an LED ring light with adjustable brightness between 500 and 1500 lux, and images were captured at multiple angles within a ± 30° range relative to the weld seam plane. The 3D point cloud data were collected using a Photoneo PhoXi 3D Scanner M, with a resolution of 1.3 million points, a scanning accuracy of 50 μm, and a scanning range of 520 × 390 × 380 mm.
The neural network model is based on the Faster R-CNN architecture, using ResNet50 as the backbone. We utilized the Adam optimizer with an initial learning rate of 0.001 and employed a learning rate decay strategy. The batch size was set to 32, and the model was trained for 100 epochs. Data augmentation techniques such as random flipping, rotation (± 15°), and brightness adjustment (± 20%) were applied. The loss function combined classification loss (cross-entropy) and regression loss (smooth L1). Hyperparameters were tuned using Keras Tuner, optimizing parameters such as learning rate, dropout rate, and the number of convolutional layers.
The entire project was implemented in C++, and for the target detection component, Python functions were called using the PyObject_GetAttrString function, which greatly facilitated the integration and implementation of the project. The C++ implementation relied on OpenCV 4.5.0 for image processing and PCL 1.11.1 for point cloud processing. A key algorithm implemented in C++ was the orthogonal plane intersection line extraction, which involved extracting planes from the point cloud data using RANSAC, detecting orthogonal planes, and then refining the feature points using DBSCAN clustering and outlier removal. The pseudo-code of Orthogonal Plane Intersection Line Extraction is shown in Algorithm1:
Orthogonal plane intersection line extraction
Figure 7 shows the change curve of loss during the training process. It can be seen that various loss values are gradually declining during the training process and tend to be stable. After testing, the overall recognition rate of the weld reached 92%, and the detection accuracy was high. The output result is the pixel coordinate of the diagonal corner point of the pre-selected box on the image.
In order to solve the problems of difficulty in corner point extraction and low extraction accuracy in the curvature extraction experiment, the orthogonal plane intersection line extraction algorithm introduced in Sect “Literature Review on Weld Seam Detection” is used. First, planes with different orthogonal relationships are grouped into pairs, and the point pair features on the two groups of planes are analyzed according to the distance component of their point pair feature descriptors. Among them, those whose distance is less than the threshold are feature points. In this experiment, it is set the threshold is 0.1 mm, and it has been verified that the orthogonal weld features can be accurately extracted. As shown in Fig. 8.
Comparing Figs. 8, 9, we can find that compared with the curvature extraction method, the feature area extracted by the improved orthogonal plane intersection line algorithm is more accurate and refined.
Improved orthogonal plane intersection line extraction algorithm results.
In addition, in order to verify the accuracy of the extracted intersection line corner points, the two furthest point clouds in the feature point cloud set are selected as a pair of corner points, and the error between the distance between the two corner points and the actual size of the workpiece is compared. The workpiece 3D The size of the file is 50*50*100, and the rectangular feature area of the 2D image containing a 100 mm long side is roughly extracted to simulate the prediction results of the weld seam by the target detection network. The schematic diagram of corner distance detection is shown in Fig. 10. The experimental results of T-shaped weld corner distance detection are shown in Table 1. The experimental results of V-shaped weld corner distance are shown in Table 2. The average error of T-shaped weld detection is 2.17%, the maximum error is 3.84 mm, and the minimum error is 0.21 mm. The average error of V-shaped weld detection is 2.4%, the maximum error is 6.82 mm, and the minimum error is 0.28 mm. It can be seen that the algorithm described in this article has high-precision weld extraction capabilities and meets the needs of industrial application scenarios.
In conclusion, our study successfully demonstrates that the integration of interpretable neural networks with the fusion of 2D and 3D imaging significantly enhances the accuracy and reliability of weld seam detection in industrial environments. The added layer of interpretability not only boosts the system’s operational transparency but also builds trust among technicians, which is vital for broader acceptance and implementation. This research paves the way for future advancements in automated welding technology, emphasizing the importance of clarity and accountability in automated decision-making processes in critical manufacturing operations.
The datasets used and/or analysed during the current study available from the corresponding author on reasonable request.
Shah, H. N. M., Sulaiman, M., Shukor, A. Z., Jamaluddin, M. H. & Ab Rashid, M. Z. A review paper on vision based identification, detection and tracking of weld seams path in welding robot environment. Mod. Appl. Sci. 10(2), 83–89 (2016).
Dinham, M. & Fang, G. Autonomous weld seam identification and localisation using eye-in-hand stereo vision for robotic arc welding. Robot. Comput. Integr. Manuf. 29, 288–301 (2013).
Pires, J. N., Loureiro, A. & Bolmsjo, G. Welding Robots: Technology, System Issues and Application (Springer Science & Business Media, 2006).
Fridenfalk, M. Development of Intelligent Robot Systems Based on Sensor Control (Univ., 2003).
Liu, J., Fan, Z., Olsen, S. I., Christensen, K. H. & Kristensen, J. K. Boosting active contours for weld pool visual tracking in automatic arc welding. IEEE Trans. Autom. Sci. Eng. 14(2), 1096–1108 (2015).
Xue, K. et al. Robotic seam tracking system based on vision sensing and human–machine interaction for multi-pass mag welding. J. Manuf. Process. 63, 48–59 (2021).
Xu, Y., Fang, G., Chen, S., Zou, J. J. & Ye, Z. Real-time image processing for vision-based weld seam tracking in robotic GMAW. Int. J. Adv. Manuf. Technol. 73(9–12), 1413–1425 (2014).
Guo, B., Shi, Y., Yu, G., Liang, B. & Wang, K. Weld deviation detection based on wide dynamic range vision sensor in MAG welding process. Int. J. Adv. Manuf. Technol. 87(9–12), 3397–3410 (2016).
Li, Y., Li, Y. F., Wang, Q. L., Xu, D. & Tan, M. Measurement and defect detection of the weld bead based on online vision inspection. IEEE Trans. Instrum. Meas. 59(7), 1841–1849 (2010).
Ma, Y. et al. An efficient and robust complex weld seam feature point extraction method for seam tracking and posture adjustment. IEEE Trans. Ind. Inform. 19(11), 10704–10715 (2023).
Lin, Z., Shi, Y., Wang, Z., Li, B. & Chen, Y. Intelligent seam tracking of an ultranarrow gap during K-TIG welding: A hybrid CNN and adaptive ROI operation algorithm. IEEE Trans. Instrum. Meas. 72, 1–14 (2023).
Zhang, Y., Tiňo, P., Leonardis, A. & Tang, K. A survey on neural network interpretability. IEEE Trans. Emerg. Topics Comput. Intell. 5(5), 726–742 (2021).
Fan, F.-L., Xiong, J., Li, M. & Wang, G. On interpretability of artificial neural networks: A survey. IEEE Trans. Radiat. Plasma Med. Sci. 5(6), 741–760 (2021).
Article PubMed PubMed Central Google Scholar
Samek, W., Montavon, G., Lapuschkin, S., Anders, C. J. & Müller, K.-R. Explaining deep neural networks and beyond: A review of methods and applications. Proc. IEEE 109(3), 247–278 (2021).
Song, Z., Chung, R. & Zhang, X.-T. An accurate and robust strip-edge-based structured light means for shiny surface micromeasurement in 3-D. IEEE Trans. Ind. Electron. 60(3), 1023–1032 (2013).
Li, T. & Zheng, J. Multi-layer and multi-channel dynamic routing planning and initial point positioning of weld seam based on machine vision. IEEE Access https://doi.org/10.1109/ACCESS.2023.3319076 (2023).
Article PubMed PubMed Central Google Scholar
Wang, Y., Liu, H. & Zhang, J. Deep learning-based weld seam detection using convolutional neural networks. IEEE J. Sel. Topics Appl. Earth Obs. Remote Sens. 15, 3225–3235. https://doi.org/10.1109/JSTARS.2022.3111110 (2022).
Peng, J., Chen, Q., Lu, J., Jin, J. & Luttervelt, C. A. Real-time optimization of robotic arc welding based on machine vision and neural networks. In IECON ’98. 24th Annual Conference of the IEEE Industrial Electronics Society 1279–1283 (IEEE, 1998).
Xu, C., Wang, J., Zhang, J. & Lu, C. A new welding path planning method based on point cloud and deep learning. In 2020 IEEE 16th International Conference on Automation Science and Engineering (CASE) 786–791 (IEEE, 2020).
F Bahreini, A Hammad. Point Cloud Semantic Segmentation of Concrete Surface Defects Using Dynamic Graph CNN. In: Proceedings 2021 International Symposium on Automation and Robotics in Construction (ISARC). 53–60, https://doi.org/10.22260/isarc2021/0053. (2021).
Ren, Y. & Yu, H. The pose adjustment system of robotic arm adopts binocular vision and machine learning. In 2019 International Conference on Computer Network, Communication and Information Systems (CNCI) (Atlantic Press, 2019).
Hackel, T. et al. Semantic3D.net: A new large-scale point cloud classification benchmark. ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci. https://doi.org/10.5194/ISPRS-ANNALS-IV-1-W1-91-2017 (2017).
Guiotte, F., Bin, M., Lefevre, S., Tang, P. & Corpetti, T. Relation network for full-waveform LiDAR classification. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. https://doi.org/10.5194/isprs-archives-xliii-b3-2020-515-2020 (2020).
Yun, J. et al. Grasping pose detection for loose stacked objects based on convolutional neural network with multiple self-powered sensors information. IEEE Sens. J. 23(18), 18391–18398. https://doi.org/10.1109/JSEN.2022.3190560 (2023).
Kerle, N., Nex, F., Duarte, D. & Vetrivel, A. UAV-based structural damage mapping—Results from 6 years of research in two European projects. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. https://doi.org/10.5194/isprs-archives-xlii-3-w8-187-2019 (2019).
Ren, S., He, K., Girshick, R. & Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017).
Kim, J. et al. Multiple weld seam extraction from RGB-depth images for automatic robotic welding via point cloud registration. Multimed. Tools Appl. 80(13), 1–17 (2021).
Fischler, M. A. & Bolles, R. C. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381–395 (1981).
Zhuang, C. et al. Semantic part segmentation method based 3D object pose estimation with RGB-D images for bin-picking. Robot. Comput. Integr. Manuf. 68, 102086 (2016).
College of Sino-German Science and Technology, Qingdao University of Science and Technology, Qingdao, China
College of Automation and Electronic Engineering, Qingdao University of Science and Technology, Qingdao, China
Zengxu Li, Guodong Chen & Yaobin Yue
You can also search for this author in PubMed Google Scholar
You can also search for this author in PubMed Google Scholar
You can also search for this author in PubMed Google Scholar
You can also search for this author in PubMed Google Scholar
S.W. and Y.Y.: conceptualization, methodology. S.W.: reviewing and editing. Z.L. and G.C.: data curation, sofware. S.W.: writing—original draft preparation. All authors have read and agreed to the published version of the manuscript.
The authors declare no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
Wang, S., Li, Z., Chen, G. et al. Weld seam object detection system based on the fusion of 2D images and 3D point clouds using interpretable neural networks. Sci Rep 14, 21137 (2024). https://doi.org/10.1038/s41598-024-71989-w
DOI: https://doi.org/10.1038/s41598-024-71989-w
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative
Scientific Reports (Sci Rep) ISSN 2045-2322 (online)
electron beam welder Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.