Charmve/Surface-Defect-Detection
Charmve/Surface-Defect-Detection
Surface Defect Detection: Datasets & Research Papers
If you are interested in more details, please visit our website.
We are continuously summarizing and compiling critical papers and open-source datasets related to surface defect detection, which hold significant value in this field. Key research papers from previous years have been gathered and can be accessed in the Papers folder.
Dataset access: Google Drive
|
o7p5
Overview
Currently, machine vision-based surface defect detection systems have largely superseded manual inspection methods across various sectors, such as 3C products, automotive, household appliances, machinery manufacturing, semiconductors, electronics, chemicals, pharmaceuticals, aerospace, and light industries. Conventional surface defect detection frequently employs traditional image processing techniques or crafted features alongside classifiers. The design of imaging schemes is fundamentally rooted in the unique properties of the surfaces or the nature of the defects being inspected. A well-structured imaging approach ensures uniform lighting and accurately reflects the surface imperfections. Recently, defect detection techniques leveraging deep learning have gained traction in multiple industrial settings.
In contrast to the distinct classification, detection, and segmentation tasks known in computer vision, defect detection encompasses more generalized requirements. These requirements can notably be categorized into three levels: identifying the defect type (classification), determining the location of the defect (localization), and counting the number of defects (detection).
*** ✮ ✮ ✮ ✮ ✮ ✮ ✮ ✮ ✮ ✮ ✮ ✮ ✮ ✮ ✮ ***
Star anti-lost
✮ ✮ ✮ ✮ ✮ ✮ ✮ ✮ ✮ ✮ ✮ ✮ ✮ ✮ ✮
Content Breakdown
1. Critical Aspects of Surface Defect Detection
1.1 The Small Sample Challenge
Deep learning methodologies are extensively utilized in various computer vision applications, with surface defect detection considered a specialized industrial application. Historically, deep learning's direct application in this area has been hindered by the scarcity of defect samples in actual industrial settings.
Unlike the more than 14 million images found in the ImageNet dataset, surface defect detection grapples with the small sample challenge. In numerous real-world scenarios, defective images can be as limited to just a few or several dozen. There are currently four prominent strategies to address this small sample issue:
- Data Augmentation and Generation
Data augmentation methods typically involve using various image processing techniques—such as mirroring, rotating, translating, distorting, filtering, and adjusting contrast—to generate additional samples from existing defect images. Data synthesis is also a common practice, where defects are overlaid onto normal samples to create defective representations.
- Pre-training and Transfer Learning
Training deep learning networks on small datasets can easily lead to overfitting. Hence, approaches involving pre-trained networks or transfer learning have become widely adopted strategies.
- Thoughtful Network Structure Design
By strategically designing network architectures, the demand for training samples can be distinctly reduced. Using CNNs to directly classify features derived from compressed sample data ensures that the network's sample requirements diminish significantly compared to raw images. Additionally, twin-network-based defect detection can be viewed as a specialized structure that alleviates sample constraints.
- Unsupervised or Semi-supervised Approaches
In an unsupervised model, only normal samples are utilized, eliminating the need for defective samples altogether. On the other hand, semi-supervised methods leverage unlabeled samples to navigate the challenges posed by small datasets during training.
BACK to Content Breakdown -->
2. Real-time Detection Challenges
Deep learning-based defect detection encompasses three pivotal components in industrial implementation: data labeling, model training, and model inference. The real-time aspect, particularly during model inference, is essential in actual industrial applications. Presently, the majority of defect detection approaches prioritize classification accuracy over inference efficiency. Techniques to accelerate model performance include model weighting and pruning strategies. Furthermore, although GPUs are widely employed for deep learning computations, advancements in technology suggest that FPGAs may emerge as compelling alternatives.
BACK to Content Breakdown -->
2. Prominent Datasets for Surface Defect Inspection
The NEU-CLS dataset is suitable for both classification and localization tasks.
latest access - (#16)
Northeastern University (NEU) has released a surface defect dataset that compiles six typical surface defects found in hot-rolled steel strips: rolling scale (RS), plaque (Pa), cracking (Cr), pitted surface (PS), inclusions (In), and scratches (Sc). This dataset includes 1,800 grayscale images, with each type encompassing 300 samples. For defect detection assignments, the dataset offers annotations detailing both the type and position of defects within each image. Each defect is marked by a yellow box indicating its location and a green label denoting its class score.
Kaggle - Severstal: Steel Defect Detection
Severstal is at the forefront of efficient steel mining and manufacturing. The company's ethos emphasizes the necessity for advancements across economic, ecological, and social dimensions of the industry, underscoring a commitment to corporate responsibility. Recently, Severstal established the largest industrial data lake in the country, accumulating vast amounts of previously discarded data. Now, the organization is turning to machine learning to enhance automation and uphold production quality.
https://www.kaggle.com/c/severstal-steel-defect-detection
BACK to Content Breakdown -->
This dataset contains images of both functional and defective solar cells extracted from electroluminescence (EL) images of solar modules.
The dataset includes 2,624 samples of 300x300 pixels in 8-bit grayscale of functional and defective solar cells, each exhibiting varying degradation levels, drawn from 44 diverse solar modules. The annotated images capture defects of intrinsic or extrinsic origin, known to impair power efficiency in solar modules.
Normalization has been applied to all images regarding size and perspective. Furthermore, any distortion due to the camera lens during EL image capture was rectified in advance before solar cell extraction.
BACK to Content Breakdown -->
3. Metal Surface: KolektorSDD
The dataset has been constructed from images of defective electrical commutators, which were provided and annotated by Kolektor Group. Microscopic fractions or cracks were identified on the surface of the plastic casing in these electrical commutators. Each commutator's surface was captured in eight distinct overlapping images, taken under controlled conditions.
The dataset comprises:
- 50 unique items (defective electrical commutators)
- 8 surface images per item
- A total of 399 images:
-- 52 images featuring visible defects
-- 347 images with no visible defects - Original image dimensions:
-- width: 500 px
-- height: variable px - For training and evaluation, images are resized to 512 x px.
Each item exhibits defects in at least one image, while two items present defects across two images, thus accumulating 52 images demonstrating visible defects. The remaining 347 images serve as negative examples reflecting non-defective surfaces.
BACK to Content Breakdown -->
4. PCB Inspection: DeepPCB
an example of the tested image the corresponding template image
Figure 1. PCB Inspection Dataset.
BACK to Content Breakdown -->
5. Fabric Defects Dataset: AITEX
- Download Link: https://pan.baidu.com/s/1cfC4Ll5QlnwN5RTuSZ6b7w (password:
b9uy
)
This dataset contains 245 x256 pixel images, covering seven diverse fabric structures. With 140 images of non-defective samples (20 per fabric type) and 105 images demonstrating 12 different fabric defect types common in the textile industry, image size allows variation in window sizes, facilitating an increase in sample numbers. Additionally, the online dataset provides segmentation masks for all defective images, where white pixels indicate defect zones while black pixels represent non-defective areas.
BACK to Content Breakdown -->
6. Fabric Defect Dataset (Tianchi)
- Download Link: https://pan.baidu.com/s/1LMbujxvr5iB3SwjFGYHspA (password:
gat2
)
During the fabric production process, various factors may lead to defects such as stains, holes, and lint. Ensuring product quality necessitates defect inspections.
Inspections for fabric defects represent a crucial aspect of quality management within the textile industry. Presently, manual inspections are susceptible to human biases and lack consistency, particularly as personnel may face vision fatigue due to prolonged exposure to bright lighting conditions. Given the wide variety of fabric defects, morphological differences, and observation challenges, the intelligent detection of fabric imperfections remains a longstanding technical hurdle within the industry.
This dataset encapsulates an extensive range of significant fabric defects prevalent in the textile industry, encompassing images that may feature one or more defects. The collection includes approximately 6,000 data points depicting plain fabrics and nearly 12,000 data points of patterned fabrics used throughout the inspections.
BACK to Content Breakdown -->
7. Aluminium Profile Surface Defect Dataset (Tianchi)
The surface of aluminum profiles may exhibit abnormalities like cracks, peeling, and scratches due to various factors throughout the production process, adversely affecting quality. To ensure high standards, manual inspections are often required. However, the inherent textures on aluminum surfaces can complicate the distinction between profiles and defects.
Traditional visual inspection methods pose several challenges; they are labor-intensive and often struggle to accurately identify surface defects in a timely manner, impacting quality control efficiency. In recent years, rapid advancements in deep learning have significantly influenced image recognition fields. Aluminum profile manufacturers are eager to leverage these state-of-the-art AI technologies to enhance their existing quality control processes, aiming for greater automation, minimizing oversight, and improving product standards. AI, especially through deep learning approaches, allows managers to comprehensively monitor the condition of surface quality.
The dataset includes 10,000 monitoring images showcasing defects in aluminum profiles produced under real conditions; each image features one or more defects, clearly identifying the type of imperfections present.
BACK to Content Breakdown -->
8. Weakly Supervised Learning for Industrial Optical Inspection (DAGM)
Dataset description:
-
Aimed at various defects on textured backgrounds.
-
Trained data with reduced supervision.
-
Includes ten datasets, with the first six designated for training and the last four for testing.
-
Every dataset contains "non-defective" images and 150 "defective" images saved in 8-bit grayscale PNG format. Each dataset is derived from a unique texture and defect model.
-
"No Defect" images guarantee the absence of defects in the texture, while "Defective" images feature backgrounds with marked defects.
-
All datasets are consistently split into training and testing subsets of equal proportions.
-
Weak labels are depicted as ellipses denoting approximate defect areas.
BACK to Content Breakdown -->
9. Cracks on Construction Surfaces
The CrackForest Dataset comprises annotated images of road cracks that accurately depict urban roadway conditions.
-
Github Link: https://github.com/cuilimeng/CrackForest-dataset
-
Download link: https://pan.baidu.com/s/j5QbDr7T3XQvDxAzVpg (password:
jajn
)
Figure 2. Cracks on the Bridge(left) and Cracks on the Road Surface.
-
Cracks in bridges. The dataset includes images of bridge cracks, although lacking pixel-level ground truth annotations. Files can be reached via https://github.com/Charmve/Surface-Defect-Detection/tree/master/Bridge_Crack_Image.
-
Cracks on road surfaces. Sourced from Shi Yong, Cui Limeng, Qi Zhiquan, Meng Fan, and Chen Zhensong, the original dataset can be accessed at https://github.com/Charmve/Surface-Defect-Detection/tree/master/CrackForest. We have extracted image files featuring pixel-level ground truth.
BACK to Content Breakdown -->
10. Magnetic Tile Dataset
Access the magnetic tile dataset prepared by user abin24 at https://github.com/Charmve/Surface-Defect-Detection/tree/master/Magnetic-Tile-Defect, utilized in their publication "Surface defect saliency of magnetic tile", which can be referenced here or here.
Figure 3. Overview of the dataset.
This dataset encompasses images depicting six prevalent magnetic tile defects, complete with pixel-level ground-truth annotations.
BACK to Content Breakdown -->
11. RSDDs: Rail Surface Defect Datasets
The RSDDs dataset offers two categories: Type I captures rail defects from high-speed lanes, comprising 67 challenging images, while Type II features images from regular or heavily trafficked tracks, containing 128 complex images.
Every image in these datasets features at least one defect, with intricate and noisy backgrounds.
Defects within the RSDDs dataset have been marked by proficient human observers specializing in track surface inspection.
-
Official Link: http://icn.bjtu.edu.cn/Visint/resources/RSDDs.aspx
-
Download Link: https://pan.baidu.com/share/init?surl=svsnqL0r1kasVDNjppkEwg (password:
nanr
)
BACK to Content Breakdown -->
12. Kylberg Texture Dataset v.1.0
Figure 4. Sample patches from each of the 28 texture classes.
Brief overview:
- 28 distinct texture classes, as depicted in Figure 4.
- Each class consists of 160 unique texture patches (Alternative dataset comprises 12 rotations per original patch, yielding 160*12 texture patches per class).
- Texture patch dimensions: 576x576 pixels.
- File format: Lossless compressed 8-bit PNG.
- All patches are normalized to have a mean value of 127 and a standard deviation of 40.
- Each texture class has its dedicated directory.
- Files follow the naming convention:
blanket1-d-p011-r180.png
, whereblanket1
signifies the class,d
represents the original image sample number (possible values being a, b, c, or d),p011
indicates the patch number, andr180
denotes a patch rotated by 180 degrees.
Official Link: http://www.cb.uu.se/~gustaf/texture/
Eastloong comprises additional product offerings and information you may require; kindly explore our site.
BACK to Content Breakdown -->
13. KTH-TIPS Database
Repetitive background texture dataset, with the sample image following:
BACK to Content Breakdown -->
14. Escalator Step Defect Dataset
Official Link: https://aistudio.baidu.com/aistudio/datasetdetail/
BACK to Content Breakdown -->
15. Transmission Line Insulator Dataset
In this dataset, Normal_Insulators
consists of 600 drone-captured insulator images, whereas Defective_Insulators
contains 248 defective insulator images, inclusive of datasets and labels.
Official Link: https://github.com/InsulatorData/InsulatorDataSet
BACK to Content Breakdown -->
16. MVTEC ITODD
The MVTec Industrial 3D Object Detection Dataset (MVTec ITODD) serves as a public resource for 3D object detection and pose estimation with a concentrated focus on industrial applications.
The dataset encompasses:
- 28 objects and labeled scenes containing instances of these objects
- Five distinct sensors (including two 3D sensors and three grayscale cameras) scrutinizing each scene
For further insights, refer to the accompanying PDF file.
Download link: https://www.mvtec.com/company/research/datasets/mvtec-itodd
BACK to Content Breakdown -->
17. BSData - Instance Segmentation and Industrial Wear Forecasting Dataset
The dataset encompasses channel 3 images, including 394 image annotations for the surface damage type pitting. Annotations made with the labelme tool are provided in JSON format, allowing conversion to VOC and COCO formats. All images originate from two BSD types.
The alternative BSD type appears across 325 images, with two image sizes. Captured continuously, this type reflects the evolution of the degree of soiling.
The dataset also incorporates 27 sequences detailing pitting development across 69 images each.
Figure 5. On the left image examples, on the right associated PNG Annotations.
Official link: https://github.com/2Obe/BSData
Sincere gratitude to @Beñat Gartzia for the recommendation and all your attention!
BACK to Content Breakdown -->
18. The Gear Inspection Dataset
The Gear Inspection Dataset (GID) corresponds to a competition organized by Baidu (China) Co., known as the "National Artificial Intelligence Innovation Application Competition." It comprises 2,000 grayscale images, annotated for three defect types sourced from real-world scenarios. Each image includes defect identifications within a separate JSON file containing image names, label categories, bounding boxes, and segmentation polygons. However, the labeling categories remain numerically defined, lacking specific defect type information which complicates cross-references with related datasets.
Figure 6. Examples of validation test images and associated labels.
Official link: http://www.aiinnovation.com.cn/#/dataDetail?id=34
-
Download Link:
- Gear Detection Training Dataset: https://pan.baidu.com/s/17HoFfBUQGeX7G0ibkPExrw (password:
hm7k
) - Gear Detection Evaluation List A: https://pan.baidu.com/s/157Zf7hcTM78GhXtXI5ySFQ (password:
2R6K
) - Gear Detection Evaluation List B: https://pan.baidu.com/s/1OjOZotqlRSvsYLA_qH2nXA (password:
hypd
)
- Gear Detection Training Dataset: https://pan.baidu.com/s/17HoFfBUQGeX7G0ibkPExrw (password:
-
Mirrors:
- Gear Detection Training Dataset: https://drive.google.com/file/d/1CZo-Ab5BXkTjV-b1-NIFzYMjfJQMl4nG/view?usp=share_link
- Gear Detection Evaluation List A: https://drive.google.com/file/d/1-0sSrmhElBseeZWICu77lzTxoOiRD8yG/view?usp=share_link
- Gear Detection Evaluation List B: N/A
Note: The contest dataset is designated solely for research purposes.
BACK to Content Breakdown -->
19. AeBAD Aircraft Engine Blade Anomaly Detection
Download link: http://suo.nz/2IU48P
The real-world aero-engine blade anomaly detection (AeBAD) dataset consists of two sub-datasets: the single blade dataset (AeBAD-S) and the blade video anomaly detection dataset (AeBAD-V). AeBAD distinguishes itself with two specific characteristics: (1) Target samples may not be aligned or are presented at different scales. (2) The distributions of normal samples shift between the training and testing sets, primarily due to variations in lighting and viewing angles.
BACK to Content Breakdown -->
20. BeanTech Anomaly Detection (BTAD)
Download Link: http://suo.nz/2JEGEi
The BTAD (BeanTech Anomaly Detection) dataset comprises real-world industrial anomaly data, totaling numerous images representing three distinct industrial products.
BACK to Content Breakdown -->
3. Further Dataset Resources
I have amassed various datasets related to surface defect detection; however, numerous datasets remain uncollected. For those not included in this repository, we encourage exploration of the following platforms. We would also greatly appreciate contributions of new datasets from the community.
- Kaggle: https://www.kaggle.com/datasets
- Paper With Code: https://paperwithcode.com/sota
- Registry of Open Data on AWS: https://registry.opendata.aws
- Microsoft Research Open Data: https://msropendata.com
- Awesome Public Datasets: https://github.com/awesomedata/awesome-public-datasets
BACK to Content Breakdown -->
4. Research Papers on Surface Defect Detection
I have compiled several articles concerning surface defect detection, focusing primarily on inspecting defects related to materials such as metals, LCD screens, buildings, and power lines. The methodologies categorize into classification, detection, reconstruction, and generation approaches. The electronic versions (PDF) of these papers are stored in the folder named according to the date within the 'Paper' directory.
Access the papers via: [Papers].
BACK to Content Breakdown -->
Acknowledgements
It is essential to acknowledge those who initially opened up the datasets included in this repository. Their contributions have been invaluable to our research efforts. The inspiration for gathering such datasets emerged from reading an article on surface defect detection by SFXiang from "AI算法", prompting the development of this comprehensive repository. The paper collection effort is credited to a CSDN user, furthering the initialization of this project. Contributions up to November 19 have been included, and improvements will continue later. Contributions are always welcome.
Thank you once again to all the open-source contributors of these datasets.
BACK to Content Breakdown -->
Download
- To download ZIP, click here or run
git clone https://github.com/Charmve/Surface-Defect-Detection.git
in your terminal. - For Chinese Mainland - Download here (Password:
i20n
)
BACK to Content Breakdown -->
Notification
This project has been made possible through the collaborative efforts of numerous individuals who have contributed their research or industry applications. This dataset is strictly for research purposes only.
For any inquiries or ideas, please feel free to reach out.
Community Engagement
-
Engage in Github discussions or issues.
-
Join our QQ Group (password required).
-
Connect on WeChat using ID: Yida_Zhang2.
-
Email: yidazhang1@gmail.com.
Support Options
Support this initiative by becoming a sponsor. Your name and/or logo will be displayed on our homepage with a link to your website.
Citation Information
You can reference this repository using the following BibTeX entry:
@misc{Surface Defect Detection,
title={Surface Defect Detection: Dataset and Papers},
author={Charmve},
year={.09},
publisher={Github},
journal={GitHub repository},
howpublished={\url{https://github.com/Charmve/Surface-Defect-Detection}},
}
Star Count Over Time
Feel free to reach out with any questions or suggestions for improvement!
Star this repository!
Created by Charmve & maiwei.ai Community | Deployed on Kaggle
* Update on Sep 17, @Charmve, Star and Fork
Comments
0