# Dataset Description

In this section, a sample refers to "a single image" of welding.

A dataset available in this challenge is described using a Parquet file containing metadata for all samples within the dataset. A Parquet file represents a dataframe. For each sample, the following fields are available:

Field Description
sample_id Unique identifier for the sample, following the template "data_X".
class Real state of the welding present in the image; this is the ground truth. Two values are possible: OK or KO.
timestamp Datetime when the photo was taken; this field is not expected to be useful.
welding-seams Name of the welding seam to which the welding belongs. Welding seams are named "c_X".
labelling_type Type of person who annotated the data. Two possible values: "expert" or "operator".
resolution List containing the resolution of the image [width, height].
path Internal path of the image in the challenge storage.
sha256 A unique hexadecimal key representing the image data, used to detect alteration or corruption in the storage.
storage_type Type of sample storage: "s3" or "filesystem".
data-origin Type of data. This field has two possible values: "real" or "synthetic". The provided datasets contain only real samples.
blur_level Level of blur in the image, measured numerically using OpenCV. The lower this value, the blurrier the image.
blur_class Class of blur deduced from the "blur_level" field. Two classes are considered: "blur" and "clean". The value is set to "blur" when the blur level is below 950.
luminosity_level Percentage of luminosity in the image, measured numerically.
external_path URL of the image. This URL can be used by challengers to directly download the sample from storage.

# Dataset Examples

# Example Mini Dataset

A reduced sample of the dataset "example_mini_dataset" is provided to give an overview of the final dataset for this challenge. This sample contains 2,857 images of welding, split into three different welding seams: c102, c20, and c33.

The metadata file for this dataset can be found here: Example Mini Dataset Metadata (opens new window)

Below is an example of the first nine rows from the metadata file:

meta_examples.png

The dataset can be downloaded directly as a ZIP file: Download Example Mini Dataset (opens new window)

# Welding Detection Challenge Dataset

The complete dataset provided for this challenge is named "welding-detection-challenge-dataset". It contains 22,753 images of welding, covering three different welding seams: c20, c102, and c33.

The metadata file for this dataset can be found here: Welding Detection Challenge Dataset Metadata (opens new window)

Please note that this complete dataset is the one required for the challenge.

The full dataset can be downloaded as a ZIP file: Download Welding Detection Challenge Dataset (opens new window)



DebiAI is an open-source bias detection and contextual evaluation tool for AI projects.
We used it to explore the Welding Detection Challenge Dataset Metadata parquet file:

Dataset analysis DebiAI dashboard

Analyze the challenge dataset on our public DebiAI instance

DebiAI was designed to assist data scientists in exploring datasets like the one described in this challenge.
How to create your own Challenge Dataset DebiAI project

Last Updated: 4/17/2025, 3:56:30 PM