International Contest on Pedestrian Attribute Recognition

21st International Conference on Computer Analysis of Images and Patterns - CAIP 2025

Contest

Following the success of the previous edition presented during CAIP 2023, the Pedestrian Attribute Recognition (PAR) 2025 Contest is an international competition aimed at assessing methods for recognizing pedestrian attributes from images. We provide the participants with the Mivia PAR KD Dataset 2025, featuring newly annotated images with labels such as clothing color, gender and the presence or absence of a bag or hat. After the contest, the dataset will be made publicly available to the scientific community, with the goal to provide one of the largest datasets for PAR with the considered set of annotations. Competing methods will be evaluated based on accuracy using a distinct private test set, separate from the training data.

Recently, a wide variety of methods have been proposed to tackle the challenge of PAR in both effective and efficient ways. In the 2023 edition, the winning method, which leveraged Visual Question Answering (VQA), achieved remarkable success by integrating Large Language Models. This approach reached an impressive 92% accuracy on the contest’s private test set, highlighting the immense potential of Vision-Language Models (VLMs) in addressing complex PAR challenges. Considering the rapid advancements in VLMs over the past two years, we expect many of the proposed methods to take advantage of these cutting-edge technologies. However, the competition is not limited to a specific approach and every innovative solution is not only welcomed but highly valued, contributing to the ongoing progression of this field.

Dataset

For this contest, we provide the participants with the MIVIA PAR KD Dataset 2025, which includes annotations for five pedestrian attributes. The dataset contains over 105,000 images, each labeled with the following attributes:

  • COLOR OF THE CLOTHES: the available colours are black, blue, brown, gray, green, orange, pink, purple, red, white and yellow represented by the labels [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11].
  • GENDER: classified as either male or female, represented by [0, 1].
  • BAG: indicates whether a bag is present or not, represented by [0, 1].
  • HAT: indicates whether a hat is present or not, represented by [0, 1].

The dataset has been compiled from multiple sources, including existing datasets (e.g., PETA, RAP, Colorful among others) and private images where pedestrian were manually extracted and labelled. Due to varying collection conditions, the dataset is heterogeneous in terms of image size, lighting, pose and distance from the camera. Each image contains a single pre-cropped person. Participants will receive a folder with the images and a CSV file containing the labels for the training samples.

Unlike the previous edition, all training samples have complete labels, eliminating any missing annotations through knowledge distillation; the method adopted to generate the missing labels is the winner of the PAR 2025 contest. In addition, we have added new samples to the training set that will be provided to further support participants in creating robust models.

The effectiveness of the proposed methods will be evaluated on a challenging private test set, not made available to the participants. In this way, the evaluation is entirely fair and we ensure that there is no overlap between the training and the test samples.

Evaluation protocol

The proposed methods will be evaluated based on accuracy across all tasks.

Accuracy $A$ is defined as the ratio of correct predictions (prediction $p_i$ matches the ground truth $g_i$) to the total number of samples $K$:

$ A= \frac{\sum_{i=1}^K (p_i==g_i)}{K} $

We calculate the accuracy for each of the five attributes as follows:

  • $A_u$: accuracy in recognizing the color of the upper body clothing;
  • $A_l$: accuracy in recognizing the color of the lower body clothing;
  • $A_g$: accuracy in recognizing gender;
  • $A_b$: accuracy in recognizing the presence of a bag;
  • $A_h$: accuracy in recognizing the presence of a hat;
The higher is the accuracy achieved by a method, the higher is its effectiveness in the recognition of that specific pedestrian attribute.

The contest ranking is determined by the Mean Accuracy (mA), which is the mean of the accuracies listed above:

$ mA= \frac{A_u+A_l+A_g+A_b+A_h}{5} $

The method with the highest Mean Accuracy will be declared the winner of the PAR 2025 Contest, as it will demonstrate the best overall performance across all the tasks.

Rules

  • The deadline for method submission is May 31, 2025. Submissions must be made via email, in which participants must share (either directly or via external links) the trained model, the code and a technical report of the method.
  • The participants can obtain the training set, validation set and their annotations by sending an email, specifying their team name.
  • Each participant must train a neural network to predict all the required pedestrian attributes for each sample. Teams are free to design novel neural network architectures, define new training procedures or propose innovative loss functions.
  • The participants must submit their trained model and code by carefully following the detailed instructions provided here.
  • Participants are highly encouraged to submit their contest papers via email by the deadline of June 15, 2025. The top three papers will be featured in the proceedings of the CAIP 2025 main conference. When submitting a paper, participants are requested to cite the official contest paper, which can be downloaded from the bibtex file or as follows:
    • Greco A., Vento B., "An extended dataset and a baseline for pedestrian attribute recognition with advanced neural networks", 21st International Conference Computer Analysis of Images and Patterns, CAIP 2025

Instructions

The methods proposed by participants will be executed on a private test set. To ensure full flexibility in choosing software libraries and to enable an accurate reproduction of their processing pipeline, the evaluation will be conducted on Google Colab (follow this tutorial).

Therefore, participants are required to submit an archive (download an example) containing the following elements:

  • A Python script test.py, which takes as input a CSV file formatted the same as the training annotations (--data) and the folder of test images (--images), and produces as output a CSV file with the predicted attributes for each image (--results). The script should be executable with the following command:
    python test.py --data foo_test.csv --images foo_test/ --results foo_results.csv
  • A Google Colab Notebook test.ipynb, which includes the commands to install all necessary software dependencies and executes the test.py script.
  • All the necessary files for running the test, such as the trained model, additional scripts and so on.

The provided sample test.py script also includes reading the CSV file with annotations and writing the results file. Each row of the CSV file consists of the image filename (e.g. 000000.jpg) and the estimated attributes (e.g. 3,2,1,1,1), separated by commas (as per the CSV standard). For example, a row might look like: 000000.jpg,3,2,1,1,1. The results file should be formatted exactly in the same way.

For any questions or concerns, feel free to contact us at this email.

Organizers

Antonio Greco

Tenure-Track Assistant Professor
Dept. of Information and Electrical Engineering and Applied Mathematics (DIEM)
University of Salerno, Italy

Bruno Vento

PhD Student
Dept. of Electrical Engineering and Information Technology (DIETI)
University of Napoli, Italy

Contact

par2025@unisa.it

+39 089 963006

Loading
Your message has been sent. Thank you!