ICVGIP 2021 Contests (EXTENDED)

Welcome to the ICVGIP 2021 contests. This year ICVGIP is hosting three competitions, with the aim to stimulate the students to tackle large scale applied learning problems.

The contest is open to full time students registered with an institute anywhere in the world. The students can participate either individually or in teams of maximum 5 members.

Note: Non-student researchers can participate in this contest, but will not be eligible for the prizes. Hence, they will be evaluated separately.


Prizes (Each Track)

  • First Prize
INR 7500
  • Second Prize
INR 3500
  • Third Prize
INR 2000
Sponsored by Infosys Center for Artificial Intelligence, IIIT-Delhi and TensorTour

NEW Teams active on eval.ai leaderboard will be eligible for reimbursement of Colab Pro subscription for up to 2 months
NEW All active participants will get participation certificates

Note: The organizers reserve the right to call off the contest if there are not sufficient teams

Important dates

  • 04 Oct., 2021
Registration opens
  • 04 Oct., 2021
Training and validation data released
  • 22 Nov., 2021
Seminar: Testing the model on eval.ai
  • 25 Nov., 2021
Seminar: QnA Session: Problem Statement and GPU Support
  • 10 Dec., 2021
Test phase start
  • 16 Dec., 2021
Final submissions close
  • 19 Dec., 2021
Report submission deadline
  • 19 Dec., 2021
Contest workshop, top contenders give a talk
  • 15 Jan., 2022
Extended-test phase start NEW
  • 21 Jan., 2022
Final submissions close NEW
  • 23 Jan., 2022
Report submission deadline NEW

Participate


Note: The prize will be distributed to only those winners who submit a small report describing their approach and analysis before 23rd Jan., 2022
The winners of each track will be on a joint report with organizers, to be posted on arXiv. The report submission deadline is thus firm. Kindly be prepared for it.

Task 1 - Audio-Visual Retrieval

Description

The task is to learn a method which takes and audio (video) as a query and returns the relevant videos (audios) among a big set of gallery examples.

Given a query example in one modality (audio/video) the task is to retrieve relevent examples in the other modality (video/audio). For every data point, audio and video data are available, along with class level annotations. The class name can also be considered as a third modality, i.e. text. The retrieval examples are considered correct if they are semantically similar to query, i.e. they share same class label as the query.

Dataset

AudiosetZSL dataset will be used for the task. This dataset was oringinally proposed for the task of zero-shot classification and retrieval of videos and was curated from a large dataset, Audioset.

For this challenge, only the seen classes from the dataset will be considered. It contains a total of 79,795 training examples and 26,587 validation example. Out of the total 26,593 testing examples, a subset will be used for the final evaluation. We have provided the features for both audio and video modalities, extracted using pre-trained networks. For a fair camparison it is mandatory for everyone to use the features provided. More details about the dataset and task can be found in the papers below.

  1. Coordinated Joint Multimodal Embeddings
  2. Discriminative Semantic Transitive Consistency

Evaluation metric

ClassAverage mAP will be used as the evaluation metric. Each retrieval example will produce an average precision (AP) score. Averaging AP for all the query from a particular class will give the mAP for that class. ClassAverage mAP is then obtained by averaging mAP for all the class. ClassAverage Map can be calculated for both audio to video and video to audio retrieval. The final score will be the average of both of them.

Final mAP = 0.5*(audio2video) + 0.5*(video2audio)

Code to get started

A Github repository is available here to easily get started with the contest.

Questions
Please use the Github repositories issues section to ask questions about the contest. You can also get in touch with Kranti Kumar Parida for any track specific queries.

Good luck!


Task 2 - Network Quantization

Description

Deploying state-of-the-art DNNs on resource-constrained devices is a challenging task due to their large size and high latency. The task for this challenge is to take pretrained DNNs and quantize them to reduce their size while minimizing the drop in performance, in a data free setting, i.e. when the original training data is no longer available. The availability of the original training dataset in full or subset may not be available for some tasks, such as medical imaging, where privacy is a priority. Hence the data free setting is a challenging and relevant setting.

Training and validation data
The ImageNet ILSVRC 2012 validation set will be used for the task, and can be downloaded from here.
Evaluation metric
The submissions will be judged for high compression and minimal drop in performance. The methods must have a compression ratio of more than 25%.
compression_ratio = 100 * (orignial_model_size - new_model_size) / orignial_model_size
The entries with compression ratio more than 25% will be sorted by the compression ratio first and then by accuracy, i.e. method with higher compression ratio for a given accuracy will win. The precision for accuracy will be 0.1%, i.e. accuracies of 80.13% and 80.14% will be considered the same, and that of 88.15% (which rounds to 88.2%) will be considered higher.
Code to get started

Please use this Github repository to easily get started with the contest.

Questions
Please use the Github repository's issues section to ask questions about the contest. You can also get in touch with Prasen K. Sharma for any track specific queries.

Good luck!

Relevant links

Task 3 - Wildlife Species Detection

Description
The wildlife conservation organizations and governments across the globe have dedicated resources and developed policies to ensure continued biodiversity on our planet. Population monitoring is critical to wildlife conservation. The advancements in the last decade in the computer vision field have shown scope in assisting the conservation efforts as we are able to collect large datasets from camera traps. The aim of this task is to detect species in camera trap images and develop robust systems that can generalize well to different species and across different geographical locations.
Training and validation data
The dataset consists of 20 species, with a total of 11141 images for training and 1586 images for validation. The dataset for the challenge is available here. We have provided the baseline results on a held out dataset from the same distribution. The baseline results are reported using YOLOv5 and Faster-RCNN method. The details for the same can be found in in the respective papers listed below.
  1. Faster r-cnn: Towards real-time object detection with region proposal networks
  2. YOLOv5

Evaluation metric
ClassAverage mAP will be used as the evaluation metric. Each classification example will produce an average precision (AP) score. Averageing AP for all the instances from a particular class will give the mAP for that class. ClassAverage mAP is then obtained by averaging mAP for all the class.
Code to get started

Please use this Github repository to easily get started with the contest.

Questions
Please use the Github repository's issues section to ask questions about the contest. You can also get in touch with Sharat Agarwal for any track specific queries.

Good luck!


Have questions? Connect with us


Organizers

ICVGIP 2021 Contest chairs
Committee members
Volunteers