ICVGIP 2021 Contests ^(EXTENDED)

Welcome to the ICVGIP 2021 contests. This year ICVGIP is hosting three competitions, with the aim to stimulate the students to tackle large scale applied learning problems.

The contest is open to full time students registered with an institute anywhere in the world. The students can participate either individually or in teams of maximum 5 members.

Note: Non-student researchers can participate in this contest, but will not be eligible for the prizes. Hence, they will be evaluated separately.

Prizes (Each Track)

First Prize	INR 7500
Second Prize	INR 3500
Third Prize	INR 2000

Sponsored by Infosys Center for Artificial Intelligence, IIIT-Delhi and TensorTour

^NEW Teams active on eval.ai leaderboard will be eligible for reimbursement of Colab Pro subscription for up to 2 months
^NEW All active participants will get participation certificates

Note: The organizers reserve the right to call off the contest if there are not sufficient teams

Important dates

04 Oct., 2021	Registration opens
04 Oct., 2021	Training and validation data released
22 Nov., 2021	Seminar: Testing the model on eval.ai
25 Nov., 2021	Seminar: QnA Session: Problem Statement and GPU Support
10 Dec., 2021	Test phase start
16 Dec., 2021	Final submissions close
~~19 Dec., 2021~~	~~Report submission deadline~~
~~19 Dec., 2021~~	~~Contest workshop, top contenders give a talk~~
15 Jan., 2022	Extended-test phase start ^NEW
21 Jan., 2022	Final submissions close ^NEW
23 Jan., 2022	Report submission deadline ^NEW

Participate

Audio-Visual Retrieval
Network Quantization
Wildlife Species Detection

Note: The prize will be distributed to only those winners who submit a small report describing their approach and analysis before 23^rd Jan., 2022
The winners of each track will be on a joint report with organizers, to be posted on arXiv. The report submission deadline is thus firm. Kindly be prepared for it.

Task 1 - Audio-Visual Retrieval

Description

The task is to learn a method which takes and audio (video) as a query and returns the relevant videos (audios) among a big set of gallery examples.

Given a query example in one modality (audio/video) the task is to retrieve relevent examples in the other modality (video/audio). For every data point, audio and video data are available, along with class level annotations. The class name can also be considered as a third modality, i.e. text. The retrieval examples are considered correct if they are semantically similar to query, i.e. they share same class label as the query.

Dataset

AudiosetZSL dataset will be used for the task. This dataset was oringinally proposed for the task of zero-shot classification and retrieval of videos and was curated from a large dataset, Audioset.

For this challenge, only the seen classes from the dataset will be considered. It contains a total of 79,795 training examples and 26,587 validation example. Out of the total 26,593 testing examples, a subset will be used for the final evaluation. We have provided the features for both audio and video modalities, extracted using pre-trained networks. For a fair camparison it is mandatory for everyone to use the features provided. More details about the dataset and task can be found in the papers below.

Evaluation metric

ClassAverage mAP will be used as the evaluation metric. Each retrieval example will produce an average precision (AP) score. Averaging AP for all the query from a particular class will give the mAP for that class. ClassAverage mAP is then obtained by averaging mAP for all the class. ClassAverage Map can be calculated for both audio to video and video to audio retrieval. The final score will be the average of both of them.

Final mAP = 0.5*(audio2video) + 0.5*(video2audio)

Code to get started

A Github repository is available here to easily get started with the contest.

Questions

Please use the Github repositories issues section to ask questions about the contest. You can also get in touch with Kranti Kumar Parida for any track specific queries.

Good luck!

Task 2 - Network Quantization

Description

Deploying state-of-the-art DNNs on resource-constrained devices is a challenging task due to their large size and high latency. The task for this challenge is to take pretrained DNNs and quantize them to reduce their size while minimizing the drop in performance, in a data free setting, i.e. when the original training data is no longer available. The availability of the original training dataset in full or subset may not be available for some tasks, such as medical imaging, where privacy is a priority. Hence the data free setting is a challenging and relevant setting.

Training and validation data

The ImageNet ILSVRC 2012 validation set will be used for the task, and can be downloaded from here.

Evaluation metric

The submissions will be judged for high compression and minimal drop in performance. The methods must have a compression ratio of more than 25%.

                compression_ratio = 100 * (orignial_model_size - new_model_size) / orignial_model_size
            

The entries with compression ratio more than 25% will be sorted by the compression ratio first and then by accuracy, i.e. method with higher compression ratio for a given accuracy will win. The precision for accuracy will be 0.1%, i.e. accuracies of 80.13% and 80.14% will be considered the same, and that of 88.15% (which rounds to 88.2%) will be considered higher.

Code to get started

Please use this Github repository to easily get started with the contest.

Questions

Please use the Github repository's issues section to ask questions about the contest. You can also get in touch with Prasen K. Sharma for any track specific queries.

Good luck!

Relevant links

Krishnamoorthi, Quantizing deep convolutional networks for efficient inference: A whitepaper, arXiv 2018

Task 3 - Wildlife Species Detection

Description

The wildlife conservation organizations and governments across the globe have dedicated resources and developed policies to ensure continued biodiversity on our planet. Population monitoring is critical to wildlife conservation. The advancements in the last decade in the computer vision field have shown scope in assisting the conservation efforts as we are able to collect large datasets from camera traps. The aim of this task is to detect species in camera trap images and develop robust systems that can generalize well to different species and across different geographical locations.

Training and validation data

The dataset consists of 20 species, with a total of 11141 images for training and 1586 images for validation. The dataset for the challenge is available here. We have provided the baseline results on a held out dataset from the same distribution. The baseline results are reported using YOLOv5 and Faster-RCNN method. The details for the same can be found in in the respective papers listed below.

Evaluation metric

ClassAverage mAP will be used as the evaluation metric. Each classification example will produce an average precision (AP) score. Averageing AP for all the instances from a particular class will give the mAP for that class. ClassAverage mAP is then obtained by averaging mAP for all the class.

Code to get started

Please use this Github repository to easily get started with the contest.

Questions

Please use the Github repository's issues section to ask questions about the contest. You can also get in touch with Sharat Agarwal for any track specific queries.

Good luck!