Videos – Icelandic Vision Lab

The original version of the following list of visual stimulus sets was compiled by Johanna Margret Sigurdardottir and will be updated as needed. We neither host nor do we provide copies of the stimuli. Reseachers who may wish to use a particular stimulus set should seek further information, including on possible licences, e.g. by following the provided web links, reading the referenced papers, and/or emailing the listed contact person/persons for a particular stimulus set. If you notice an error, know of a stimulus set that should be included, or have any other questions or comments, please contact Heida Maria Sigurdardottir (heidasi(Replace this parenthesis with the @ sign)hi.is). The list is provided as is without any warranty whatsoever.

Table of Contents

Face and Emotion Exeter Database (FAMED)

Description: This database contains 4 versions of the same 2303 video clips. The first version contains the unprocessed videos, the second version is in black and white, the third is blurred, and the fourth version is pixelated. The videos are of 32 male actors from 2 viewpoints while they showed 6 emotions, told 3 jokes and had a short conversation. Each action was performed 3 times, once while actors had nothing on their head, once whilst wearing a swimming cap and once whilst wearing a wig.

License: see website.

Link: http://www.chrislongmore.co.uk/famed/index.html

Reference: Longmore, C. A., & Tree, J. J. (2013). Motion as a cue to face recognition: Evidence from congenital prosopagnosia. Neuropsychologia, 51, 864-875

Face Video Database of the Max Planck Institute for Biological Cybernetics

Description: This set has 246 short videos of sequences taken around the heads of individuals. In the making of the database cameras were arranged around the subjects 1.3 m from each individual with 18 degrees between the cameras. The videos are in MPEG1 format.

License: http://vdb.kyb.tuebingen.mpg.de/license.php

Link: http://vdb.kyb.tuebingen.mpg.de/index.php

References: Image copyright Martin Breidt, MPI for Biological Cybernetics

“The face video database was provided by the Max Planck Institute for Biological Cybernetics in Tuebingen, Germany”.

MOBIO-Mobile Biometry Face and Speech Database

Description: This data set contains audio and video of 152 individuals, thereof 100 men and 52 women. They were collected using NOKIA N93i mobile phone or a 1008 MacBook laptop. The laptop was only partly used in the first session. This set was collected in August 2008 until July 2010 from 5 countries. The videos have both native and non-native English speakers. In the videos the individuals are asked various personal questions and they all give fake answers.

License: Not known.

Link: https://www.idiap.ch/dataset/mobio

Reference: Chris McCool, Sébastien Marcel, Abdenour Hadid, Matti Pietikäinen, Pavel Matějka, Jan Černocký, Norman Poh, Josef Kittler, Anthony Larcher, Christophe Lévy, Driss Matrouf, Jean-François Bonastre, Phil Tresadern, and Timothy Cootes, “Bi-Modal Person Recognition on a Mobile Phone: using mobile phone data”, in IEEE ICME Workshop on Hot Topics in Mobile Mutlimedia, 2012.

PA-HMDB51

Description: “PA-HMDB51 is the very first human action video dataset with both privacy attributes and action labels provided. The dataset contains 592 videos selected from HMDB51 [1], each provided with frame-level annotation of five privacy attributes. We evaluated the visual privacy algorithms proposed in [3] on PA-HMDB51.” (see link).

License: https://github.com/htwang14/PA-HMDB51/blob/master/LICENSE

Link: https://github.com/htwang14/PA-HMDB51

Reference: https://arxiv.org/abs/1906.05675

VidTIMIT Audio-Video dataset

Description: This database includes video and audio recordings of 43 individuals speaking short pre-decided sentences in an office environment. The audio is stored in mono, 16 bit, 32 kHz WAV file. The videos are restored as a sequence3 of JPEG images with resolution of 512×384 pixels. Each person had to speak 10 sentences while being recorded. The sentences were chosen from TIMIT corpus test section. Individuals were recorded while they moved their head to the left, right, back to the center, up, then down and returned to the center.

License: See website.

Link: http://conradsanderson.id.au/vidtimit/

Reference: C. Sanderson and B.C. Lovell Multi-Region Probabilistic Histograms for Robust and Scalable Identity Inference. Lecture Notes in Computer Science (LNCS), 5558, 199-208, 2009.

YouTube Video Text

Description: This database contains 30 videos obtained from YouTube. The videos are all 15 seconds, 30 frames per second, and HD 720p quality. The database is split in two; one has captions, song title, logos but the other has scene text like street signs and words on a shirt.

License: Not known.

Link: http://vision.ucsd.edu/content/youtube-video-text

Reference: Not known.

YouTube Face DB

Description: This dataset consists of 3.425 videos of 1.595 different individuals. The average video is 2 minutes and 15 seconds. There are 591 videos of only one person, 471 videos of 2 individuals, 307 videos of 3 persons, 167 of 4 individuals, 51 include 5 individuals but 6 videos contain 8 persons. Every single video frame is encoded using face-image descriptors. The videos also contain labels which indicate the identities of the people appearing in each video.

License: see website.

Link: http://www.cs.tau.ac.il/~wolf/ytfaces/

Reference: Lior Wolf, Tal Hassner and Itay Maoz
Face Recognition in Unconstrained Videos with Matched Background Similarity.IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2011.

VB100 Bird Dataset

Description: This database has 1416 videos, the medium length is 32 seconds, of 100 bird species taken by bird experts. Birds were recorded in their natural habitat from a distance, with different scales, poses, background, and camera movement. 798 videos where taken with a moving camera but in the rest the cameras were still.

License: Not known.

Link: http://arma.sourceforge.net/vb100/

Reference: ZongYuan Ge, Chris McCool, Conrad Sanderson, Peng Wang, Lingqiao Liu, Ian Reid, Peter Corke. Exploiting Temporal Information for DCNN-based Fine-Grained Object Classification. arXiv preprint 1608.00486, 2016.