Research

EXPLORATORY VIDEO ANALYTICS (2022)

EVA is a visual data management system (think MySQL for videos). It supports a declarative language similar to SQL and a wide range of commonly used computer vision models. The key idea behind EVA is that simple to moderate analysis on videos should be as easy as writing SQL queries.
See here to know more.

USING BILINEAR CNNs FOR VEHICLE MAKE AND MODEL PREDICTION (2022)

In this project, we have taken up a fine-grained classification of predicting a vehicle’s make and model given an input image of a vehicle using various neural networks. We used VMMRdb as the main dataset source.
We compared the performance of 3 methods. Transfer learning with various backbone models (ResNet18, ResNet50, MobileNetv2), Bilinear CNNs and Vision Transformers. As the number of labels increased, we found that Bilinear CNNs outperformed the other networks in terms of accuracy, as it was able to learn the fine details better.
This project was done as part of CS 7643 (Deep Learning). Please find the code here and the research report here

EMOJI CATEGORY AND POSITION PREDICTION IN TEXT PASSAGES (2022)

Curated a new dataset by scraping and cleaning emoji information along with character and word level index for about 350K tweets.
Implemented a Bi-LSTM network with pre-trained GloVe embeddings for predicting the type and position of an emoji given a text. Achieved 62% accuracy in emoji prediction (modeled as a top-10 clustering problem) and a 78% accuracy in position prediction.
This project was done as part of CS 7650 (Natural Language). Please find the repo here and the research report here

ANOMALOUS CONTENT FROM SURVEILLANCE VIDEOS (2019)

One of the main driving goals behind this project is the high number of false positivies typically associated with naive monitoring systems. For eg. Surveillance cameras in smarthomes send alerts to the user every time it detects motion. We wanted to see if it’s possible to reduce such high false alerts.
We used Facebook’s C3D to extract spatiotemporal features from videos taken from the UCF-Crimes dataset and fed them to a multi-input CNN. Modeling it as a multi-classification problem didn’t give great results due to very limited training set, however, the model was able to sufficiently correlate highly anomalous segments of a video with high regression scores.
This publication was accepted and presented at ICinPro-2019. This being my first research project, helped me learn a lot of interesting things. I also realized the immense complexity/scope in the domain of video understanding and instilled in me a desire to learn more.