ChimeraBreak

Tri-modal Adversarial Attacks on Short Videos for Content Appropriateness Evaluation

Sahid Hossain Mustakim*, S M Jishanul Islam*, et al.

ICCV 2025 Workshop SVU Workshop (Long Paper)

We introduce SVMA, an adversarial dataset for content moderation in short-form videos, and ChimeraBreak, a coordinated tri-modal attack strategy that simultaneously challenges visual, auditory, and semantic reasoning pathways in multimodal large language models (MLLMs).

Read Paper
MIMIC

MIMIC: Multimodal Islamophobic Meme Identification and Classification

S M Jishanul Islam*, Sahid Hossain Mustakim*, et al.

NeurIPS 2024 3rd Muslims in ML Workshop

We present a novel dataset and propose a classifier based on the Vision-and-Language Transformer (ViLT) specifically tailored to identify anti-Muslim hate within memes by integrating both visual and textual representations.

Read Paper
Emotion Recognition

An audio video-based multi-modal fusion approach for speech emotion recognition

S M Jishanul Islam, Sahid Hossain Mustakim, et al.

Elsevier (IF: 7.6) Knowledge-Based Systems

We present an approach to classify human emotions by fusing audio and visual inputs. Our approach sets new state-of-the-art results while keeping the architecture simple. Additionally, we present a new frame filtering strategy to overcome the problem of spatiotemporal redundancy.

Under 2nd Revision
IPBlocks

IPBlocks: A Blockchain Ecosystem for Secure IP Registration and Decentralized Marketplace

Sadia Ahmmed, Sahid Hossain Mustakim, et al.

TENCON 2025 2025 IEEE Region 10 Conference

A blockchain-based solution that streamlines IP applications, trading, and royalty transfers through a decentralized marketplace. We develop algorithms that allow users to apply for, publish, auction, and transfer IPs with enhanced security and transparency.

View Code