Tri-modal Adversarial Attacks on Short Videos for Content Appropriateness Evaluation
We introduce SVMA, an adversarial dataset for content moderation in short-form videos, and ChimeraBreak, a coordinated tri-modal attack strategy that simultaneously challenges visual, auditory, and semantic reasoning pathways in multimodal large language models (MLLMs).
Read Paper