MULTIMODAL DETECTION AND PREEMPTION SYSTEM FOR EMERGENCY VEHICLES USING COMPUTER VISION AND ACOUSTIC ANALYSIS

Abstract

Emergency vehicle detection in intelligent traffic systems often suffers from high false positive rates when relying solely on vision-based approaches. To address this limitation, a multimodal framework combining computer vision and acoustic analysis is proposed. The system integrates a YOLOv5-based vision module deployed on Jetson Nano with an ESP32-S3 acoustic verifier extracting Mel-Filterbank Energy (MFE) features, with detections confirmed only when both modalities agree through an AND-gate fusion strategy. Over 11 days, 1,380 timestamped traffic events were recorded under operational conditions. The vision module alone produced 1,321 detections, of which only 421 were true positives, resulting in a preemption accuracy of 31.8%. With multimodal fusion, 61 of 70 true emergency events were successfully detected, raising accuracy to 87.1% while eliminating 900 false positives entirely. No ghost detections were observed, and no emergency events with active sirens were missed by the acoustic verifier, affirming the robustness of the fusion logic. These findings demonstrate that multimodal integration achieves 100% precision, substantially improves reliability, and enables practical edge deployment in intelligent traffic management systems.

Description

Citation

Collections

Endorsement

Review

Supplemented By

Referenced By