School of Professional Studies

DV Emotion Net: A Sub-Study on Emotion Detection Model Performance Across P100, T4, and TPU VM V3-8

Document Type

Conference Proceeding

Abstract

This study evaluates the performance of different hardware configurations-GPU P100, GPU T4, and TPU VM v3-8-in the context of emotion detection using the DV EmotionNet framework. Building on prior research that integrates audio and video modalities for emotion recognition, the analysis explores how each hardware setup influences model efficiency and accuracy. Audio features were extracted using techniques such as energy, zero crossing rate, and Mel-Frequency Cepstral Coefficients (MFCC), while video features were obtained through spatial-temporal Gaussian kernels and Gaussian-weighted functions applied to the second momentum matrix. The Multimodal Feature Aggregation (MFA) method was employed to fuse the audio and video features, creating a comprehensive dataset. The evaluation utilized the Fusion of Emotion Recognition Convolutional Neural Network (FERCNN) model, focusing on the impact of accelerators on performance metrics. Recent advancements often face challenges like high computational costs, scalability issues, and sensitivity to noisy data. This study addresses these challenges by systematically evaluating the computational efficiency and accuracy trade-offs across different hardware accelerators. Results from the RAVDESS and CREMAD datasets revealed notable differences in accuracy, with the P100 demonstrating superior performance on simpler tasks, while TPU VM v3-8 excelled in more complex scenarios. These findings highlight the significance of hardware choice in optimizing multimodal emotion recognition systems, reinforcing the critical role of effective computational resources in enhancing applications across various domains, including human-computer interaction, healthcare, and entertainment. © 2025 IEEE.

Publication Title

4th International Conference on Sentiment Analysis and Deep Learning, ICSADL 2025 - Proceedings

Publication Date

2-2025

First Page

826

Last Page

832

ISBN

9798331523923

DOI

10.1109/ICSADL65848.2025.10933197

Keywords

emotion recognition, emotional context understanding, fusion techniques, intelligent systems, multimodal system

Cross Post Location

Student Publications

Share

COinS