
JCTAM OPEN ACCESS
Journal of Computer Technology and Applied Mathematics
ISSN:3007-4126 (print) | ISSN:3007-4134 (online) | Publication Frequency: Bimonthly
Enhancing Video Conferencing Experience through Speech Activity Detection and Lip Synchronization with Deep Learning Models
* Corresponding Author1: Weikun Lin, E-Mail: welton.lin2233@gmail.com
Publication
Accepted 2025 March 3 ; Published 2025 March 1
Journal of Computer Technology and Applied Mathematics, 2025, 2(2), 3007-4126.
Abstract
As video conferencing becomes increasingly integral to modern communication, the need for high-quality synchronization between speech and visual elements is paramount. Speech Activity Detection (VAD) and lip synchronization technologies play crucial roles in ensuring accurate, real-time communication by distinguishing speech signals from noise and aligning lip movements with audio. This paper proposes a novel multimodal fusion approach based on deep learning models that significantly improves the accuracy of speech activity detection and the real-time performance of lip synchronization. Using open datasets such as AVSpeech and LRW, this study showcases the effectiveness of the proposed models in various real-world scenarios, such as multi-party conferences, noisy environments, and cross-lingual settings. Experimental results demonstrate that the LSTM-based VAD model achieves an accuracy of 92%, outperforming traditional methods, while the lip synchronization module ensures seamless audio-visual alignment with minimal delay.
Keywords
Speech Activity Detection , Lip Synchronization , Deep Learning , Video Conferencing , Video Conferencing , Multimodal Fusion , Dynamic Time Warping , User Experience , Real-Time Communication .
Metadata
Pages: 16-23
References: 27
Disciplines: Artificial Intelligence and Intelligence
Subjects: Speech Recognition
Cite This Article
APA Style
Lin, W. (2025). Enhancing video conferencing experience through speech activity detection and lip synchronization with deep learning models. Journal of Computer Technology and Applied Mathematics, 2(2), 16-23. https://doi.org/10.70393/6a6374616d.323637
Acknowledgments
The authors thank the editor and anonymous reviewers for their helpful comments and valuable suggestions.
FUNDING
Not applicable.
INSTITUTIONAL REVIEW BOARD STATEMENT
Not applicable.
DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author.
INFORMED CONSENT STATEMENT
Not applicable.
CONFLICT OF INTEREST
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
AUTHOR CONTRIBUTIONS
Not applicable.
References
PUBLISHER'S NOTE
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
Copyright © 2025 The Author(s). Published by Southern United Academy of Sciences.This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Persistent Identifiers





Abstracting and Indexing




Quality Assurance


Archiving Services
t



