Computer Science
Document Type
Conference Paper
Abstract
Despite the growing interest in leveraging Large Language Models (LLMs) for content analysis, current studies have primarily focused on text-based content. In the present work, we explored the potential of LLMs in assisting video content analysis by conducting a case study that followed a new workflow of LLM-assisted multimodal content analysis. The workflow encompasses codebook design, prompt engineering, LLM processing, and human evaluation. We strategically crafted annotation prompts to get LLM Annotations in structured form and explanation prompts to generate LLM Explanations for a better understanding of LLM reasoning and transparency. To test LLM's video annotation capabilities, we analyzed 203 keyframes extracted from 25 YouTube short videos about depression. We compared the LLM Annotations with those of two human coders and found that LLM has higher accuracy in object and activity Annotations than emotion and genre Annotations. Moreover, we identified the potential and limitations of LLM's capabilities in annotating videos. Based on the findings, we explore opportunities and challenges for future research and improvements to the workflow. We also discuss ethical concerns surrounding future studies based on LLM-assisted video analysis.
Publication Title
Proceedings of the ACM Conference on Computer Supported Cooperative Work, CSCW
Publication Date
11-13-2024
First Page
190
Last Page
196
ISBN
9798400711145
DOI
10.1145/3678884.3681850
Keywords
images, large language model, large language-and-vision assistant (llava), mental health, multimodal information, user generated content, visual content
Repository Citation
Liu, Jiaying (Lizzy); Wang, Yunlong; Lyu, Yao; Su, Yiheng; Niu, Shuo; Xu, Orson Xuhai; and Zheng, Yan, "Harnessing LLMs for Automated Video Content Analysis: An Exploratory Workflow of Short Videos on Depression" (2024). Computer Science. 239.
https://commons.clarku.edu/faculty_computer_sciences/239
APA Citation
Liu, J., Wang, Y., Lyu, Y., Su, Y., Niu, S., Xu, X. O., & Zhang, Y. (2024, November). Harnessing LLMs for Automated Video Content Analysis: An Exploratory Workflow of Short Videos on Depression. In Companion Publication of the 2024 Conference on Computer-Supported Cooperative Work and Social Computing (pp. 190-196).
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.