Conferences
https://doi.org/10.1145/3708557.3716343
Nonverbal cues are essential for natural communication, yet voice assistants often lack such features, limiting their effectiveness in social contexts like emotion journaling. This study investigates the impact of different visual backchannel feedback patterns on conversational flow and user perception with an icon-based voice assistant. Using a within-subject design, participants experienced three feedback conditions: regular, synchronized, and randomized. Quantitative results showed that minimal differences in visual patterns significantly influenced user perceptions, with synchronized and randomized feedback generally outperforming regular feedback. Qualitative findings emphasized individual differences, highlighting the need for customizable feedback to address diverse user needs. This study contributes to the design of inclusive, user-centered voice assistant interfaces for social applications.
https://doi.org/10.1145/3711112
In this paper, we propose PracticeDAPR, an AI-based education-supported system for beginners in DAPR assessment practice. As professional identity is considered a pivotal goal in art therapy education, it is important to help beginners not to experience difficulties in professional identity development. Therefore, we designed the proposed system to provide the following three factors, which are closely associated with the professional identity formation of beginners in art therapy: (i) performance improvement, (ii) anxiety reduction, and (iii) self-efficacy enhancement. To this end, we adopt online peer-to-peer learning as the foundational learning approach. In addition, by introducing AI as a mentor, we let users not only interact with their peers but also experience AI assistance. The user study targeting graduate students in art therapy was conducted with both quantitative and qualitative methods. In general, users reported positive experiences with PracticeDAPR. The results of the structural equation model analysis showed that perceived usefulness is an important contributor to the three factors, highlighting the effectiveness of online peer-to-peer learning with the AI mentor. Furthermore, by deriving the results that intention to use can be promoted by performance improvement, it is demonstrated that PracticeDAPR can consistently help the development of the professional identity, which is not easily established in a short period of time. Discussion and implications are provided in relation to using AI and online peer-to-peer learning to support current art therapy education.
https://doi.org/10.1145/3708359.3712081
Psychological counseling, especially for children, heavily relies on capturing both verbal and non-verbal cues to understand and support each child’s emotional and developmental needs. Therefore, creating a detailed and accurate transcription for a child’s counseling session is crucial but often labor-intensive and time-consuming, which makes it challenging to maintain the consistency of counseling quality. Despite advancements in AI, current session analysis practices rely primarily on manual clinical assessments and struggle to accurately capture children’s verbal and non-verbal expressions. To address these challenges, we propose an AI-based expert support system designed to enhance child counseling analysis. The system comprises two key components: (i) a transcription generation model and (ii) an editable dashboard. The transcription generation model extracts verbal expressions from both children and counselors, verifies speakers’ identities, and objectively captures non-verbal cues using a Multimodal Large Language Model. The editable dashboard facilitates Counselor & AI collaboration, where AI reduces human bias by providing objectivity, and counselors mitigate the risk of over-reliance on AI while maintaining oversight. This collaboration ultimately enhances workflow efficiency and leads to accurate counseling analyses. An evaluation with 48 child counselors demonstrates the system’s superior effectiveness and usability compared to existing services, with a majority expressing a strong intent to continue using our system. The system not only improves transcription accuracy but also supports more precise analysis of counseling sessions, enabling counselors to focus more on therapeutic engagements. These findings highlight the system’s potential to reduce the workload of child counselors, improve the quality of counseling services, and provide valuable resources for both individual counseling and counselor training. To the best of our knowledge, our study is the first to propose an AI-based expert support system optimized for generating transcriptions for child counseling analysis.
본 논문은 제철소에서 용선 온도를 예측하기 위해 풍구 및 출선구의 이미지 데이터를 센서 데이터와 결합하여 분석 및 모델링한 연구를 다룬다. 기존의 관련 연구들은 센서 데이터만을 활용한 반면, 본 연구는 센서 데이터와 풍구 및 출선구 이미지 데이터를 함께 사용하여 용선 온도 예측에 미치는 영향을 분석하고자 한다. 실험 결과, 풍구 및 출선구 이미지 데이터의 활용이 예측 성능 향상에 도움을 주었으며, 이는 제철 공정에서 품질 관리와 효율성 향상에 기여할 수 있음을 시사한다.
10.1109/WACV61041.2025.00867
Precise retina Optical Coherence Tomography (OCT) image classification and segmentation are important for di-agnosing various retinal diseases and identifying specific regions. Alongside comprehensive lesion identification, re-ducing the predictive uncertainty of models is crucial for improving reliability in clinical retinal practice. However, existing methods have primarily focused on a limited set of regions identified in OCT images and have often faced challenges due to aleatoric and epistemic uncertainty. To address these issues, we propose CAMEL (Confidence-Aware Multi-task Ensemble Learning), a novel frame-work designed to reduce task-specific uncertainty in multi-task learning. CAMEL achieves this by estimating model confidence at both pixel and image levels and leveraging confidence-aware ensemble learning to minimize the un-certainty inherent in single-model predictions. CAMEL demonstrates state-of-the-art performance on a compre-hensive retinal OCT image dataset containing annotations for nine distinct retinal regions and nine retinal diseases. Furthermore, extensive experiments highlight the clini-cal utility of CAMEL, especially in scenarios with mini-mal regions, significant class imbalances, and diverse re-gions and diseases.
본 연구는 모바일 환경에서 제한된 하드웨어 성능을 감안하여, 기존의 정지 영상 기반 객체 탐지 및 추적 기법을 실시간 동영상 처리로 확장하는 방법을 제시한다. 이를 위해 YOLOv5를 경량화하고 DeepSORT 알고리즘을 통합하여, Jetson Nano와 같은 모바일 GPU 환경에서 실시간 다중 객체 추적이 가능하도록 구현하였다. 또한, 모델 구조 최적화와 Knowledge Distillation 기법을 적용하여 추론 속도를 개선하면서도 성능 저하를 최소화하는 방법을 제안한다. 드론 영상에서 발생할 수 있는 객체 크기 축소와 ID 스위칭 문제를 해결하기 위해, 저신뢰 탐지 기반의 새로운 추적 전략을 도입하였다. 제안된 알고리즘은 다양한 실험 환경에서 기존 YOLO 기반 모델들과 비교하여 실시간 다중 객체 추적 성능이 안정적임을 확인하였다. 이 결과는 UAV 영상 처리와 모바일 기반 실시간 비전 시스템의 성능 향상에 기여할 수 있음을 입증한다.
본 연구는 3D Gaussian Splatting 기반의 실시간 자유 시점 영상(Free Viewpoint Video) 재구성 시스템에서 발생하는 모션 블러 및 고스트 현상을 완화하기 위해, 전경-배경 분리와 이원화된 NTC 학습 구조인 SplitStream을 제안한다. 초기 프레임에서 2D 마스킹과 Space Carving 기법으로 전경과 배경을 3D 공간에서 분리한 후, 각 영역에 대해 독립적인 NTC(Neural Transformation Cache) 학습을 수행한다. 전경은 빠른 움직임에 유연하게 대응하기 위해 주로 변환만 학습하고, 배경은 정적 특성을 반영해 추가 Gaussian 학습과 pruning을 병행한다. 이러한 이원화된 학습 구조는 불필요한 배경 변형과 모션 블러를 줄이고, 연산 자원 효율성을 높인다. N3DV 데이터셋을 활용한 실험에서, 제안된 모델은 기존 대비 PSNR 및 시각적 품질에서 우수한 성능을 보였으며, 특히 동적 장면에서 전경과 배경 간 경계가 명확하게 유지되었다.
본 논문은 도메인 적대적 신경망(DANN)을 활용해 서양인 위주의 데이터셋으로 학습된 FER 모델이 동양인 얼굴에서 성능이 저하되는 문제를 해결하고자 한다. 감정 분류기와 도메인 분류기를 동시에 훈련시키되, 감정 분류 정확도는 극대화하고 도메인 분류 정확도는 최소화하도록 최적화하여 인종 독립적인 특징 표현을 학습한다. 제안된 방법의 유효성은 Tsinghua 및 JAFFE 동양인 데이터셋을 통합한 실험을 통해 검증되었으며, 단일 도메인 학습 모델 대비 교차 문화적 감정 인식 정확도가 유의미하게 향상되었음을 확인했다. 특히 동양인 얼굴에 대한 인식 성능이 크게 개선되었다. 본 연구는 FER 분야에서 데이터 편향 문제를 극복하고 보다 일반화된 모델 개발에 기여한다
10.1109/ICIP55913.2025.11084465
In the United States, as of 2023, pet ownership has reached 66% of households and continues to rise annually. This trend underscores the critical need for effective pet identification and monitoring methods, particularly as nearly 10 million cats and dogs are reported stolen or lost each year. However, traditional methods for finding lost animals like GPS tags or ID photos have limitations-they can be removed, face signal issues, and depend on someone finding and reporting the pet. To address these limitations, we introduce PawPrint and PawPrint+, the first publicly available datasets focused on individual-level footprint identification for dogs and cats. Through comprehensive benchmarking of both modern deep neural networks (e.g., CNN, Transformers) and classical local features, we observe varying advantages and drawbacks depending on substrate complexity and data availability. These insights suggest future directions for combining learned global representations with local descriptors to enhance reliability across diverse, real-world conditions. As this approach provides a non-invasive alternative to traditional ID tags, we anticipate promising applications in ethical pet management and wildlife conservation efforts.
10.1109/ICIP55913.2025.11084407
This paper addresses the problem of anticipating traffic accidents, which aims to forecast potential accidents before they happen. Real-time anticipation is crucial for safe autonomous driving, yet most methods rely on computationally heavy modules like optical flow and intermediate feature extractors, making real-world deployment challenging. In this paper, we thus introduce RARE (Real-time Accident anticipation with Reused Embeddings), a lightweight framework that capitalizes on intermediate features from a single pre-trained object detector. By eliminating additional feature-extraction pipelines, RARE significantly reduces latency. Furthermore, we introduce a novel Attention Score Ranking Loss, which prioritizes higher attention on accident-related objects over non-relevant ones. This loss enhances both accuracy and interpretability. RARE demonstrates a 4-8× speedup over existing approaches on the DAD and CCD benchmarks, achieving a latency of 13.6ms per frame (73.3FPS) on an RTX 6000. Moreover, despite its reduced complexity, it attains state-of-the-art Average Precision and reliably anticipates imminent collisions in real time. These results highlight RARE’s potential for safety-critical applications where timely and explainable anticipation is essential.
10.1109/WACV61041.2025.00280
Recent advancements in computer vision have led to a renewed interest in developing assistive technologies for individuals with visual impairments. Although extensive research has been conducted in the field of computer vision-based assistive technologies, most of the focus has been on understanding contexts in images, rather than addressing their physical safety and security concerns. To address this challenge, we propose the first step towards detecting anomalous situations for visually impaired people by observing their entire surroundings using an egocentric 360-degree camera. We first introduce a novel egocentric 360-degree video dataset called VIEW360 (Visually Impaired Equipped with Wearable 360-degree camera), which contains abnormal activities that visually impaired individuals may encounter, such as shoulder surfing and pickpocketing. Furthermore, we propose a new architecture called the FDPN (Frame and Direction Prediction Network), which facilitates frame-level prediction of abnormal events and identifying of their directions. Finally, we evaluate our approach on our VIEW360 dataset and the publicly available UCF-Crime and Shanghaitech datasets, demonstrating state-of-the-art performance.
본 연구는 모델 병합 시 발생하는 작업 간 간섭 문제를 해결하기 위해 병합 계수 학습 과정에 L1 정규화 기반 희소성을 도입하고, 병합 계수를 작업별로 최적화하는 기법을 제안함.
제안 기법은 다양한 데이터세트에서 기존 방식보다 우수한 성능을 보였으며, 희소성이 간섭을 줄이고 모델의 일반화 성능을 개선하는 데 중요한 역할을 함을 입증함.
https://doi.org/10.48550/arXiv.2506.03171
EdgeVidSum is a lightweight method that generates personalized, fast-forward summaries of long-form videos directly on edge devices. The proposed approach enables real-time video summarization while safeguarding user privacy through local data processing using innovative thumbnail-based techniques and efficient neural architectures. Unlike conventional methods that process entire videos frame by frame, the proposed method uses thumbnail containers to significantly reduce computational complexity without sacrificing semantic relevance. The framework employs a hierarchical analysis approach, where a lightweight 2D CNN model identifies user-preferred content from thumbnails and generates timestamps to create fast-forward summaries. Our interactive demo highlights the system's ability to create tailored video summaries for long-form videos, such as movies, sports events, and TV shows, based on individual user preferences. The entire computation occurs seamlessly on resource-constrained devices like Jetson Nano, demonstrating how EdgeVidSum addresses the critical challenges of computational efficiency, personalization, and privacy in modern video consumption environments.
10.48340/ecscw2025_ep01
The increasing use of online study rooms raises critical questions about the dynamics and challenges of virtual learning environments. This paper explores students' experiences in remote study settings through semi-structured interviews with 13 university students. We explore the factors influencing their participation, the obstacles they face before, during, and after study sessions, and the role of social interaction in sustaining motivation. Our findings highlight the significance of a perceived sense of responsibility--enhanced by camera usage--in maintaining concentration and engagement. Moreover, feelings of intimacy and belonging, particularly when studying with close peers, play a significant role in motivation and focus. Students also report challenges such as coordinating schedules and managing distractions in group study sessions. We propose design implications for enhancing online study environments based on these insights. We emphasize fostering a stronger sense of community, minimizing distractions, and facilitating effective collaboration. Our contributions inform the design of more inclusive and engaging virtual study platforms with broader implications for learning communities and online collaboration tools.
10.18653/v1/2024.emnlp-main.994
As the explainability of mental disorder detection models has become important, symptom-based methods that predict disorders from identified symptoms have been widely utilized. However, since these approaches focused on the presence of symptoms, the context of symptoms can be often ignored, leading to missing important contextual information related to detecting mental disorders. Furthermore, the result of disorder detection can be vulnerable to errors that may occur in identifying symptoms. To address these issues, we propose a novel framework that detects mental disorders by leveraging symptoms and their context while mitigating potential errors in symptom identification. In this way, we propose to use large language models to effectively extract contextual information and introduce an uncertainty-aware decision fusion network that combines predictions of multiple models based on quantified uncertainty values. To evaluate the proposed method, we constructed a new Korean mental health dataset annotated by experts, named KoMOS. Experimental results demonstrate that the proposed model accurately detects mental disorders even in situations where symptom information is incomplete.
10.18653/v1/2024.emnlp-industry.49
This study introduces a Multidisciplinary chILDhood cancer survivor question-answering (MILD) bot designed to support childhood cancer survivors facing diverse challenges in their survivorship journey. In South Korea, a shortage of experts equipped to address these unique concerns comprehensively leaves survivors with limited access to reliable information. To bridge this gap, our MILD bot employs a dual-component model featuring an intent classifier and a semantic textual similarity model. The intent classifier first analyzes the user’s query to identify the underlying intent and match it with the most suitable expert who can provide advice. Then, the semantic textual similarity model identifies questions in a predefined dataset that closely align with the user’s query, ensuring the delivery of relevant responses. This proposed framework shows significant promise in offering timely, accurate, and high-quality information, effectively addressing a critical need for support among childhood cancer survivors.
딥페이크 기술의 발전으로 인한 사회적 문제, 특히 딥페이크 포르노 콘텐츠의 확산이 심각한 이슈로 부각되고 있다. 기존의 콘텐츠 확산 예측 연구는 주로 텍스트 기반의 가짜 뉴스에 초점을 맞추고 있어, 시청각적 특성을 가진 딥페이크 콘텐츠의 확산 예측에는 한계가 있다. 본 연구는 딥페이크 콘텐츠의 전파 양상을 분석하고 예측하기 위해, 딥페이크 포르노 콘텐츠를 수집하고 이를 바탕으로 전파 트리를 모델링하였다. 네트워크 분석을 위해 위너 인덱스를 사용하여 딥페이크 콘텐츠의 확산 구조를 정량화하고, CNN 기반의 예측 모델을 통해 확산 가능성이 높은 콘텐츠를 식별하였다. 실험 결과, 위너 인덱스 상위 20%에 해당하는 딥페이크 콘텐츠를 효과적으로 예측할 수 있었으며, 이는 딥페이크 콘텐츠 확산의 조기 대응을 위한 중요한 기준을 제공한다
본 연구는 화재로 인한 인명 피해를 최소화하기 위해 AI 기술을 반영한 소방 훈련 시뮬레이션을 개발하며, 인간의 안전을 최우선으로 고려한 새로운 소방 훈련 가이드라인을 제시함.
특히, 인공지능 기반의 시뮬레이션을 통해 소방 훈련의 효과를 극대화하고 실질적인 현장 대응 능력을 강화함으로써, AI 기술이 공공 안전 분야에서 신뢰성 있고 실용적으로 활용될 가능성을 입증함.
본 연구는 인공지능 에이전트와의 상호작용에서 신뢰감을 향상시키는 비언어적 요인을 검증하며, AI 기술이 정신 건강 및 사회적 상호작용 맥락에서 효과적으로 활용될 가능성을 탐구함.
또한, 인간-AI 인터랙션 큐 설계 방법을 제안하고, 사용자 경험 평가 및 분석을 통해 인공지능 기반 서비스 설계를 위한 실질적이고 신뢰성 있는 인터페이스 가이드라인을 도출함으로써, AI 기술의 실용성과 윤리적 활용 가능성을 동시에 제시함.
본 연구는 AI 에이전트를 활용하여 언어 학습에서 학습자의 불안감과 인지적 부담을 줄이는 효과적인 피드백 방식을 제안하며, 학습자의 심리적 안정을 지원하는 AI 기반 교육 기술의 가능성을 탐구함.
특히, 편안하고 지지받는 학습 환경을 조성하는 대화형 에이전트 설계 방안을 제시하고, 피드백 방식과 출처의 조합이 영어 학습 만족도에 미치는 영향을 실증적으로 확인함으로써 AI 기술의 학습 효과 증진 및 신뢰성 있는 활용 방안을 제시함."