2025 Fall

Visual Augmented Tutorial Agent for Software Tutorial

DOI

10.5703/1288284318544

Description

Many existing software tutorials lack clear interaction cues (clicks, drags, shortcuts), causing learners to frequently rewind and lose flow. To address this, we propose a system that infers user actions by analyzing frame-to-frame changes using a multimodal LLM, then overlays predefined visual indicators, such as click ripples, drag trails, and shortcut labels, at precise cursor positions. To ensure reliable task classification and avoid hallucinations, we integrate RAG, grounding the model with official software documentation. This approach aims to enhance older tutorials with accurate, actionable interaction feedback, improving clarity and learning efficiency.

Download

COinS

Visual Augmented Tutorial Agent for Software Tutorial

2025 Fall

Visual Augmented Tutorial Agent for Software Tutorial

DOI

Description

Search

Links

Links for Authors

Browse

2025 Fall

Visual Augmented Tutorial Agent for Software Tutorial

Presenter Information

DOI

Description

Share

Search

Links

Links for Authors

Browse