Architecture

The architecture is designed around Explorer user interface (UI) integration and document-level media workflow management.

Component Overview

Component Location Role
Agent manifest multimedia/manifest.json Defines the runtime image (node:24.15.0-bullseye) and the install script executed at startup.
Dependency bootstrap multimedia/scripts/install.sh Installs git + ffmpeg in containers or validates their presence in host sandbox mode.
IDE plugin bundles multimedia/IDE-plugins/ Primary functional surface: upload, metadata editing, preview, and compilation actions.
Media utilities multimedia/IDE-plugins/utils/ Shared helpers for blob upload, metadata extraction, and context resolution.
FFmpeg skill multimedia/skills/ffmpegImageToVideo/ Builds MP4 from images (optionally audio), then uploads the result to blob storage.

Execution Positioning

The standard flow is user-interface-first: a plugin is mounted in document/paragraph/chapter host contexts, reads current context data, and then uses document, workspace, or llm modules for operations.

The FFmpeg skill is separated from user interface plugins and runs as a media processing block, allowing reuse in automated flows that require standardized video output.

This agent is not focused on complex LLM orchestration; its value is in specialized IDE media extensions.