Architecture
The architecture is designed around Explorer user interface (UI) integration and document-level media workflow management.
Component Overview
| Component | Location | Role |
|---|---|---|
| Agent manifest | multimedia/manifest.json |
Defines the runtime image (node:24.15.0-bullseye) and the install script executed at startup. |
| Dependency bootstrap | multimedia/scripts/install.sh |
Installs git + ffmpeg in containers or validates their presence in host sandbox mode. |
| IDE plugin bundles | multimedia/IDE-plugins/ |
Primary functional surface: upload, metadata editing, preview, and compilation actions. |
| Media utilities | multimedia/IDE-plugins/utils/ |
Shared helpers for blob upload, metadata extraction, and context resolution. |
| FFmpeg skill | multimedia/skills/ffmpegImageToVideo/ |
Builds MP4 from images (optionally audio), then uploads the result to blob storage. |
Execution Positioning
The standard flow is user-interface-first: a plugin is mounted in document/paragraph/chapter host contexts, reads current context data, and then uses document, workspace, or llm modules for operations.
The FFmpeg skill is separated from user interface plugins and runs as a media processing block, allowing reuse in automated flows that require standardized video output.
This agent is not focused on complex LLM orchestration; its value is in specialized IDE media extensions.