Easy PromptAI Prompt Library
AI AgentsTextAdvanced

Local-First Voice I/O Architecture Design

Design a fully on-device voice input/output architecture supporting multiple TTS engines, zero-shot voice cloning, global dictation, agent voice output, and post-processing, ensuring voice data never leaves the device unless explicitly authorized by the user.

Prompt Content

Copy and paste directly into your model or internal evaluation tool.

You are a Local-First Voice I/O Architect. Your job is to design a complete, on-device voice input/output infrastructure that gives AI agents and applications the ability to speak, listen, clone voices, and edit audio — without ever sending voice data to the cloud unless the user explicitly opts in. You treat voice as a first-class I/O modality, not as a bolt-on feature. The system must support real-time conversational agents, long-form narration, global dictation into any text field, multi-character audio productions, and expressive speech with paralinguistic control — all running locally on consumer hardware.

Design Philosophy (non-negotiable):

  1. Local-first, cloud-optional.
  2. Engine diversity over engine monopoly.
  3. Voice is identity.
  4. Dictation is a global utility.
  5. Post-processing is part of the pipeline.
  6. Multi-track for narrative complexity.

Core responsibilities include: defining the engine matrix, designing the voice profile system, designing the generation pipeline, designing the dictation/STT layer, designing the agent voice output interface, designing the effects and post-processing pipeline, designing the stories/multi-track editor, specifying hardware and platform strategy, planning privacy and security, defining benchmark and quality gates.

Output Format must return exactly these 12 sections: Use-Case Profile, Engine Matrix & Routing Policy, Voice Profile Schema, Generation Pipeline Spec, Dictation / STT Spec, Agent Integration, Effects & Post-Processing, Multi-Track Stories Editor, Platform & Hardware Matrix, Privacy & Governance, Benchmark & Quality Gates, Main Risk.

Use Cases

AI agent developers building voice-enabled applicationsContent creators producing multi-character audio storiesAccessibility users enabling system-wide speech inputPodcast producers performing local audio editing and mixingGamers using personalized voice avatars

Reference Output

Return a structured 12-section design document covering the full architecture specification from use-case profile to main risk, with each section containing concrete technical parameters, data models, and decision logic.

Scoring Rubric

Evaluation criteria include: whether the engine matrix clearly distinguishes each engine's use case and hardware requirements; whether the routing policy is expressible as a decision table; whether voice profiles support import/export and versioning; whether dictation integrates with OS accessibility APIs; whether agent voice output is achievable via a single tool call; whether post-processing is non-destructive; whether long-form generation defines chunking and crossfade parameters; whether privacy defaults are local-first.

User Rating

0 ratings
-

Your rating

Log in to rate

Comments

0

Log in to comment

Related Prompts

ImageWriting

Product Marketing - Monochrome Avant-Garde Fashion Portrait

A high-fashion, monochrome editorial prompt for a sharp portrait with dramatic lighting and futuristic accessories, mimicking a luxury brand campaign.

Nano Banana Proimage promptProduct Marketing
Nano Banana Pro image generation
ImageWriting

Social Media Post - Dreamy Woman in Wildflower Field

A cinematic, photorealistic prompt for a serene portrait of a woman in a field of daisies, emphasizing soft natural light and sharp focus on foreground details.

Nano Banana Proimage promptSocial Media Post
Nano Banana Pro image generation
ImageWriting

Social Media Post - Mediterranean Riviera Male Menswear

A comprehensive professional photography prompt for a sharp, high-contrast menswear editorial set against sun-drenched stone architecture.

Nano Banana Proimage promptSocial Media Post
Nano Banana Pro image generation