Research

Intention-Based
Interface

Our research question: can menu structures and keyboard shortcuts be replaced by pure intent — semantically captured, AI-mediated, locally executed — and to what extent is this substitution defensible?

Open the engineering logbookInsights into the ongoing PapoNox research process — methodological notes, discovery findings, crash analyses.

Three research pillars

Research matrix

Chat input, AI vision and voice input — three modalities we investigate in parallel and at different levels of maturity.

Under research

Chat Mode

The Brain

Text-based dialogue between user and DAW. Research question: how reliably can musical intent be extracted from free-form German language — and what systematic limits does this approach face when confronted with domain-specific terminology?

Designed for screen reader and Braille display users

Under research

AI Vision

The Eyes

Screenshot analysis, OCR and AI-assisted image understanding as a methodological approach to address UI elements beyond official interfaces. Does the three-tier grounding cascade — accessibility API → OCR bounding boxes → vision-language model — work reliably enough for productive use? An open question.

Open question: how precisely can the vision layer reproducibly identify UI elements?

Under research

Long-term question

Voice Control

The Voice

Speech-to-text control as a possible extension of the input modality. Whether and when this modality can be meaningfully investigated depends on funding outcomes and on findings from the chat-mode research.

Future perspective

Not part of the current funding phase. Continued funding and methodological groundwork required.

paponox_demo.sh — prototype

Proof of Concept

$ paponox --status

What the prototype currently demonstrates in a documented form (Phase 0)

# Navigation

→ "where am I" — Position + track state (initial feasibility)

→ "go to bar 5" — Direct positional addressing (initial feasibility)

# Transport

→ play / stop / record

→ Define cycle range — e.g. bars 5–15 (initial feasibility)

# Mixing

→ mute / solo / volume / pan — with auditory feedback (initial feasibility)

→ "what do you see" / "click on X"

# Project & modal dialogs

→ save / save as — with dialog automation (initial feasibility)

→ import audio / bounce — multi-stage dialog sequence (initial feasibility)

$phase-0—Initial proof-of-concept findings documented

Anything beyond this falls within the open validation research

Architecture

Methodological principles

How we approach the research question operationally — four principles already empirically implemented in the prototype.

On-device inference

Local & free

We use Apple's FoundationModels as an on-device language model — executed entirely locally. Research premise: no cloud connectivity, no data sharing, no recurring inference costs.

Process injection

Direct access investigated

Direct integration into the DAW process via Apple's documented DYLD_INSERT_LIBRARIES mechanism and the public Objective-C runtime — a methodological approach to investigate to what extent internal data models can be opened up without compromises in usability.

Real-time IPC

Low latency in focus

Bidirectional inter-process communication via Unix Domain Socket using JSON Lines. We investigate how low end-to-end latency can remain when all components operate locally and synchronously.

Privacy by design

No data sharing

Research-ethical principle: all data remain on the user's device. No account, no tracking, no upload. The vision module, the language model and speech output operate exclusively locally.

The Stack

From chat to DAW

Four layers. The prototype establishes that, in principle, the procedure holds. The detailed operationalisation is the subject of ongoing research.

Input

Chat CLI

Text-based commands in German

Processing

PapoNox Core

Intent engine + on-device LLM fallback

Bridge

IPC bridge

Unix Domain Socket + JSON Lines

Output

Logic Pro

Objective-C runtime + CGEvent synthesis

Target latency:< 200ms

Research stages

Milestones

What has been achieved so far — and what still needs to be measured methodologically.

Phase 0Completed

Phase 0

Architectural feasibility

Research question precisely formulated
Prototype as documented proof of concept
On-device LLM and public runtime interfaces empirically validated
Open-source research licensing (MIT for code, CC-BY-4.0 for documentation)

Phase 1

Phase 1 — open

DAW control — validation phase

Precision measurement of continuous values across the full value range
Empirical determination of the methodological limits
Validation of the three-tier grounding cascade (AX → OCR → vision)
User studies, external accessibility audit, conference submission — funding in preparation

Phase 2

Long-term research perspective

Universal accessibility

Transferability of the methodology to other closed applications
Classification of addressable application domains
Context sensitivity as an open methodological research question
No commitment, but a documented perspective

Questions about the research project??

We do our research openly and methodologically transparent. Write to us — we welcome feedback, technical exchange and collaboration interest.

Contact

Intention-BasedInterface

Research matrix

Chat Mode

AI Vision

Voice Control

Methodological principles

On-device inference

Process injection

Real-time IPC

Privacy by design

From chat to DAW

Chat CLI

PapoNox Core

IPC bridge

Logic Pro

Milestones

Architectural feasibility

DAW control — validation phase

Universal accessibility

Questions about the research project??

Intention-Based
Interface