Intention-Based
Interface
Our research question: can menu structures and keyboard shortcuts be replaced by pure intent — semantically captured, AI-mediated, locally executed — and to what extent is this substitution defensible?
Research matrix
Chat input, AI vision and voice input — three modalities we investigate in parallel and at different levels of maturity.
Chat Mode
The Brain
Text-based dialogue between user and DAW. Research question: how reliably can musical intent be extracted from free-form German language — and what systematic limits does this approach face when confronted with domain-specific terminology?
AI Vision
The Eyes
Screenshot analysis, OCR and AI-assisted image understanding as a methodological approach to address UI elements beyond official interfaces. Does the three-tier grounding cascade — accessibility API → OCR bounding boxes → vision-language model — work reliably enough for productive use? An open question.
Voice Control
The Voice
Speech-to-text control as a possible extension of the input modality. Whether and when this modality can be meaningfully investigated depends on funding outcomes and on findings from the chat-mode research.
Not part of the current funding phase. Continued funding and methodological groundwork required.
$ paponox --status
What the prototype currently demonstrates in a documented form (Phase 0)
# Navigation
→ "where am I" — Position + track state (initial feasibility)
→ "go to bar 5" — Direct positional addressing (initial feasibility)
# Transport
→ play / stop / record
→ Define cycle range — e.g. bars 5–15 (initial feasibility)
# Mixing
→ mute / solo / volume / pan — with auditory feedback (initial feasibility)
→ "what do you see" / "click on X"
# Project & modal dialogs
→ save / save as — with dialog automation (initial feasibility)
→ import audio / bounce — multi-stage dialog sequence (initial feasibility)
Methodological principles
How we approach the research question operationally — four principles already empirically implemented in the prototype.
On-device inference
Local & freeWe use Apple's FoundationModels as an on-device language model — executed entirely locally. Research premise: no cloud connectivity, no data sharing, no recurring inference costs.
Process injection
Direct access investigatedDirect integration into the DAW process via Apple's documented DYLD_INSERT_LIBRARIES mechanism and the public Objective-C runtime — a methodological approach to investigate to what extent internal data models can be opened up without compromises in usability.
Real-time IPC
Low latency in focusBidirectional inter-process communication via Unix Domain Socket using JSON Lines. We investigate how low end-to-end latency can remain when all components operate locally and synchronously.
Privacy by design
No data sharingResearch-ethical principle: all data remain on the user's device. No account, no tracking, no upload. The vision module, the language model and speech output operate exclusively locally.
From chat to DAW
Four layers. The prototype establishes that, in principle, the procedure holds. The detailed operationalisation is the subject of ongoing research.
Chat CLI
Text-based commands in German
PapoNox Core
Intent engine + on-device LLM fallback
IPC bridge
Unix Domain Socket + JSON Lines
Logic Pro
Objective-C runtime + CGEvent synthesis
Milestones
What has been achieved so far — and what still needs to be measured methodologically.
Architectural feasibility
- Research question precisely formulated
- Prototype as documented proof of concept
- On-device LLM and public runtime interfaces empirically validated
- Open-source research licensing (MIT for code, CC-BY-4.0 for documentation)
DAW control — validation phase
- Precision measurement of continuous values across the full value range
- Empirical determination of the methodological limits
- Validation of the three-tier grounding cascade (AX → OCR → vision)
- User studies, external accessibility audit, conference submission — funding in preparation
Universal accessibility
- Transferability of the methodology to other closed applications
- Classification of addressable application domains
- Context sensitivity as an open methodological research question
- No commitment, but a documented perspective
Questions about the research project??
We do our research openly and methodologically transparent. Write to us — we welcome feedback, technical exchange and collaboration interest.
Contact