Zum Hauptinhalt springen
Research

Intention-Based
Interface

Our research question: can menu structures and keyboard shortcuts be replaced by pure intent — semantically captured, AI-mediated, locally executed — and to what extent is this substitution defensible?

Three research pillars

Research matrix

Chat input, AI vision and voice input — three modalities we investigate in parallel and at different levels of maturity.

Under research

Chat Mode

The Brain

Text-based dialogue between user and DAW. Research question: how reliably can musical intent be extracted from free-form German language — and what systematic limits does this approach face when confronted with domain-specific terminology?

Designed for screen reader and Braille display users
Under research
Under research

AI Vision

The Eyes

Screenshot analysis, OCR and AI-assisted image understanding as a methodological approach to address UI elements beyond official interfaces. Does the three-tier grounding cascade — accessibility API → OCR bounding boxes → vision-language model — work reliably enough for productive use? An open question.

Open question: how precisely can the vision layer reproducibly identify UI elements?
Under research
Long-term question

Voice Control

The Voice

Speech-to-text control as a possible extension of the input modality. Whether and when this modality can be meaningfully investigated depends on funding outcomes and on findings from the chat-mode research.

Future perspective

Not part of the current funding phase. Continued funding and methodological groundwork required.

paponox_demo.sh — prototype
Proof of Concept

$ paponox --status

What the prototype currently demonstrates in a documented form (Phase 0)

# Navigation

"where am I" Position + track state (initial feasibility)

"go to bar 5" Direct positional addressing (initial feasibility)

# Transport

play / stop / record

Define cycle range e.g. bars 5–15 (initial feasibility)

# Mixing

mute / solo / volume / pan with auditory feedback (initial feasibility)

"what do you see" / "click on X"

# Project & modal dialogs

save / save as with dialog automation (initial feasibility)

import audio / bounce multi-stage dialog sequence (initial feasibility)

$phase-0Initial proof-of-concept findings documented
Anything beyond this falls within the open validation research
Architecture

Methodological principles

How we approach the research question operationally — four principles already empirically implemented in the prototype.

On-device inference

Local & free

We use Apple's FoundationModels as an on-device language model — executed entirely locally. Research premise: no cloud connectivity, no data sharing, no recurring inference costs.

Process injection

Direct access investigated

Direct integration into the DAW process via Apple's documented DYLD_INSERT_LIBRARIES mechanism and the public Objective-C runtime — a methodological approach to investigate to what extent internal data models can be opened up without compromises in usability.

Real-time IPC

Low latency in focus

Bidirectional inter-process communication via Unix Domain Socket using JSON Lines. We investigate how low end-to-end latency can remain when all components operate locally and synchronously.

Privacy by design

No data sharing

Research-ethical principle: all data remain on the user's device. No account, no tracking, no upload. The vision module, the language model and speech output operate exclusively locally.

The Stack

From chat to DAW

Four layers. The prototype establishes that, in principle, the procedure holds. The detailed operationalisation is the subject of ongoing research.

Input

Chat CLI

Text-based commands in German

Processing

PapoNox Core

Intent engine + on-device LLM fallback

Bridge

IPC bridge

Unix Domain Socket + JSON Lines

Output

Logic Pro

Objective-C runtime + CGEvent synthesis

Target latency:< 200ms
Research stages

Milestones

What has been achieved so far — and what still needs to be measured methodologically.

Phase 0Completed
Phase 0

Architectural feasibility

  • Research question precisely formulated
  • Prototype as documented proof of concept
  • On-device LLM and public runtime interfaces empirically validated
  • Open-source research licensing (MIT for code, CC-BY-4.0 for documentation)
Phase 1
Phase 1 — open

DAW control — validation phase

  • Precision measurement of continuous values across the full value range
  • Empirical determination of the methodological limits
  • Validation of the three-tier grounding cascade (AX → OCR → vision)
  • User studies, external accessibility audit, conference submission — funding in preparation
Phase 2
Long-term research perspective

Universal accessibility

  • Transferability of the methodology to other closed applications
  • Classification of addressable application domains
  • Context sensitivity as an open methodological research question
  • No commitment, but a documented perspective

Questions about the research project??

We do our research openly and methodologically transparent. Write to us — we welcome feedback, technical exchange and collaboration interest.

Contact