Product design & research

HITL practice

The same story as the home page product section, with paths into prototypes, the Kit, and how validation with real users shaped the work.

Ubik Studio Team test log (Kraa)Research OS (prototype)HITL Kit write-up

Or jump to the full demo list.

What I build

Interfaces for complex AI systems that people actually trust. Approval flows, citation verification, design systems, and the copy conventions that make agentic tools feel legible. I use Claude and Claude Code in a pipeline into Cursor; when I sketch by hand, I often pipe that through v0 or Claude first, then finish in Cursor.

The public face of that systems work is the HITL Kit: a perspective paper, eleven primitives, and a shadcn registry. Before and alongside the shipped site at hitlkit.dev, I iterated the same ideas in this repo: the HITL-AI widget showcase and the component sheet are the earlier, in-site reference implementations.

How I validate

Mixed-methods research: structured interviews, behavioral observation, and session replays, synthesized into prioritized UX decisions rather than slide-deck summaries. I anchor every cycle in trust, evidence attribution, and meaningful human control. Findings feed interaction specs, flow changes, system prompts, and microcopy in the same shipping rhythm as the product.

A running example of that loop in the open is the team test log: feedback and observation turned into concrete improvements, documented where partners and future-us can read the arc—not only the conclusions.

Prototyping in the browser

I prototype in code, at scale, not in a separate design handoff. Cursor and Claude Code are how I turn intent into high-quality, working surfaces—web and desktop—fast enough to stay aligned with the people using the product. The same weeks usually include interviews, screen replays, and light data analysis, so the next build is shaped by what we are seeing in the field, not by a static milestone.

Research OS is a deliberately simple, browser-based slice of that idea: a multi-panel surface with agentic search, chat, and human-in-the-loop approval so I can stress-test legibility, density, and control in isolation. It is not the whole story—just a visible checkpoint. Related slices are in Music Analysis Chat and the rest of the demo index.

How the HITL Kit was born

The Kit did not start as a component drop. It started from a measurement argument: most enterprise AI pilots are judged on autonomous completion while real deployment is collaborative. I wrote that case into a single narrative—paper, installable library, and registry—so the critique and the UI stay coupled. The HITL Kit project page breaks down what actually shipped; the in-repo HITL-AI pages are a fork in the timeline worth comparing.

A measurement problem

95% of enterprise AI initiatives deliver zero measurable return, not from technology failure but from a measurement crisis. Current benchmarks saturate within months and optimize for autonomous task completion; the longer piece traces that gap across cognitive science, scaffolding research, and enterprise data, and argues for the assist-not-complete paradigm: AI designed to augment human agency rather than replace it. Read the paper

Current work

Co-founded Ubik Studio, a desktop-native AI research platform. Tied to that: production-ready UI in Next.js and Electron, user research (interviews, observation, session synthesis), approval flows and citation UI, copy and terminology standards, and datasets plus prompts for multi-hop research agents. The team test log is one window into how that work gets validated with real users.

Demos on this site are not decoration—they are where I rehearse the same HITL patterns that show up in product and in the Kit. Research OS for end-to-end flows, HITL-AI for primitive comparison, the Kit and Kraa write-ups for the argument and the field notes.

Mirror: Product section on the home page.