Problem and Constraints
Singers needed immediate, understandable feedback while practicing, not delayed analysis after recordings.
- Hackathon timeline forced fast technical decisions and a stable demo path.
- Feedback had to be immediate enough to use during practice, not after a session.
- Model output had to be translated into plain coaching cues for non-technical users.
My Approach
Considered: I considered batching analysis after recorded clips to simplify model execution.
Chose: I chose a near real-time inference loop so users could adjust technique in-session.
Rejected: I rejected post-record analysis because delayed feedback weakens practical training value.
Considered: I considered exposing raw model scores directly in the UI.
Chose: I chose interpretable coaching cues tied to pitch, timing, and vocal control behaviors.
Rejected: I rejected raw score output because users could not act on it quickly without translation.
Considered: I considered spending most time on feature depth.
Chose: I chose architecture stability and demo reliability first, then layered user-facing refinements.
Rejected: I rejected high feature churn because unstable flow would fail under presentation pressure.
System Design
Loading diagram…
Audio input is captured, transformed for inference, and mapped into practical coaching signals before it reaches the user interface. This keeps latency low while making technical output understandable in live practice.
Implementation Highlights
- Low-latency analysis flow optimized for near real-time response.
- Feedback loop that translates model output into actionable singing cues.
- Hackathon-ready deployment path to demo stability under pressure.
- Clear UX framing so technical output stays useful to non-technical users.
Tech Stack
Outcomes
- Won 'Most Ambitious VAPI Project' at UC Berkeley AI Hackathon 2025.
- Demonstrated responsive feedback loop with practical UX direction.
- Validated rapid prototyping approach for real-time AI products.
Retrospective
Problem: Fast model output alone was not enough if users could not act on it quickly.
What I tried: I prioritized low-latency response and clear feedback phrasing so users can immediately adjust performance.
What I'd do differently: I would instrument richer per-session analytics earlier to measure learning improvement over time.