Smart Sensing for Humans (SmaSH) Lab - Carnegie Mellon University
Sept 2024 - Present
Objective: To develop adaptive voice systems that support natural, confident use, even for those unfamiliar with technology
TOOLS
OpenAI API
Whisper
Figma
SHI Bot: Adaptive Decision-Making for Multimodal Voice Interfaces is designed to support natural, confident interaction. Particularly for users unfamiliar with or hesitant toward digital tools. By integrating NLP and semantic analysis, a classification engine evaluates the contextual relevance of input, triggering responses only when appropriate.
This reduces false activations and builds user trust. Designed for hands-free, context-aware use, the interface is intuitive, non-intrusive, and empowers users to feel in control, not overwhelmed.
Overview:
Speech-to-text → NLP + Semantic Analysis
Voice Interfaces as a Bridge
Accessible: Speak instead of tap or type
Approachable: No need for manuals or instructions
Natural: Feels like talking to a person, not a machine
Non-intrusive: Hands-free, eyes-free, blends into daily life
System Design
Use Cases
Accidental Command Trigger in Passive Conversation
A nearby user casually says “go home” during an unrelated conversation.
▶︎ The voice interface incorrectly interprets the phrase as a navigation command and initiates route guidance.
False Activation from Distant, Irrelevant Speech
Smart speaker overhears “no!” from another room
▶︎ The smart speaker misinterprets the emotional outburst as a cancellation command and prematurely stops an active task (e.g., timer or music).
Next Steps
Integration with Other Modalities
Coordinate voice interaction with gesture recognition or gaze tracking to add layers of intent detection (e.g., only respond when user is looking at the device and speaking).
Incorporating Emotional and Tonal Cues
Enhance context awareness by recognizing user emotions and vocal tone to adjust system responsiveness and empathy.
Personalized Interaction Models
Adapt system behavior based on individual user preferences, speech patterns, and interaction history.