My BosonAI's hackathon project ranked 2nd place in benchmarking stream - Benchmarking the scalable Audio Higgs Model and Qwen3.2

My BosonAI’s hackathon project ranked 2nd place in benchmarking stream: Benchmarking the scalable Audio Higgs Model + Qwen3.2

Project Overview

Problem

When customers call support, they often get forced into a rigid menu (“press 1/2/3…”), even though:

they don’t know which category their issue belongs to,
routing takes time,
scaling human agents is expensive,
requests can span multiple domains.

Our idea

Use audio-first routing:

synthesize/ingest caller request (TTS caller),
understand request → decide department + intent,
dispatch to the right expert model,
respond as an agent via TTS.

System Diagram

System diagram of HiggAudioUnderstand routing to expert models and TTS agent

System pipeline: audio request → understanding/router → expert(s) → TTS agent response.

Audio Examples (Caller vs Agent)

Below are paired audio clips (caller request → agent response).
Tip: keep filenames short + lowercase.

Example 1 — Simple QA

Caller: “What is two plus two?”

Agent: short phrase answer

Example 7 — Time question

Caller: “How many minutes are in an hour?”

Agent: short phrase answer

Case Study: Airline Customer Calling Service

Example 0

“Please help me make a new reservation for Toronto.”

Example 35

“Online check-in isn’t working for me.”

Example 56

“Set up UM service for my child.”

Routing taxonomy (skeleton)

We route into:

Department (BOOKING / CHANGES / REFUNDS / …)
Intention (NEW_BOOKING / SEAT_SELECTION / …)

Click to expand routing labels

- **BOOKING**: NEW_BOOKING, SEAT_SELECTION, … - **CHANGES**: FLIGHT_CHANGE, NAME_CORRECTION, … - **REFUNDS**: CANCEL_ITINERARY, VOUCHER_QUESTION, … - **BAGGAGE**: ADD_EXTRA_BAGS, LOST_BAGS, … - **FLIGHT_STATUS**: STATUS_REQUEST, DELAY_REASON, … - **CHECK_IN**: ONLINE_CHECKIN, PASSPORT_ISSUE, … - **LOYALTY**: MISSING_MILES, MERGE_ACCOUNTS, … - **PAYMENT**: PAYMENT_FAILED, REQUEST_RECEIPT, … - **SPECIAL_ASSISTANCE**: WHEELCHAIR, SPECIAL_MEAL, … - **GENERAL**: SPEAK_TO_AGENT, OTHER, …

Benchmarking: Concurrency

Latency p50/p95 by stage vs concurrency

Latency p50/p95 by stage vs concurrency"

Key metrics (fill in)

Setting	p50 Total Latency (ms)	p95 Total Latency (ms)	Notes
C=1	TBD	TBD	baseline
C=2	TBD	TBD
C=4	TBD	TBD
C=8	TBD	TBD

Benchmarking: Small dataset

Department and intent confusion matrices

Confusion matrices on SMALL DATASET (department / intent).

Summary numbers (example)

Dataset: 64 examples
Department accuracy: 87.5%
Intent accuracy: 87.5%

Confusion matrices on LARGE DATASET (department / intent).

Reproducibility

Environment

TTS caller: HiggAudio v2
Router: HiggAudioUnderstand + Qwen3.2
TTS agent: HiggAudio v2

How to run (skeleton)

# 1) install
pip install -r requirements.txt

# 2) run demo
python demo.py --config configs/demo.yaml