My BosonAI's hackathon project ranked 2nd place in benchmarking stream - Benchmarking the scalable Audio Higgs Model and Qwen3.2

My BosonAI’s hackathon project ranked 2nd place in benchmarking stream: Benchmarking the scalable Audio Higgs Model + Qwen3.2 :tada:


Project Overview

Problem

When customers call support, they often get forced into a rigid menu (“press 1/2/3…”), even though:

  • they don’t know which category their issue belongs to,
  • routing takes time,
  • scaling human agents is expensive,
  • requests can span multiple domains.

Our idea

Use audio-first routing:

  1. synthesize/ingest caller request (TTS caller),
  2. understand request → decide department + intent,
  3. dispatch to the right expert model,
  4. respond as an agent via TTS.

System Diagram

System diagram of HiggAudioUnderstand routing to expert models and TTS agent

System pipeline: audio request → understanding/router → expert(s) → TTS agent response.


Audio Examples (Caller vs Agent)

Below are paired audio clips (caller request → agent response).
Tip: keep filenames short + lowercase.

Example 1 — Simple QA

Caller: “What is two plus two?”

 

Agent: short phrase answer


Example 7 — Time question

Caller: “How many minutes are in an hour?”

 

Agent: short phrase answer


Case Study: Airline Customer Calling Service

Example 0

“Please help me make a new reservation for Toronto.”

Example 35

“Online check-in isn’t working for me.”

Example 56

“Set up UM service for my child.”


Routing taxonomy (skeleton)

We route into:

  • Department (BOOKING / CHANGES / REFUNDS / …)
  • Intention (NEW_BOOKING / SEAT_SELECTION / …)
Click to expand routing labels - **BOOKING**: NEW_BOOKING, SEAT_SELECTION, … - **CHANGES**: FLIGHT_CHANGE, NAME_CORRECTION, … - **REFUNDS**: CANCEL_ITINERARY, VOUCHER_QUESTION, … - **BAGGAGE**: ADD_EXTRA_BAGS, LOST_BAGS, … - **FLIGHT_STATUS**: STATUS_REQUEST, DELAY_REASON, … - **CHECK_IN**: ONLINE_CHECKIN, PASSPORT_ISSUE, … - **LOYALTY**: MISSING_MILES, MERGE_ACCOUNTS, … - **PAYMENT**: PAYMENT_FAILED, REQUEST_RECEIPT, … - **SPECIAL_ASSISTANCE**: WHEELCHAIR, SPECIAL_MEAL, … - **GENERAL**: SPEAK_TO_AGENT, OTHER, …

Benchmarking: Concurrency

Latency p50/p95 by stage vs concurrency

Latency p50/p95 by stage vs concurrency"

Key metrics (fill in)

Setting p50 Total Latency (ms) p95 Total Latency (ms) Notes
C=1 TBD TBD baseline
C=2 TBD TBD  
C=4 TBD TBD  
C=8 TBD TBD  

Benchmarking: Small dataset

Department and intent confusion matrices

Confusion matrices on SMALL DATASET (department / intent).

Summary numbers (example)

  • Dataset: 64 examples
  • Department accuracy: 87.5%
  • Intent accuracy: 87.5%
Department and intent confusion matrices

Confusion matrices on LARGE DATASET (department / intent).


Reproducibility

Environment

  • TTS caller: HiggAudio v2
  • Router: HiggAudioUnderstand + Qwen3.2
  • TTS agent: HiggAudio v2

How to run (skeleton)

# 1) install
pip install -r requirements.txt

# 2) run demo
python demo.py --config configs/demo.yaml