Deliveries¶

What's currently in the corpus, what's missing, and what changed in the latest batch. One row per data delivery in the log at the bottom.

Current delivery — 003¶

provisional · 2026-05-12 #5 · slug: multi-project-multi-voice · supersedes delivery-002.


Clips	20
Total duration	~41.6 min
Projects	`she_proves` (12) + `elephant_in_the_room` (8)
Tiers	A (12 clean) + B (8 room-augmented)
TTS backends	Azure (18) + Google Chirp 3 HD (2)
Unique speaker personas	6 (4 in She-Proves, 2 in Elephant)
Validation failures	0 / 20
Pipeline	SynthBanshee `0.1.0` @ `1ea48f3`

All clips are split: train. Only 4 unique speaker personas across 20 clips — speaker-disjoint partitioning isn't feasible at this scale.
One room type for Elephant. All 8 Tier-B clips use clinic_office. welfare_office and open_office are in the pipeline but not exercised yet.
One device profile for She-Proves. No phone_in_pocket etc. augmentation applied yet — Tier-A clips are clean, not phone-captured.
Voice diversity is low. 2 voice families per gender; the QA threshold for "diverse" is ≥3.
Toy-batch scale. 20 clips is enough to wire up consumer plumbing. Not enough to train a production model.

Flag	Detail	What to do about it
`low_voice_diversity_male`	2 male voice families across the corpus (threshold ≥3)	Track per-voice eval separately; expect feature overfit to AvriNeural until more voices land
`low_voice_diversity_female`	Same, for female voices	Same
`vic_f0_high` (per-clip × 2)	`sp_sv_a_0003_00`, `sp_it_a_0003_00` — Google Chirp HD female F0 above Azure baseline	Nothing. Don't exclude the clips. Calibrate F0 features per backend if you compute them. See Audio Format.
`quality_flagged_clips: 15`	Mostly `emotion_downgrade` from prosody cap activations at I3+	Don't reflexively filter these out — they pass validation. See Common mistakes #7.

Typology	Tier A (She-Proves)	Tier B (Elephant)	Total
`SV`	3	2	5
`IT`	3	2	5
`NEG`	3	2	5
`NEU`	3	2	5

max_intensity across the 20 clips: I5 = 10 clips · I3 = 4 clips · I2 = 6 clips.

Use these to check your consumer code on the schema features the delivery was designed to cover:

Full ClipMetadata schema — including the generation_metadata block and (for Tier B) populated acoustic_scene.
Per-surface casing rules — UPPERCASE speaker_id, lowercase paths and clip IDs.
has_violence derivation from events — NEG clips correctly false even at max_intensity ≥ 3.
Multi-project layout under a single data/he/ root.
Multi-backend provenance — generation_metadata.tts_backend differs per speaker.

Closed QA findings (vs. delivery-002)

Finding	Delivery-002	Delivery-003
`agg_no_escalation`	3 clips	0 — AGG RMS now escalates with intensity
`warn_no_overlap`	4 clips	0 — turn-overlap fires on I4+ clips
`warn_emotion_downgrade`	4 clips	0
`generation_metadata` absent	0 of 8 clips had it	20 of 20 carry the full block
`dirty_file_path` null	7 of 8 clips	20 of 20 retain dirty files
`normalized_dbfs` hardcoded `-1.0`	all 8 clips	Records the measured peak

Closed by the 2026-05-12 schema-shift regen

Three SynthBanshee PRs landed alongside the regen (#110 / #111 / #112):

Finding	Resolution
`single_backend` false positive	`qa.py` derives backend diversity from `generation_metadata.tts_backend.values()`; reports `clips_by_tts_backend: {azure: 18, google: 2}`
Absolute paths in clip JSON	`dirty_file_path` and `transcript_path` are now repo-relative POSIX
Leaked pytest tmp_path on `sp_neu_a_0001_00`	Regen overwrote with canonical path; autouse env-var strip fixture prevents future leaks

#	Date	Slug	Project	Tier	Clips	Duration	Status
003	2026-05-12	multi-project-multi-voice	she_proves + elephant	A + B	20	~42m	provisional
002	2026-04-15	m2a-wettest	she_proves	A	8	~17m	superseded
001	2026-04-15	debug-run-1	she_proves	A	1	2m 36s	superseded

Status	Meaning
`provisional`	Preview batch; consumer-integration only, not approved for training
`approved`	QA passed; cleared for training use
`superseded`	Replaced by a later delivery covering the same scenes at higher quality