Glossary¶

Abbreviations and jargon that show up across the corpus and on this site, in one place.

Speaker roles¶

The role of each speaker is encoded in the speaker_id prefix and in speakers[].role.

Code	Stands for	Used in
`AGG`	Aggressor — the perpetrator in a domestic-violence scene	She-Proves clips (`AGG_M_30-45_*`)
`VIC`	Victim — the target of violence in a domestic-violence scene	She-Proves clips (`VIC_F_25-40_*`)
`BEN`	Beneficiary / client — a service-user in a welfare or clinic setting (the threatening party in Elephant scenes)	Elephant clips (`BEN_M_40-55_*`)
`SW`	Social Worker — the threatened professional in Elephant scenes	Elephant clips (`SW_F_30-45_*`)

The role determines the prosody profile, scene position, and which tier1_category events the speaker can produce.

Project codes¶

Code	Project	Clip ID prefix
`she_proves`	She-Proves smartphone app	`sp_*`
`elephant_in_the_room`	Elephant in the Room (clinic/welfare device)	`el_*`

Violence typology¶

The clip-level violence_typology field — not an ordered scale. See Label Taxonomy for details.

Code	Stands for
`SV`	Severe Violence
`IT`	Intimate Terrorism
`NEG`	Negative confusor (sounds intense, no violence)
`NEU`	Neutral

Tier 1 event category¶

The event-level tier1_category field on each EventLabel.

Code	Stands for
`VERB`	Verbal violence (shouting, threats, insults)
`DIST`	Distress vocalisations (screaming, crying under duress)
`PHYS`	Physical violence cues (impact sounds, struggle)
`EMOT`	Emotional manipulation (gaslighting, guilt-tripping)
`ACOU`	Acoustic non-vocal events (slams, falls)
`NONE`	Ambient / neutral / no violence cue

Tier codes¶

Code	Meaning
`A`	Clean audio — no room IR, no device profile, no background noise
`B`	Room IR + device profile + background noise injection

Audio jargon¶

Term	Meaning
F0	Fundamental frequency — the lowest frequency of a periodic signal; for voice, the pitch. Reported per speaker in some QA outputs.
dBFS	Decibels relative to full scale — 0 dBFS is the maximum amplitude representable by the format; –2 dBFS is ~80% of full amplitude.
Peak normalization	Applying a single gain to the whole signal so its absolute maximum matches a target level.
RMS	Root-mean-square — a measure of average signal energy. SynthBanshee uses per-turn RMS gain to enforce the loudness gradient between calm and escalated turns.
SNR	Signal-to-noise ratio — speech level minus background-noise level, in dB. Recorded in `acoustic_scene.snr_db_actual` for Tier B clips.
IR	Impulse response — a recording of how a room (or microphone, or speaker) responds to an idealised pulse. Convolving clean speech with a room IR makes it sound like it was recorded in that room.
ISM	Image-source method — an algorithm for synthetically generating room IRs by reflecting virtual sound sources off room walls. Implemented by `pyroomacoustics`.
SSML	Speech Synthesis Markup Language — an XML dialect that controls TTS output (pitch, rate, emphasis, breaks, voice). Azure and Google both accept SSML.
TTS	Text-to-speech — the generation of audio from a text prompt.
Prosody	The patterns of stress, intonation, pitch, and rate that make speech expressive (vs. flat).
Prosody cap	A safety clamp applied by SynthBanshee to LLM-suggested prosody values to prevent unnatural extremes (pitch ≤ +2 st, rate ∈ [0.85, 1.20]).
Whisper	OpenAI's open-weight ASR model, used internally as a sanity check that synthesised audio is still transcribable.

Pipeline / corpus jargon¶

Term	Meaning
Dirty file	The pre-preprocessing WAV (raw TTS-mixer output, before normalization and padding). Retained under `assets/speech/dirty/{clip_id}_dirty.wav`.
Generation metadata	The `generation_metadata` field — pipeline provenance: which TTS backend was used, which voice family, what mix mode, etc.
Manifest	The flat CSV summary at `data/he/manifest.csv` — one row per clip, columns for filtering.
Strong labels	Event-level labels in `.jsonl` files — one `EventLabel` object per labelled event, with onset/offset/category.
Weak labels	Clip-level summary labels in `.json` — `has_violence`, `max_intensity`, `violence_typology`, `violence_categories`.
Quality flag	A soft warning in `quality_flags` (e.g. `emotion_downgrade`). Doesn't fail validation; flags audio worth a second look.
Delivery	A merged data batch under `deliveries/{slug}/`. Each delivery records its SynthBanshee commit, metadata, and per-batch QA notes.

Hebrew TTS voice IDs¶

The four voices used in delivery-003:

Voice ID	Gender	Backend
`he-IL-AvriNeural`	M	Azure
`he-IL-HilaNeural`	F	Azure
`he-IL-Chirp3-HD-Achird`	M	Google Chirp 3 HD
`he-IL-Chirp3-HD-Achernar`	F	Google Chirp 3 HD