Chat about this codebase

AI-powered code exploration

Online

Project Overview

Gentle is a flexible, Kaldi-based forced-aligner that synchronizes spoken audio with its transcript at the word level. It eliminates manual timestamping in subtitle creation, linguistic research, podcast editing and any workflow that needs precise speech-text alignment. Gentle runs entirely locally, supports large audio files, and exposes multiple interfaces for integration.

Key Features

  • Word-level timestamps with confidence scores
  • No external services—runs offline on Mac, Linux or Windows (via Docker)
  • Modular design for custom Kaldi models and pipelines

Usage Interfaces

1. Web UI

Deliver a user-friendly browser interface.

  • Defaults to http://localhost:8765 after installation
  • Drag-and-drop audio and transcript files
  • Visualize alignments with waveform and word highlights

2. REST API

Automate batch processing or integrate into services.

Endpoint
POST http://localhost:8765/transcriptions?async=[true|false]

Required multipart/form-data fields:

  • audio: MP3, WAV, etc.
  • transcript: Plain-text transcript

Example (synchronous)

curl -F "audio=@lecture.mp3" \
     -F "transcript=@script.txt" \
     "http://localhost:8765/transcriptions?async=false"

Response JSON includes an array of words with start/end timestamps and confidence.

3. Command-Line Interface

Quick alignment from your shell or in scripts.

Basic usage

python3 align.py path/to/audio.wav path/to/transcript.txt

Common options

  • --json output full JSON to stdout
  • --consonant tune alignment sensitivity
  • --quiet suppress progress logs

Run python3 align.py --help for all flags.

4. Python Integration

Embed Gentle in custom Python workflows.

Install from source or pip install gentle then:

from gentle import Resources, force_align

# Initialize models and metadata
resources = Resources()  

# Align audio and transcript
alignment = force_align(
    audio='podcast.mp3',
    transcript='podcast.txt',
    resources=resources
)

# Iterate aligned words
for w in alignment.words:
    print(f"{w.alignedWord}: {w.start:.2f}s–{w.end:.2f}s (conf={w.confidence:.2f})")

Use this API to build batch jobs, real-time analysis tools or custom GUIs.

Getting Started

This guide shows how to launch a Gentle instance in under 10 minutes—either via Docker or a native install—download the models, and perform your first alignment.

1. Docker Quickstart

  1. Clone the repo and build the image

    git clone https://github.com/lowerquality/gentle.git
    cd gentle
    docker build -t gentle:latest .
    
  2. Run the container (CPU)

    docker run -d \
      --name gentle \
      -p 8765:8765 \
      -v $(pwd)/webdata:/gentle/webdata \
      gentle:latest
    
  3. Verify the service

    • Open http://localhost:8765 in your browser
    • Or test the REST API:
      curl -F "audio=@path/to/audio.wav" \
           -F "transcript=@path/to/transcript.txt" \
           "http://localhost:8765/transcriptions?async=false"
      

2. Native Install (Linux/macOS)

Prerequisites

  • Python 3.10
  • ffmpeg, git, build tools (zlib1g-dev, automake, autoconf, libtool, subversion, wget, unzip)
  • Optionally: pipenv

Steps

  1. Clone and enter repo

    git clone https://github.com/lowerquality/gentle.git
    cd gentle
    
  2. Install OS dependencies and link Python package

    # Linux
    ./install_deps.sh
    # macOS
    ./install_deps.sh
    
  3. Set up the project (submodules, Kaldi build, Python dev install)

    ./install.sh
    
  4. Verify installation

    which serve.py              # should point to your checkout
    python3 -c "import gentle; print(gentle.__version__)"
    

3. Downloading Models

Gentle requires Kaldi models (version 0.04) for alignment. From the project root:

./install_models.sh

This unpacks models into exp/ (native) or /gentle/models (Docker image). To customize the path:

export GENTLE_RESOURCES=/path/to/exp

4. First Alignment

Command-Line

Align speech.wav to transcript.txt:

python3 align.py \
  --nthreads 4 \
  --disfluency \
  -o aligned.json \
  speech.wav \
  transcript.txt

REST API

Submit an HTTP POST:

curl -X POST \
  -F 'audio=@speech.wav' \
  -F 'transcript=@transcript.txt' \
  'http://localhost:8765/transcriptions?async=false' \
  -o result.json

Inspecting Results

Open the JSON to see word-level timings:

{
  "words": [
    {
      "word": "hello",
      "start": 0.52,
      "end": 0.90,
      "case": "success"
    },
    …
  ]
}

You now have a running Gentle server and a basic alignment workflow. Proceed to the API Reference or Advanced Usage for fine-tuning.

Usage Guide

This guide covers all supported interfaces for Gentle: command-line, REST API (synchronous and asynchronous), web UI, interactive viewer, and Python API.

align.py Command-Line Interface

Force-align a transcript to audio and produce JSON timing data.

Essential Options

  • audiofile (positional): input audio (any FFmpeg-supported format)
  • txtfile (positional): UTF-8 plain-text transcript
  • -o, --output : write JSON to <file> (default: stdout)
  • --nthreads : number of alignment threads (default: CPU count)
  • --conservative: skip low-confidence matches
  • --disfluency: include filler words (uh, um)
  • --log : logging level (DEBUG, INFO, WARNING, ERROR, CRITICAL)

Examples

Run basic alignment and save to aligned.json

python align.py speech.wav transcript.txt -o aligned.json

Use 4 threads with debug logging

python align.py speech.wav transcript.txt --nthreads 4 --log DEBUG

Enable conservative matching and disfluencies

python align.py speech.wav transcript.txt -o out.json --conservative --disfluency

JSON Output

{
  "words": [
    {"word":"Hello","start":0.12,"end":0.45,"case":"success"},
    …
  ],
  "transcript":"Hello world…"
}
  • start, end: timestamps in seconds
  • case: success, not-found-in-audio, etc.

Tips

  • Increase --nthreads for long audio.
  • Use --conservative with high-quality transcripts.
  • Use --disfluency when filler timings matter.

Synchronous Transcription via cURL

Invoke /transcriptions in blocking mode to get alignment JSON immediately.

Request

curl -X POST http://localhost:8765/transcriptions \
  -F 'audio=@/path/to/file.wav' \
  -F 'transcript=< /path/to/transcript.txt' \
  -F 'async=false' \
  -F 'disfluency' \
  -F 'conservative'

Response

  • HTTP 200 with full alignment JSON
  • Errors appear in JSON error field

Tips

  • For long files, use async=true and poll status.
  • Download CSV, HTML viewer or ZIP via /transcriptions/<uid>/align.csv, /transcriptions/<uid>/index.html, /zip/<uid>.zip.

Asynchronous Transcription & Polling REST API

Submit a job, poll its status, then fetch results.

1. Submit Job

curl -X POST http://localhost:8765/transcriptions \
  -F 'audio=@/path/to/file.mp3' \
  -F 'transcript=' \
  -F 'async=true' \
  -o submit.json

Response (submit.json):

{"uid":"abcd1234","status":"Queued"}

2. Poll Status

while true; do
  curl http://localhost:8765/transcriptions/abcd1234/status.json | jq .
  sleep 5
done

Status becomes "OK" when done.

3. Download Results

curl -O http://localhost:8765/transcriptions/abcd1234/align.json
curl -O http://localhost:8765/transcriptions/abcd1234/align.csv
curl -O http://localhost:8765/zip/abcd1234.zip

Web Interface Upload and Alignment

Use the browser UI at http://localhost:8765/ for quick alignment.

  1. Open index.html.
  2. Select audio and transcript files.
  3. Toggle Conservative and Disfluency as needed.
  4. Click Submit.
  5. Monitor progress; click the returned link to open the interactive viewer or download ZIP.

Interactive Alignment Viewer

Open view_alignment.html (or /transcriptions/<uid>/index.html) to play audio and see live word/phoneme highlighting.

Key JavaScript Snippets

Render transcript spans with timing:

function render(words, transcript) {
  const $trans = document.getElementById("transcript");
  let offset = 0;
  words.forEach(w => {
    // add text between words
    $trans.appendChild(document.createTextNode(
      transcript.slice(offset, w.startOffset)
    ));
    // word span
    const span = document.createElement("span");
    span.textContent = transcript.slice(w.startOffset, w.endOffset);
    span.dataset.start = w.start;
    span.onclick = () => {
      audio.currentTime = w.start;
      audio.play();
    };
    w.$span = span;
    $trans.appendChild(span);
    offset = w.endOffset;
  });
}

Highlight active word and phoneme:

function syncPlayback() {
  const t = audio.currentTime;
  let active;
  words.forEach(w => {
    if (t >= w.start && t < w.end) active = w;
  });
  if (active && current !== active) {
    document.querySelectorAll(".active").forEach(el => el.classList.remove("active"));
    active.$span.classList.add("active");
    renderPhonemes(active);
    current = active;
  }
  highlightPhoneme(t);
  requestAnimationFrame(syncPlayback);
}
requestAnimationFrame(syncPlayback);

Tips

  • Serve align.json and view_alignment.html from the same directory.
  • Customize .active, .phone, .phactive in CSS for your theme.

Python API: ForcedAligner.transcribe

Programmatically align transcripts in Python.

1. Setup Resources

from gentle import Resources
resources = Resources(lang_dir="/path/to/langdir")

2. Instantiate ForcedAligner

from gentle import ForcedAligner
transcript = "the quick brown fox jumps over the lazy dog"
aligner = ForcedAligner(
    resources,
    transcript,
    nthreads=4,
    acoustic_scale=0.1,
    beam=10.0,
    retry_beam=40.0
)

3. Run Alignment

import logging
logger = logging.getLogger("gentle")

def progress_cb(state):
    print("Progress:", state)

transcription = aligner.transcribe(
    wavfile="audio.wav",
    progress_cb=progress_cb,
    logging=logger
)

Result Attributes

  • transcription.transcript: normalized transcript string
  • transcription.words: list of Word objects with:
    • word, start, end, case
    • phones: phone-level timings

Advanced Configuration

aligner = ForcedAligner(
  resources, transcript,
  insertion_penalty=1.5,
  deletion_penalty=1.0,
  substitution_penalty=2.0
)

Tune penalties or beam widths to balance speed and accuracy.

Core Concepts & Architecture

Gentle separates speech‐recognition into layered components—from native Kaldi binaries to high-level Python orchestration. This section outlines the main modules, their interactions, performance considerations and extension points.

1. Native Decoding & Graph Construction (ext/)

  • k3.cc
    • Performs online decoding: feature extraction, neural‐network scoring, beam search, endpoint detection.
    • Exposes commands: push-chunk, reset, get-final.
    • Real-time focus: low-latency chunk processing and partial results.
  • m3.cc
    • Builds the HCLG decoding graph via FST composition (context, lexicon, LM), determinization, minimization and self-loops.
    • Reads Kaldi models and grammar FST; writes optimized FST.
  • Makefile
    • Centralizes Kaldi include/lib paths, CXXFLAGS, CUDA options.
    • Auto-generates build rules for k3, m3 and additional .cc tools.
    • Extension point: drop new <tool>.cc, add name to BINFILES.

2. Resource Management (gentle/resources.py)

  • Resources
    • Initializes paths to acoustic models, HCLG graph, lexicon, vocabulary.
    • Loads symbol tables (words.txt, phones.txt).
    • Use once per process and share across threads.

Example:

from gentle.resources import Resources
res = Resources(model_root="models/kaldi")
print(res.hclg_path, res.word_table.get("hello"))

3. RPC & Streaming Interface (gentle/rpc.py & standard_kaldi.py)

  • RPCProtocol
    • Frames mixed text/binary requests over pipes to k3.
    do(method, *args, body=bytes)(body, status) or raises RPCError.
  • StandardKaldi
    • Wraps RPCProtocol for high-level methods: push_chunk(), finalize(), parsing JSON‐like word/phone streams.

Example:

from subprocess import Popen, PIPE
from gentle.rpc import RPCProtocol
from gentle.util.paths import get_binary

exe = get_binary("ext/k3")
proc = Popen([exe, res.nnet_dir, res.hclg_path], stdin=PIPE, stdout=PIPE)
rpc  = RPCProtocol(proc.stdin, proc.stdout)

# Stream raw audio
with open("audio.raw","rb") as f:
    rpc.do("push-chunk", "utt1", body=f.read())
body, _ = rpc.do("get-final")

4. Concurrency & Model Pooling (gentle/kaldi_queue.py)

  • KaldiQueue
    • Maintains a thread-safe pool of standard_kaldi instances.
    • Pre-loads multiple ASR pipelines for parallel decoding.
  • Usage
    from gentle.kaldi_queue import KaldiQueue
    queue = KaldiQueue(resources=res, size=4)
    worker = queue.get()
    result = worker.transcribe_chunk(audio_bytes)
    queue.put(worker)
    

5. High-Level Transcription API (gentle/full_transcriber.py)

  • FullTranscriber
    • Coordinates KaldiQueue, handles WAV I/O, chunking, finalization.
    • Returns a Transcription object with word timings and phonemes.
  • Sample
    from gentle.full_transcriber import FullTranscriber
    ft = FullTranscriber(resources=res, n_threads=4)
    transcription = ft.transcribe("interview.wav")
    print(transcription.to_json())
    

6. Alignment Pipeline

  1. MetaSentence (gentle/metasentence.py)
    • Normalizes and tokenizes transcript, maintains mapping to original offsets.
  2. Diff‐Based Alignment (gentle/diff_align.py)
    • Aligns Kaldi’s raw words to transcript tokens via difflib.
  3. ForcedAligner (gentle/forced_aligner.py)
    • Runs two‐pass alignment: coarse transcription + word‐level forced align.
    • Applies AdjacencyOptimizer to fix boundary mismatches.
  4. Multipass Realignment (gentle/multipass.py)
    • Detects low-confidence segments, re-aligns in parallel for accuracy.

7. Language Model Utilities (gentle/language_model.py)

  • make_bigram_lm_fst
    • Builds OpenFST text-format bigram LM from token sequences.
    • Supports conservative OOV arcs and optional disfluency tokens.
  • make_bigram_language_model
    • Wraps Kaldi’s mkgraph to produce binary G.fst.

8. Text & Data Structures (gentle/transcription.py)

  • Word
    • Encapsulates word, start, duration, case, phones, text offsets.
  • Transcription
    • Collections of Word objects.
    • Serializes to JSON/CSV and supports trimming or concatenation.

Performance & Extension Points

  • Tweak queue size in KaldiQueue for CPU vs GPU throughput.
  • Add new alignment strategies by subclassing ForcedAligner or injecting passes into multipass.prepare_multipass.
  • Customize normalization by extending MetaSentence.kaldi_normalize.
  • Replace bigram LM with higher-order models by modifying language_model.
  • Optimize native tools via ext/Makefile flags (e.g. turn on CUDA or change -O levels).

Deployment & Configuration

This section covers containerized deployment, continuous integration, pretrained model setup, path resolution utilities, and server configuration. Follow these steps to run Gentle reliably in production or research pipelines and to customise performance and behaviour.

Containerized Deployment with Docker

Build and run Gentle as a Docker container to ensure consistency across environments.

Building the Docker Image

From your project root:

docker build -t lowerquality/gentle:latest .

Running the Container

docker run -d \
  --name gentle \
  -p 8765:8765 \
  -v /path/to/transcriptions:/gentle/webdata/transcriptions \
  -e GENTLE_DATA=/gentle/kaldi-models-0.04 \
  lowerquality/gentle:latest

Options:

  • -p 8765:8765 exposes the web server.
  • -v …:/gentle/webdata/transcriptions persists job data.
  • -e GENTLE_DATA points to model files inside the container.

Customising Resources and Ports

To adjust CPU/GPU usage or memory limits, append Docker flags:

docker run -d \
  --cpus="4" \
  --memory="8g" \
  -p 8765:8765 \
  lowerquality/gentle:latest

Continuous Integration with Travis CI

Automate builds, tests, and Docker image publishing via .travis.yml.

sudo: required
language: generic

services:
  - docker

install:
  - docker build -t lowerquality/gentle .

script:
  - docker run --rm lowerquality/gentle \
      sh -c 'cd /gentle && python3 setup.py test'

after_success:
  - if [ "$TRAVIS_BRANCH" == "master" ]; then
      docker login -u="$DOCKER_USERNAME" -p="$DOCKER_PASSWORD";
      docker push lowerquality/gentle:latest;
    fi

Steps:

  1. Build the image in install.
  2. Test inside a fresh container in script.
  3. Push on master in after_success (requires DOCKER_USERNAME and DOCKER_PASSWORD in Travis settings).

For version tags:

  - if [ "$TRAVIS_TAG" != "" ]; then
      docker tag lowerquality/gentle:latest lowerquality/gentle:$TRAVIS_TAG;
      docker push lowerquality/gentle:$TRAVIS_TAG;
    fi

Pretrained Model Setup

Automate model download and unpacking with install_models.sh.

# At project root
bash install_models.sh

This script:

  • Downloads kaldi-models-0.04.zip
  • Unzips into ./kaldi-models-0.04/
  • Removes the ZIP archive

Set the environment variable so Gentle can locate models:

export GENTLE_DATA=/path/to/gentle/kaldi-models-0.04

Add this line to ~/.bashrc or ~/.zshrc for persistence.

Path Resolution Utilities

Use gentle/util/paths.py to locate binaries, resources, and data directories consistently.

from gentle.util.paths import get_binary, get_resource, get_datadir

# Locate FFmpeg binary (bundled or system)
ffmpeg_path = get_binary("ffmpeg")

# Locate an HTML template shipped with Gentle
template_path = get_resource("web/index.html")

# Locate the root data directory (models, graphs)
data_dir = get_datadir()

Functions:

  • get_binary(name): Returns full path to a command.
  • get_resource(relative_path): Returns path inside gentle/resources/.
  • get_datadir(): Returns model root directory (controlled by GENTLE_DATA).

Server Configuration and Performance Tuning

Run the Gentle web server directly or via Docker. You can tweak behaviour via environment variables and flags.

Running Locally

# Ensure GENTLE_DATA is set
python3 serve.py \
  --port 8765 \
  --threads 4 \
  --timeout 300

Available options:

  • --port: Web server port (default: 8765)
  • --threads: Number of worker threads for parallel transcription
  • --timeout: Max processing time (seconds) before aborting a job

Run in the background with logging:

nohup python3 serve.py > gentle.log 2>&1 &

Custom Flags for Behaviour

  • --disfluency: Keep filler words (uh, um) in alignments.
  • --conservative: Enforce stricter alignment (fewer insertions).
  • --cleanup: Remove intermediate files upon completion.

Monitoring and Persistence

  • Transcription jobs and logs reside under webdata/transcriptions/<job-id>/.
  • Use a mounted volume (-v /host/path:/gentle/webdata) to preserve data across restarts.
  • Monitor CPU and memory with container metrics or system tools to tune --threads and Docker resource limits.

With these steps, you can deploy Gentle in a reproducible, scalable manner and customise its behaviour to fit production or research workflows.

Development Guide

This guide walks you through setting up your development environment, building the C++ Kaldi extension, running tests, and applying coding standards.

1. Environment Setup

Initialize the repository, install dependencies, and build external components with a single command:

# Clone and initialize submodules
git clone https://github.com/strob/gentle.git
cd gentle
git submodule update --init --recursive

# Bootstrap everything: system deps, Kaldi, Python package
bash install.sh

install.sh performs:

  • install_deps.sh for system packages and python3 setup.py develop
  • ext/install_kaldi.sh to build Kaldi (tools + src) with static linking
  • make in ext and core components

2. Dependency Installation (install_deps.sh)

Installs OS-specific packages and sets up the Python package in “develop” mode.

sudo ./install_deps.sh

Key behaviors:

  • Aborts on error (set -e)
  • On Debian/Ubuntu: installs zlib, automake, libtool, Subversion, ATLAS, Python dev tools, wget, unzip, git, ffmpeg
  • On macOS: uses Homebrew to install ffmpeg, libtool, automake, autoconf, wget, python3
  • Registers the local package for editable installs (python3 setup.py develop)

Practical tip: Ensure universe is enabled on Ubuntu for ffmpeg, and that python3 points to your desired interpreter.

3. Building the Kaldi Extension (ext/Makefile + ext/install_kaldi.sh)

a. Prepare Kaldi

# Under ext/kaldi
cd ext/kaldi/tools
make clean && make
cd ../src
./configure --static --static-math=yes --static-fst=yes --use-cuda=no
make depend && make -j$(nproc)

Or automate:

bash ext/install_kaldi.sh

b. Configure ext/Makefile

  • KALDI_BASE: path to Kaldi root
    KALDI_BASE = ext/kaldi/src/
    
  • CXXFLAGS / EXTRA_CXXFLAGS: include paths, optimizations
  • ADDLIBS: list of static Kaldi libraries in link order
  • CUDA toggle: pass CUDA=true to inject GPU flags

c. Build Binaries

cd ext
make all          # builds k3, m3 statically
make CUDA=true all  # include CUDA if supported

To add a new tool:

  1. Append its name to BINFILES in ext/Makefile
  2. Create ext/<tool>.cc; default rules will compile and link it.

4. Running Tests

gentle uses Python’s unittest framework. Run all tests after setup:

# From repo root
python3 -m unittest discover -v tests

Or run specific suites:

python3 -m unittest tests.base
python3 -m unittest tests.transcriber
  • tests/base.py checks core imports and resource paths.
  • tests/transcriber.py verifies forced alignment using embedded models.

Practical tip: ensure gentle.Resources().get_binary(...) resolves correctly before running transcriber tests.

5. Coding Standards (pylintrc)

Enforce consistent linting with the provided pylintrc:

pip install pylint
pylint --rcfile=pylintrc gentle serve.py align.py

Key setting:

[MESSAGES CONTROL]
disable=locally-disabled

– disallows ad-hoc # pylint: disable comments in code.

CI Integration (GitHub Actions example):

- name: Lint with Pylint
  run: |
    pip install pylint
    pylint --rcfile=pylintrc gentle serve.py align.py

Practical tip: for one-off rules, update pylintrc centrally instead of scattering disable comments.

Python API Reference

This section documents Gentle’s public Python classes and functions for programmatic integration, covering resource management, alignment, transcription, data models, and audio resampling.

Resources: Loading ASR Models and Lexicons

Provide acoustic models, language models and decoding graphs for Kaldi-based processing.

class gentle.Resources

Constructor

Resources(
    lang_dir: str = None,
    acoustic_model: str = None,
    graph: str = None,
    lexicon: str = None
)

Parameters

  • lang_dir (str, optional): Path to model/ directory containing conf/, data/, etc.
  • acoustic_model (str, optional): Path to neural network model (final.mdl).
  • graph (str, optional): Path to decoding graph (HCLG.fst).
  • lexicon (str, optional): Path to pronunciation lexicon (lexicon.txt).

Attributes

  • self.lang_dir, self.acoustic_model, self.graph, self.lexicon
  • self.word_symbols (dict): maps word strings to symbol IDs.
  • self.vocab (list): vocabulary from lexicon.

Example

from gentle import Resources

# Load default bundled models (downloads on first use)
resources = Resources()

# Or point to custom model paths
resources = Resources(
    lang_dir="/models/english/lang",
    acoustic_model="/models/english/final.mdl",
    graph="/models/english/HCLG.fst",
    lexicon="/models/english/lexicon.txt"
)

ForcedAligner: Word-Level Alignment

Align a known transcript to audio with multi-pass refinement and ambiguity handling.

class gentle.ForcedAligner

Constructor

ForcedAligner(
    resources: Resources,
    transcript: str,
    nthreads: int = 4,
    **kwargs
)

Parameters

  • resources: gentle.Resources instance.
  • transcript: Full transcript text to align.
  • nthreads: Number of Kaldi threads.
  • **kwargs: Passed to language model and diff alignment (e.g., acoustic_scale, beam).

Method

transcribe(
    wav_path: str,
    progress_cb: Callable = None,
    logging: ModuleType = None
) -> gentle.transcription.Transcription

Example

from gentle import Resources, ForcedAligner
import logging

resources = Resources()
text = "This is a sample transcript to align."
aligner = ForcedAligner(resources, text, nthreads=2, acoustic_scale=1.0)

def progress_cb(update):
    print(update)  # {'percent': float, 'message': str}

logging.getLogger().setLevel(logging.INFO)

transcription = aligner.transcribe(
    "input.wav",
    progress_cb=progress_cb,
    logging=logging
)
print(transcription.to_json())

FullTranscriber: End-to-End Speech-to-Text

Perform full unconstrained transcription on WAV files using Kaldi models.

class gentle.FullTranscriber

Constructor

FullTranscriber(
    resources: Resources,
    nthreads: int = 4,
    **kwargs
)

Parameters

  • resources: gentle.Resources instance.
  • nthreads: Number of parallel decoding threads.
  • **kwargs: Options for ASR decoding (e.g., beam, max_active).

Method

transcribe(
    wav_path: str,
    progress_cb: Callable = None
) -> gentle.transcription.Transcription

Example

from gentle import Resources, FullTranscriber

resources = Resources()
transcriber = FullTranscriber(resources, nthreads=4, beam=15.0)

# Process full transcription
transcription = transcriber.transcribe(
    "long_recording.wav",
    progress_cb=lambda u: print(f"{u['percent']*100:.1f}%")
)
print(transcription.to_csv())

Transcriber: Chunked Multi-Threaded Processing

Efficiently transcribe long audio by splitting into overlapping chunks.

class gentle.Transcriber

Constructor

Transcriber(
    resources: Resources,
    nthreads: int = None,
    chunk_len: float = 30.0,
    overlap: float = 1.0,
    **kwargs
)

Parameters

  • resources: gentle.Resources instance.
  • nthreads: Number of worker threads (defaults to CPU count).
  • chunk_len: Duration (s) of each segment.
  • overlap: Overlap (s) between segments.
  • **kwargs: Decoding options (e.g., max_active, beam).

Method

transcribe(
    wav_path: str,
    progress_cb: Callable = None
) -> List[gentle.transcription.Word]

Example

from gentle import Resources, Transcriber

resources = Resources()
tx = Transcriber(resources, nthreads=8, chunk_len=20.0, overlap=0.5)

words = tx.transcribe("meeting.wav")
for w in words:
    print(f"{w.start:.2f}-{w.end:.2f}: {w.word}")

Transcription Data Models

Represent and serialize transcription results.

class gentle.transcription.Word

Fields

  • word (str): Original token.
  • alignedWord (str): Normalized form.
  • start, end (float): Timing in seconds.
  • startOffset, endOffset (int): Character offsets.
  • case (str): 'success' or 'not-found-in-audio'.
  • phones (List[dict]): Phone-level alignments.

class gentle.transcription.Transcription

Constructor

Transcription(
    transcript: str,
    words: List[Word]
)

Methods

  • to_json() -> str
  • to_csv() -> str
  • adjust(offset: float): Shift all timings by offset seconds.

Example

from gentle.transcription import Transcription, Word

# Build from aligned words
words = [Word(word="Hello", start=0.0, end=0.5, ...), ...]
tx = Transcription("Hello world", words)

print(tx.to_json())

Audio Resampling Utility

Convert media to single-channel, 16-bit, 8 kHz WAV for ASR.

resample(infile, outfile, offset=None, duration=None)

Convert in place. Returns exit code (0 on success).
Raises IOError if infile missing, RuntimeError on conversion failure.

Parameters

  • infile (str)
  • outfile (str)
  • offset (float, optional)
  • duration (float, optional)

Example

from gentle.resample import resample

# Full conversion
resample("episode.mp3", "episode_8k.wav")

# Clip 10s from 30s mark
resample("episode.mp3", "clip.wav", offset=30.0, duration=10.0)

resampled(infile, offset=None, duration=None)

Context manager yielding a temp WAV path; auto-deletes on exit.
Raises RuntimeError on failure.

Example

from gentle.resample import resampled
from gentle import ForcedAligner, Resources

resources = Resources()
with resampled("video.mov", duration=15.0) as wav:
    aligner = ForcedAligner(resources, "Transcript text")
    tx = aligner.transcribe(wav)
    print(tx.to_csv())