strob/gentle Documentation - Complete Guide & API Reference

Project Overview

Gentle is a flexible, Kaldi-based forced-aligner that synchronizes spoken audio with its transcript at the word level. It eliminates manual timestamping in subtitle creation, linguistic research, podcast editing and any workflow that needs precise speech-text alignment. Gentle runs entirely locally, supports large audio files, and exposes multiple interfaces for integration.

Key Features

Word-level timestamps with confidence scores
No external services—runs offline on Mac, Linux or Windows (via Docker)
Modular design for custom Kaldi models and pipelines

Usage Interfaces

1. Web UI

Deliver a user-friendly browser interface.

Defaults to http://localhost:8765 after installation
Drag-and-drop audio and transcript files
Visualize alignments with waveform and word highlights

2. REST API

Automate batch processing or integrate into services.

Endpoint
POST http://localhost:8765/transcriptions?async=[true|false]

Required multipart/form-data fields:

audio: MP3, WAV, etc.
transcript: Plain-text transcript

Example (synchronous)

curl -F "audio=@lecture.mp3" \
     -F "transcript=@script.txt" \
     "http://localhost:8765/transcriptions?async=false"

Response JSON includes an array of words with start/end timestamps and confidence.

3. Command-Line Interface

Quick alignment from your shell or in scripts.

Basic usage

python3 align.py path/to/audio.wav path/to/transcript.txt

Common options

--json output full JSON to stdout
--consonant tune alignment sensitivity
--quiet suppress progress logs

Run python3 align.py --help for all flags.

4. Python Integration

Embed Gentle in custom Python workflows.

Install from source or pip install gentle then:

from gentle import Resources, force_align

# Initialize models and metadata
resources = Resources()  

# Align audio and transcript
alignment = force_align(
    audio='podcast.mp3',
    transcript='podcast.txt',
    resources=resources
)

# Iterate aligned words
for w in alignment.words:
    print(f"{w.alignedWord}: {w.start:.2f}s–{w.end:.2f}s (conf={w.confidence:.2f})")

Use this API to build batch jobs, real-time analysis tools or custom GUIs.

Getting Started

This guide shows how to launch a Gentle instance in under 10 minutes—either via Docker or a native install—download the models, and perform your first alignment.

1. Docker Quickstart

Clone the repo and build the image

git clone https://github.com/lowerquality/gentle.git
cd gentle
docker build -t gentle:latest .

Run the container (CPU)

docker run -d \
  --name gentle \
  -p 8765:8765 \
  -v $(pwd)/webdata:/gentle/webdata \
  gentle:latest

Verify the service

Open http://localhost:8765 in your browser

Or test the REST API:

curl -F "audio=@path/to/audio.wav" \
     -F "transcript=@path/to/transcript.txt" \
     "http://localhost:8765/transcriptions?async=false"

2. Native Install (Linux/macOS)

Prerequisites

Python 3.10
ffmpeg, git, build tools (zlib1g-dev, automake, autoconf, libtool, subversion, wget, unzip)
Optionally: pipenv

Steps

Clone and enter repo

git clone https://github.com/lowerquality/gentle.git
cd gentle

Install OS dependencies and link Python package

# Linux
./install_deps.sh
# macOS
./install_deps.sh

Set up the project (submodules, Kaldi build, Python dev install)
```
./install.sh
```

Verify installation

which serve.py              # should point to your checkout
python3 -c "import gentle; print(gentle.__version__)"

3. Downloading Models

Gentle requires Kaldi models (version 0.04) for alignment. From the project root:

./install_models.sh

This unpacks models into exp/ (native) or /gentle/models (Docker image). To customize the path:

export GENTLE_RESOURCES=/path/to/exp

4. First Alignment

Command-Line

Align speech.wav to transcript.txt:

python3 align.py \
  --nthreads 4 \
  --disfluency \
  -o aligned.json \
  speech.wav \
  transcript.txt

REST API

Submit an HTTP POST:

curl -X POST \
  -F 'audio=@speech.wav' \
  -F 'transcript=@transcript.txt' \
  'http://localhost:8765/transcriptions?async=false' \
  -o result.json

Inspecting Results

Open the JSON to see word-level timings:

{
  "words": [
    {
      "word": "hello",
      "start": 0.52,
      "end": 0.90,
      "case": "success"
    },
    …
  ]
}

You now have a running Gentle server and a basic alignment workflow. Proceed to the API Reference or Advanced Usage for fine-tuning.

Usage Guide

This guide covers all supported interfaces for Gentle: command-line, REST API (synchronous and asynchronous), web UI, interactive viewer, and Python API.

align.py Command-Line Interface

Force-align a transcript to audio and produce JSON timing data.

Essential Options

audiofile (positional): input audio (any FFmpeg-supported format)
txtfile (positional): UTF-8 plain-text transcript
-o, --output : write JSON to <file> (default: stdout)
--nthreads : number of alignment threads (default: CPU count)
--conservative: skip low-confidence matches
--disfluency: include filler words (uh, um)
--log : logging level (DEBUG, INFO, WARNING, ERROR, CRITICAL)

Examples

Run basic alignment and save to aligned.json

python align.py speech.wav transcript.txt -o aligned.json

Use 4 threads with debug logging

python align.py speech.wav transcript.txt --nthreads 4 --log DEBUG

Enable conservative matching and disfluencies

python align.py speech.wav transcript.txt -o out.json --conservative --disfluency

JSON Output

{
  "words": [
    {"word":"Hello","start":0.12,"end":0.45,"case":"success"},
    …
  ],
  "transcript":"Hello world…"
}

start, end: timestamps in seconds
case: success, not-found-in-audio, etc.

Tips

Increase --nthreads for long audio.
Use --conservative with high-quality transcripts.
Use --disfluency when filler timings matter.

Synchronous Transcription via cURL

Invoke /transcriptions in blocking mode to get alignment JSON immediately.

Request

curl -X POST http://localhost:8765/transcriptions \
  -F 'audio=@/path/to/file.wav' \
  -F 'transcript=< /path/to/transcript.txt' \
  -F 'async=false' \
  -F 'disfluency' \
  -F 'conservative'

Response

HTTP 200 with full alignment JSON
Errors appear in JSON error field

Tips

For long files, use async=true and poll status.
Download CSV, HTML viewer or ZIP via /transcriptions/<uid>/align.csv, /transcriptions/<uid>/index.html, /zip/<uid>.zip.

Asynchronous Transcription & Polling REST API

Submit a job, poll its status, then fetch results.

1. Submit Job

curl -X POST http://localhost:8765/transcriptions \
  -F 'audio=@/path/to/file.mp3' \
  -F 'transcript=' \
  -F 'async=true' \
  -o submit.json

Response (submit.json):

{"uid":"abcd1234","status":"Queued"}

2. Poll Status

while true; do
  curl http://localhost:8765/transcriptions/abcd1234/status.json | jq .
  sleep 5
done

Status becomes "OK" when done.

3. Download Results

curl -O http://localhost:8765/transcriptions/abcd1234/align.json
curl -O http://localhost:8765/transcriptions/abcd1234/align.csv
curl -O http://localhost:8765/zip/abcd1234.zip

Web Interface Upload and Alignment

Use the browser UI at http://localhost:8765/ for quick alignment.

Open index.html.
Select audio and transcript files.
Toggle Conservative and Disfluency as needed.
Click Submit.
Monitor progress; click the returned link to open the interactive viewer or download ZIP.

Interactive Alignment Viewer

Open view_alignment.html (or /transcriptions/<uid>/index.html) to play audio and see live word/phoneme highlighting.

Key JavaScript Snippets

Render transcript spans with timing:

function render(words, transcript) {
  const $trans = document.getElementById("transcript");
  let offset = 0;
  words.forEach(w => {
    // add text between words
    $trans.appendChild(document.createTextNode(
      transcript.slice(offset, w.startOffset)
    ));
    // word span
    const span = document.createElement("span");
    span.textContent = transcript.slice(w.startOffset, w.endOffset);
    span.dataset.start = w.start;
    span.onclick = () => {
      audio.currentTime = w.start;
      audio.play();
    };
    w.$span = span;
    $trans.appendChild(span);
    offset = w.endOffset;
  });
}

Highlight active word and phoneme:

function syncPlayback() {
  const t = audio.currentTime;
  let active;
  words.forEach(w => {
    if (t >= w.start && t < w.end) active = w;
  });
  if (active && current !== active) {
    document.querySelectorAll(".active").forEach(el => el.classList.remove("active"));
    active.$span.classList.add("active");
    renderPhonemes(active);
    current = active;
  }
  highlightPhoneme(t);
  requestAnimationFrame(syncPlayback);
}
requestAnimationFrame(syncPlayback);

Tips

Serve align.json and view_alignment.html from the same directory.
Customize .active, .phone, .phactive in CSS for your theme.

Python API: ForcedAligner.transcribe

Programmatically align transcripts in Python.

1. Setup Resources

from gentle import Resources
resources = Resources(lang_dir="/path/to/langdir")

2. Instantiate ForcedAligner

from gentle import ForcedAligner
transcript = "the quick brown fox jumps over the lazy dog"
aligner = ForcedAligner(
    resources,
    transcript,
    nthreads=4,
    acoustic_scale=0.1,
    beam=10.0,
    retry_beam=40.0
)

3. Run Alignment

import logging
logger = logging.getLogger("gentle")

def progress_cb(state):
    print("Progress:", state)

transcription = aligner.transcribe(
    wavfile="audio.wav",
    progress_cb=progress_cb,
    logging=logger
)

Result Attributes

transcription.transcript: normalized transcript string
transcription.words: list of Word objects with:
- word, start, end, case
- phones: phone-level timings

Advanced Configuration

aligner = ForcedAligner(
  resources, transcript,
  insertion_penalty=1.5,
  deletion_penalty=1.0,
  substitution_penalty=2.0
)

Tune penalties or beam widths to balance speed and accuracy.

Core Concepts & Architecture

Gentle separates speech‐recognition into layered components—from native Kaldi binaries to high-level Python orchestration. This section outlines the main modules, their interactions, performance considerations and extension points.

1. Native Decoding & Graph Construction (ext/)

k3.cc
• Performs online decoding: feature extraction, neural‐network scoring, beam search, endpoint detection.
• Exposes commands: push-chunk, reset, get-final.
• Real-time focus: low-latency chunk processing and partial results.
m3.cc
• Builds the HCLG decoding graph via FST composition (context, lexicon, LM), determinization, minimization and self-loops.
• Reads Kaldi models and grammar FST; writes optimized FST.
Makefile
• Centralizes Kaldi include/lib paths, CXXFLAGS, CUDA options.
• Auto-generates build rules for k3, m3 and additional .cc tools.
• Extension point: drop new <tool>.cc, add name to BINFILES.

2. Resource Management (gentle/resources.py)

Resources
• Initializes paths to acoustic models, HCLG graph, lexicon, vocabulary.
• Loads symbol tables (words.txt, phones.txt).
• Use once per process and share across threads.

Example:

from gentle.resources import Resources
res = Resources(model_root="models/kaldi")
print(res.hclg_path, res.word_table.get("hello"))

3. RPC & Streaming Interface (gentle/rpc.py & standard_kaldi.py)

RPCProtocol
• Frames mixed text/binary requests over pipes to k3.
• do(method, *args, body=bytes) → (body, status) or raises RPCError.
StandardKaldi
• Wraps RPCProtocol for high-level methods: push_chunk(), finalize(), parsing JSON‐like word/phone streams.

Example:

from subprocess import Popen, PIPE
from gentle.rpc import RPCProtocol
from gentle.util.paths import get_binary

exe = get_binary("ext/k3")
proc = Popen([exe, res.nnet_dir, res.hclg_path], stdin=PIPE, stdout=PIPE)
rpc  = RPCProtocol(proc.stdin, proc.stdout)

# Stream raw audio
with open("audio.raw","rb") as f:
    rpc.do("push-chunk", "utt1", body=f.read())
body, _ = rpc.do("get-final")

4. Concurrency & Model Pooling (gentle/kaldi_queue.py)

KaldiQueue
• Maintains a thread-safe pool of standard_kaldi instances.
• Pre-loads multiple ASR pipelines for parallel decoding.

Usage

from gentle.kaldi_queue import KaldiQueue
queue = KaldiQueue(resources=res, size=4)
worker = queue.get()
result = worker.transcribe_chunk(audio_bytes)
queue.put(worker)

5. High-Level Transcription API (gentle/full_transcriber.py)

FullTranscriber
• Coordinates KaldiQueue, handles WAV I/O, chunking, finalization.
• Returns a Transcription object with word timings and phonemes.

Sample

from gentle.full_transcriber import FullTranscriber
ft = FullTranscriber(resources=res, n_threads=4)
transcription = ft.transcribe("interview.wav")
print(transcription.to_json())

6. Alignment Pipeline

MetaSentence (gentle/metasentence.py)
• Normalizes and tokenizes transcript, maintains mapping to original offsets.
Diff‐Based Alignment (gentle/diff_align.py)
• Aligns Kaldi’s raw words to transcript tokens via difflib.
ForcedAligner (gentle/forced_aligner.py)
• Runs two‐pass alignment: coarse transcription + word‐level forced align.
• Applies AdjacencyOptimizer to fix boundary mismatches.
Multipass Realignment (gentle/multipass.py)
• Detects low-confidence segments, re-aligns in parallel for accuracy.

7. Language Model Utilities (gentle/language_model.py)

make_bigram_lm_fst
• Builds OpenFST text-format bigram LM from token sequences.
• Supports conservative OOV arcs and optional disfluency tokens.
make_bigram_language_model
• Wraps Kaldi’s mkgraph to produce binary G.fst.

8. Text & Data Structures (gentle/transcription.py)

Word
• Encapsulates word, start, duration, case, phones, text offsets.
Transcription
• Collections of Word objects.
• Serializes to JSON/CSV and supports trimming or concatenation.

Performance & Extension Points

Tweak queue size in KaldiQueue for CPU vs GPU throughput.
Add new alignment strategies by subclassing ForcedAligner or injecting passes into multipass.prepare_multipass.
Customize normalization by extending MetaSentence.kaldi_normalize.
Replace bigram LM with higher-order models by modifying language_model.
Optimize native tools via ext/Makefile flags (e.g. turn on CUDA or change -O levels).

Deployment & Configuration

This section covers containerized deployment, continuous integration, pretrained model setup, path resolution utilities, and server configuration. Follow these steps to run Gentle reliably in production or research pipelines and to customise performance and behaviour.

Containerized Deployment with Docker

Build and run Gentle as a Docker container to ensure consistency across environments.

Building the Docker Image

From your project root:

docker build -t lowerquality/gentle:latest .

Running the Container

docker run -d \
  --name gentle \
  -p 8765:8765 \
  -v /path/to/transcriptions:/gentle/webdata/transcriptions \
  -e GENTLE_DATA=/gentle/kaldi-models-0.04 \
  lowerquality/gentle:latest

Options:

-p 8765:8765 exposes the web server.
-v …:/gentle/webdata/transcriptions persists job data.
-e GENTLE_DATA points to model files inside the container.

Customising Resources and Ports

To adjust CPU/GPU usage or memory limits, append Docker flags:

docker run -d \
  --cpus="4" \
  --memory="8g" \
  -p 8765:8765 \
  lowerquality/gentle:latest

Continuous Integration with Travis CI

Automate builds, tests, and Docker image publishing via .travis.yml.

sudo: required
language: generic

services:
  - docker

install:
  - docker build -t lowerquality/gentle .

script:
  - docker run --rm lowerquality/gentle \
      sh -c 'cd /gentle && python3 setup.py test'

after_success:
  - if [ "$TRAVIS_BRANCH" == "master" ]; then
      docker login -u="$DOCKER_USERNAME" -p="$DOCKER_PASSWORD";
      docker push lowerquality/gentle:latest;
    fi

Steps:

Build the image in install.
Test inside a fresh container in script.
Push on master in after_success (requires DOCKER_USERNAME and DOCKER_PASSWORD in Travis settings).

For version tags:

  - if [ "$TRAVIS_TAG" != "" ]; then
      docker tag lowerquality/gentle:latest lowerquality/gentle:$TRAVIS_TAG;
      docker push lowerquality/gentle:$TRAVIS_TAG;
    fi

Pretrained Model Setup

Automate model download and unpacking with install_models.sh.

# At project root
bash install_models.sh

This script:

Downloads kaldi-models-0.04.zip
Unzips into ./kaldi-models-0.04/
Removes the ZIP archive

Set the environment variable so Gentle can locate models:

export GENTLE_DATA=/path/to/gentle/kaldi-models-0.04

Add this line to ~/.bashrc or ~/.zshrc for persistence.

Path Resolution Utilities

Use gentle/util/paths.py to locate binaries, resources, and data directories consistently.

from gentle.util.paths import get_binary, get_resource, get_datadir

# Locate FFmpeg binary (bundled or system)
ffmpeg_path = get_binary("ffmpeg")

# Locate an HTML template shipped with Gentle
template_path = get_resource("web/index.html")

# Locate the root data directory (models, graphs)
data_dir = get_datadir()

Functions:

get_binary(name): Returns full path to a command.
get_resource(relative_path): Returns path inside gentle/resources/.
get_datadir(): Returns model root directory (controlled by GENTLE_DATA).

Server Configuration and Performance Tuning

Run the Gentle web server directly or via Docker. You can tweak behaviour via environment variables and flags.

Running Locally

# Ensure GENTLE_DATA is set
python3 serve.py \
  --port 8765 \
  --threads 4 \
  --timeout 300

Available options:

--port: Web server port (default: 8765)
--threads: Number of worker threads for parallel transcription
--timeout: Max processing time (seconds) before aborting a job

Run in the background with logging:

nohup python3 serve.py > gentle.log 2>&1 &

Custom Flags for Behaviour

--disfluency: Keep filler words (uh, um) in alignments.
--conservative: Enforce stricter alignment (fewer insertions).
--cleanup: Remove intermediate files upon completion.

Monitoring and Persistence

Transcription jobs and logs reside under webdata/transcriptions/<job-id>/.
Use a mounted volume (-v /host/path:/gentle/webdata) to preserve data across restarts.
Monitor CPU and memory with container metrics or system tools to tune --threads and Docker resource limits.

With these steps, you can deploy Gentle in a reproducible, scalable manner and customise its behaviour to fit production or research workflows.

Development Guide

This guide walks you through setting up your development environment, building the C++ Kaldi extension, running tests, and applying coding standards.

1. Environment Setup

Initialize the repository, install dependencies, and build external components with a single command:

# Clone and initialize submodules
git clone https://github.com/strob/gentle.git
cd gentle
git submodule update --init --recursive

# Bootstrap everything: system deps, Kaldi, Python package
bash install.sh

install.sh performs:

install_deps.sh for system packages and python3 setup.py develop
ext/install_kaldi.sh to build Kaldi (tools + src) with static linking
make in ext and core components

2. Dependency Installation (`install_deps.sh`)

Installs OS-specific packages and sets up the Python package in “develop” mode.

sudo ./install_deps.sh

Key behaviors:

Aborts on error (set -e)
On Debian/Ubuntu: installs zlib, automake, libtool, Subversion, ATLAS, Python dev tools, wget, unzip, git, ffmpeg
On macOS: uses Homebrew to install ffmpeg, libtool, automake, autoconf, wget, python3
Registers the local package for editable installs (python3 setup.py develop)

Practical tip: Ensure universe is enabled on Ubuntu for ffmpeg, and that python3 points to your desired interpreter.

3. Building the Kaldi Extension (`ext/Makefile` + `ext/install_kaldi.sh`)

a. Prepare Kaldi

# Under ext/kaldi
cd ext/kaldi/tools
make clean && make
cd ../src
./configure --static --static-math=yes --static-fst=yes --use-cuda=no
make depend && make -j$(nproc)

Or automate:

bash ext/install_kaldi.sh

b. Configure `ext/Makefile`

KALDI_BASE: path to Kaldi root
```
KALDI_BASE = ext/kaldi/src/
```
CXXFLAGS / EXTRA_CXXFLAGS: include paths, optimizations
ADDLIBS: list of static Kaldi libraries in link order
CUDA toggle: pass CUDA=true to inject GPU flags

c. Build Binaries

cd ext
make all          # builds k3, m3 statically
make CUDA=true all  # include CUDA if supported

To add a new tool:

Append its name to BINFILES in ext/Makefile
Create ext/<tool>.cc; default rules will compile and link it.

4. Running Tests

gentle uses Python’s unittest framework. Run all tests after setup:

# From repo root
python3 -m unittest discover -v tests

Or run specific suites:

python3 -m unittest tests.base
python3 -m unittest tests.transcriber

tests/base.py checks core imports and resource paths.
tests/transcriber.py verifies forced alignment using embedded models.

Practical tip: ensure gentle.Resources().get_binary(...) resolves correctly before running transcriber tests.

5. Coding Standards (`pylintrc`)

Enforce consistent linting with the provided pylintrc:

pip install pylint
pylint --rcfile=pylintrc gentle serve.py align.py

Key setting:

[MESSAGES CONTROL]
disable=locally-disabled

– disallows ad-hoc # pylint: disable comments in code.

CI Integration (GitHub Actions example):

- name: Lint with Pylint
  run: |
    pip install pylint
    pylint --rcfile=pylintrc gentle serve.py align.py

Practical tip: for one-off rules, update pylintrc centrally instead of scattering disable comments.

Python API Reference

This section documents Gentle’s public Python classes and functions for programmatic integration, covering resource management, alignment, transcription, data models, and audio resampling.

Resources: Loading ASR Models and Lexicons

Provide acoustic models, language models and decoding graphs for Kaldi-based processing.

`class gentle.Resources`

Constructor

Resources(
    lang_dir: str = None,
    acoustic_model: str = None,
    graph: str = None,
    lexicon: str = None
)

Parameters

lang_dir (str, optional): Path to model/ directory containing conf/, data/, etc.
acoustic_model (str, optional): Path to neural network model (final.mdl).
graph (str, optional): Path to decoding graph (HCLG.fst).
lexicon (str, optional): Path to pronunciation lexicon (lexicon.txt).

Attributes

self.lang_dir, self.acoustic_model, self.graph, self.lexicon
self.word_symbols (dict): maps word strings to symbol IDs.
self.vocab (list): vocabulary from lexicon.

Example

from gentle import Resources

# Load default bundled models (downloads on first use)
resources = Resources()

# Or point to custom model paths
resources = Resources(
    lang_dir="/models/english/lang",
    acoustic_model="/models/english/final.mdl",
    graph="/models/english/HCLG.fst",
    lexicon="/models/english/lexicon.txt"
)

ForcedAligner: Word-Level Alignment

Align a known transcript to audio with multi-pass refinement and ambiguity handling.

`class gentle.ForcedAligner`

Constructor

ForcedAligner(
    resources: Resources,
    transcript: str,
    nthreads: int = 4,
    **kwargs
)

Parameters

resources: gentle.Resources instance.
transcript: Full transcript text to align.
nthreads: Number of Kaldi threads.
**kwargs: Passed to language model and diff alignment (e.g., acoustic_scale, beam).

Method

transcribe(
    wav_path: str,
    progress_cb: Callable = None,
    logging: ModuleType = None
) -> gentle.transcription.Transcription

Example

from gentle import Resources, ForcedAligner
import logging

resources = Resources()
text = "This is a sample transcript to align."
aligner = ForcedAligner(resources, text, nthreads=2, acoustic_scale=1.0)

def progress_cb(update):
    print(update)  # {'percent': float, 'message': str}

logging.getLogger().setLevel(logging.INFO)

transcription = aligner.transcribe(
    "input.wav",
    progress_cb=progress_cb,
    logging=logging
)
print(transcription.to_json())

FullTranscriber: End-to-End Speech-to-Text

Perform full unconstrained transcription on WAV files using Kaldi models.

`class gentle.FullTranscriber`

Constructor

FullTranscriber(
    resources: Resources,
    nthreads: int = 4,
    **kwargs
)

Parameters

resources: gentle.Resources instance.
nthreads: Number of parallel decoding threads.
**kwargs: Options for ASR decoding (e.g., beam, max_active).

Method

transcribe(
    wav_path: str,
    progress_cb: Callable = None
) -> gentle.transcription.Transcription

Example

from gentle import Resources, FullTranscriber

resources = Resources()
transcriber = FullTranscriber(resources, nthreads=4, beam=15.0)

# Process full transcription
transcription = transcriber.transcribe(
    "long_recording.wav",
    progress_cb=lambda u: print(f"{u['percent']*100:.1f}%")
)
print(transcription.to_csv())

Transcriber: Chunked Multi-Threaded Processing

Efficiently transcribe long audio by splitting into overlapping chunks.

`class gentle.Transcriber`

Constructor

Transcriber(
    resources: Resources,
    nthreads: int = None,
    chunk_len: float = 30.0,
    overlap: float = 1.0,
    **kwargs
)

Parameters

resources: gentle.Resources instance.
nthreads: Number of worker threads (defaults to CPU count).
chunk_len: Duration (s) of each segment.
overlap: Overlap (s) between segments.
**kwargs: Decoding options (e.g., max_active, beam).

Method

transcribe(
    wav_path: str,
    progress_cb: Callable = None
) -> List[gentle.transcription.Word]

Example

from gentle import Resources, Transcriber

resources = Resources()
tx = Transcriber(resources, nthreads=8, chunk_len=20.0, overlap=0.5)

words = tx.transcribe("meeting.wav")
for w in words:
    print(f"{w.start:.2f}-{w.end:.2f}: {w.word}")

Transcription Data Models

Represent and serialize transcription results.

`class gentle.transcription.Word`

Fields

word (str): Original token.
alignedWord (str): Normalized form.
start, end (float): Timing in seconds.
startOffset, endOffset (int): Character offsets.
case (str): 'success' or 'not-found-in-audio'.
phones (List[dict]): Phone-level alignments.

`class gentle.transcription.Transcription`

Constructor

Transcription(
    transcript: str,
    words: List[Word]
)

Methods

to_json() -> str
to_csv() -> str
adjust(offset: float): Shift all timings by offset seconds.

Example

from gentle.transcription import Transcription, Word

# Build from aligned words
words = [Word(word="Hello", start=0.0, end=0.5, ...), ...]
tx = Transcription("Hello world", words)

print(tx.to_json())

Audio Resampling Utility

Convert media to single-channel, 16-bit, 8 kHz WAV for ASR.

`resample(infile, outfile, offset=None, duration=None)`

Convert in place. Returns exit code (0 on success).
Raises IOError if infile missing, RuntimeError on conversion failure.

Parameters

infile (str)
outfile (str)
offset (float, optional)
duration (float, optional)

Example

from gentle.resample import resample

# Full conversion
resample("episode.mp3", "episode_8k.wav")

# Clip 10s from 30s mark
resample("episode.mp3", "clip.wav", offset=30.0, duration=10.0)

`resampled(infile, offset=None, duration=None)`

Context manager yielding a temp WAV path; auto-deletes on exit.
Raises RuntimeError on failure.

Example

from gentle.resample import resampled
from gentle import ForcedAligner, Resources

resources = Resources()
with resampled("video.mov", duration=15.0) as wav:
    aligner = ForcedAligner(resources, "Transcript text")
    tx = aligner.transcribe(wav)
    print(tx.to_csv())

Documentation

Contents

Quick Actions

Contents

Quick Actions

Chat about this codebase

Chat about this codebase

Project Overview

Key Features

Usage Interfaces

1. Web UI

2. REST API

3. Command-Line Interface

4. Python Integration

Getting Started

1. Docker Quickstart

2. Native Install (Linux/macOS)

Prerequisites

Steps

3. Downloading Models

4. First Alignment

Command-Line

REST API

Inspecting Results

Usage Guide

align.py Command-Line Interface

Essential Options

Examples

JSON Output

Tips

Synchronous Transcription via cURL

Request

Response

Tips

Asynchronous Transcription & Polling REST API

1. Submit Job

2. Poll Status

3. Download Results

Web Interface Upload and Alignment

Interactive Alignment Viewer

Key JavaScript Snippets

Tips

Python API: ForcedAligner.transcribe

1. Setup Resources

2. Instantiate ForcedAligner

3. Run Alignment

Result Attributes

Advanced Configuration

Core Concepts & Architecture

1. Native Decoding & Graph Construction (ext/)

2. Resource Management (gentle/resources.py)

3. RPC & Streaming Interface (gentle/rpc.py & standard_kaldi.py)

4. Concurrency & Model Pooling (gentle/kaldi_queue.py)

5. High-Level Transcription API (gentle/full_transcriber.py)

6. Alignment Pipeline

7. Language Model Utilities (gentle/language_model.py)

8. Text & Data Structures (gentle/transcription.py)

Performance & Extension Points

Deployment & Configuration

Containerized Deployment with Docker

Building the Docker Image

Running the Container

Customising Resources and Ports

Continuous Integration with Travis CI

Pretrained Model Setup

Path Resolution Utilities

Server Configuration and Performance Tuning

Running Locally

Custom Flags for Behaviour

Monitoring and Persistence

Development Guide

1. Environment Setup

2. Dependency Installation (install_deps.sh)

3. Building the Kaldi Extension (ext/Makefile + ext/install_kaldi.sh)

a. Prepare Kaldi

b. Configure ext/Makefile

c. Build Binaries

4. Running Tests

5. Coding Standards (pylintrc)

Python API Reference

2. Dependency Installation (`install_deps.sh`)

3. Building the Kaldi Extension (`ext/Makefile` + `ext/install_kaldi.sh`)

b. Configure `ext/Makefile`

5. Coding Standards (`pylintrc`)

`class gentle.Resources`

`class gentle.ForcedAligner`

`class gentle.FullTranscriber`

`class gentle.Transcriber`

`class gentle.transcription.Word`

`class gentle.transcription.Transcription`

`resample(infile, outfile, offset=None, duration=None)`

`resampled(infile, offset=None, duration=None)`