Project Overview
Gentle is a flexible, Kaldi-based forced-aligner that synchronizes spoken audio with its transcript at the word level. It eliminates manual timestamping in subtitle creation, linguistic research, podcast editing and any workflow that needs precise speech-text alignment. Gentle runs entirely locally, supports large audio files, and exposes multiple interfaces for integration.
Key Features
- Word-level timestamps with confidence scores
- No external services—runs offline on Mac, Linux or Windows (via Docker)
- Modular design for custom Kaldi models and pipelines
Usage Interfaces
1. Web UI
Deliver a user-friendly browser interface.
- Defaults to http://localhost:8765 after installation
- Drag-and-drop audio and transcript files
- Visualize alignments with waveform and word highlights
2. REST API
Automate batch processing or integrate into services.
Endpoint
POST http://localhost:8765/transcriptions?async=[true|false]
Required multipart/form-data fields:
- audio: MP3, WAV, etc.
- transcript: Plain-text transcript
Example (synchronous)
curl -F "audio=@lecture.mp3" \
-F "transcript=@script.txt" \
"http://localhost:8765/transcriptions?async=false"
Response JSON includes an array of words with start/end timestamps and confidence.
3. Command-Line Interface
Quick alignment from your shell or in scripts.
Basic usage
python3 align.py path/to/audio.wav path/to/transcript.txt
Common options
--json
output full JSON to stdout--consonant
tune alignment sensitivity--quiet
suppress progress logs
Run python3 align.py --help
for all flags.
4. Python Integration
Embed Gentle in custom Python workflows.
Install from source or pip install gentle
then:
from gentle import Resources, force_align
# Initialize models and metadata
resources = Resources()
# Align audio and transcript
alignment = force_align(
audio='podcast.mp3',
transcript='podcast.txt',
resources=resources
)
# Iterate aligned words
for w in alignment.words:
print(f"{w.alignedWord}: {w.start:.2f}s–{w.end:.2f}s (conf={w.confidence:.2f})")
Use this API to build batch jobs, real-time analysis tools or custom GUIs.
Getting Started
This guide shows how to launch a Gentle instance in under 10 minutes—either via Docker or a native install—download the models, and perform your first alignment.
1. Docker Quickstart
Clone the repo and build the image
git clone https://github.com/lowerquality/gentle.git cd gentle docker build -t gentle:latest .
Run the container (CPU)
docker run -d \ --name gentle \ -p 8765:8765 \ -v $(pwd)/webdata:/gentle/webdata \ gentle:latest
Verify the service
- Open http://localhost:8765 in your browser
- Or test the REST API:
curl -F "audio=@path/to/audio.wav" \ -F "transcript=@path/to/transcript.txt" \ "http://localhost:8765/transcriptions?async=false"
2. Native Install (Linux/macOS)
Prerequisites
- Python 3.10
- ffmpeg, git, build tools (zlib1g-dev, automake, autoconf, libtool, subversion, wget, unzip)
- Optionally: pipenv
Steps
Clone and enter repo
git clone https://github.com/lowerquality/gentle.git cd gentle
Install OS dependencies and link Python package
# Linux ./install_deps.sh # macOS ./install_deps.sh
Set up the project (submodules, Kaldi build, Python dev install)
./install.sh
Verify installation
which serve.py # should point to your checkout python3 -c "import gentle; print(gentle.__version__)"
3. Downloading Models
Gentle requires Kaldi models (version 0.04) for alignment. From the project root:
./install_models.sh
This unpacks models into exp/
(native) or /gentle/models
(Docker image). To customize the path:
export GENTLE_RESOURCES=/path/to/exp
4. First Alignment
Command-Line
Align speech.wav
to transcript.txt
:
python3 align.py \
--nthreads 4 \
--disfluency \
-o aligned.json \
speech.wav \
transcript.txt
REST API
Submit an HTTP POST:
curl -X POST \
-F 'audio=@speech.wav' \
-F 'transcript=@transcript.txt' \
'http://localhost:8765/transcriptions?async=false' \
-o result.json
Inspecting Results
Open the JSON to see word-level timings:
{
"words": [
{
"word": "hello",
"start": 0.52,
"end": 0.90,
"case": "success"
},
…
]
}
You now have a running Gentle server and a basic alignment workflow. Proceed to the API Reference or Advanced Usage for fine-tuning.
Usage Guide
This guide covers all supported interfaces for Gentle: command-line, REST API (synchronous and asynchronous), web UI, interactive viewer, and Python API.
align.py Command-Line Interface
Force-align a transcript to audio and produce JSON timing data.
Essential Options
- audiofile (positional): input audio (any FFmpeg-supported format)
- txtfile (positional): UTF-8 plain-text transcript
- -o, --output
: write JSON to <file>
(default: stdout) - --nthreads
: number of alignment threads (default: CPU count) - --conservative: skip low-confidence matches
- --disfluency: include filler words (
uh
,um
) - --log
: logging level ( DEBUG
,INFO
,WARNING
,ERROR
,CRITICAL
)
Examples
Run basic alignment and save to aligned.json
python align.py speech.wav transcript.txt -o aligned.json
Use 4 threads with debug logging
python align.py speech.wav transcript.txt --nthreads 4 --log DEBUG
Enable conservative matching and disfluencies
python align.py speech.wav transcript.txt -o out.json --conservative --disfluency
JSON Output
{
"words": [
{"word":"Hello","start":0.12,"end":0.45,"case":"success"},
…
],
"transcript":"Hello world…"
}
start
,end
: timestamps in secondscase
:success
,not-found-in-audio
, etc.
Tips
- Increase
--nthreads
for long audio. - Use
--conservative
with high-quality transcripts. - Use
--disfluency
when filler timings matter.
Synchronous Transcription via cURL
Invoke /transcriptions
in blocking mode to get alignment JSON immediately.
Request
curl -X POST http://localhost:8765/transcriptions \
-F 'audio=@/path/to/file.wav' \
-F 'transcript=< /path/to/transcript.txt' \
-F 'async=false' \
-F 'disfluency' \
-F 'conservative'
Response
- HTTP 200 with full alignment JSON
- Errors appear in JSON
error
field
Tips
- For long files, use
async=true
and poll status. - Download CSV, HTML viewer or ZIP via
/transcriptions/<uid>/align.csv
,/transcriptions/<uid>/index.html
,/zip/<uid>.zip
.
Asynchronous Transcription & Polling REST API
Submit a job, poll its status, then fetch results.
1. Submit Job
curl -X POST http://localhost:8765/transcriptions \
-F 'audio=@/path/to/file.mp3' \
-F 'transcript=' \
-F 'async=true' \
-o submit.json
Response (submit.json
):
{"uid":"abcd1234","status":"Queued"}
2. Poll Status
while true; do
curl http://localhost:8765/transcriptions/abcd1234/status.json | jq .
sleep 5
done
Status becomes "OK"
when done.
3. Download Results
curl -O http://localhost:8765/transcriptions/abcd1234/align.json
curl -O http://localhost:8765/transcriptions/abcd1234/align.csv
curl -O http://localhost:8765/zip/abcd1234.zip
Web Interface Upload and Alignment
Use the browser UI at http://localhost:8765/
for quick alignment.
- Open
index.html
. - Select audio and transcript files.
- Toggle Conservative and Disfluency as needed.
- Click Submit.
- Monitor progress; click the returned link to open the interactive viewer or download ZIP.
Interactive Alignment Viewer
Open view_alignment.html
(or /transcriptions/<uid>/index.html
) to play audio and see live word/phoneme highlighting.
Key JavaScript Snippets
Render transcript spans with timing:
function render(words, transcript) {
const $trans = document.getElementById("transcript");
let offset = 0;
words.forEach(w => {
// add text between words
$trans.appendChild(document.createTextNode(
transcript.slice(offset, w.startOffset)
));
// word span
const span = document.createElement("span");
span.textContent = transcript.slice(w.startOffset, w.endOffset);
span.dataset.start = w.start;
span.onclick = () => {
audio.currentTime = w.start;
audio.play();
};
w.$span = span;
$trans.appendChild(span);
offset = w.endOffset;
});
}
Highlight active word and phoneme:
function syncPlayback() {
const t = audio.currentTime;
let active;
words.forEach(w => {
if (t >= w.start && t < w.end) active = w;
});
if (active && current !== active) {
document.querySelectorAll(".active").forEach(el => el.classList.remove("active"));
active.$span.classList.add("active");
renderPhonemes(active);
current = active;
}
highlightPhoneme(t);
requestAnimationFrame(syncPlayback);
}
requestAnimationFrame(syncPlayback);
Tips
- Serve
align.json
andview_alignment.html
from the same directory. - Customize
.active
,.phone
,.phactive
in CSS for your theme.
Python API: ForcedAligner.transcribe
Programmatically align transcripts in Python.
1. Setup Resources
from gentle import Resources
resources = Resources(lang_dir="/path/to/langdir")
2. Instantiate ForcedAligner
from gentle import ForcedAligner
transcript = "the quick brown fox jumps over the lazy dog"
aligner = ForcedAligner(
resources,
transcript,
nthreads=4,
acoustic_scale=0.1,
beam=10.0,
retry_beam=40.0
)
3. Run Alignment
import logging
logger = logging.getLogger("gentle")
def progress_cb(state):
print("Progress:", state)
transcription = aligner.transcribe(
wavfile="audio.wav",
progress_cb=progress_cb,
logging=logger
)
Result Attributes
transcription.transcript
: normalized transcript stringtranscription.words
: list ofWord
objects with:word
,start
,end
,case
phones
: phone-level timings
Advanced Configuration
aligner = ForcedAligner(
resources, transcript,
insertion_penalty=1.5,
deletion_penalty=1.0,
substitution_penalty=2.0
)
Tune penalties or beam widths to balance speed and accuracy.
Core Concepts & Architecture
Gentle separates speech‐recognition into layered components—from native Kaldi binaries to high-level Python orchestration. This section outlines the main modules, their interactions, performance considerations and extension points.
1. Native Decoding & Graph Construction (ext/)
- k3.cc
• Performs online decoding: feature extraction, neural‐network scoring, beam search, endpoint detection.
• Exposes commands:push-chunk
,reset
,get-final
.
• Real-time focus: low-latency chunk processing and partial results. - m3.cc
• Builds the HCLG decoding graph via FST composition (context, lexicon, LM), determinization, minimization and self-loops.
• Reads Kaldi models and grammar FST; writes optimized FST. - Makefile
• Centralizes Kaldi include/lib paths, CXXFLAGS, CUDA options.
• Auto-generates build rules fork3
,m3
and additional.cc
tools.
• Extension point: drop new<tool>.cc
, add name toBINFILES
.
2. Resource Management (gentle/resources.py)
- Resources
• Initializes paths to acoustic models, HCLG graph, lexicon, vocabulary.
• Loads symbol tables (words.txt
,phones.txt
).
• Use once per process and share across threads.
Example:
from gentle.resources import Resources
res = Resources(model_root="models/kaldi")
print(res.hclg_path, res.word_table.get("hello"))
3. RPC & Streaming Interface (gentle/rpc.py & standard_kaldi.py)
- RPCProtocol
• Frames mixed text/binary requests over pipes tok3
.
•do(method, *args, body=bytes)
→(body, status)
or raisesRPCError
. - StandardKaldi
• WrapsRPCProtocol
for high-level methods:push_chunk()
,finalize()
, parsing JSON‐like word/phone streams.
Example:
from subprocess import Popen, PIPE
from gentle.rpc import RPCProtocol
from gentle.util.paths import get_binary
exe = get_binary("ext/k3")
proc = Popen([exe, res.nnet_dir, res.hclg_path], stdin=PIPE, stdout=PIPE)
rpc = RPCProtocol(proc.stdin, proc.stdout)
# Stream raw audio
with open("audio.raw","rb") as f:
rpc.do("push-chunk", "utt1", body=f.read())
body, _ = rpc.do("get-final")
4. Concurrency & Model Pooling (gentle/kaldi_queue.py)
- KaldiQueue
• Maintains a thread-safe pool ofstandard_kaldi
instances.
• Pre-loads multiple ASR pipelines for parallel decoding. - Usage
from gentle.kaldi_queue import KaldiQueue queue = KaldiQueue(resources=res, size=4) worker = queue.get() result = worker.transcribe_chunk(audio_bytes) queue.put(worker)
5. High-Level Transcription API (gentle/full_transcriber.py)
- FullTranscriber
• CoordinatesKaldiQueue
, handles WAV I/O, chunking, finalization.
• Returns aTranscription
object with word timings and phonemes. - Sample
from gentle.full_transcriber import FullTranscriber ft = FullTranscriber(resources=res, n_threads=4) transcription = ft.transcribe("interview.wav") print(transcription.to_json())
6. Alignment Pipeline
- MetaSentence (
gentle/metasentence.py
)
• Normalizes and tokenizes transcript, maintains mapping to original offsets. - Diff‐Based Alignment (
gentle/diff_align.py
)
• Aligns Kaldi’s raw words to transcript tokens viadifflib
. - ForcedAligner (
gentle/forced_aligner.py
)
• Runs two‐pass alignment: coarse transcription + word‐level forced align.
• AppliesAdjacencyOptimizer
to fix boundary mismatches. - Multipass Realignment (
gentle/multipass.py
)
• Detects low-confidence segments, re-aligns in parallel for accuracy.
7. Language Model Utilities (gentle/language_model.py)
- make_bigram_lm_fst
• Builds OpenFST text-format bigram LM from token sequences.
• Supports conservative OOV arcs and optional disfluency tokens. - make_bigram_language_model
• Wraps Kaldi’smkgraph
to produce binary G.fst.
8. Text & Data Structures (gentle/transcription.py)
- Word
• Encapsulatesword
,start
,duration
,case
,phones
, text offsets. - Transcription
• Collections ofWord
objects.
• Serializes to JSON/CSV and supports trimming or concatenation.
Performance & Extension Points
- Tweak queue size in
KaldiQueue
for CPU vs GPU throughput. - Add new alignment strategies by subclassing
ForcedAligner
or injecting passes intomultipass.prepare_multipass
. - Customize normalization by extending
MetaSentence.kaldi_normalize
. - Replace bigram LM with higher-order models by modifying
language_model
. - Optimize native tools via
ext/Makefile
flags (e.g. turn on CUDA or change-O
levels).
Deployment & Configuration
This section covers containerized deployment, continuous integration, pretrained model setup, path resolution utilities, and server configuration. Follow these steps to run Gentle reliably in production or research pipelines and to customise performance and behaviour.
Containerized Deployment with Docker
Build and run Gentle as a Docker container to ensure consistency across environments.
Building the Docker Image
From your project root:
docker build -t lowerquality/gentle:latest .
Running the Container
docker run -d \
--name gentle \
-p 8765:8765 \
-v /path/to/transcriptions:/gentle/webdata/transcriptions \
-e GENTLE_DATA=/gentle/kaldi-models-0.04 \
lowerquality/gentle:latest
Options:
-p 8765:8765
exposes the web server.-v …:/gentle/webdata/transcriptions
persists job data.-e GENTLE_DATA
points to model files inside the container.
Customising Resources and Ports
To adjust CPU/GPU usage or memory limits, append Docker flags:
docker run -d \
--cpus="4" \
--memory="8g" \
-p 8765:8765 \
lowerquality/gentle:latest
Continuous Integration with Travis CI
Automate builds, tests, and Docker image publishing via .travis.yml
.
sudo: required
language: generic
services:
- docker
install:
- docker build -t lowerquality/gentle .
script:
- docker run --rm lowerquality/gentle \
sh -c 'cd /gentle && python3 setup.py test'
after_success:
- if [ "$TRAVIS_BRANCH" == "master" ]; then
docker login -u="$DOCKER_USERNAME" -p="$DOCKER_PASSWORD";
docker push lowerquality/gentle:latest;
fi
Steps:
- Build the image in
install
. - Test inside a fresh container in
script
. - Push on
master
inafter_success
(requiresDOCKER_USERNAME
andDOCKER_PASSWORD
in Travis settings).
For version tags:
- if [ "$TRAVIS_TAG" != "" ]; then
docker tag lowerquality/gentle:latest lowerquality/gentle:$TRAVIS_TAG;
docker push lowerquality/gentle:$TRAVIS_TAG;
fi
Pretrained Model Setup
Automate model download and unpacking with install_models.sh
.
# At project root
bash install_models.sh
This script:
- Downloads
kaldi-models-0.04.zip
- Unzips into
./kaldi-models-0.04/
- Removes the ZIP archive
Set the environment variable so Gentle can locate models:
export GENTLE_DATA=/path/to/gentle/kaldi-models-0.04
Add this line to ~/.bashrc
or ~/.zshrc
for persistence.
Path Resolution Utilities
Use gentle/util/paths.py
to locate binaries, resources, and data directories consistently.
from gentle.util.paths import get_binary, get_resource, get_datadir
# Locate FFmpeg binary (bundled or system)
ffmpeg_path = get_binary("ffmpeg")
# Locate an HTML template shipped with Gentle
template_path = get_resource("web/index.html")
# Locate the root data directory (models, graphs)
data_dir = get_datadir()
Functions:
get_binary(name)
: Returns full path to a command.get_resource(relative_path)
: Returns path insidegentle/resources/
.get_datadir()
: Returns model root directory (controlled byGENTLE_DATA
).
Server Configuration and Performance Tuning
Run the Gentle web server directly or via Docker. You can tweak behaviour via environment variables and flags.
Running Locally
# Ensure GENTLE_DATA is set
python3 serve.py \
--port 8765 \
--threads 4 \
--timeout 300
Available options:
--port
: Web server port (default: 8765)--threads
: Number of worker threads for parallel transcription--timeout
: Max processing time (seconds) before aborting a job
Run in the background with logging:
nohup python3 serve.py > gentle.log 2>&1 &
Custom Flags for Behaviour
--disfluency
: Keep filler words (uh
,um
) in alignments.--conservative
: Enforce stricter alignment (fewer insertions).--cleanup
: Remove intermediate files upon completion.
Monitoring and Persistence
- Transcription jobs and logs reside under
webdata/transcriptions/<job-id>/
. - Use a mounted volume (
-v /host/path:/gentle/webdata
) to preserve data across restarts. - Monitor CPU and memory with container metrics or system tools to tune
--threads
and Docker resource limits.
With these steps, you can deploy Gentle in a reproducible, scalable manner and customise its behaviour to fit production or research workflows.
Development Guide
This guide walks you through setting up your development environment, building the C++ Kaldi extension, running tests, and applying coding standards.
1. Environment Setup
Initialize the repository, install dependencies, and build external components with a single command:
# Clone and initialize submodules
git clone https://github.com/strob/gentle.git
cd gentle
git submodule update --init --recursive
# Bootstrap everything: system deps, Kaldi, Python package
bash install.sh
install.sh performs:
install_deps.sh
for system packages andpython3 setup.py develop
ext/install_kaldi.sh
to build Kaldi (tools + src) with static linkingmake
in ext and core components
2. Dependency Installation (install_deps.sh
)
Installs OS-specific packages and sets up the Python package in “develop” mode.
sudo ./install_deps.sh
Key behaviors:
- Aborts on error (
set -e
) - On Debian/Ubuntu: installs zlib, automake, libtool, Subversion, ATLAS, Python dev tools, wget, unzip, git, ffmpeg
- On macOS: uses Homebrew to install ffmpeg, libtool, automake, autoconf, wget, python3
- Registers the local package for editable installs (
python3 setup.py develop
)
Practical tip: Ensure universe
is enabled on Ubuntu for ffmpeg, and that python3
points to your desired interpreter.
3. Building the Kaldi Extension (ext/Makefile
+ ext/install_kaldi.sh
)
a. Prepare Kaldi
# Under ext/kaldi
cd ext/kaldi/tools
make clean && make
cd ../src
./configure --static --static-math=yes --static-fst=yes --use-cuda=no
make depend && make -j$(nproc)
Or automate:
bash ext/install_kaldi.sh
b. Configure ext/Makefile
- KALDI_BASE: path to Kaldi root
KALDI_BASE = ext/kaldi/src/
- CXXFLAGS / EXTRA_CXXFLAGS: include paths, optimizations
- ADDLIBS: list of static Kaldi libraries in link order
- CUDA toggle: pass
CUDA=true
to inject GPU flags
c. Build Binaries
cd ext
make all # builds k3, m3 statically
make CUDA=true all # include CUDA if supported
To add a new tool:
- Append its name to
BINFILES
inext/Makefile
- Create
ext/<tool>.cc
; default rules will compile and link it.
4. Running Tests
gentle uses Python’s unittest framework. Run all tests after setup:
# From repo root
python3 -m unittest discover -v tests
Or run specific suites:
python3 -m unittest tests.base
python3 -m unittest tests.transcriber
tests/base.py
checks core imports and resource paths.tests/transcriber.py
verifies forced alignment using embedded models.
Practical tip: ensure gentle.Resources().get_binary(...)
resolves correctly before running transcriber tests.
5. Coding Standards (pylintrc
)
Enforce consistent linting with the provided pylintrc
:
pip install pylint
pylint --rcfile=pylintrc gentle serve.py align.py
Key setting:
[MESSAGES CONTROL]
disable=locally-disabled
– disallows ad-hoc # pylint: disable
comments in code.
CI Integration (GitHub Actions example):
- name: Lint with Pylint
run: |
pip install pylint
pylint --rcfile=pylintrc gentle serve.py align.py
Practical tip: for one-off rules, update pylintrc
centrally instead of scattering disable comments.
Python API Reference
This section documents Gentle’s public Python classes and functions for programmatic integration, covering resource management, alignment, transcription, data models, and audio resampling.
Resources: Loading ASR Models and Lexicons
Provide acoustic models, language models and decoding graphs for Kaldi-based processing.
class gentle.Resources
Constructor
Resources(
lang_dir: str = None,
acoustic_model: str = None,
graph: str = None,
lexicon: str = None
)
Parameters
lang_dir
(str, optional): Path tomodel/
directory containingconf/
,data/
, etc.acoustic_model
(str, optional): Path to neural network model (final.mdl
).graph
(str, optional): Path to decoding graph (HCLG.fst
).lexicon
(str, optional): Path to pronunciation lexicon (lexicon.txt
).
Attributes
self.lang_dir
,self.acoustic_model
,self.graph
,self.lexicon
self.word_symbols
(dict): maps word strings to symbol IDs.self.vocab
(list): vocabulary from lexicon.
Example
from gentle import Resources
# Load default bundled models (downloads on first use)
resources = Resources()
# Or point to custom model paths
resources = Resources(
lang_dir="/models/english/lang",
acoustic_model="/models/english/final.mdl",
graph="/models/english/HCLG.fst",
lexicon="/models/english/lexicon.txt"
)
ForcedAligner: Word-Level Alignment
Align a known transcript to audio with multi-pass refinement and ambiguity handling.
class gentle.ForcedAligner
Constructor
ForcedAligner(
resources: Resources,
transcript: str,
nthreads: int = 4,
**kwargs
)
Parameters
resources
: gentle.Resources instance.transcript
: Full transcript text to align.nthreads
: Number of Kaldi threads.**kwargs
: Passed to language model and diff alignment (e.g.,acoustic_scale
,beam
).
Method
transcribe(
wav_path: str,
progress_cb: Callable = None,
logging: ModuleType = None
) -> gentle.transcription.Transcription
Example
from gentle import Resources, ForcedAligner
import logging
resources = Resources()
text = "This is a sample transcript to align."
aligner = ForcedAligner(resources, text, nthreads=2, acoustic_scale=1.0)
def progress_cb(update):
print(update) # {'percent': float, 'message': str}
logging.getLogger().setLevel(logging.INFO)
transcription = aligner.transcribe(
"input.wav",
progress_cb=progress_cb,
logging=logging
)
print(transcription.to_json())
FullTranscriber: End-to-End Speech-to-Text
Perform full unconstrained transcription on WAV files using Kaldi models.
class gentle.FullTranscriber
Constructor
FullTranscriber(
resources: Resources,
nthreads: int = 4,
**kwargs
)
Parameters
resources
: gentle.Resources instance.nthreads
: Number of parallel decoding threads.**kwargs
: Options for ASR decoding (e.g.,beam
,max_active
).
Method
transcribe(
wav_path: str,
progress_cb: Callable = None
) -> gentle.transcription.Transcription
Example
from gentle import Resources, FullTranscriber
resources = Resources()
transcriber = FullTranscriber(resources, nthreads=4, beam=15.0)
# Process full transcription
transcription = transcriber.transcribe(
"long_recording.wav",
progress_cb=lambda u: print(f"{u['percent']*100:.1f}%")
)
print(transcription.to_csv())
Transcriber: Chunked Multi-Threaded Processing
Efficiently transcribe long audio by splitting into overlapping chunks.
class gentle.Transcriber
Constructor
Transcriber(
resources: Resources,
nthreads: int = None,
chunk_len: float = 30.0,
overlap: float = 1.0,
**kwargs
)
Parameters
resources
: gentle.Resources instance.nthreads
: Number of worker threads (defaults to CPU count).chunk_len
: Duration (s) of each segment.overlap
: Overlap (s) between segments.**kwargs
: Decoding options (e.g.,max_active
,beam
).
Method
transcribe(
wav_path: str,
progress_cb: Callable = None
) -> List[gentle.transcription.Word]
Example
from gentle import Resources, Transcriber
resources = Resources()
tx = Transcriber(resources, nthreads=8, chunk_len=20.0, overlap=0.5)
words = tx.transcribe("meeting.wav")
for w in words:
print(f"{w.start:.2f}-{w.end:.2f}: {w.word}")
Transcription Data Models
Represent and serialize transcription results.
class gentle.transcription.Word
Fields
word
(str): Original token.alignedWord
(str): Normalized form.start
,end
(float): Timing in seconds.startOffset
,endOffset
(int): Character offsets.case
(str):'success'
or'not-found-in-audio'
.phones
(List[dict]): Phone-level alignments.
class gentle.transcription.Transcription
Constructor
Transcription(
transcript: str,
words: List[Word]
)
Methods
to_json() -> str
to_csv() -> str
adjust(offset: float)
: Shift all timings byoffset
seconds.
Example
from gentle.transcription import Transcription, Word
# Build from aligned words
words = [Word(word="Hello", start=0.0, end=0.5, ...), ...]
tx = Transcription("Hello world", words)
print(tx.to_json())
Audio Resampling Utility
Convert media to single-channel, 16-bit, 8 kHz WAV for ASR.
resample(infile, outfile, offset=None, duration=None)
Convert in place. Returns exit code (0 on success).
Raises IOError
if infile
missing, RuntimeError
on conversion failure.
Parameters
infile
(str)outfile
(str)offset
(float, optional)duration
(float, optional)
Example
from gentle.resample import resample
# Full conversion
resample("episode.mp3", "episode_8k.wav")
# Clip 10s from 30s mark
resample("episode.mp3", "clip.wav", offset=30.0, duration=10.0)
resampled(infile, offset=None, duration=None)
Context manager yielding a temp WAV path; auto-deletes on exit.
Raises RuntimeError
on failure.
Example
from gentle.resample import resampled
from gentle import ForcedAligner, Resources
resources = Resources()
with resampled("video.mov", duration=15.0) as wav:
aligner = ForcedAligner(resources, "Transcript text")
tx = aligner.transcribe(wav)
print(tx.to_csv())