timesler/facenet-pytorch Documentation - Complete Guide & API Reference

Project Overview

Face Recognition Overview

Facenet-PyTorch delivers an end-to-end face detection and recognition pipeline built on PyTorch. It combines an efficient MTCNN detector with pretrained Inception Resnet V1 models to simplify building applications that need accurate face embeddings, classification, and fine-tuning.

Purpose

Provide developers with a ready-to-use toolkit for:

Detecting faces in images and video streams
Generating 512-dimensional face embeddings
Classifying or clustering identities
Fine-tuning models on custom datasets
Deploying production-ready pipelines via Docker and example notebooks

Core Features

MTCNN Face Detector
Multi-stage convolutional network for fast, accurate face and landmark detection.
Inception Resnet V1 Embedding
Pretrained on VGGFace2 or CASIA-Webface; outputs 512-d embeddings for recognition.
Recognition Pipelines
Tools for nearest-neighbor classification, clustering, and threshold-based identification.
Fine-tuning Support
Easy integration into training loops to adapt models to new identities.
Video & Batch Processing
Real-time face tracking and batch inference utilities for streams and folders.
Docker & Notebooks
Example Jupyter notebooks and Dockerfiles accelerate deployment and reproducibility.

Typical Use-Cases

Access Control & Security
Real-time identification at checkpoints or entry systems.
Attendance Systems
Automated logging of participants in classrooms, events, or workplaces.
Photo Management
Face clustering and tagging in large image collections.
Custom Authentication
Integrate with mobile or web apps for biometric login.
Research & Prototyping
Benchmark new architectures or datasets using a standardized pipeline.

Quick Start Example

from PIL import Image
from facenet_pytorch import MTCNN, InceptionResnetV1

# Initialize detector and embedding model
mtcnn = MTCNN(keep_all=False, device='cuda')
resnet = InceptionResnetV1(pretrained='vggface2').eval().to('cuda')

# Load image and detect face
img = Image.open('input.jpg')
face = mtcnn(img)

# Generate embedding
if face is not None:
    embedding = resnet(face.unsqueeze(0).to('cuda'))
    print('Face embedding shape:', embedding.shape)

This minimal snippet detects a single face and prints its 512-dim embedding. For advanced pipelines and deployment options, refer to the example notebooks and Docker configurations.

Getting Started

Start detecting and recognizing faces with facenet-pytorch in minutes. This guide covers environment requirements, installation methods (pip, source, Docker) and a 10-line “detect & recognise” example.

Requirements

• Python 3.6+
• PyTorch ≥1.0, torchvision
• NumPy, Pillow
• (Optional GPU) CUDA 10.1+ compatible driver

Installation

1. Via pip

pip install facenet-pytorch

2. From source

git clone https://github.com/timesler/facenet-pytorch.git
cd facenet-pytorch
pip install -e .

3. With Docker

# Build image
docker build -t facenet-pytorch .
# Run container with GPU support
docker run --gpus all -v $(pwd)/data:/data facenet-pytorch \
    python - << 'PYCODE'
from facenet_pytorch import MTCNN, InceptionResnetV1
print("Ready for inference")
PYCODE

Quickstart: Detect & Recognise (10 lines)

This snippet detects faces in input.jpg, computes embeddings, and compares them to a stored reference embedding.

from facenet_pytorch import MTCNN, InceptionResnetV1
import torch
from PIL import Image

device = 'cuda' if torch.cuda.is_available() else 'cpu'
mtcnn = MTCNN(keep_all=True, device=device)
resnet = InceptionResnetV1(pretrained='vggface2').eval().to(device)

img = Image.open('input.jpg')
faces, probs = mtcnn(img, return_prob=True)                         # Detect & crop
embeddings = resnet(faces.to(device))                               # Compute embeddings
known = torch.load('known_embedding.pt').to(device)                  # Reference embedding
distances = (embeddings - known).norm(dim=1)                         # L2 distances
best_idx = distances.argmin().item()
print(f"Best match face #{best_idx} (distance: {distances[best_idx]:.2f})")

Next steps: fine-tune on custom datasets, build classification pipelines, or integrate into video streams.

Core Concepts & API Reference

MTCNN.detect: Retrieving Bounding Boxes and Facial Landmarks

Provide a low-level interface to run MTCNN’s P-, R- and O-nets on one or more images, returning face bounding boxes, detection probabilities, and optional 5-point landmarks.

Signature

boxes, probs[, points] = mtcnn.detect(imgs, landmarks=False)

Parameters
• imgs
– PIL.Image, numpy.ndarray (H×W×3 uint8), torch.Tensor (H×W×3 or B×H×W×3), or list of equal-sized PIL.Images
• landmarks (bool)
– If True, returns facial landmarks as an N×5×2 array

Returns
• boxes
– numpy.ndarray of shape N×4 for single image, or list of such arrays for batch. Each row is (x1, y1, x2, y2)
• probs
– 1-D array of length N with face confidence scores
• points (optional)
– N×5×2 array of floating-point coordinates for left eye, right eye, nose, left mouth, right mouth

Details
• No faces found → boxes is empty array, probs is empty list
• Batch input → returns arrays of length B with object dtype (each element holds detections for that image)
• To return all faces unsorted, set keep_all=True when constructing MTCNN

Examples

Detect on a single PIL image, annotate and save faces:

from PIL import Image, ImageDraw
from facenet_pytorch import MTCNN, extract_face

img = Image.open('group_photo.jpg')
mtcnn = MTCNN(keep_all=True)
boxes, probs, points = mtcnn.detect(img, landmarks=True)

draw = ImageDraw.Draw(img)
for i, (box, prob, landmark) in enumerate(zip(boxes, probs, points)):
    if prob < 0.90:
        continue
    draw.rectangle(box.tolist(), outline=(0,255,0), width=3)
    for (x, y) in landmark:
        draw.ellipse((x-3, y-3, x+3, y+3), fill=(255,0,0))
    extract_face(img, box, image_size=160, margin=20,
                 save_path=f'face_{i}.png')

img.save('annotated.jpg')

Batch detection on numpy arrays:

import numpy as np
from PIL import Image
from facenet_pytorch import MTCNN

paths = ['img1.jpg', 'img2.jpg', 'img3.jpg']
imgs_np = np.stack([np.asarray(Image.open(p)) for p in paths])
mtcnn = MTCNN(keep_all=False)

boxes_batch, probs_batch = mtcnn.detect(imgs_np)
for boxes, probs in zip(boxes_batch, probs_batch):
    if len(boxes) == 0:
        print("No face")
    else:
        print("Top face box:", boxes[0], "Score:", probs[0])

Tips
• Post-process or normalize face tensors with fixed_image_standardization or prewhiten.
• Tune thresholds, factor, and min_face_size at MTCNN initialization for speed vs. recall.

Loading Pretrained Weights and Caching

Describe how InceptionResnetV1 downloads, caches, and loads pretrained weights for VGGFace2 and CASIA-Webface.

Instantiation

from facenet_pytorch import InceptionResnetV1

# Downloads and caches VGGFace2 weights under TORCH_HOME/checkpoints
model = InceptionResnetV1(pretrained='vggface2')

Under the hood, load_weights(model, 'vggface2'):

Chooses URL for the dataset

Determines cache dir:

torch_home = os.getenv('TORCH_HOME') \
    or os.path.join(os.getenv('XDG_CACHE_HOME', '~/.cache'), 'torch')
model_dir = os.path.join(os.path.expanduser(torch_home), 'checkpoints')

Creates model_dir if missing
Downloads file via download_url_to_file if not present
Loads state dict with model.load_state_dict

Invalid names raise:

ValueError: Pretrained models only exist for "vggface2" and "casia-webface"

Overriding Classification Head

By default classify=False yields L2-normalized 512-D embeddings. To get logits:

model = InceptionResnetV1(
    pretrained='casia-webface',
    classify=True,
    num_classes=150  # custom number of classes
)

If num_classes differs from original (10575), the final layer initializes randomly.

Custom Cache Location

Set TORCH_HOME or XDG_CACHE_HOME before running:

export TORCH_HOME=/mnt/data/torch_cache
export XDG_CACHE_HOME=/mnt/data/cache
python your_script.py

Practical Tips

• Re-instantiating with the same pretrained name uses the cache—no network call.
• To force redownload, delete the cached .pt in <TORCH_HOME>/checkpoints/.
• Move model to device immediately:

import torch
model = InceptionResnetV1(
    pretrained='vggface2',
    device=torch.device('cuda')
)

download_url_to_file: Robust HTTP Download with Progress and Integrity Check

Provide a reliable way to download a file over HTTP(S) into a local path, with optional SHA-256 prefix check and a progress bar. Ensures atomic writes and avoids corrupted files.

Signature

def download_url_to_file(
    url: str,
    dst: str,
    hash_prefix: Optional[str] = None,
    progress: bool = True
) -> None

Parameters
• url
– HTTP(S) URL of the file
• dst
– Full destination path (parent directories must exist)
• hash_prefix (optional)
– Hex string; downloaded file’s SHA-256 must start with this prefix
• progress (optional)
– Show tqdm progress bar if True

Key Features
• Atomic write via a temporary file in the target directory
• On-the-fly SHA-256 integrity check
• Streams in 8 KB chunks—low memory usage
• Uses Content-Length header for progress bar if available

Examples

Download without integrity check, with progress bar

from facenet_pytorch.models.utils.download import download_url_to_file

url = 'https://example.com/models/resnet18.pth'
dst = '/home/user/checkpoints/resnet18.pth'
download_url_to_file(url, dst)

Download with SHA-256 prefix verification

from facenet_pytorch.models.utils.download import download_url_to_file

hash_pref = '5c106cde'
download_url_to_file(
    'https://s3.amazonaws.com/pytorch/models/resnet18-5c106cde.pth',
    '/tmp/resnet18.pth',
    hash_prefix=hash_pref
)

Suppress progress bar (e.g., in CI logs)

download_url_to_file(
    url='https://example.com/large.bin',
    dst='/data/large.bin',
    progress=False
)

Practical Tips
• Always use hash_prefix for critical assets to catch silent corruption.
• Ensure the directory for dst exists; the function won’t create parents.
• On interruption, the temporary file cleans up automatically.

Pretrained Models & Performance

This section describes available MTCNN checkpoint files, how the library downloads them automatically, and benchmark results to help you choose the right model and hardware.

Available Checkpoints

facenet-pytorch ships with three serialized PyTorch state dictionaries under data/:

data/pnet.pt
Weights for the P-Net (Proposal Network).
data/rnet.pt
Weights for the R-Net (Refine Network).
data/onet.pt (not shown above, but also available)

Loading a Checkpoint Manually

import torch
from facenet_pytorch.models.mtcnn import PNet, RNet

# Initialize networks (specify device and channel order)
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
pnet = PNet()
rnet = RNet()

# Load state dicts
pnet.load_state_dict(torch.load('data/pnet.pt', map_location=device))
rnet.load_state_dict(torch.load('data/rnet.pt', map_location=device))

# Move to device
pnet.to(device).eval()
rnet.to(device).eval()

# Example inference on a batch of images
# imgs: torch.Tensor of shape (N, 3, H, W), values in [0,1]
with torch.no_grad():
    proposals = pnet(imgs)        # returns candidate boxes
    refined   = rnet(proposals)   # refines boxes

Automatic Download Behavior

When you instantiate the high-level MTCNN detector, facenet-pytorch downloads missing checkpoints automatically into your cache directory (~/.cache/torch/hub/checkpoints/).

Example

from facenet_pytorch import MTCNN
import torch
from PIL import Image

device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
mtcnn = MTCNN(keep_all=True, device=device)  
# On first run, this downloads pnet.pt, rnet.pt, onet.pt

img = Image.open('path/to/image.jpg')
boxes, probs = mtcnn.detect(img)  
print(boxes, probs)

Performance Benchmarks

Benchmarks in tests/perf_test.py measure throughput (images/sec) for various batch sizes and devices. The figure below compares facenet-pytorch (batched vs non-batched) against dlib and a native MTCNN implementation.

Performance Comparison

Running the Performance Test

# Evaluate on CPU
python tests/perf_test.py --batch-size 16 --device cpu

# Evaluate on GPU
python tests/perf_test.py --batch-size 32 --device cuda:0

Key Snippet from tests/perf_test.py

import time
from torch.utils.data import DataLoader
from facenet_pytorch import MTCNN
from torchvision.datasets import ImageFolder
from torchvision import transforms

# Prepare dataset
dataset = ImageFolder('tests/data/images', transform=transforms.ToTensor())
loader = DataLoader(dataset, batch_size=args.batch_size, num_workers=4)

# Initialize detector
mtcnn = MTCNN(keep_all=True, device=args.device)

# Measure throughput
start = time.perf_counter()
for imgs, _ in loader:
    _ = mtcnn(imgs.to(args.device))
elapsed = time.perf_counter() - start
print(f'Processed {len(dataset)} images in {elapsed:.2f}s ({len(dataset)/elapsed:.2f} img/s)')

Use these benchmarks to plan hardware: on a recent GPU you can expect >100 img/s with batch sizes ≥32; CPU throughput varies around 5–20 img/s depending on batch size.

Development & Contribution Guide

This guide explains the repository layout, development environment setup, testing, CI configuration, and pull request workflow to help you get started quickly.

Repository Layout

facenet-pytorch/

facenet_pytorch/ – Core modules (MTCNN, InceptionResnetV1, utilities)
tests/
- actions_requirements.txt – Dev/test dependencies
- actions_test.py – CI test suite for face detection & embeddings
- perf_test.py – Performance benchmark for MTCNN
.github/workflows/python.yml – GitHub Actions CI
LICENSE.md – MIT License and redistribution terms
README.md – Overview and basic usage
examples/ – Sample scripts for common workflows

Setup Development Environment

Clone the repo and enter the directory

git clone https://github.com/timesler/facenet-pytorch.git
cd facenet-pytorch

Create and activate a virtual environment

python3 -m venv .venv
source .venv/bin/activate

Install core and test dependencies

pip install --upgrade pip
pip install torch torchvision  # or desired CUDA build
pip install -r tests/actions_requirements.txt

Verify installation

python -c "import facenet_pytorch; print(facenet_pytorch.__version__)"

Running the Test Suite

Run all tests with pytest:

pytest --maxfail=1 --disable-warnings -q

Run only CI validation tests (face detection & embeddings):

pytest tests/actions_test.py

Measure MTCNN performance on sample images:

python tests/perf_test.py --data-dir path/to/images --batch-size 32

Generate a coverage report locally:

coverage run -m pytest
coverage report -m

Continuous Integration

The repository uses GitHub Actions (.github/workflows/python.yml) to:

Test on Python 3.6, 3.7, 3.8, 3.9
Install dependencies from actions_requirements.txt
Run pytest and coverage
Upload coverage results to Codecov

CI triggers on push, pull requests, or manual workflow dispatch.

Submitting Pull Requests

Fork the repo and create a feature branch
```
git checkout -b feature/my-improvement
```
Make changes and add tests. Follow existing style and include docstrings.
Run the full test suite and ensure coverage
```
pytest
coverage run -m pytest
```

Commit your changes with a clear message

git add .
git commit -m "Add X feature: brief description"

Push your branch and open a PR against master
```
git push origin feature/my-improvement
```
In your PR description:
- Reference any related issues (e.g. “Fixes #123”)
- Describe your change and its impact
- Confirm tests pass locally

All contributions must comply with the MIT License in LICENSE.md. Ensure you have signed any required contributor license agreements, if applicable.

Documentation

Contents

Quick Actions