Chat about this codebase

AI-powered code exploration

Online

Project Overview

Slippi-AI (Phillip II) trains Super Smash Bros. Melee agents using behavioral cloning on Slippi replays and reinforcement learning in a Dolphin-emulated environment. It provides end-to-end tools for dataset creation, model training, evaluation, and interactive play both locally and online.

What is Slippi-AI?

Slippi-AI is a framework that:

  • Automates Dolphin emulator control for Melee matches
  • Extracts game states and frame-perfect inputs from Slippi replay files
  • Trains agents via imitation learning (BC) and policy optimization (RL)
  • Evaluates AI performance and integrates with netplay for online matches

Problems It Solves

  • Provides a reproducible, high-throughput Melee simulation environment
  • Bridges raw Slippi replay data to ML-ready datasets
  • Manages emulator crashes and state resets in batch training
  • Simplifies evaluation and headless matches without manual setup

Key Features

  • Emulator Control
    • Headless Dolphin startup, configurable ports, state snapshots
    • AI/Human controller mapping and Slippi replay injection
  • Environment API
    • Single (Environment), fault-tolerant (SafeEnvironment), and batched (BatchedEnvironment) wrappers
    • Multiprocessing support for scalable RL training
  • Training Scripts
    • scripts/create_dataset.py for BC data
    • scripts/train_bc.py and scripts/train_rl.py for model optimization
  • Evaluation & Play
    • scripts/evaluate.py for benchmark matches
    • scripts/play.py for interactive or netplay deployment

When to Use Slippi-AI

  • Researching game-playing agent performance in Melee
  • Rapid prototyping of imitation or reinforcement learning setups
  • Automating large-scale match simulations and benchmarks
  • Integrating AI opponents into local or online Melee sessions

Quick Start

Install dependencies and run a basic behavioral cloning training:

git clone https://github.com/vladfi1/slippi-ai.git
cd slippi-ai
pip install -r requirements.txt

# Create a dataset from Slippi replays
python scripts/create_dataset.py \
  --input-replays data/slippi/*.slp \
  --output-dataset data/bc_dataset

# Train a behavioral cloning agent
python scripts/train_bc.py \
  --dataset data/bc_dataset \
  --output-model models/bc_agent

# Evaluate the trained agent
python scripts/evaluate.py \
  --model models/bc_agent \
  --matches 100
## Getting Started

Follow these steps to spin up a working Slippi-AI environment, run a demo training, and verify your setup.

### 1. Clone the Repository  
```bash
git clone https://github.com/vladfi1/slippi-ai.git
cd slippi-ai

2. Quickstart with Docker

Build the container image:

docker build \
  -f docker/Dockerfile \
  -t slippi-ai:latest \
  .

Run an interactive shell inside the container, mounting your code for iterative development:

docker run --rm -it \
  -v "$(pwd)":/app \
  -w /app \
  slippi-ai:latest \
  bash

Inside the container, install editable package and run the CI test:

pip install -e .
bash tests/train_rl.sh

3. Local Setup (Python 3.9)

Create a virtual environment and install dependencies:

python3.9 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
pip install -e .

4. Run a Demo Training

Use the built-in RL test script to train for a few hundred steps and confirm metrics log to console:

bash tests/train_rl.sh

You should see training loss, reward, and step count printed periodically.

5. (Optional) Imitation-Learning Example

If you have a folder of Slippi replays (.slp), launch the transformer imitation script on “Fox” matches with an 18-frame delay:

bash scripts/imitation_example.sh \
  --data-dir=/path/to/replays \
  --delay=18 \
  --batch-size=64 \
  --eval-interval=500

This runs a 3-layer transformer on GPU (if available) and logs evaluation metrics.

Next Steps

  • Generate a full dataset: use your own Slippi replays and the data-processing scripts.
  • Train from scratch or continue from checkpoints in checkpoints/.
  • Launch headless evaluation via scripts/run_dolphin.py (requires a GameCube/Wii ROM).

Data Pipeline & Dataset Creation

This section outlines the end-to-end flow that transforms raw Slippi replay archives (.zip, .7z) into a filtered, model-ready dataset serialized as Parquet files.

1. Parsing Raw Replay Archives

Use slippi_db.parse_local.run_parsing() to extract and parse all .slp files from archive folders into per-replay pickles.

Key Function

run_parsing(
    input_dir: str,           # Directory containing .zip/.7z archives or loose .slp files
    output_dir: str,          # Directory to write parsed subfolders per replay
    num_workers: int = 1,     # Parallel extraction workers
    in_memory: bool = False,  # Extract archives in memory instead of to disk
    overwrite: bool = False   # Re-parse even if output exists
)

Example

from slippi_db.parse_local import run_parsing

# Parse all archives under ./raw_replays into ./parsed_replays using 4 workers
run_parsing(
    input_dir="data/raw_replays",
    output_dir="data/parsed_replays",
    num_workers=4,
    in_memory=True,
    overwrite=False
)

After running, each replay folder (<replay_hash>) contains:

  • metadata.pickle (dict from preprocessing)
  • game_frames.pickle (PyArrow StructArray of frames)

2. Filtering & Metadata Extraction

Filter parsed replays to 1v1 human matches that meet training criteria.

Metadata Functions

from slippi_db.preprocessing import (
    get_metadata_safe,       # Returns metadata dict or tags failure
    is_training_replay,      # (bool, reason) on filtered criteria
)

Example

from slippi_db.preprocessing import get_metadata_safe, is_training_replay
import os

def filter_parsed_replays(parsed_dir):
    valid = []
    for replay_id in os.listdir(parsed_dir):
        meta = get_metadata_safe(f"{parsed_dir}/{replay_id}/metadata.pickle")
        ok, reason = is_training_replay(meta)
        if ok:
            valid.append(replay_id)
        else:
            print(f"Skipped {replay_id}: {reason}")
    return valid

valid_ids = filter_parsed_replays("data/parsed_replays")
print(f"{len(valid_ids)} replays passed filters")

3. Building & Packaging the Local Dataset

Use the make_local_dataset.py script or API to collect filtered replays into a dataset directory or tar archive.

Command-Line Usage

python -m slippi_db.scripts.make_local_dataset \
  --parsed_dir data/parsed_replays \
  --output_dir data/training_dataset \
  --min_damage 0.1 \
  --max_damage 20.0 \
  --tarfile data/slippi_dataset.tar

Python API

from slippi_db.scripts.make_local_dataset import make_dataset

make_dataset(
    parsed_dir="data/parsed_replays",
    output_dir="data/training_dataset",
    tarfile="data/slippi_dataset.tar",  # optional .tar output
    min_damage=0.1,
    max_damage=20.0,
    require_winner=True
)

Output layout:

  • training_dataset/
    • <replay_id>/metadata.pickle
    • <replay_id>/game_frames.pickle
  • slippi_dataset.tar (if requested)

4. Parquet Serialization

Convert each replay’s StructArray into a Parquet file for fast batch reads.

Utility: convert_game

from slippi_db.parsing_utils import convert_game, CompressionType
import pickle, os

def serialize_to_parquet(dataset_dir, out_dir):
    os.makedirs(out_dir, exist_ok=True)
    for replay_id in os.listdir(dataset_dir):
        # Load parsed frames
        with open(f"{dataset_dir}/{replay_id}/game_frames.pickle","rb") as f:
            game_array = pickle.load(f)
        # Serialize to Parquet bytes
        pq_bytes = convert_game(
            game=game_array,
            pq_version="2.4",
            compression=CompressionType.SNAPPY
        )
        # Write file
        with open(f"{out_dir}/{replay_id}.parquet","wb") as out:
            out.write(pq_bytes)

Reading Back

import pyarrow.parquet as pq

table = pq.read_table("data/parquet/abc123.parquet")
# 'root' is a Struct column of frames
frames = table.column("root").flatten()
print(frames.state_action.state.hp.shape)  # (num_frames, )

5. End-to-End Example Script

from slippi_db.parse_local import run_parsing
from slippi_db.preprocessing import get_metadata_safe, is_training_replay
from slippi_db.scripts.make_local_dataset import make_dataset
from slippi_db.parsing_utils import convert_game, CompressionType
import pickle, os

# 1. Parse archives
run_parsing("data/raw_replays", "data/parsed_replays", num_workers=4, in_memory=True)

# 2. Build filtered dataset
make_dataset("data/parsed_replays", "data/training_dataset", tarfile=None)

# 3. Serialize each replay to Parquet
os.makedirs("data/parquet", exist_ok=True)
for rid in os.listdir("data/training_dataset"):
    # Load frames
    with open(f"data/training_dataset/{rid}/game_frames.pickle","rb") as f:
        game = pickle.load(f)
    pq_bytes = convert_game(game, pq_version="2.4", compression=CompressionType.ZSTD, compression_level=5)
    with open(f"data/parquet/{rid}.parquet","wb") as out:
        out.write(pq_bytes)

This pipeline yields a ready-to-consume Parquet dataset for downstream model training.

Training Workflows

Launch and monitor all supported learning regimes: Imitation Learning, Q-Learning, and PPO Reinforcement Learning. Each workflow initializes Weights & Biases logging, configurable via --config.* and --wandb.* flags.


Imitation Learning (scripts/train.py)

Train a policy by mimicking replay data, with checkpointing and evaluation built in.

Quickstart

# Start training with default imitation settings, WandB offline
python scripts/train.py \
  --wandb.mode=offline

Key Flags

--config.data.replays Paths or glob for replay files
--config.batch_size Frames per minibatch (default 8)
--config.learning_rate Adam LR for policy (default 1e-4)
--config.max_steps Total training steps
--config.checkpoint_dir Directory to save/restore model checkpoints

--wandb.project WandB project name
--wandb.name Run name (defaults to config.tag)
--wandb.modeonline/offline/disabled

Example: Custom Dataset & Logging

python scripts/train.py \
  --config.data.replays="data/train/*.slp" \
  --config.batch_size=16 \
  --config.learning_rate=5e-5 \
  --config.max_steps=500000 \
  --config.checkpoint_dir="checkpoints/imit" \
  --config.tag="fox_imitation_v2" \
  --wandb.project="slippi-il" \
  --wandb.mode=online

Logs, metrics, and checkpoints appear under checkpoints/imit and your WandB dashboard.


Q-Learning (scripts/train_q.py & slippi_ai/train_q_lib)

Run a tabular or function-approx Q-learning agent on Slippi environments.

Quickstart

# Disabled WandB, local logging
python -m scripts.train_q \
  --wandb.mode=disabled

Key Flags

--config.learning_rate LR for Q-network updates
--config.reward_halflife Controls discount γ≈0.5^(1/(halflife×60))
--config.batch_size Minibatch size for experience replay
--config.buffer_size Replay buffer capacity
--config.target_update_interval Steps between target-network sync

--wandb.project
--wandb.group
--wandb.name
--wandb.mode
--wandb.dir Local log directory

Examples

# Online logging to WandB, custom experiment group
python -m scripts.train_q \
  --config.learning_rate=1e-4 \
  --config.reward_halflife=8 \
  --wandb.mode=online \
  --wandb.project="slippi-q" \
  --wandb.group="run_alpha" \
  --wandb.name="alpha-001"
# Offline logging to disk
python -m scripts.train_q \
  --config.batch_size=64 \
  --config.buffer_size=100000 \
  --wandb.mode=offline \
  --wandb.dir="/mnt/logs/q_learning"

All flags under --config.* map directly to train_q_lib.Config fields.


PPO Reinforcement Learning (scripts/rl_example.sh & slippi_ai/rl/run.py)

Launch parallel PPO training with Dolphin-EMU environments, actor-learner architecture, and optional self-play.

Quickstart via Shell Wrapper

# Ensure environment variables
export DOLPHIN_PATH=/path/to/dolphin-emu
export ISO_PATH=/path/to/Sm4sh.iso

# Run default fox training
bash scripts/rl_example.sh

Override inline:

bash scripts/rl_example.sh \
  --config.actor.num_envs=128 \
  --config.actor.rollout_length=256 \
  --config.learner.learning_rate=2e-5 \
  --wandb.mode=online \
  --wandb.project="slippi-ppo" \
  --wandb.name="fox_ppo_run"

Direct Python Invocation

python slippi_ai/rl/run.py \
  --config.actor.num_envs=64 \
  --config.actor.rollout_length=200 \
  --config.learner.ppo.epsilon=0.2 \
  --config.opponent.type=SELF \
  --config.opponent.update_interval=50 \
  --config.runtime.max_step=2e7 \
  --config.runtime.tag="marth_selfplay" \
  --wandb.mode=online \
  --wandb.project="slippi-ppo" \
  --wandb.name="marth_sp_001"

Key Flag Categories

Actor • --config.actor.num_envs, rollout_length, inner_batch_size, gpu_inference

Learner & PPO • --config.learner.learning_rate, value_cost, policy_gradient_weight
--config.learner.ppo.epsilon, beta, num_epochs, num_batches

Opponent & Teacher • --config.opponent.type (CPU/SELF/OTHER)
--config.opponent.update_interval, train (self-play schedules)
--config.teacher (path to pretrained imitation model)

Runtime & Logging • --config.runtime.max_step, log_interval, checkpoint_interval
--wandb.* flags for project, mode, tags

Monitoring & Tuning Tips

  1. Increase num_envs until CPU/RAM saturates, then bump rollout_length to fill GPU.
  2. Match inner_batch_size to CPU threads: num_envs / inner_batch_size ≃ # threads.
  3. Watch actor_kl in WandB; reduce ppo.epsilon or LR if mean KL exceeds threshold.
  4. Adjust reward_halflife to bias short- vs. long-term rewards.
  5. Use --wandb.mode=offline for unattended runs, then wandb sync later.

These workflows deliver end-to-end training, logging, checkpointing, and evaluation across all supported regimes.

Evaluation & Gameplay

Run trained Slippi AI agents in Dolphin or headless loops. Support single- or multi-agent rollouts, agent-vs-agent or human matches, online play tests, and performance benchmarking.

Running Policy Evaluations with run_evaluator.py

Provide a one-stop CLI for rolling out trained agents (single or multi-agent), measuring rewards and FPS, and optionally parallelizing via Ray.

Key Flags

  • --dolphin: JSON overrides for DolphinConfig (e.g. --dolphin.headless=True,infinite_time=False,online_delay=2)
  • --rollout_length: frames per rollout (default 3600)
  • --num_envs: environments per worker
  • --fake_envs: replayed matches instead of live Dolphin
  • --async_envs: non-blocking environment stepping
  • --num_env_steps + --inner_batch_size: batch size for async envs
  • --use_gpu: run inference on GPU
  • --num_agent_steps: batch agent inferences
  • --num_workers: Ray workers count
  • --agent: agent-specific flags (model_dir, jit_compile, etc.)
  • --self_play: agent plays both ports
  • --opponent: opponent agent flags

How It Works

  1. Load agent parameters via eval_lib.load_state.
  2. Build dolphin_kwargs from --dolphin and player assignments.
  3. Instantiate evaluators.Evaluator (single‐process) or evaluators.RayEvaluator.
  4. Enter with evaluator.run(): to start environments and agents.
  5. Perform burn-in, then main rollout.
  6. Log KO-diff/min, component timings (env_pop, agent_step), FPS, SPS.

Examples

  1. Single-env, CPU only

    python scripts/run_evaluator.py \
      --agent.model_dir=/path/to/checkpoint \
      --rollout_length=600 \
      --num_envs=1
    
  2. Four async envs, GPU inference

    python scripts/run_evaluator.py \
      --agent.model_dir=/path/to/checkpoint \
      --num_envs=4 \
      --async_envs \
      --num_env_steps=64 \
      --inner_batch_size=16 \
      --use_gpu
    
  3. Self-play

    python scripts/run_evaluator.py \
      --agent.model_dir=/path/to/checkpoint \
      --self_play \
      --num_envs=2
    
  4. Distributed with 4 Ray workers

    python scripts/run_evaluator.py \
      --agent.model_dir=/path/to/checkpoint \
      --num_envs=8 \
      --num_workers=4
    

eval_two.py: Automated Matches Between Agents or Against a Human

Quickly pit two AI agents—or AI vs. human—in Dolphin, collect game stats, and profile inference time.

Essential CLI Usage

  • Agent vs. Agent

    python scripts/eval_two.py \
      --dolphin.path=/path/to/slippi-dolphin \
      --dolphin.iso=/path/to/SSBM.iso \
      --p1.ai.path=/path/to/agent1 \
      --p2.ai.path=/path/to/agent2 \
      --num_games=50
    
  • Human vs. Agent (human on port 1)

    python scripts/eval_two.py \
      --dolphin.path=/path/to/slippi-dolphin \
      --dolphin.iso=/path/to/SSBM.iso \
      --p1.type=human \
      --p2.ai.path=/path/to/agent \
      --num_games=10
    

Key Flags

  • dolphin.path, dolphin.iso: Dolphin executable and ISO (or use DOLPHIN_PATH/ISO_PATH)
  • p<n>.type: ai or human; for ai, supply p<n>.ai.path and optional p<n>.ai.async_inference (defaults True)
  • num_games: total matches (runs indefinitely if omitted)
  • --dolphin.headless, --dolphin.online_delay, --dolphin.save_replays, --dolphin.replay_dir

Workflow

  1. Parse flags into DolphinConfig and player specs via eval_lib.get_player().
  2. Build and .start() each eval_lib.Agent.
  3. Launch Dolphin, auto-menu, then loop for state in dolphin.iter_gamestates():
    • agent.step(state) produces controller inputs.
  4. Log avg. inference time every 15 seconds.

Tips

  • If Dolphin fails to connect, verify paths.
  • Simulate lag with --dolphin.online_delay.
  • Use --p<n>.ai.async_inference=False for synchronous profiling.
  • For CI: --dolphin.headless=True --dolphin.save_replays=True --dolphin.replay_dir=/tmp/replays.
  • eval_two.py disables GPUs by default; ensure your model loader respects eval_lib.disable_gpus().

netplay.py: Online Play Testing and Controller Verification

Test a trained agent in a live or headless Dolphin session, optionally verify controller inputs over time.

Common Flags

  • --dolphin.path, --dolphin.iso
  • --ai.model_dir: checkpoint directory
  • --ai.async_inference: enable non-blocking inference (default True)
  • --runtime: seconds to run (default 300)
  • --verify_inputs: assert controller state changes each frame
  • --headless: run without GUI
  • --save_replays, --replay_dir: store Slippi replays

Example: 5-Minute Headless Test with Input Verification

python scripts/netplay.py \
  --dolphin.path=/usr/bin/slippi-dolphin \
  --dolphin.iso=/games/SSBM.iso \
  --ai.model_dir=/models/checkpoint \
  --runtime=300 \
  --verify_inputs \
  --headless \
  --save_replays \
  --replay_dir=/tmp/replays

What It Does

  1. Configure Dolphin via slippi_ai.dolphin.DolphinConfig.
  2. Build and start agent with eval_lib.build_agent().
  3. Step Dolphin frames, call agent.step() each frame.
  4. If --verify_inputs, compare consecutive controller reports and error on no change.
  5. On exit, optionally save replay.

run_dolphin.py: Dolphin Emulator Benchmarking

Launch multiple Dolphin instances with AI or CPU players to measure emulator performance in frames per second.

Key Flags

  • --dolphin.path, --dolphin.iso
  • --num_instances: parallel Dolphin processes (default 1)
  • --num_frames: frames per instance (default 10000)
  • --render: enable graphics (default False/headless)
  • --p<n>.type: ai, cpu, or human
  • --p<n>.ai.path: model dir for AI players
  • --headless, --save_replays, --replay_dir

Example: Benchmark 4 Headless Instances, 20k Frames Each

python scripts/run_dolphin.py \
  --dolphin.path=/usr/bin/slippi-dolphin \
  --dolphin.iso=/games/SSBM.iso \
  --num_instances=4 \
  --num_frames=20000 \
  --p1.type=ai --p1.ai.path=/models/a1 \
  --p2.type=cpu \
  --headless

Output

  • Per-instance FPS
  • Aggregate average FPS

Use these scripts to automate evaluation loops, human-involved matches, online play tests, and performance benchmarks for Slippi AI agents.

Architecture & Extensibility

This section explores the core abstractions in slippi-ai and shows how to extend or replace modules safely. Each subsystem defines clear interfaces—use subclassing and composition to inject custom logic.


Controller Heads

Controller heads encapsulate how discrete controller actions are embedded, sampled, and compared.

Core Interfaces

  • ControllerHeadBase (abstract):

    • size property
    • embed(controller: Controller) → np.ndarray
    • sample(logits: tf.Tensor, **kwargs) → Controller
    • distance(logits1, logits2) → tf.Tensor
  • Provided implementations:

    • IndependentControllerHead
    • AutoregressiveControllerHead

Extending with a Custom Head

from slippi_ai.controller_heads import ControllerHeadBase, IndependentControllerHead
import tensorflow as tf
import numpy as np

class MyControllerHead(ControllerHeadBase):
    def __init__(self, name, button_dim, stick_dim):
        super().__init__(name)
        self.button_dim = button_dim
        self.stick_dim = stick_dim

    @property
    def size(self):
        return self.button_dim + self.stick_dim

    def embed(self, controller):
        # pack discrete buttons and continuous sticks
        btns = controller.buttons.astype(np.float32)
        stick = controller.stick / 1.0
        return np.concatenate([btns, stick], axis=-1)

    def sample(self, logits, temperature=1.0):
        # custom temperature scaling
        probs = tf.nn.softmax(logits[:self.button_dim] / temperature)
        btn_sample = tf.random.categorical(tf.math.log(probs), 1)
        stick_pred = logits[self.button_dim:]
        return self._reconstruct_controller(btn_sample, stick_pred)

    def distance(self, a, b):
        # L2 on stick + cross‐entropy on buttons
        stick_a, stick_b = a[..., self.button_dim:], b[..., self.button_dim:]
        btn_a, btn_b = a[..., :self.button_dim], b[..., :self.button_dim]
        stick_dist = tf.reduce_sum((stick_a - stick_b) ** 2, axis=-1)
        btn_ce = tf.reduce_sum(-btn_a * tf.math.log(btn_b + 1e-6), axis=-1)
        return stick_dist + btn_ce

    def _reconstruct_controller(self, btn_idx, stick_vec):
        # convert back to Controller namedtuple
        from slippi_ai.types import Controller
        btn_onehot = tf.one_hot(tf.squeeze(btn_idx, -1), self.button_dim)
        return Controller(buttons=btn_onehot.numpy(), stick=stick_vec.numpy())

# Registration: swap in your head when building a Policy
from slippi_ai.policies import Policy
policy = Policy(controller_head=MyControllerHead("my_head", 10, 2), ...)

Embeddings

Embeddings map structured game state to flat tensors and back.

Base Class

  • Embedding (abstract):
    • size property
    • __call__(np_array) → tf.Tensor
    • from_state(python_struct) → np.ndarray
    • sample(logits, **kwargs)
    • distance(a, b)

Composing Structs

Use StructEmbedding (or helpers ordered_struct_embedding/dict_embedding) to combine field‐level embeddings:

from slippi_ai.embed import FloatEmbedding, BoolEmbedding, ordered_struct_embedding
from slippi_ai.types import Player

pct = FloatEmbedding("percent", scale=0.01)
facing = BoolEmbedding("facing", off=-1.)
player_emb = ordered_struct_embedding(
   name="player",
   embedding=[
     ("percent", pct),
     ("facing", facing),
     # …
   ],
   nt_type=Player
)

# Embed a Player state:
np_player = player_emb.from_state(parsed_namedtuple)
tf_tensor = player_emb(np_player)

Custom Embedding

from slippi_ai.embed import Embedding
import tensorflow as tf
import numpy as np

class LogScaleEmbedding(Embedding):
    def __init__(self, name, base=10.):
        super().__init__(name)
        self.base = base

    @property
    def size(self):
        return 1

    def from_state(self, val: float):
        return np.array([np.log(val + 1) / np.log(self.base)], dtype=np.float32)

    def __call__(self, arr: np.ndarray):
        return tf.convert_to_tensor(arr)

    def distance(self, a, b):
        return tf.abs(a - b)

Environment Backends

The environment API supports single, batched, safe and multiprocessing variants.

Core Interface

  • Environment (abstract):
    • reset() → Dict[int, Game]
    • step(controllers: Dict[int, Controller]) → (gamestates, needs_reset)

Extending or Replacing

from slippi_ai.envs import Environment, EnvOutput
from slippi_ai.types import Controller, Game

class DummyEnv(Environment):
    def __init__(self, length=100):
        self.length = length
        self.t = 0

    def reset(self):
        self.t = 0
        return {0: Game.zero()}

    def step(self, controllers: Dict[int, Controller]):
        self.t += 1
        state = Game.random()
        done = self.t >= self.length
        return {0: state}, done

# Plug into AsyncEnvMP
from slippi_ai.envs import AsyncEnvMP
env = AsyncEnvMP(env_class=DummyEnv, num_envs=0)

Network Modules

Networks implement recurrent and feedforward architectures with a unified step/unroll API.

Abstract Base

  • Network (abstract):
    • initial_state(batch_size) → RecurrentState
    • step(inputs, prev_state) → (outputs, next_state)

Adding a Custom Network

import tensorflow as tf
from slippi_ai.networks import Network

class MyCNN(Network):
    def __init__(self, name="my_cnn"):
        super().__init__(name=name)
        self.conv = tf.keras.layers.Conv2D(32, 3, activation='relu')
        self.flatten = tf.keras.layers.Flatten()
        self.fc = tf.keras.layers.Dense(64)

    def initial_state(self, batch_size):
        # stateless CNN
        return ()

    def step(self, inputs, prev_state):
        # inputs: [batch, H, W, C]
        x = self.conv(inputs)
        x = self.flatten(x)
        return self.fc(x), prev_state

# Integrate into a Policy’s network stack
from slippi_ai.policies import Policy
policy = Policy(network=MyCNN(), ...)

Observation Filters

Filters preprocess raw game states before embedding or training.

Base Class

  • ObservationFilter:
    • apply(step_index, gamestate, controller) → filtered gamestate
    • mask(actions) → masked action set

Custom Filter

from slippi_ai.observations import ObservationFilter

class NoShieldsFilter(ObservationFilter):
    def apply(self, step, gamestate, controller):
        # zero out shield strength
        gamestate[0].shield_strength = 0.0
        return gamestate, controller

# Build a pipeline
from slippi_ai.observations import build_observation_filter
filter_chain = build_observation_filter(
    filters=[NoShieldsFilter(), AnimationFilter(mask_duration=5)]
)

Policy Modules

Policy ties networks, embeddings, and controller heads into training and sampling routines.

Swapping Components

from slippi_ai.policies import Policy
from slippi_ai.networks import MyCNN
from slippi_ai.controller_heads import MyControllerHead

policy = Policy(
    network=MyCNN(),
    controller_head=MyControllerHead("custom", 8, 2),
    state_embedding=my_state_embedding,  # any Embedding instance
    value_head=my_value_head             # optional custom head
)

Extending Loss Functions

Subclass Policy and override:

class MyPolicy(Policy):
    def imitation_loss(self, logits, target):
        # custom loss on controller logits
        return tf.reduce_mean((logits - target) ** 2)

    def value_loss(self, values, returns):
        # Huber instead of MSE
        return tf.keras.losses.Huber()(returns, values)

By adhering to these interfaces and using subclassing/composition, you can inject new behavior into slippi-ai’s controller sampling, embeddings, environments, networks, filters, and policies without modifying core logic. Which subsection would you like documentation for?

  • Stripping Jupyter Notebook Outputs (Git Filter)
  • Continuous Integration with GitHub Actions