Project Overview
Slippi-AI (Phillip II) trains Super Smash Bros. Melee agents using behavioral cloning on Slippi replays and reinforcement learning in a Dolphin-emulated environment. It provides end-to-end tools for dataset creation, model training, evaluation, and interactive play both locally and online.
What is Slippi-AI?
Slippi-AI is a framework that:
- Automates Dolphin emulator control for Melee matches
- Extracts game states and frame-perfect inputs from Slippi replay files
- Trains agents via imitation learning (BC) and policy optimization (RL)
- Evaluates AI performance and integrates with netplay for online matches
Problems It Solves
- Provides a reproducible, high-throughput Melee simulation environment
- Bridges raw Slippi replay data to ML-ready datasets
- Manages emulator crashes and state resets in batch training
- Simplifies evaluation and headless matches without manual setup
Key Features
- Emulator Control
- Headless Dolphin startup, configurable ports, state snapshots
- AI/Human controller mapping and Slippi replay injection
- Environment API
- Single (
Environment
), fault-tolerant (SafeEnvironment
), and batched (BatchedEnvironment
) wrappers - Multiprocessing support for scalable RL training
- Single (
- Training Scripts
scripts/create_dataset.py
for BC datascripts/train_bc.py
andscripts/train_rl.py
for model optimization
- Evaluation & Play
scripts/evaluate.py
for benchmark matchesscripts/play.py
for interactive or netplay deployment
When to Use Slippi-AI
- Researching game-playing agent performance in Melee
- Rapid prototyping of imitation or reinforcement learning setups
- Automating large-scale match simulations and benchmarks
- Integrating AI opponents into local or online Melee sessions
Quick Start
Install dependencies and run a basic behavioral cloning training:
git clone https://github.com/vladfi1/slippi-ai.git
cd slippi-ai
pip install -r requirements.txt
# Create a dataset from Slippi replays
python scripts/create_dataset.py \
--input-replays data/slippi/*.slp \
--output-dataset data/bc_dataset
# Train a behavioral cloning agent
python scripts/train_bc.py \
--dataset data/bc_dataset \
--output-model models/bc_agent
# Evaluate the trained agent
python scripts/evaluate.py \
--model models/bc_agent \
--matches 100
## Getting Started
Follow these steps to spin up a working Slippi-AI environment, run a demo training, and verify your setup.
### 1. Clone the Repository
```bash
git clone https://github.com/vladfi1/slippi-ai.git
cd slippi-ai
2. Quickstart with Docker
Build the container image:
docker build \
-f docker/Dockerfile \
-t slippi-ai:latest \
.
Run an interactive shell inside the container, mounting your code for iterative development:
docker run --rm -it \
-v "$(pwd)":/app \
-w /app \
slippi-ai:latest \
bash
Inside the container, install editable package and run the CI test:
pip install -e .
bash tests/train_rl.sh
3. Local Setup (Python 3.9)
Create a virtual environment and install dependencies:
python3.9 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
pip install -e .
4. Run a Demo Training
Use the built-in RL test script to train for a few hundred steps and confirm metrics log to console:
bash tests/train_rl.sh
You should see training loss, reward, and step count printed periodically.
5. (Optional) Imitation-Learning Example
If you have a folder of Slippi replays (.slp
), launch the transformer imitation script on “Fox” matches with an 18-frame delay:
bash scripts/imitation_example.sh \
--data-dir=/path/to/replays \
--delay=18 \
--batch-size=64 \
--eval-interval=500
This runs a 3-layer transformer on GPU (if available) and logs evaluation metrics.
Next Steps
- Generate a full dataset: use your own Slippi replays and the data-processing scripts.
- Train from scratch or continue from checkpoints in
checkpoints/
. - Launch headless evaluation via
scripts/run_dolphin.py
(requires a GameCube/Wii ROM).
Data Pipeline & Dataset Creation
This section outlines the end-to-end flow that transforms raw Slippi replay archives (.zip
, .7z
) into a filtered, model-ready dataset serialized as Parquet files.
1. Parsing Raw Replay Archives
Use slippi_db.parse_local.run_parsing()
to extract and parse all .slp
files from archive folders into per-replay pickles.
Key Function
run_parsing(
input_dir: str, # Directory containing .zip/.7z archives or loose .slp files
output_dir: str, # Directory to write parsed subfolders per replay
num_workers: int = 1, # Parallel extraction workers
in_memory: bool = False, # Extract archives in memory instead of to disk
overwrite: bool = False # Re-parse even if output exists
)
Example
from slippi_db.parse_local import run_parsing
# Parse all archives under ./raw_replays into ./parsed_replays using 4 workers
run_parsing(
input_dir="data/raw_replays",
output_dir="data/parsed_replays",
num_workers=4,
in_memory=True,
overwrite=False
)
After running, each replay folder (<replay_hash>
) contains:
metadata.pickle
(dict from preprocessing)game_frames.pickle
(PyArrow StructArray of frames)
2. Filtering & Metadata Extraction
Filter parsed replays to 1v1 human matches that meet training criteria.
Metadata Functions
from slippi_db.preprocessing import (
get_metadata_safe, # Returns metadata dict or tags failure
is_training_replay, # (bool, reason) on filtered criteria
)
Example
from slippi_db.preprocessing import get_metadata_safe, is_training_replay
import os
def filter_parsed_replays(parsed_dir):
valid = []
for replay_id in os.listdir(parsed_dir):
meta = get_metadata_safe(f"{parsed_dir}/{replay_id}/metadata.pickle")
ok, reason = is_training_replay(meta)
if ok:
valid.append(replay_id)
else:
print(f"Skipped {replay_id}: {reason}")
return valid
valid_ids = filter_parsed_replays("data/parsed_replays")
print(f"{len(valid_ids)} replays passed filters")
3. Building & Packaging the Local Dataset
Use the make_local_dataset.py
script or API to collect filtered replays into a dataset directory or tar archive.
Command-Line Usage
python -m slippi_db.scripts.make_local_dataset \
--parsed_dir data/parsed_replays \
--output_dir data/training_dataset \
--min_damage 0.1 \
--max_damage 20.0 \
--tarfile data/slippi_dataset.tar
Python API
from slippi_db.scripts.make_local_dataset import make_dataset
make_dataset(
parsed_dir="data/parsed_replays",
output_dir="data/training_dataset",
tarfile="data/slippi_dataset.tar", # optional .tar output
min_damage=0.1,
max_damage=20.0,
require_winner=True
)
Output layout:
training_dataset/
<replay_id>/metadata.pickle
<replay_id>/game_frames.pickle
slippi_dataset.tar
(if requested)
4. Parquet Serialization
Convert each replay’s StructArray
into a Parquet file for fast batch reads.
Utility: convert_game
from slippi_db.parsing_utils import convert_game, CompressionType
import pickle, os
def serialize_to_parquet(dataset_dir, out_dir):
os.makedirs(out_dir, exist_ok=True)
for replay_id in os.listdir(dataset_dir):
# Load parsed frames
with open(f"{dataset_dir}/{replay_id}/game_frames.pickle","rb") as f:
game_array = pickle.load(f)
# Serialize to Parquet bytes
pq_bytes = convert_game(
game=game_array,
pq_version="2.4",
compression=CompressionType.SNAPPY
)
# Write file
with open(f"{out_dir}/{replay_id}.parquet","wb") as out:
out.write(pq_bytes)
Reading Back
import pyarrow.parquet as pq
table = pq.read_table("data/parquet/abc123.parquet")
# 'root' is a Struct column of frames
frames = table.column("root").flatten()
print(frames.state_action.state.hp.shape) # (num_frames, )
5. End-to-End Example Script
from slippi_db.parse_local import run_parsing
from slippi_db.preprocessing import get_metadata_safe, is_training_replay
from slippi_db.scripts.make_local_dataset import make_dataset
from slippi_db.parsing_utils import convert_game, CompressionType
import pickle, os
# 1. Parse archives
run_parsing("data/raw_replays", "data/parsed_replays", num_workers=4, in_memory=True)
# 2. Build filtered dataset
make_dataset("data/parsed_replays", "data/training_dataset", tarfile=None)
# 3. Serialize each replay to Parquet
os.makedirs("data/parquet", exist_ok=True)
for rid in os.listdir("data/training_dataset"):
# Load frames
with open(f"data/training_dataset/{rid}/game_frames.pickle","rb") as f:
game = pickle.load(f)
pq_bytes = convert_game(game, pq_version="2.4", compression=CompressionType.ZSTD, compression_level=5)
with open(f"data/parquet/{rid}.parquet","wb") as out:
out.write(pq_bytes)
This pipeline yields a ready-to-consume Parquet dataset for downstream model training.
Training Workflows
Launch and monitor all supported learning regimes: Imitation Learning, Q-Learning, and PPO Reinforcement Learning. Each workflow initializes Weights & Biases logging, configurable via --config.*
and --wandb.*
flags.
Imitation Learning (scripts/train.py)
Train a policy by mimicking replay data, with checkpointing and evaluation built in.
Quickstart
# Start training with default imitation settings, WandB offline
python scripts/train.py \
--wandb.mode=offline
Key Flags
• --config.data.replays
Paths or glob for replay files
• --config.batch_size
Frames per minibatch (default 8)
• --config.learning_rate
Adam LR for policy (default 1e-4)
• --config.max_steps
Total training steps
• --config.checkpoint_dir
Directory to save/restore model checkpoints
• --wandb.project
WandB project name
• --wandb.name
Run name (defaults to config.tag
)
• --wandb.mode
online
/offline
/disabled
Example: Custom Dataset & Logging
python scripts/train.py \
--config.data.replays="data/train/*.slp" \
--config.batch_size=16 \
--config.learning_rate=5e-5 \
--config.max_steps=500000 \
--config.checkpoint_dir="checkpoints/imit" \
--config.tag="fox_imitation_v2" \
--wandb.project="slippi-il" \
--wandb.mode=online
Logs, metrics, and checkpoints appear under checkpoints/imit
and your WandB dashboard.
Q-Learning (scripts/train_q.py & slippi_ai/train_q_lib)
Run a tabular or function-approx Q-learning agent on Slippi environments.
Quickstart
# Disabled WandB, local logging
python -m scripts.train_q \
--wandb.mode=disabled
Key Flags
• --config.learning_rate
LR for Q-network updates
• --config.reward_halflife
Controls discount γ≈0.5^(1/(halflife×60))
• --config.batch_size
Minibatch size for experience replay
• --config.buffer_size
Replay buffer capacity
• --config.target_update_interval
Steps between target-network sync
• --wandb.project
• --wandb.group
• --wandb.name
• --wandb.mode
• --wandb.dir
Local log directory
Examples
# Online logging to WandB, custom experiment group
python -m scripts.train_q \
--config.learning_rate=1e-4 \
--config.reward_halflife=8 \
--wandb.mode=online \
--wandb.project="slippi-q" \
--wandb.group="run_alpha" \
--wandb.name="alpha-001"
# Offline logging to disk
python -m scripts.train_q \
--config.batch_size=64 \
--config.buffer_size=100000 \
--wandb.mode=offline \
--wandb.dir="/mnt/logs/q_learning"
All flags under --config.*
map directly to train_q_lib.Config
fields.
PPO Reinforcement Learning (scripts/rl_example.sh & slippi_ai/rl/run.py)
Launch parallel PPO training with Dolphin-EMU environments, actor-learner architecture, and optional self-play.
Quickstart via Shell Wrapper
# Ensure environment variables
export DOLPHIN_PATH=/path/to/dolphin-emu
export ISO_PATH=/path/to/Sm4sh.iso
# Run default fox training
bash scripts/rl_example.sh
Override inline:
bash scripts/rl_example.sh \
--config.actor.num_envs=128 \
--config.actor.rollout_length=256 \
--config.learner.learning_rate=2e-5 \
--wandb.mode=online \
--wandb.project="slippi-ppo" \
--wandb.name="fox_ppo_run"
Direct Python Invocation
python slippi_ai/rl/run.py \
--config.actor.num_envs=64 \
--config.actor.rollout_length=200 \
--config.learner.ppo.epsilon=0.2 \
--config.opponent.type=SELF \
--config.opponent.update_interval=50 \
--config.runtime.max_step=2e7 \
--config.runtime.tag="marth_selfplay" \
--wandb.mode=online \
--wandb.project="slippi-ppo" \
--wandb.name="marth_sp_001"
Key Flag Categories
Actor
• --config.actor.num_envs
, rollout_length
, inner_batch_size
, gpu_inference
Learner & PPO
• --config.learner.learning_rate
, value_cost
, policy_gradient_weight
• --config.learner.ppo.epsilon
, beta
, num_epochs
, num_batches
Opponent & Teacher
• --config.opponent.type
(CPU
/SELF
/OTHER
)
• --config.opponent.update_interval
, train
(self-play schedules)
• --config.teacher
(path to pretrained imitation model)
Runtime & Logging
• --config.runtime.max_step
, log_interval
, checkpoint_interval
• --wandb.*
flags for project, mode, tags
Monitoring & Tuning Tips
- Increase
num_envs
until CPU/RAM saturates, then bumprollout_length
to fill GPU. - Match
inner_batch_size
to CPU threads:num_envs / inner_batch_size ≃ # threads
. - Watch
actor_kl
in WandB; reduceppo.epsilon
or LR if mean KL exceeds threshold. - Adjust
reward_halflife
to bias short- vs. long-term rewards. - Use
--wandb.mode=offline
for unattended runs, thenwandb sync
later.
These workflows deliver end-to-end training, logging, checkpointing, and evaluation across all supported regimes.
Evaluation & Gameplay
Run trained Slippi AI agents in Dolphin or headless loops. Support single- or multi-agent rollouts, agent-vs-agent or human matches, online play tests, and performance benchmarking.
Running Policy Evaluations with run_evaluator.py
Provide a one-stop CLI for rolling out trained agents (single or multi-agent), measuring rewards and FPS, and optionally parallelizing via Ray.
Key Flags
--dolphin
: JSON overrides forDolphinConfig
(e.g.--dolphin.headless=True,infinite_time=False,online_delay=2
)--rollout_length
: frames per rollout (default 3600)--num_envs
: environments per worker--fake_envs
: replayed matches instead of live Dolphin--async_envs
: non-blocking environment stepping--num_env_steps
+--inner_batch_size
: batch size for async envs--use_gpu
: run inference on GPU--num_agent_steps
: batch agent inferences--num_workers
: Ray workers count--agent
: agent-specific flags (model_dir, jit_compile, etc.)--self_play
: agent plays both ports--opponent
: opponent agent flags
How It Works
- Load agent parameters via
eval_lib.load_state
. - Build
dolphin_kwargs
from--dolphin
and player assignments. - Instantiate
evaluators.Evaluator
(single‐process) orevaluators.RayEvaluator
. - Enter
with evaluator.run():
to start environments and agents. - Perform burn-in, then main rollout.
- Log KO-diff/min, component timings (
env_pop
,agent_step
), FPS, SPS.
Examples
Single-env, CPU only
python scripts/run_evaluator.py \ --agent.model_dir=/path/to/checkpoint \ --rollout_length=600 \ --num_envs=1
Four async envs, GPU inference
python scripts/run_evaluator.py \ --agent.model_dir=/path/to/checkpoint \ --num_envs=4 \ --async_envs \ --num_env_steps=64 \ --inner_batch_size=16 \ --use_gpu
Self-play
python scripts/run_evaluator.py \ --agent.model_dir=/path/to/checkpoint \ --self_play \ --num_envs=2
Distributed with 4 Ray workers
python scripts/run_evaluator.py \ --agent.model_dir=/path/to/checkpoint \ --num_envs=8 \ --num_workers=4
eval_two.py
: Automated Matches Between Agents or Against a Human
Quickly pit two AI agents—or AI vs. human—in Dolphin, collect game stats, and profile inference time.
Essential CLI Usage
Agent vs. Agent
python scripts/eval_two.py \ --dolphin.path=/path/to/slippi-dolphin \ --dolphin.iso=/path/to/SSBM.iso \ --p1.ai.path=/path/to/agent1 \ --p2.ai.path=/path/to/agent2 \ --num_games=50
Human vs. Agent (human on port 1)
python scripts/eval_two.py \ --dolphin.path=/path/to/slippi-dolphin \ --dolphin.iso=/path/to/SSBM.iso \ --p1.type=human \ --p2.ai.path=/path/to/agent \ --num_games=10
Key Flags
dolphin.path
,dolphin.iso
: Dolphin executable and ISO (or useDOLPHIN_PATH
/ISO_PATH
)p<n>.type
:ai
orhuman
; forai
, supplyp<n>.ai.path
and optionalp<n>.ai.async_inference
(defaultsTrue
)num_games
: total matches (runs indefinitely if omitted)--dolphin.headless
,--dolphin.online_delay
,--dolphin.save_replays
,--dolphin.replay_dir
Workflow
- Parse flags into
DolphinConfig
and player specs viaeval_lib.get_player()
. - Build and
.start()
eacheval_lib.Agent
. - Launch Dolphin, auto-menu, then loop
for state in dolphin.iter_gamestates()
:agent.step(state)
produces controller inputs.
- Log avg. inference time every 15 seconds.
Tips
- If Dolphin fails to connect, verify paths.
- Simulate lag with
--dolphin.online_delay
. - Use
--p<n>.ai.async_inference=False
for synchronous profiling. - For CI:
--dolphin.headless=True --dolphin.save_replays=True --dolphin.replay_dir=/tmp/replays
. eval_two.py
disables GPUs by default; ensure your model loader respectseval_lib.disable_gpus()
.
netplay.py
: Online Play Testing and Controller Verification
Test a trained agent in a live or headless Dolphin session, optionally verify controller inputs over time.
Common Flags
--dolphin.path
,--dolphin.iso
--ai.model_dir
: checkpoint directory--ai.async_inference
: enable non-blocking inference (defaultTrue
)--runtime
: seconds to run (default 300)--verify_inputs
: assert controller state changes each frame--headless
: run without GUI--save_replays
,--replay_dir
: store Slippi replays
Example: 5-Minute Headless Test with Input Verification
python scripts/netplay.py \
--dolphin.path=/usr/bin/slippi-dolphin \
--dolphin.iso=/games/SSBM.iso \
--ai.model_dir=/models/checkpoint \
--runtime=300 \
--verify_inputs \
--headless \
--save_replays \
--replay_dir=/tmp/replays
What It Does
- Configure Dolphin via
slippi_ai.dolphin.DolphinConfig
. - Build and start agent with
eval_lib.build_agent()
. - Step Dolphin frames, call
agent.step()
each frame. - If
--verify_inputs
, compare consecutive controller reports and error on no change. - On exit, optionally save replay.
run_dolphin.py
: Dolphin Emulator Benchmarking
Launch multiple Dolphin instances with AI or CPU players to measure emulator performance in frames per second.
Key Flags
--dolphin.path
,--dolphin.iso
--num_instances
: parallel Dolphin processes (default 1)--num_frames
: frames per instance (default 10000)--render
: enable graphics (default False/headless)--p<n>.type
:ai
,cpu
, orhuman
--p<n>.ai.path
: model dir for AI players--headless
,--save_replays
,--replay_dir
Example: Benchmark 4 Headless Instances, 20k Frames Each
python scripts/run_dolphin.py \
--dolphin.path=/usr/bin/slippi-dolphin \
--dolphin.iso=/games/SSBM.iso \
--num_instances=4 \
--num_frames=20000 \
--p1.type=ai --p1.ai.path=/models/a1 \
--p2.type=cpu \
--headless
Output
- Per-instance FPS
- Aggregate average FPS
Use these scripts to automate evaluation loops, human-involved matches, online play tests, and performance benchmarks for Slippi AI agents.
Architecture & Extensibility
This section explores the core abstractions in slippi-ai and shows how to extend or replace modules safely. Each subsystem defines clear interfaces—use subclassing and composition to inject custom logic.
Controller Heads
Controller heads encapsulate how discrete controller actions are embedded, sampled, and compared.
Core Interfaces
ControllerHeadBase
(abstract):size
propertyembed(controller: Controller) → np.ndarray
sample(logits: tf.Tensor, **kwargs) → Controller
distance(logits1, logits2) → tf.Tensor
Provided implementations:
IndependentControllerHead
AutoregressiveControllerHead
Extending with a Custom Head
from slippi_ai.controller_heads import ControllerHeadBase, IndependentControllerHead
import tensorflow as tf
import numpy as np
class MyControllerHead(ControllerHeadBase):
def __init__(self, name, button_dim, stick_dim):
super().__init__(name)
self.button_dim = button_dim
self.stick_dim = stick_dim
@property
def size(self):
return self.button_dim + self.stick_dim
def embed(self, controller):
# pack discrete buttons and continuous sticks
btns = controller.buttons.astype(np.float32)
stick = controller.stick / 1.0
return np.concatenate([btns, stick], axis=-1)
def sample(self, logits, temperature=1.0):
# custom temperature scaling
probs = tf.nn.softmax(logits[:self.button_dim] / temperature)
btn_sample = tf.random.categorical(tf.math.log(probs), 1)
stick_pred = logits[self.button_dim:]
return self._reconstruct_controller(btn_sample, stick_pred)
def distance(self, a, b):
# L2 on stick + cross‐entropy on buttons
stick_a, stick_b = a[..., self.button_dim:], b[..., self.button_dim:]
btn_a, btn_b = a[..., :self.button_dim], b[..., :self.button_dim]
stick_dist = tf.reduce_sum((stick_a - stick_b) ** 2, axis=-1)
btn_ce = tf.reduce_sum(-btn_a * tf.math.log(btn_b + 1e-6), axis=-1)
return stick_dist + btn_ce
def _reconstruct_controller(self, btn_idx, stick_vec):
# convert back to Controller namedtuple
from slippi_ai.types import Controller
btn_onehot = tf.one_hot(tf.squeeze(btn_idx, -1), self.button_dim)
return Controller(buttons=btn_onehot.numpy(), stick=stick_vec.numpy())
# Registration: swap in your head when building a Policy
from slippi_ai.policies import Policy
policy = Policy(controller_head=MyControllerHead("my_head", 10, 2), ...)
Embeddings
Embeddings map structured game state to flat tensors and back.
Base Class
Embedding
(abstract):size
property__call__(np_array) → tf.Tensor
from_state(python_struct) → np.ndarray
sample(logits, **kwargs)
distance(a, b)
Composing Structs
Use StructEmbedding
(or helpers ordered_struct_embedding
/dict_embedding
) to combine field‐level embeddings:
from slippi_ai.embed import FloatEmbedding, BoolEmbedding, ordered_struct_embedding
from slippi_ai.types import Player
pct = FloatEmbedding("percent", scale=0.01)
facing = BoolEmbedding("facing", off=-1.)
player_emb = ordered_struct_embedding(
name="player",
embedding=[
("percent", pct),
("facing", facing),
# …
],
nt_type=Player
)
# Embed a Player state:
np_player = player_emb.from_state(parsed_namedtuple)
tf_tensor = player_emb(np_player)
Custom Embedding
from slippi_ai.embed import Embedding
import tensorflow as tf
import numpy as np
class LogScaleEmbedding(Embedding):
def __init__(self, name, base=10.):
super().__init__(name)
self.base = base
@property
def size(self):
return 1
def from_state(self, val: float):
return np.array([np.log(val + 1) / np.log(self.base)], dtype=np.float32)
def __call__(self, arr: np.ndarray):
return tf.convert_to_tensor(arr)
def distance(self, a, b):
return tf.abs(a - b)
Environment Backends
The environment API supports single, batched, safe and multiprocessing variants.
Core Interface
Environment
(abstract):reset() → Dict[int, Game]
step(controllers: Dict[int, Controller]) → (gamestates, needs_reset)
Extending or Replacing
from slippi_ai.envs import Environment, EnvOutput
from slippi_ai.types import Controller, Game
class DummyEnv(Environment):
def __init__(self, length=100):
self.length = length
self.t = 0
def reset(self):
self.t = 0
return {0: Game.zero()}
def step(self, controllers: Dict[int, Controller]):
self.t += 1
state = Game.random()
done = self.t >= self.length
return {0: state}, done
# Plug into AsyncEnvMP
from slippi_ai.envs import AsyncEnvMP
env = AsyncEnvMP(env_class=DummyEnv, num_envs=0)
Network Modules
Networks implement recurrent and feedforward architectures with a unified step
/unroll
API.
Abstract Base
Network
(abstract):initial_state(batch_size) → RecurrentState
step(inputs, prev_state) → (outputs, next_state)
Adding a Custom Network
import tensorflow as tf
from slippi_ai.networks import Network
class MyCNN(Network):
def __init__(self, name="my_cnn"):
super().__init__(name=name)
self.conv = tf.keras.layers.Conv2D(32, 3, activation='relu')
self.flatten = tf.keras.layers.Flatten()
self.fc = tf.keras.layers.Dense(64)
def initial_state(self, batch_size):
# stateless CNN
return ()
def step(self, inputs, prev_state):
# inputs: [batch, H, W, C]
x = self.conv(inputs)
x = self.flatten(x)
return self.fc(x), prev_state
# Integrate into a Policy’s network stack
from slippi_ai.policies import Policy
policy = Policy(network=MyCNN(), ...)
Observation Filters
Filters preprocess raw game states before embedding or training.
Base Class
ObservationFilter
:apply(step_index, gamestate, controller)
→ filtered gamestatemask(actions)
→ masked action set
Custom Filter
from slippi_ai.observations import ObservationFilter
class NoShieldsFilter(ObservationFilter):
def apply(self, step, gamestate, controller):
# zero out shield strength
gamestate[0].shield_strength = 0.0
return gamestate, controller
# Build a pipeline
from slippi_ai.observations import build_observation_filter
filter_chain = build_observation_filter(
filters=[NoShieldsFilter(), AnimationFilter(mask_duration=5)]
)
Policy Modules
Policy
ties networks, embeddings, and controller heads into training and sampling routines.
Swapping Components
from slippi_ai.policies import Policy
from slippi_ai.networks import MyCNN
from slippi_ai.controller_heads import MyControllerHead
policy = Policy(
network=MyCNN(),
controller_head=MyControllerHead("custom", 8, 2),
state_embedding=my_state_embedding, # any Embedding instance
value_head=my_value_head # optional custom head
)
Extending Loss Functions
Subclass Policy
and override:
class MyPolicy(Policy):
def imitation_loss(self, logits, target):
# custom loss on controller logits
return tf.reduce_mean((logits - target) ** 2)
def value_loss(self, values, returns):
# Huber instead of MSE
return tf.keras.losses.Huber()(returns, values)
By adhering to these interfaces and using subclassing/composition, you can inject new behavior into slippi-ai’s controller sampling, embeddings, environments, networks, filters, and policies without modifying core logic. Which subsection would you like documentation for?
- Stripping Jupyter Notebook Outputs (Git Filter)
- Continuous Integration with GitHub Actions