PodMotion is alpha software (v0.1.0-alpha). APIs and behaviors may change without notice. Not recommended for production workloads.

Getting Started

PodMotion migrates live Kubernetes pods between nodes within a single cluster using CRIU checkpoint/restore. This guide takes you from a fresh cluster to a verified migration driven entirely by the PodMigration custom resource and kubectl apply.

WARNING

PodMotion is alpha software (v0.1.0-alpha). It is not production-ready. The proof corpus is reproduced on arm64 (Ubuntu 24.04 / kernel 6.8 / flannel VXLAN); amd64 is buildable but the amd64/kind path is not yet proven.

NOTE

The only shipped user interface is the PodMigration CRD applied with kubectl apply. There is no kubectl podmotion plugin — a CLI is described in docs/kubectl-migrate-proposal.md (Status: Draft) but is not shipped. See the CRD / Interface Reference for the proposal status.

Prerequisites

Before you begin, confirm every node and the cluster meet the baselines from the project README:

  • Kubernetes 1.25+ and a matching kubectl
  • Go 1.24.6+ and Docker 17.03+ (only to build images)
  • Linux kernel 5.15+ on every node (Ubuntu 22.04 HWE minimum; 6.8+ preferred)
  • CRIU 4.x installed on every node (4.2 used for iterative pre-copy)
  • A bpfman DaemonSet running on the cluster
  • PostCopy mode additionally requires CONFIG_USERFAULTFD=y on the destination kernel

A flannel VXLAN overlay is the proven, documented CNI environment. TCP source-IP preservation (spec.sourcePodIP) is Calico/Cilium-only.

Install

Install the CRDs and deploy the manager and node-agent DaemonSet:

# Install the 5 CRDs (podmigrations, migrationpolicies,
# nodemigrationcapacities, migrationcheckpoints, migrationwebhookconfigs)
make install

# Deploy the controller-manager
make deploy IMG=<your-registry>/podmotion:<tag>

Alternatively, build and apply the release bundle:

make build-installer IMG=<your-registry>/podmotion:<tag>
kubectl apply -f dist/install.yaml
NOTE

The repository includes a Helm chart at charts/podmotion (v0.1.0). It is present in the source tree but not yet published to a public chart repository. To use it, clone the repo and install from the local path: helm install podmotion ./charts/podmotion -n podmotion-system --create-namespace

A successful install gives you the controller-manager Deployment, the node-agent DaemonSet, the admission webhooks, and the five migration.podmotion.io CRDs.

Your First Migration

A minimal PodMigration requires podName and podNamespace:

apiVersion: migration.podmotion.io/v1alpha1
kind: PodMigration
metadata:
  name: migrate-my-app
  namespace: default
spec:
  podName: my-app-pod          # required
  podNamespace: default        # required
  targetNodeName: worker-2     # optional; scheduler picks if empty
  mode: PreCopy                # PreCopy (default) | PostCopy | Cold
  # tcpPreservationMode defaults to None (no TCP continuity).
  # Opt in with Strict or BestEffort — see below.

Apply it:

kubectl apply -f migrate-my-app.yaml

The CRD has shortName pm and category podmotion, so you can list migrations with kubectl get pm or kubectl get podmotion.

TCP connection continuity is opt-in

WARNING

spec.tcpPreservationMode defaults to None, which skips TCP verification entirely — in-flight TCP connections are not preserved. This is the conservative default (ADR-0041). Zero-connection-loss behavior is opt-in.

To preserve TCP connections, set the mode explicitly:

ModeBehavior
None (default)No TCP continuity; TCPVerifying phase skipped. Stateless workloads.
StrictHard rollback if TCPSequenceContinuityVerified=False (ADR-0019 Amendment A).
BestEffortWarning condition only; no rollback.

The full TCP-first pipeline runs only for Strict and BestEffort.

Verify It Worked

Watch the migration status:

kubectl get podmigration migrate-my-app -o yaml

Track status.phase. The full MigrationPhase enum has 22 values — the list below is a partial selection of the most observable phases. See api/v1alpha1/podmigration_types.go for the complete enum.

Common phases: Pending, Validating, Checkpointing, PreCopyMemory, SocketInventory, ZeroWindowArm, OverlayHandoff, Transferring, Restoring, RestoreSocketReattach, DisengageHold, TCPVerifying, ServiceVerifying, CutoverComplete, Complete, DryRunComplete, Failed, RollingBack.

Inspect status.conditions. The relevant exported condition types are:

  • TCPVerified — aggregate TCP gate (only populated for Strict/BestEffort)
  • TCPSequenceContinuityVerifiedseq_delta=0 (zero dropped TCP sequence numbers)
  • ServiceVerified — Service-level reachability confirmed
  • SocketsLive / ProcessResumedAfterSocketsLive — SIGCONT-after-sockets-live ordering invariant
  • ConnectionCountParity, OverlayReady, ZeroWindowArmed, ZeroWindowDisarmed

For a dry run that produces a DryRunEstimate without moving the pod, set spec.dryRun: true; the migration ends in phase DryRunComplete.

Security

PodMotion is a privileged operator. Read this before installing on any cluster you care about.

  • Privileged node-agent DaemonSet (config/agent/daemonset.yaml): runs with hostPID: true, seccompProfile Unconfined (not RuntimeDefault), and tolerates all taints (operator: Exists), so it runs on every node including control-plane nodes. It drops ALL capabilities, then adds: SYS_PTRACE, SYS_ADMIN, NET_ADMIN, DAC_OVERRIDE, CHOWN, SETUID, SETGID, CHECKPOINT_RESTORE. There is no privileged: true flag, but this posture is effectively highly privileged. It hostPath-mounts the host CRIU binary (/usr/sbin/criu, read-only) and /var/lib/podmotion/checkpoints.
  • Checkpoint images contain process memory. CRIU checkpoints are full memory dumps and may contain secrets, tokens, and in-flight data. Protect the checkpoint store and any image registry that holds them accordingly.
  • Transport security is opt-in. The agent gRPC endpoint (:9090) can fall back to plaintext; mTLS is opt-in, not the default. Enable it before using PodMotion across untrusted networks.
  • The controller-manager itself is hardened by contrast (runAsNonRoot, seccomp RuntimeDefault, read-only root filesystem, all capabilities dropped).

See SECURITY.md in the repository for the full security policy and disclosure process.