Virtualization Layers¶

The testbed is built as a stack of five abstraction layers, each providing a different level of isolation and programmability. Understanding this stack is essential for debugging, scaling, and reasoning about what happens when something fails.

graph BT
    L1["Layer 1 · Physical Host
    VT-x/AMD-V · Linux/macOS/Windows"]
    L2["Layer 2 · Virtual Machines
    Vagrant + VirtualBox · 4 VMs · 192.168.56.0/24"]
    L3["Layer 3 · Kubernetes
    K3s + KubeEdge + Flannel · pod scheduling · services"]
    L4["Layer 4 · 5G Overlay Network
    OVS bridges + VXLAN + Multus + Whereabouts IPAM"]
    L5["Layer 5 · 5G Network Functions
    Open5GS pods · UERANSIM / Physical RAN"]

    L1 --> L2 --> L3 --> L4 --> L5

Read each section top-down to understand what each layer provides, how it is implemented, and where its limits are.

Layer 1: Physical Host¶

What it provides¶

Hardware compute, storage, and the hypervisor entry point. All four VMs share this host's CPU, RAM, and NIC.

Requirements¶

Resource	Minimum	Recommended
CPU	4 cores (with VT-x/AMD-V)	8+ cores
RAM	16 GB	24 GB
Disk	40 GB free	60 GB SSD
OS	Linux, macOS, Windows	Linux (fewest surprises)
Virtualisation	VT-x or AMD-V enabled in BIOS	—

The physical NIC is only involved when using Physical RAN mode: the host NIC that is connected to the femtocell is bridged into the worker VM via VirtualBox bridged networking (PHYSICAL_RAN_BRIDGE=<nic>).

Scaling limits¶

A single host is the fundamental constraint. All four VMs compete for the same physical CPU and RAM. The worker node carries the heaviest load (8 vCPU, 8 GB RAM) because it runs all 5G Core NFs. On a host with fewer than 16 GB RAM total, the worker may OOM under load.

Production path¶

Replace VirtualBox with bare metal or a cloud hypervisor (KVM, ESXi, AWS/GCP). The Ansible automation is hypervisor-agnostic, so only the Vagrantfile needs to be replaced with a different provisioner.

Layer 2: Virtual Machines¶

What it provides¶

Network isolation, resource limits, and OS-level separation between nodes. Each VM has its own kernel, network stack, and file system. The 192.168.56.0/24 management network connects all four VMs.

Technology¶

Vagrant: declarative VM lifecycle (create, provision, destroy)
VirtualBox: hypervisor providing CPU/RAM/NIC virtualisation
VM networking: two types of interfaces per VM:
eth0 (NAT): internet access for package downloads
eth1 (host-only, 192.168.56.0/24): inter-VM communication and cluster traffic

The 5G overlay traffic (N2, N3, N4 VXLAN) rides on top of eth1. This means all 5G-plane traffic shares the same management NIC, acceptable in a testbed, not ideal for production.

Key configuration¶

The Vagrantfile controls: - VM sizes (vCPU, RAM) - Network assignment (host-only IP) - Physical RAN bridging (public_network on worker when PHYSICAL_RAN_BRIDGE is set) - Ansible provisioner invocation (triggers 00-main-playbook.yml)

Scaling limits¶

Single-host topology: All VMs on the same physical machine, so inter-VM "network" is actually shared memory via VirtualBox virtual switch. Latency is sub-millisecond; throughput is limited by memory bandwidth, not NIC speed.
VirtualBox kernel module: Can conflict with other hypervisors (VMware, Hyper-V). On Linux, KVM is a better long-term choice.
Static IP assignment: All node IPs are hardcoded in ansible/group_vars/all.yml. Adding a fourth compute node requires updating both the Vagrantfile and group_vars.

Production path¶

Replace Vagrant + VirtualBox with: - Bare metal: remove the VM layer entirely; run K3s and KubeEdge directly on physical servers - Cloud VMs: use Terraform + cloud-init instead of Vagrant; the Ansible plays remain the same - KVM: use vagrant-libvirt instead of VirtualBox for a more Linux-native experience

Layer 3: Kubernetes¶

What it provides¶

Container orchestration: pod scheduling, service discovery (for cloud nodes), secrets/configmap distribution, rolling updates, and health-based restarts. This layer gives the 5G NFs their lifecycle management.

Technology¶

graph LR
    subgraph Cloud["Cloud side (master + worker)"]
        API["K3s API Server
        (master)"]
        K3SA["K3s Agent
        (worker)"]
        CC["CloudCore
        (worker, kubeedge ns)"]
    end
    subgraph Edge["Edge side"]
        EC["EdgeCore
        (systemd service)"]
        CRID["containerd
        (standalone)"]
    end
    API <--> K3SA
    CC <-->|"WebSocket TCP/10000"| EC
    EC --> CRID

Component	Node	Role
K3s Server	master	API server, etcd, scheduler, controller manager, CoreDNS
K3s Agent	worker	Standard kubelet, kube-proxy
CloudCore	worker (pod)	KubeEdge cloud-side controller
EdgeCore	edge (systemd)	Lightweight kubelet replacement
containerd	edge (standalone)	CRI for EdgeCore — not the k3s containerd

Why no k3s-agent on edge: k3s-agent and EdgeCore both implement the kubelet interface. Running both causes dual registration (same hostname registered twice), competing CRI access, and double pod management. The edge runs EdgeCore-only with a standalone containerd installed separately from k3s.

Edge node limitations (KubeEdge specifics)¶

These are not bugs: they are known constraints of the KubeEdge architecture. Workarounds are documented in known-issues/:

Limitation	Cause	Workaround
No CoreDNS	EdgeCore does not proxy DNS to the cluster	Pods use static IPs; init containers query K8s API directly
ConfigMap sync unreliable	EdgeCore's metadata sync has edge cases	Inject values as env vars at deploy time from Ansible
ServiceAccount token bugs	Token projection fails on edge pods	Set `automountServiceAccountToken: false`
Multus env injection	KubeEdge injects empty K8s env vars; last write wins, breaking Multus auto-config	Use static Multus conflist on edge

Scaling limits¶

Single control plane: One K3s master is a single point of failure. For HA, K3s supports embedded etcd with multiple servers.
Edge node count: KubeEdge scales to thousands of edge nodes in theory; in this testbed, there is one.
No horizontal pod autoscaling on edge: KubeEdge does not fully support HPA for edge pods (metrics-server is not accessible).

Production path¶

Replace VirtualBox VMs with bare-metal or cloud VMs
Use K3s HA (3-node embedded etcd) or upstream Kubernetes for the cloud side
For many edge nodes: KubeEdge scales well; add edge nodes by running keadm join with the CloudCore token

Layer 4: 5G Overlay Network¶

This is the most complex layer. It solves a fundamental problem: Kubernetes gives every pod one network interface, but 5G NFs need multiple isolated interfaces, one per N-reference-point.

What it provides¶

Per-interface isolation: each 5G interface (N1, N2, N3, N4, N6) is a separate L2 domain, just like in a real 5G deployment
Cross-node connectivity: worker and edge pods on different VMs can communicate on the same N-interface subnet via VXLAN tunnels
Dynamic IP management: pods get stable, predictable IPs within each interface's subnet via Whereabouts IPAM

Technology stack¶

graph TB
    APP["5G NF pod
    (e.g. AMF)"]
    MULTUS["Multus (meta-CNI)
    reads k8s.v1.cni.cncf.io/networks annotation"]
    FLANNEL["Flannel
    (primary CNI → eth0)"]
    OVS_CNI["OVS CNI plugin
    (secondary CNI → n1, n2, ...)"]
    WA["Whereabouts
    (cluster-wide IPAM)"]
    OVS["Open vSwitch bridge
    (br-n2, br-n3, ...)"]
    VXLAN["VXLAN tunnel
    UDP 4789"]

    APP -->|"create pod"| MULTUS
    MULTUS --> FLANNEL
    MULTUS --> OVS_CNI
    OVS_CNI --> WA
    OVS_CNI --> OVS
    OVS --> VXLAN

Technology	Role
Multus	Meta-CNI: calls primary CNI first, then secondary CNIs for each extra interface
OVS CNI	Connects pod netns to a named OVS bridge via a veth pair
Whereabouts	Cluster-wide IPAM — allocates IPs across nodes without conflicts
Open vSwitch	Programmable software switch; one bridge per 5G interface
VXLAN	L2-over-UDP tunnel connecting worker and edge OVS bridges

OVS bridge topology¶

graph LR
    subgraph Worker["Worker  192.168.56.11"]
        BN1W["br-n1
        VNI 101
        10.201.0.0/24"]

        BN2W["br-n2
        VNI 102
        10.202.0.0/24"]

        BN3W["br-n3
        VNI 103
        10.203.0.0/24"]

        BN4W["br-n4
        VNI 104
        10.204.0.0/24"]

        BN6EW["br-n6e
        VNI 106
        10.206.0.0/24"]

        BN6CW["br-n6c
        VNI 107
        10.207.0.0/24"]
    end

    subgraph Edge["Edge  192.168.56.12"]

        BN1E["br-n1
        VNI 101"]

        BN2E["br-n2
        VNI 102"]

        BN3E["br-n3
        VNI 103"]

        BN4E["br-n4
        VNI 104"]

        BN6EE["br-n6e
        VNI 106"]

        BN6CE["br-n6c
        VNI 107"]
    end

    BN1W <-->|"VXLAN UDP 4789"| BN1E
    BN2W <-->|"VXLAN UDP 4789"| BN2E
    BN3W <-->|"VXLAN UDP 4789"| BN3E
    BN4W <-->|"VXLAN UDP 4789"| BN4E
    BN6EW <-->|"VXLAN UDP 4789"| BN6EE
    BN6CW <-->|"VXLAN UDP 4789"| BN6CE

Why two Multus DaemonSets (and two OVS DaemonSets)¶

K3s and standalone containerd use different filesystem paths for CNI configuration:

Node	CNI config dir	CNI binary dir
worker (K3s)	`/var/lib/rancher/k3s/agent/etc/cni/net.d`	`/opt/cni/bin`
edge (containerd)	`/etc/cni/net.d`	`/opt/cni/bin`

A single DaemonSet cannot mount different host paths on different nodes. Two DaemonSets, one targeting kubernetes.io/hostname: worker, one targeting kubernetes.io/hostname: edge, each mount the correct paths for their node.

Additionally, the Multus configuration differs: - Worker: auto-mode (--multus-conf-file=auto), Multus reads the primary CNI config and wraps it - Edge: static conflist + external kubeconfig, because KubeEdge injects empty KUBERNETES_SERVICE_HOST/PORT env vars that break Multus auto-config (see known-issues/kubeedge-multus-env-injection.md)

Per-cell network scaling¶

For multi-gNB deployments, additional per-cell bridges are created:

Bridge	VNI	Subnet	Purpose
br-n2-cell-1	1021	10.202.1.0/24	Cell 1 dedicated N2
br-n3-cell-1	1031	10.203.1.0/24	Cell 1 dedicated N3
br-n2-cell-N	102N	10.202.N.0/24	Cell N dedicated N2
br-n3-cell-N	103N	10.203.N.0/24	Cell N dedicated N3

This is driven by vars/topology.yml in Phase 6.

Scaling limits and production evaluation¶

Design choice	Production-grade?	Notes
OVS bridge per N-interface	✅ Yes	Mirrors real 5G transport design
VXLAN over management NIC	⚠️ Testbed	In prod, N2/N3 run on dedicated NICs (no MTU overhead, no shared bandwidth)
Whereabouts IPAM	✅ Yes	Used in production Multus deployments
Static IP reservations for NFs	✅ Yes	Required where DNS is unavailable
VNI key space	✅ Sufficient	12 bridges used, VNI space is 24-bit (16M keys)
Single VXLAN endpoint	⚠️ Testbed	Production would use ECMP or bonded NICs
No QoS/traffic shaping	⚠️ Testbed	OVS supports OpenFlow QoS; not configured here
MTU not adjusted	⚠️ Testbed	VXLAN adds 50-byte overhead; default MTU 1500 may cause fragmentation on N3 GTP-U

Layer 5: 5G Network Functions¶

What it provides¶

The complete 5G SA (Standalone) core as defined by 3GPP Release 16, plus a simulated or physical RAN.

Technology¶

Open5GS: open-source 5G SA core (AMF, SMF, UPF, NRF, UDM, UDR, AUSF, PCF, BSF, NSSF)
MongoDB: persistent store for UDR (subscriber data, session contexts)
UERANSIM: 5G NR RAN + UE simulator
Physical RAN: femtocell (e.g. nCELL-F2240) connected via OVS bridging

How NFs find each other¶

sequenceDiagram
    participant NF as Any NF (AMF/SMF/...)
    participant NRF as NRF (10.x.x.x:7777)
    participant Target as Target NF

    NF->>NRF: Register (SBI HTTP/2)
    NF->>NRF: Discover target NF
    NRF-->>NF: Endpoint list
    NF->>Target: Direct SBI call (HTTP/2)

NFs register themselves with NRF on startup and discover peers via NRF. This is the standard 3GPP SBI (Service-Based Interface) model. Because edge pods have no CoreDNS, AMF IPs are pinned as static Kubernetes pod IPs, and gNB and UE pods query the K8s API at startup to resolve them.

5G session flow¶

sequenceDiagram
    participant UE
    participant gNB
    participant AMF
    participant SMF
    participant UPF
    participant DN as Data Network

    UE->>gNB: RRC Connection (radio)
    gNB->>AMF: NGAP Initial UE Message (N2)
    AMF->>UE: NAS Registration (N1)
    UE->>AMF: PDU Session Request (N1)
    AMF->>SMF: PDU Session Create (SBI/N11)
    SMF->>UPF: PFCP Session Establish (N4)
    SMF->>AMF: Session Response
    AMF->>gNB: PDU Session Resource Setup (N2)
    gNB->>UE: Session Accept
    UE->>gNB: Data (radio)
    gNB->>UPF: GTP-U encapsulated data (N3)
    UPF->>DN: Decapsulated IP traffic (N6)

Scaling limits and production evaluation¶

Design choice	Production-grade?	Notes
3GPP-compliant NF architecture	✅ Yes	Open5GS passes interoperability tests
SBI via NRF discovery	✅ Yes	Standard production pattern
Single UPF-Cloud, one UPF-Edge	⚠️ Testbed	Production deploys multiple UPF instances per slice
UPF-Edge disabled	⚠️ Open issue	CNI route conflict on edge node (see known-issues/)
No network slicing	⚠️ Testbed	NSSF is deployed but slicing is not exercised
Static subscriber import	⚠️ Testbed	Production uses dynamic subscriber provisioning
Single MongoDB instance	⚠️ Testbed	Production: MongoDB replica set for HA

Summary: Testbed vs Production¶

graph LR
    subgraph Testbed["This testbed"]

        T1["Vagrant + VirtualBox
        (1 laptop)"]

        T2["K3s + KubeEdge
        (lightweight)"]

        T3["OVS + VXLAN
        on management NIC"]

        T4["Open5GS
        (single instance per NF)"]

        T5["UERANSIM or
        femtocell"]
    end

    subgraph Prod["Production equivalent"]
        P1["Bare metal / cloud VMs
        (multi-rack)"]

        P2["Kubernetes HA +
        KubeEdge (many edge nodes)"]

        P3["Dedicated NICs per plane
        no VXLAN overhead"]

        P4["Multi-instance NFs
        with HA and scaling"]

        P5["Real gNBs
        (3GPP-certified)"]
    end

    T1 -.->|"upgrade path"| P1
    T2 -.->|"upgrade path"| P2
    T3 -.->|"upgrade path"| P3
    T4 -.->|"upgrade path"| P4
    T5 -.->|"upgrade path"| P5

The design choices that are already production-grade (OVS per-interface isolation, Whereabouts IPAM, SBI/NRF architecture, VXLAN tunnelling principle) provide a solid base. The testbed-specific shortcuts (single-host, shared management NIC, single NF instances) are all in the infrastructure layers, so they can be replaced independently without rearchitecting the 5G layer.

Architecture Overview: component placement and deployment flow
Network Topology: OVS, VXLAN, and Multus in depth
5G Interfaces: N-reference-point subnets and protocols
Known Issues: layer-specific limitations with workarounds

Virtualization Layers¶

Layer 1: Physical Host¶

What it provides¶

Requirements¶

Scaling limits¶

Production path¶

Layer 2: Virtual Machines¶

What it provides¶

Technology¶

Key configuration¶

Scaling limits¶

Production path¶

Layer 3: Kubernetes¶

What it provides¶

Technology¶

Edge node limitations (KubeEdge specifics)¶

Scaling limits¶

Production path¶

Layer 4: 5G Overlay Network¶

What it provides¶

Technology stack¶

OVS bridge topology¶

Why two Multus DaemonSets (and two OVS DaemonSets)¶

Per-cell network scaling¶

Scaling limits and production evaluation¶

Layer 5: 5G Network Functions¶

What it provides¶

Technology¶

How NFs find each other¶

5G session flow¶

Scaling limits and production evaluation¶

Summary: Testbed vs Production¶

Related Documentation¶