Local Persistent Volumes GA
The Local Persistent Volumes feature has been promoted to GA in Kubernetes 1.14. It was first introduced as alpha in Kubernetes 1.7, and then beta in Kubernetes 1.10. The GA milestone indicates that Kubernetes users may depend on the feature and its API for production use. GA features are protected by the Kubernetes deprecation policy.
What is a Local Persistent Volume?
A local persistent volume represents a local disk directly-attached to a single Kubernetes Node.
Kubernetes provides a powerful volume plugin system that enables Kubernetes workloads to use a wide variety of block and file storage to persist data. Most of these plugins enable remote storage – these remote storage systems persist data independent of the Kubernetes node where the data originated. Remote storage usually can not offer the consistent high performance guarantees of local directly-attached storage. With the Local Persistent Volume plugin, Kubernetes workloads can now consume high performance local storage using the same volume APIs that app developers have become accustomed to.
How is it different from a HostPath Volume?
To better understand the benefits of a Local Persistent Volume, it is useful to compare it to a HostPath volume. HostPath volumes mount a file or directory from the host node’s filesystem into a Pod. Similarly a Local Persistent Volume mounts a local disk or partition into a Pod.
The biggest difference is that the Kubernetes scheduler understands which node a Local Persistent Volume belongs to. With HostPath volumes, a pod referencing a HostPath volume may be moved by the scheduler to a different node resulting in data loss. But with Local Persistent Volumes, the Kubernetes scheduler ensures that a pod using a Local Persistent Volume is always scheduled to the same node.
While HostPath volumes may be referenced via a Persistent Volume Claim (PVC) or directly inline in a pod definition, Local Persistent Volumes can only be referenced via a PVC. This provides additional security benefits since Persistent Volume objects are managed by the administrator, preventing Pods from being able to access any path on the host.
Additional benefits include support for formatting of block devices during mount, and volume ownership using fsGroup.
What’s New With GA?
Since 1.10, we have mainly focused on improving stability and scalability of the feature so that it is production ready.
The only major feature addition is the ability to specify a raw block device and have Kubernetes automatically format and mount the filesystem. This reduces the previous burden of having to format and mount devices before giving it to Kubernetes.
Limitations of GA
At GA, Local Persistent Volumes do not support dynamic volume provisioning. However there is an external controller available to help manage the local PersistentVolume lifecycle for individual disks on your nodes. This includes creating the PersistentVolume objects, cleaning up and reusing disks once they have been released by the application.
Creating a cluster with local SSDs
gcloud container clusters create local-ssd-cluster-1 \
--num-nodes 1 --local-ssd-count 1
WARNING: In June 2019, node auto-upgrade will be enabled by default for newly created clusters and node pools. To disable it, use the `--no-enable-autoupgrade` flag.
WARNING: Starting in 1.12, new clusters will have basic authentication disabled by default. Basic authentication can be enabled (or disabled) manually using the `--[no-]enable-basic-auth` flag.
WARNING: Starting in 1.12, new clusters will not have a client certificate issued. You can manually enable (or disable) the issuance of the client certificate using the `--[no-]issue-client-certificate` flag.
WARNING: Currently VPC-native is not the default mode during cluster creation. In the future, this will become the default mode and can be disabled using `--no-enable-ip-alias` flag. Use `--[no-]enable-ip-alias` flag to suppress this warning.
WARNING: Starting in 1.12, default node pools in new clusters will have their legacy Compute Engine instance metadata endpoints disabled by default. To create a cluster with legacy instance metadata endpoints disabled in the default node pool, run `clusters create` with the flag `--metadata disable-legacy-endpoints=true`.
WARNING: Your Pod address range (`--cluster-ipv4-cidr`) can accommodate at most 1008 node(s).
This will enable the autorepair feature for nodes. Please see https://cloud.google.com/kubernetes-engine/docs/node-auto-repair for more information on node autorepairs.
Creating cluster local-ssd-cluster-1 in us-central1-a... Cluster is being health-checked (master is healthy)...done.
Created [https://container.googleapis.com/v1/projects/espblufi-android/zones/us-central1-a/clusters/local-ssd-cluster-1].
To inspect the contents of your cluster, go to: https://console.cloud.google.com/kubernetes/workload_/gcloud/us-central1-a/local-ssd-cluster-1?project=espblufi-android
kubeconfig entry generated for local-ssd-cluster-1.
NAME LOCATION MASTER_VERSION MASTER_IP MACHINE_TYPE NODE_VERSION NUM_NODES STATUS
local-ssd-cluster-1 us-central1-a 1.13.7-gke.8 35.194.60.49 n1-standard-1 1.13.7-gke.8 1 RUNNING
kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
gke-local-ssd-cluster-1-default-pool-dfd00064-dqrq Ready <none> 74s v1.13.7-gke.8 10.128.0.2 35.206.106.166 Container-Optimized OS from Google 4.14.127+ docker://18.9.3
View Local SSD
ssh adithya.j@35.206.106.166
lsblk -O
NAME KNAME MAJ:MIN FSTYPE MOUNTPOINT LABEL UUID PARTTYPE PARTLABEL PARTUUID PARTFLAGS RA RO RM HOTPLUG MODEL SERIAL SIZE STATE OWNER GROUP MODE ALIGNMENT MIN-IO OPT-IO PHY-SEC LOG-SEC ROTA SCHED RQ-SIZE TYPE DISC-ALN DISC-GRAN DISC-MAX DISC-ZERO WSAME WWN RAND PKNAME HCTL TRAN SUBSYSTEMS REV VENDOR ZONED
sda sda 8:0 128 0 0 0 PersistentDisk 100G running root disk brw-rw---- 0 4096 0 4096 512 1 mq-deadline 256 disk 0 4K 4G 0 4G 1 0:0:1:0 block:scsi:virtio:pci 1 Google none
|-sda1 sda1 8:1 /mnt/stateful_partition 0 0 95.9G root disk brw-rw---- 0 part 0 0 0 sda block:scsi:virtio:pci
|-sda2 sda2 8:2 128 0 0 0 16M root disk brw-rw---- 0 4096 0 4096 512 1 mq-deadline 256 part 0 4K 4G 0 4G 1 sda block:scsi:virtio:pci none
|-sda3 sda3 8:3 128 0 0 0 2G root disk brw-rw---- 0 4096 0 4096 512 1 mq-deadline 256 part 0 4K 4G 0 4G 1 sda block:scsi:virtio:pci none
|-sda4 sda4 8:4 128 0 0 0 16M root disk brw-rw---- 0 4096 0 4096 512 1 mq-deadline 256 part 0 4K 4G 0 4G 1 sda block:scsi:virtio:pci none
|-sda5 sda5 8:5 128 0 0 0 2G root disk brw-rw---- 0 4096 0 4096 512 1 mq-deadline 256 part 0 4K 4G 0 4G 1 sda block:scsi:virtio:pci none
|-sda6 sda6 8:6 0 0 512B root disk brw-rw---- 0 part 0 0 0 sda block:scsi:virtio:pci
|-sda7 sda7 8:7 128 0 0 0 512B root disk brw-rw---- 3584 4096 0 4096 512 1 mq-deadline 256 part 3584 4K 4G 0 4G 1 sda block:scsi:virtio:pci none
|-sda8 sda8 8:8 /usr/share/oem 0 0 16M root disk brw-rw---- 0 part 0 0 0 sda block:scsi:virtio:pci
|-sda9 sda9 8:9 128 0 0 0 512B root disk brw-rw---- 3072 4096 0 4096 512 1 mq-deadline 256 part 3072 4K 4G 0 4G 1 sda block:scsi:virtio:pci none
|-sda10 sda10 8:10 128 0 0 0 512B root disk brw-rw---- 2560 4096 0 4096 512 1 mq-deadline 256 part 2560 4K 4G 0 4G 1 sda block:scsi:virtio:pci none
|-sda11 sda11 8:11 0 0 8M root disk brw-rw---- 0 part 0 0 0 sda block:scsi:virtio:pci
`-sda12 sda12 8:12 128 0 0 0 32M root disk brw-rw---- 0 4096 0 4096 512 1 mq-deadline 256 part 0 4K 4G 0 4G 1 sda block:scsi:virtio:pci none
sdb sdb 8:16 /mnt/disks/ssd0 128 0 0 0 EphemeralDisk 375G running root disk brw-rw---- 0 4096 0 4096 4096 0 mq-deadline 512 disk 0 4K 32G 0 32G 0 1:0:1:0 block:scsi:virtio:pci 1 Google none
Creating a local persistent volume
Workloads can request a local persistent volume using the same PersistentVolumeClaim interface as remote storage backends. This makes it easy to swap out the storage backend across clusters, clouds, and on-prem environments.
First, a StorageClass should be created that sets volumeBindingMode: WaitForFirstConsumer to enable volume topology-aware scheduling. This mode instructs Kubernetes to wait to bind a PVC until a Pod using it is scheduled.
nano local-storage.yml
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: local-storage
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: WaitForFirstConsumer
This setting tells the PersistentVolume controller to not immediately bind a PersistentVolumeClaim. Instead, the system waits until a Pod that needs to use a volume is scheduled. The scheduler then chooses an appropriate local PersistentVolume to bind to, taking into account the Pod’s other scheduling constraints and policies. This ensures that the initial volume binding is compatible with any Pod resource requirements, selectors, affinity and anti-affinity policies, and more.
Note that dynamic provisioning is not supported in beta. All local PersistentVolumes must be statically created.
kubectl create -f local-storage.yml
storageclass.storage.k8s.io/local-storage created
kubectl get sc
NAME PROVISIONER AGE
local-storage kubernetes.io/no-provisioner 4s
standard (default) kubernetes.io/gce-pd 10m
Creating a local persistent volume
For this initial beta offering, local disks must first be pre-partitioned, formatted, and mounted on the local node by an administrator. Directories on a shared file system are also supported, but must also be created before use.
Once you set up the local volume, you can create a PersistentVolume for it. In this example, the local volume is mounted at /mnt/disks/ssd0 on node gke-local-ssd-cluster-1-default-pool-dfd00064-dqrq:
nano example-local-pv.yml
apiVersion: v1
kind: PersistentVolume
metadata:
name: "example-local-pv"
spec:
capacity:
storage: 375Gi
accessModes:
- "ReadWriteOnce"
persistentVolumeReclaimPolicy: "Retain"
storageClassName: "local-storage"
local:
path: "/mnt/disks/ssd0"
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: "kubernetes.io/hostname"
operator: "In"
values:
- gke-local-ssd-cluster-1-default-pool-dfd00064-dqrq
kubectl create -f example-local-pv.yml
persistentvolume/example-local-pv created
kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
example-local-pv 375Gi RWO Retain Available local-storage 5s
How to Use a Local Persistent Volume?
Workloads can start using the PVs by creating a PVC and Pod or a StatefulSet with volumeClaimTemplates.
nano local-test-ss.yml
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: local-test
spec:
serviceName: "local-service"
replicas: 1
selector:
matchLabels:
app: local-test
template:
metadata:
labels:
app: local-test
spec:
containers:
- name: test-container
image: k8s.gcr.io/busybox
command:
- "/bin/sh"
args:
- "-c"
- "sleep 100000"
volumeMounts:
- name: local-vol
mountPath: /usr/test-pod
volumeClaimTemplates:
- metadata:
name: local-vol
spec:
accessModes: ["ReadWriteOnce"]
storageClassName: "local-storage"
resources:
requests:
storage: 5Gi
kubectl create -f local-test-ss.yml
statefulset.apps/local-test created
kubectl get pods
NAME READY STATUS RESTARTS AGE
local-test-0 1/1 Running 0 2m13s
kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
local-vol-local-test-0 Bound example-local-pv 375Gi RWO local-storage 58s
Verify
Access local SSD from node
ssh adithya.j@35.206.106.166
lsblk -O
NAME KNAME MAJ:MIN FSTYPE MOUNTPOINT LABEL UUID PARTTYPE PARTLABEL PARTUUID PARTFLAGS RA RO RM HOTPLUG MODEL SERIAL SIZE STATE OWNER GROUP MODE ALIGNMENT MIN-IO OPT-IO PHY-SEC LOG-SEC ROTA SCHED RQ-SIZE TYPE DISC-ALN DISC-GRAN DISC-MAX DISC-ZERO WSAME WWN RAND PKNAME HCTL TRAN SUBSYSTEMS REV VENDOR ZONED
sda sda 8:0 128 0 0 0 PersistentDisk 100G running root disk brw-rw---- 0 4096 0 4096 512 1 mq-deadline 256 disk 0 4K 4G 0 4G 1 0:0:1:0 block:scsi:virtio:pci 1 Google none
|-sda1 sda1 8:1 /mnt/stateful_partition 0 0 95.9G root disk brw-rw---- 0 part 0 0 0 sda block:scsi:virtio:pci
|-sda2 sda2 8:2 128 0 0 0 16M root disk brw-rw---- 0 4096 0 4096 512 1 mq-deadline 256 part 0 4K 4G 0 4G 1 sda block:scsi:virtio:pci none
|-sda3 sda3 8:3 128 0 0 0 2G root disk brw-rw---- 0 4096 0 4096 512 1 mq-deadline 256 part 0 4K 4G 0 4G 1 sda block:scsi:virtio:pci none
|-sda4 sda4 8:4 128 0 0 0 16M root disk brw-rw---- 0 4096 0 4096 512 1 mq-deadline 256 part 0 4K 4G 0 4G 1 sda block:scsi:virtio:pci none
|-sda5 sda5 8:5 128 0 0 0 2G root disk brw-rw---- 0 4096 0 4096 512 1 mq-deadline 256 part 0 4K 4G 0 4G 1 sda block:scsi:virtio:pci none
|-sda6 sda6 8:6 0 0 512B root disk brw-rw---- 0 part 0 0 0 sda block:scsi:virtio:pci
|-sda7 sda7 8:7 128 0 0 0 512B root disk brw-rw---- 3584 4096 0 4096 512 1 mq-deadline 256 part 3584 4K 4G 0 4G 1 sda block:scsi:virtio:pci none
|-sda8 sda8 8:8 /usr/share/oem 0 0 16M root disk brw-rw---- 0 part 0 0 0 sda block:scsi:virtio:pci
|-sda9 sda9 8:9 128 0 0 0 512B root disk brw-rw---- 3072 4096 0 4096 512 1 mq-deadline 256 part 3072 4K 4G 0 4G 1 sda block:scsi:virtio:pci none
|-sda10 sda10 8:10 128 0 0 0 512B root disk brw-rw---- 2560 4096 0 4096 512 1 mq-deadline 256 part 2560 4K 4G 0 4G 1 sda block:scsi:virtio:pci none
|-sda11 sda11 8:11 0 0 8M root disk brw-rw---- 0 part 0 0 0 sda block:scsi:virtio:pci
`-sda12 sda12 8:12 128 0 0 0 32M root disk brw-rw---- 0 4096 0 4096 512 1 mq-deadline 256 part 0 4K 4G 0 4G 1 sda block:scsi:virtio:pci none
sdb sdb 8:16 /home/kubernetes/containerized_mounter/rootfs/var/lib/kubelet/pods/f60013f3-e0f4-11e9-a0cb-42010a800218/volumes/kubernetes.io~local-volume/example-local-pv 128 0 0 0 EphemeralDisk 375G running root disk brw-rw---- 0 4096 0 4096 4096 0 mq-deadline 512 disk 0 4K 32G 0 32G 0 1:0:1:0 block:scsi:virtio:pci 1 Google none
cd /mnt/disks/ssd0
ls
lost+found
touch hello_world_from_node
Access local SSD from pod
kubectl exec -it local-test-0 sh
/ #
ls /usr/test-pod/
hello_world_from_node lost+found