Mount a filesystem in a K8S pod without privileged

I have a K8S deployment loading a volume as raw block (through volumeDevices):

apiVersion: apps/v1
kind: Deployment
...
spec:
  replicas: 1
...
  containers:
  - name: somepod
    image: ubuntu:latest
    command: ["sh","-c", "--"]
    args: ["mount /dev/block /data && while true; do sleep 1000000; done"]
    securityContext:
      privileged: true
    volumeDevices:
    - devicePath: /dev/block
      name: vol
  volumes:
  - name: vol
    persistentVolumeClaim:
      claimName: vol

This works as expected.

What I want to achieve:

I want to mount /dev/block in the container without having to grant it privileged access (ergo, no root).

I have complete control on the base image, whose default user is 1001, added to a nonroot group.

When k8s adds /dev/block to the container, from what I can tell it assigns it a random group e.g. 993:

brw-rw----. 1 root  993 259,  16 Dec 15 09:00 block

From my understanding this is out of my control (e.g. I cannot tell k8s to mount it under a known group).

Things I tried:

  • Formatting the filesystem as ext4, adding a /etc/fstab line /dev/block /data ext4 user,uid=1001,auto 0 0
  • Adding securityContext: fsGroup:1001
  • Formatting the filesystem as ntfs adding a /etc/fstab line /dev/block /data ntfs user,uid=1001,auto 0 0
  • Installing and using pmount in the container. Fails because my user is not part of the /dev/block group
  • Using a postStart hook (useless, since it shares the same permissions of the main runtime)
  • Using a privileged initContainer to mount the volume from /dev/block to an emptyDir /data. From my understanding the initContainer and the container should share the emptyDir, but since the data lives on the mounted volume this doesn’t work.

Things I’ve yet to try:

  • the guestmount suggested here and here

A possible point of failure might be my possibly incorrect /etc/fstab settings, because whenever I try to mount as user I still get permission issues on /dev/block regardless.

Why I’m using a Block volume:

I’m running EKS and I want to have the data in ‘ReadWriteMany’ shared across several pods in the same Availability Zone. I’ve looked into using a EFS volume instead of EBS in io2, but I have price/latency concerns.

Related questions:

Asked By: Liquid

||

What ended up being a solution for me is using an AWS EC2 function (exported by Karpenter) that allows to provision a node with a copy of a given volume snapshot.

My Karpenter’s ec2 NodeClass looks something like:

apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
metadata:
  name: test
spec:
  amiFamily: AL2
  ...
  blockDeviceMappings:
  - deviceName: /dev/xvda
    ebs:
      deleteOnTermination: true
      volumeSize: 100Gi
      volumeType: gp2
  - deviceName: /dev/xvdc
    ebs:
      deleteOnTermination: true
      snapshotID: snap-ID-NUMBER
      volumeSize: 40Gi
      volumeType: gp2
 ...
  userData: |-
    #!/bin/bash
    set -x
    ...
    mkdir -p /home/ec2-user/data/
    mount -o defaults /dev/nvme2n1 /home/ec2-user/data/
    ...

There was a bit of trial and error here, but the main takeaways are:

  • AWS supplies a disk copied from the given snapshotID, that I provide;
  • it gets added under /dev/xvdc; this equals to /dev/nvme2n1 in my AMI’s case, although it’s not guaranteed for all AMIS/Archs
  • I mount the filesystem in the EC2’s userdata.

Additionally, to ensure that the data is updated, I run an aws s3 sync as part of the userData.
The same behavior can be replicated without Karpenter just with the AWS EC2’s api.

Answered By: Liquid