Docker and Data Storage

Posted in container on December 23, 2016 by Adrian Wyssmann ‐ 12 min read

Not everything can live without persistent storage, so let's see how docker offers and handles it.

In this blog entry we have seen that once a Docker container is deleted also the data written to the container is deleted. Obviously there are good reasons where you might want to keep certain data you have created in your container to be able to reuse them later or even to share data among containers. Therefore you can have data volumes

A data volume is a directory or file in the Docker host’s filesystem that is mounted directly into a container. Data volumes are not controlled by the storage driver. Reads and writes to data volumes bypass the storage driver and operate at native host speeds. You can mount any number of data volumes into a container. Multiple containers can also share one or more data volumes

Data volumes are designed to persist data, independent of the container’s life cycle…

Docker therefore never automatically deletes volumes when you remove a container, nor will it “garbage collect” volumes that are no longer referenced by a container.

Create Data Volumes

Data volumes can be added by using the option -v when running either docker create or docker run. So let’s start a container based on the ubuntu-1504-test image we have created and map a data volume to /datatestvol

[[email protected] ~]$ docker run -v /testdatavol -t -i ubuntu-1504-test /bin/bash
[email protected]:/# ls /
bin  boot  dev  etc  home  lib  lib64  media  mnt  opt  proc  root  run  sbin  srv  sys  testdatavol  tmp  usr  var

So the first command creates the data volume and maps it to /testdatavol as we can see it when performing the ls command inside the docker container. Now let’s create some data

[email protected]:/testdatavol# echo "This file was created by user root within the container 82b442fb1bc1" > 82b442fb1bc1_root.txt 
[email protected]:/testdatavol# ls 
file-in-container_user-root.txt

But where is the data then stored? We can easily find out by inspect our container

[[email protected] ~]$ docker ps
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS                               NAMES
82b442fb1bc1        ubuntu-1504-test    "/bin/bash"              31 minutes ago      Up 31 minutes                                           adoring_brattain
a2cd4d7b28ab        jenkins             "/bin/tini -- /usr/lo"   42 hours ago        Up 42 hours         0.0.0.0:8080->8080/tcp, 50000/tcp   jenkins
[[email protected] ~]$ docker inspect adoring_brattain
...
        "Mounts": [
            {
                "Name": "b5bb0cdb16c2be36951bf7e58b9b7065dae8bc023d471605a8f00f9587db8c33",
                "Source": "/var/lib/docker/volumes/b5bb0cdb16c2be36951bf7e58b9b7065dae8bc023d471605a8f00f9587db8c33/_data",
                "Destination": "/testdatavol",
                "Driver": "local",
                "Mode": "",
                "RW": true,
                "Propagation": ""
            }
        ],
...

The highlighted line show the location of the data volume and so we can examine that:

[[email protected] ~]$ ls /var/lib/docker/volumes/b5bb0cdb16c2be36951bf7e58b9b7065dae8bc023d471605a8f00f9587d b8c33/_data
82b442fb1bc1_root.txt
[[email protected] ~]$ cat /var/lib/docker/volumes/b5bb0cdb16c2be36951bf7e58b9b7065dae8bc023d471605a8f00f9587/_data/82b442fb1bc1_root.txt 
This file was created by user root within the container 82b442fb1bc1

Map to existing directory

Actually I a wondering what happens when we create a data volume for an existing directory

[[email protected] ~]$ docker run -v /bin -d -t -i ubuntu-1504-test /bin/bash
[[email protected] ~]$ docker ps                                                                                                                                                                          [257/257]
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS                               NAMES
5a5e0df11b55        ubuntu-1504-test    "/bin/bash"              42 seconds ago      Up 42 seconds                                           jolly_panini
82b442fb1bc1        ubuntu-1504-test    "/bin/bash"              About an hour ago   Up About an hour                                        adoring_brattain
a2cd4d7b28ab        jenkins             "/bin/tini -- /usr/lo"   42 hours ago        Up 42 hours         0.0.0.0:8080->8080/tcp, 50000/tcp   jenkins
[[email protected] ~]$ docker inspect jolly_panini
....
       "Mounts": [
            {
                "Name": "e30d0a1f8594f31fb2473b45532e267a6b98965e372a3c7fd680667e8ba5366e",
                "Source": "/var/lib/docker/volumes/e30d0a1f8594f31fb2473b45532e267a6b98965e372a3c7fd680667e8ba5366e/_data",
                "Destination": "/bin",
                "Driver": "local",
                "Mode": "",
                "RW": true,
                "Propagation": ""
            }
        ],
...

So we can see that the volume destination within the docker container is “/bin”. On the host system we can the examine the directory and actually see all files from bin.

[[email protected] ~]$ ls /var/lib/docker/volumes/e30d0a1f8594f31fb2473b45532e267a6b98965e372a3c7fd680667e8ba5366e/_data/
bash   dash   dnsdomainname  findmnt   ....

So let’s kill our recently created container and remove it. As expected, the data form the volume is keept persistend and therfore still exists on the host system

[[email protected] ~]$  docker kill jolly_panini
jolly_panini
[[email protected] ~]$ docker rm jolly_panini
[[email protected] ~]$ ls /var/lib/docker/volumes/e30d0a1f8594f31fb2473b45532e267a6b98965e372a3c7fd680667e8ba5366e/_data/
bash   dash   dnsdomainname  findmnt   ....

Mount host directories as data volumes

So far the data volumes are per container, so there is not much of sharing these data among containers. In addition to simply create a data volume you may actually map an existing directory form the host into the container by simply using the -v parameter as docker run -v {host path}:/{container path}. Let’s create a sample directory for this purpose and add a file to it:

[[email protected] ~]$ mkdir datacontainer
[[email protected] ~]$ cd datacontainer/
[[email protected] ~/datacontainer]$ echo "This file was created by user 'me' on host 'dockerhost'" > dockerhost_me.txt

The we mount the host directory to /testdatavol/ within the container and examine the content:

[[email protected] ~/datacontainer]$ docker run -v ~/Docker/datacontainer:/testdatavol -t -i ubuntu-1504-test /bin/bash
[email protected]:/# ls /testdatavol/
dockerhost_me.txt
[email protected]:/# cat /testdatavol/dockerhost_me.txt
This file was created by user 'me' on host 'dockerhost'

As you can see, the file we have create on the host is accessible within the container. We can modify that or create a new file within the container and then see what happens on the host:

[email protected]:/# echo "This file was created by user root within the container 13d4433ee187" >  /testdatavol/13d4433ee187_root.txt
[email protected]:/# exit
exit
[[email protected] ~/datacontainer]$ ls
13d4433ee187_root.txt dockerhost_me.txt
[[email protected] ~/datacontainer]$ cat 13d4433ee187_root.txt
This file was created by user root within the container 13d4433ee187

Something you may bear in mind when talking about mapping host directories:

Note: The host directory is, by its nature, host-dependent. For this reason, you can’t mount a host directory from Dockerfile, the VOLUME instruction does not support passing a host-dir, because built images should be portable. A host directory wouldn’t be available on all potential hosts.

By the way, you can also be used to mount a single file from the host machine.

[[email protected] ~/datacontainer] docker run -i -t -v ~/datacontainer/dockerhost_me.txt:/root/dockerhost_me.txt ubuntu-1504-test /bin/bash
[email protected]:/# cat /root/dockerhost_me.txt
This file was created by user 'me' on host 'archlinux'
[email protected]:/# vi /root/dockerhost_me.txt
... add a new line with text "I create new line"
[email protected]:/# cat /root/dockerhost_me.txt
This file was created by user 'me' on host 'archlinux'
I create a new line

File ownership

This is an interesting aspect to consider. When we check the file ownership on the host system, we notice that the file “dockerhost_me.txt” in my datacontainer directory belong to user “me” and group “me”. I also made it read and writable only by user “me”. Additionally I created a user “test” and a file “test.txt” that only belongs to him:

[[email protected] ~/datacontainer]$ ll
total 16
drwxrwxr-x 2 me   me   4096 Dec 21 10:34 ./
drwxrwxr-x 3 me   me   4096 Dec 21 10:31 ../
-rw-r--r-- 1 root root   69 Dec 21 10:34 13d4433ee187_root.txt 
-rw------- 1 me   me     55 Dec 21 10:32 dockerhost_me.txt
-rw------- 1 test test    0 Dec 21 14:27 dockerhost_test.txt

Within the docker container there is currently no user “me” or “test” therefore the files does not belong to anybody and only show the userid and group id as onwer information:

[email protected]:/testdatavol# ll
total 20
drwxrwxr-x  2 1000 1000 4096 Dec 21 14:28 ./
drwxr-xr-x 40 root root 4096 Dec 21 10:43 ../
-rw-r--r--  1 root root   45 Dec 21 10:34 13d4433ee187_root.txt 
-rw-------  1 1000 1000   26 Dec 21 14:25 dockerhost_me.txt
-rw-------  1 1001 1001    5 Dec 21 14:28 dockerhost_test.txt

Docker starts a process inside its container as a “root” user which has per all privileges and therefore can modify all files regardless the ownership. So a process inside docker may modify or destroy files within the data volume. This can be avoided when starting the container with another user than root. Usually there are no additional users available unless the image created added them - those user are accessible by name otherwise you may pass any numeric ID. In my case I use ID 1001 (the id which user “test” has on my host system):

[[email protected]: ~]$ docker run -u 1001 -v ~/Docker/datacontainer:/testdatavol -t -i ubuntu-1504-test /bin/bash
I have no [email protected]:/$ cd /testdatavol/
I have no [email protected]:/testdatavol$ ls -l
total 12
-rw-r--r--  1 root root   45 Dec 21 10:34 13d4433ee187_root.txt 
-rw-------  1 1000 1000   26 Dec 21 14:25 dockerhost_me.txt
-rw-------  1 1001 1001    5 Dec 21 14:28 dockerhost_test.txt
I have no [email protected]:/testdatavol$ cat dockerhost_me.txt
cat: dockerhost_me.txt: Permission denied
I have no [email protected]:/testdatavol$ cat dockerhost_test.txt
This file was created by user 'test' on host 'dockerhost'

But when using another user you shall also ensure that the user you are using for running your container has permissions to write on the mapped directory on the host system.

Mount a shared-storage volume

Docker also provides a lot of additional volume plugins to mount shared storage like NFS or Azure file storage. This may come handy as shared volumes are host-independent and can be used from any container as long as the volume is accordingly shared and the necessary driver is installed.To install a volume driver, you need do follow the instructions in the plugin’s documentation as this is slightly different from plugin to plugin.

You can use shared volumes either directly providing the appropriate parameters to the docker run command or you create a volume before using it in a container with the docker volume create command.

Creating and mounting a data volume container

Docker also offers the possibility to create Data Volume Containers to share persistent data between containers. Whereas containers usually do not persist data, Data Volume Containers do. You can mount more than volume from different containers if you like, but I play with only one for now:

[[email protected]: ~]$ docker create -v /datavol --name datavolumecontainer ubuntu-1504-test /bin/true
80ee0ef23d475857ae435f447fd8c772452c97ca04f76259289de9f94f454674
[[email protected]: ~]$ docker ps -a
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS                            PORTS                               NAMES
80ee0ef23d47        ubuntu-1504-test    "/bin/true"              25 seconds ago      Created                                                               datavolumecontainer

It’s recommended to reuse an existing image so you can save space as it uses the same common layers - so I used my ubuntu 15.04 test image. So once the data data volume container is created, it can be attached to other containers by using the --volumes-from parameter. The data volume is automatically mapped in the location /datavol as specified when we were creating it:

[[email protected]: ~]$ docker run -i -t --volumes-from datavolumecontainer --name ubuntu1 ubuntu-1504-test
[email protected]:/# cd /datavol
[email protected]:/datavol/# ll
total 8
drwxr-xr-x  2 root root 4096 Dec 22 19:45 ./
drwxr-xr-x 34 root root 4096 Dec 22 20:01 ../
[email protected]:/datavol/# echo "This file was created by root in container f7c55e9e1254" > f7c55e9e1254_root.txt
[email protected]:/ datavol/# ll
total 12
drwxr-xr-x  2 root root 4096 Dec 22 20:03 ./
drwxr-xr-x 34 root root 4096 Dec 22 20:03 ../
-rw-r--r--  1 root root   56 Dec 22 20:02 f7c55e9e1254_root.txt

From another terminal let’s start another docker container and examine it.

[[email protected]: ~]$ docker run -i -t --volumes-from datavolumecontainer --name ubuntu2 ubuntu-1504-test                                                                                                                 │
[email protected]:/# cd datavol/
[email protected]:/datavol# ll
total 12
drwxr-xr-x  2 root root 4096 Dec 22 20:03 ./
drwxr-xr-x 34 root root 4096 Dec 22 20:09 ../
-rw-r--r--  1 root root   56 Dec 22 20:02 f7c55e9e1254_root.txt
[email protected]:/datavol# echo "This file was created by root in docker cd67b342aa9d" > cd67b342aa9d_root.txt
[email protected]:/datavol# ll
total 16
drwxr-xr-x  2 root root 4096 Dec 22 20:09 ./
drwxr-xr-x 34 root root 4096 Dec 22 20:09 ../
-rw-r--r--  1 root root   53 Dec 22 20:09 cd67b342aa9d_root.txt
-rw-r--r--  1 root root   56 Dec 22 20:02 f7c55e9e1254_root.txt

In case the used image already contains a directory called the same way as specified in the data volume container, then the mounted volume hides the files from the image and only show the files from the data volume container. To show that we simply do an example. So let’s see what we have in my test image under ls /var/log

[email protected]:/# ls /var/log │
alternatives.log apt bootstrap.log btmp dmesg dpkg.log faillog fsck lastlog wtmp

Ok, so when I create a data volume for /var/log and mount it to a container let’s check what we see

[[email protected]: ~]$ docker create -v /var/log --name logvolume ubuntu-1504-test /bin/true
[[email protected]: ~]$ docker run -i -t --volumes-from logvolume ubuntu-1504-test /bin/bash
[email protected]:/# ll /var/log/
alternatives.log apt bootstrap.log btmp dmesg dpkg.log faillog fsck lastlog wtmp

Well we see all the files from the image. But what when me modify one of these, for example change the content:

[email protected]:/# echo "empty file" > /var/log/bootstrap.log
[email protected]:/# cat /var/log/bootstrap.log
empty file
[email protected]:/# exit
[[email protected]: ~] docker run -i -t --volumes-from logvolume ubuntu-1504-test /bin/bash
[email protected]:/# cat /var/log/bootstrap.log
empty file
[email protected]:/# exit

By not mapping the data volume the actual file from the image is not hidden, so it shows the original content as expected:

[[email protected]: ~] docker run -i -t ubuntu-1504-test /bin/bash
[email protected]:/# cat /var/log/bootstrap.log
gpgv: Signature made Fri Apr 24 18:46:59 2015 UTC using DSA key ID 437D05B5
gpgv: Good signature from "Ubuntu Archive Automatic Signing Key <[email protected]>"
gpgv: Signature made Fri Apr 24 18:46:59 2015 UTC using RSA key ID C0B21F32
gpgv: Good signature from "Ubuntu Archive Automatic Signing Key (2012) <[email protected]>"
gpgv: Signature made Fri Apr 24 18:46:59 2015 UTC using DSA key ID 437D05B5

Removing data volumes

On the Docker website I have read something interesting

To delete the volume from disk, you must explicitly call docker rm -v against the last container with a reference to the volume. This allows you to upgrade, or effectively migrate data volumes between containers.

If you remove containers without using the -v option, you may end up with “dangling” volumes; volumes that are no longer referenced by a container

You remember the data volume “logvolume”. All containers which used it are stopped and removed - so is also the container “logvolume”. Unfortunately I did not use the -v option ending up with “dangling volumes”

[[email protected]: ~] docker ps -a
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS                               NAMES
80ee0ef23d47        ubuntu-1504-test    "/bin/true"              11 hours ago        Created                                                 datavolumecontainer
51e408c0232d        ubuntu-1504-test    "/bin/bash"              40 hours ago        Created                                                 angry_bartik
3e0c3abf14ec        ubuntu-1504-test    "/bin/bash"              40 hours ago        Created                                                 elated_hodgkin
82b442fb1bc1        ubuntu-1504-test    "/bin/bash"              45 hours ago        Up 45 hours                                             adoring_brattain
[[email protected]: ~]  docker volume ls -f dangling=true
DRIVER              VOLUME NAME
local               6239291866bcfd652581b12c8b8d9b13dd276de768432c2164745ebb80cedd36
[[email protected]: ~] ls /var/lib/docker/volumes/6239291866bcfd652581b12c8b8d9b13dd276de768432c2164745ebb80cedd36/_data/
alternatives.log  apt  bootstrap.log  btmp  dmesg  dpkg.log  faillog  fsck  lastlog  wtmp
[[email protected]: ~] cat /var/lib/docker/volumes/6239291866bcfd652581b12c8b8d9b13dd276de768432c2164745ebb80cedd36/_data/bootstrap.log
empty file

Only docker volume rm actually removes the volume. Now let’s do it correctly with datavolumecontainer used to with containers “ubuntu1” and “ubuntu2”. Both containers are stopped and removed leaving “datavolumecontainer” left and on the disk on the host you can find the files

[[email protected]: ~]~$ docker inspect datavolumecontainer
...
        "Mounts": [
            {
                "Name": "c5b4db54a1671e0565a40f0ced83fe2cba5cc639e2451527e815a12cbfdf6544",
                "Source": "/var/lib/docker/volumes/c5b4db54a1671e0565a40f0ced83fe2cba5cc639e2451527e815a12cbfdf6544/_data",
...
[[email protected]: ~] ls /var/lib/docker/volumes/c5b4db54a1671e0565a40f0ced83fe2cba5cc639e2451527e815a12cbfdf6544/_data/
cd67b342aa9d_root.txt  f7c55e9e1254_root.txt

So now we remove the data volume with option -v and we can see that also the files on the disk have gone:

[[email protected]: ~] docker rm -v datavolumecontainer
datavolumecontainer
[[email protected]: ~]  ls /var/lib/docker/volumes/c5b4db54a1671e0565a40f0ced83fe2cba5cc639e2451527e815a12cbfdf6544/_data/
ls: cannot access '/var/lib/docker/volumes/c5b4db54a1671e0565a40f0ced83fe2cba5cc639e2451527e815a12cbfdf6544/_data/': No such file or directory

Conclusion

Having persistent files is obviously very important, especially if you keep in mind that data in containers are per-se not persistent. Being able to mount not only host directories but also any other data storage gives you a lot of flexibility but depends on a particular driver to be installed. Data volume containers on the other hand do not rely on a driver and give you some abstraction - the data is logically stored in the container and you do not need to know where the data is physically stored (it is still on the host system  somewhere in /var/lib/docker/). What solution to use depends on what is your scenario and what problem you want to solve.