Docker and Data Storage

Posted in container on December 23, 2016 by Adrian Wyssmann ‐ 12 min read

Not everything can live without persistent storage, so let's see how docker offers and handles it.

In this blog entry we have seen that once a Docker container is deleted also the data written to the container is deleted. Obviously there are good reasons where you might want to keep certain data you have created in your container to be able to reuse them later or even to share data among containers. Therefore you can have data volumes

A data volume is a directory or file in the Docker host’s filesystem that is mounted directly into a container. Data volumes are not controlled by the storage driver. Reads and writes to data volumes bypass the storage driver and operate at native host speeds. You can mount any number of data volumes into a container. Multiple containers can also share one or more data volumes

Data volumes are designed to persist data, independent of the container’s life cycle…

Docker therefore never automatically deletes volumes when you remove a container, nor will it “garbage collect” volumes that are no longer referenced by a container.

Create Data Volumes

Data volumes can be added by using the option -v when running either docker create or docker run. So let’s start a container based on the ubuntu-1504-test image we have created and map a data volume to /datatestvol

[me@dockerhost ~]$ docker run -v /testdatavol -t -i ubuntu-1504-test /bin/bash
root@82b442fb1bc1:/# ls /
bin  boot  dev  etc  home  lib  lib64  media  mnt  opt  proc  root  run  sbin  srv  sys  testdatavol  tmp  usr  var

So the first command creates the data volume and maps it to /testdatavol as we can see it when performing the ls command inside the docker container. Now let’s create some data

root@82b442fb1bc1:/testdatavol# echo "This file was created by user root within the container 82b442fb1bc1" > 82b442fb1bc1_root.txt 
root@82b442fb1bc1:/testdatavol# ls 
file-in-container_user-root.txt

But where is the data then stored? We can easily find out by inspect our container

[me@dockerhost ~]$ docker ps
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS                               NAMES
82b442fb1bc1        ubuntu-1504-test    "/bin/bash"              31 minutes ago      Up 31 minutes                                           adoring_brattain
a2cd4d7b28ab        jenkins             "/bin/tini -- /usr/lo"   42 hours ago        Up 42 hours         0.0.0.0:8080->8080/tcp, 50000/tcp   jenkins
[me@dockerhost ~]$ docker inspect adoring_brattain
...
        "Mounts": [
            {
                "Name": "b5bb0cdb16c2be36951bf7e58b9b7065dae8bc023d471605a8f00f9587db8c33",
                "Source": "/var/lib/docker/volumes/b5bb0cdb16c2be36951bf7e58b9b7065dae8bc023d471605a8f00f9587db8c33/_data",
                "Destination": "/testdatavol",
                "Driver": "local",
                "Mode": "",
                "RW": true,
                "Propagation": ""
            }
        ],
...

The highlighted line show the location of the data volume and so we can examine that:

[me@dockerhost ~]$ ls /var/lib/docker/volumes/b5bb0cdb16c2be36951bf7e58b9b7065dae8bc023d471605a8f00f9587d b8c33/_data
82b442fb1bc1_root.txt
[me@dockerhost ~]$ cat /var/lib/docker/volumes/b5bb0cdb16c2be36951bf7e58b9b7065dae8bc023d471605a8f00f9587/_data/82b442fb1bc1_root.txt 
This file was created by user root within the container 82b442fb1bc1

Map to existing directory

Actually I a wondering what happens when we create a data volume for an existing directory

[me@dockerhost ~]$ docker run -v /bin -d -t -i ubuntu-1504-test /bin/bash
[me@dockerhost ~]$ docker ps                                                                                                                                                                          [257/257]
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS                               NAMES
5a5e0df11b55        ubuntu-1504-test    "/bin/bash"              42 seconds ago      Up 42 seconds                                           jolly_panini
82b442fb1bc1        ubuntu-1504-test    "/bin/bash"              About an hour ago   Up About an hour                                        adoring_brattain
a2cd4d7b28ab        jenkins             "/bin/tini -- /usr/lo"   42 hours ago        Up 42 hours         0.0.0.0:8080->8080/tcp, 50000/tcp   jenkins
[me@dockerhost ~]$ docker inspect jolly_panini
....
       "Mounts": [
            {
                "Name": "e30d0a1f8594f31fb2473b45532e267a6b98965e372a3c7fd680667e8ba5366e",
                "Source": "/var/lib/docker/volumes/e30d0a1f8594f31fb2473b45532e267a6b98965e372a3c7fd680667e8ba5366e/_data",
                "Destination": "/bin",
                "Driver": "local",
                "Mode": "",
                "RW": true,
                "Propagation": ""
            }
        ],
...

So we can see that the volume destination within the docker container is “/bin”. On the host system we can the examine the directory and actually see all files from bin.

[me@dockerhost ~]$ ls /var/lib/docker/volumes/e30d0a1f8594f31fb2473b45532e267a6b98965e372a3c7fd680667e8ba5366e/_data/
bash   dash   dnsdomainname  findmnt   ....

So let’s kill our recently created container and remove it. As expected, the data form the volume is keept persistend and therfore still exists on the host system

[me@dockerhost ~]$  docker kill jolly_panini
jolly_panini
[me@dockerhost ~]$ docker rm jolly_panini
[me@dockerhost ~]$ ls /var/lib/docker/volumes/e30d0a1f8594f31fb2473b45532e267a6b98965e372a3c7fd680667e8ba5366e/_data/
bash   dash   dnsdomainname  findmnt   ....

Mount host directories as data volumes

So far the data volumes are per container, so there is not much of sharing these data among containers. In addition to simply create a data volume you may actually map an existing directory form the host into the container by simply using the -v parameter as docker run -v {host path}:/{container path}. Let’s create a sample directory for this purpose and add a file to it:

[me@dockerhost ~]$ mkdir datacontainer
[me@dockerhost ~]$ cd datacontainer/
[me@dockerhost ~/datacontainer]$ echo "This file was created by user 'me' on host 'dockerhost'" > dockerhost_me.txt

The we mount the host directory to /testdatavol/ within the container and examine the content:

[me@dockerhost ~/datacontainer]$ docker run -v ~/Docker/datacontainer:/testdatavol -t -i ubuntu-1504-test /bin/bash
root@13d4433ee187:/# ls /testdatavol/
dockerhost_me.txt
root@13d4433ee187:/# cat /testdatavol/dockerhost_me.txt
This file was created by user 'me' on host 'dockerhost'

As you can see, the file we have create on the host is accessible within the container. We can modify that or create a new file within the container and then see what happens on the host:

root@13d4433ee187:/# echo "This file was created by user root within the container 13d4433ee187" >  /testdatavol/13d4433ee187_root.txt
root@13d4433ee187:/# exit
exit
[me@dockerhost ~/datacontainer]$ ls
13d4433ee187_root.txt dockerhost_me.txt
[me@dockerhost ~/datacontainer]$ cat 13d4433ee187_root.txt
This file was created by user root within the container 13d4433ee187

Something you may bear in mind when talking about mapping host directories:

Note: The host directory is, by its nature, host-dependent. For this reason, you can’t mount a host directory from Dockerfile, the VOLUME instruction does not support passing a host-dir, because built images should be portable. A host directory wouldn’t be available on all potential hosts.

By the way, you can also be used to mount a single file from the host machine.

[me@dockerhost ~/datacontainer] docker run -i -t -v ~/datacontainer/dockerhost_me.txt:/root/dockerhost_me.txt ubuntu-1504-test /bin/bash
root@17434fe5672c:/# cat /root/dockerhost_me.txt
This file was created by user 'me' on host 'archlinux'
root@17434fe5672c:/# vi /root/dockerhost_me.txt
... add a new line with text "I create new line"
root@17434fe5672c:/# cat /root/dockerhost_me.txt
This file was created by user 'me' on host 'archlinux'
I create a new line

File ownership

This is an interesting aspect to consider. When we check the file ownership on the host system, we notice that the file “dockerhost_me.txt” in my datacontainer directory belong to user “me” and group “me”. I also made it read and writable only by user “me”. Additionally I created a user “test” and a file “test.txt” that only belongs to him:

[me@dockerhost ~/datacontainer]$ ll
total 16
drwxrwxr-x 2 me   me   4096 Dec 21 10:34 ./
drwxrwxr-x 3 me   me   4096 Dec 21 10:31 ../
-rw-r--r-- 1 root root   69 Dec 21 10:34 13d4433ee187_root.txt 
-rw------- 1 me   me     55 Dec 21 10:32 dockerhost_me.txt
-rw------- 1 test test    0 Dec 21 14:27 dockerhost_test.txt

Within the docker container there is currently no user “me” or “test” therefore the files does not belong to anybody and only show the userid and group id as onwer information:

root@89f97383c0f8:/testdatavol# ll
total 20
drwxrwxr-x  2 1000 1000 4096 Dec 21 14:28 ./
drwxr-xr-x 40 root root 4096 Dec 21 10:43 ../
-rw-r--r--  1 root root   45 Dec 21 10:34 13d4433ee187_root.txt 
-rw-------  1 1000 1000   26 Dec 21 14:25 dockerhost_me.txt
-rw-------  1 1001 1001    5 Dec 21 14:28 dockerhost_test.txt

Docker starts a process inside its container as a “root” user which has per all privileges and therefore can modify all files regardless the ownership. So a process inside docker may modify or destroy files within the data volume. This can be avoided when starting the container with another user than root. Usually there are no additional users available unless the image created added them - those user are accessible by name otherwise you may pass any numeric ID. In my case I use ID 1001 (the id which user “test” has on my host system):

[me@dockerhost: ~]$ docker run -u 1001 -v ~/Docker/datacontainer:/testdatavol -t -i ubuntu-1504-test /bin/bash
I have no name!@a3d664ca7ca8:/$ cd /testdatavol/
I have no name!@a3d664ca7ca8:/testdatavol$ ls -l
total 12
-rw-r--r--  1 root root   45 Dec 21 10:34 13d4433ee187_root.txt 
-rw-------  1 1000 1000   26 Dec 21 14:25 dockerhost_me.txt
-rw-------  1 1001 1001    5 Dec 21 14:28 dockerhost_test.txt
I have no name!@a3d664ca7ca8:/testdatavol$ cat dockerhost_me.txt
cat: dockerhost_me.txt: Permission denied
I have no name!@a3d664ca7ca8:/testdatavol$ cat dockerhost_test.txt
This file was created by user 'test' on host 'dockerhost'

But when using another user you shall also ensure that the user you are using for running your container has permissions to write on the mapped directory on the host system.

Mount a shared-storage volume

Docker also provides a lot of additional volume plugins to mount shared storage like NFS or Azure file storage. This may come handy as shared volumes are host-independent and can be used from any container as long as the volume is accordingly shared and the necessary driver is installed.To install a volume driver, you need do follow the instructions in the plugin’s documentation as this is slightly different from plugin to plugin.

You can use shared volumes either directly providing the appropriate parameters to the docker run command or you create a volume before using it in a container with the docker volume create command.

Creating and mounting a data volume container

Docker also offers the possibility to create Data Volume Containers to share persistent data between containers. Whereas containers usually do not persist data, Data Volume Containers do. You can mount more than volume from different containers if you like, but I play with only one for now:

[me@dockerhost: ~]$ docker create -v /datavol --name datavolumecontainer ubuntu-1504-test /bin/true
80ee0ef23d475857ae435f447fd8c772452c97ca04f76259289de9f94f454674
[me@dockerhost: ~]$ docker ps -a
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS                            PORTS                               NAMES
80ee0ef23d47        ubuntu-1504-test    "/bin/true"              25 seconds ago      Created                                                               datavolumecontainer

It’s recommended to reuse an existing image so you can save space as it uses the same common layers - so I used my ubuntu 15.04 test image. So once the data data volume container is created, it can be attached to other containers by using the --volumes-from parameter. The data volume is automatically mapped in the location /datavol as specified when we were creating it:

[me@dockerhsot: ~]$ docker run -i -t --volumes-from datavolumecontainer --name ubuntu1 ubuntu-1504-test
root@f7c55e9e1254:/# cd /datavol
root@f7c55e9e1254:/datavol/# ll
total 8
drwxr-xr-x  2 root root 4096 Dec 22 19:45 ./
drwxr-xr-x 34 root root 4096 Dec 22 20:01 ../
root@f7c55e9e1254:/datavol/# echo "This file was created by root in container f7c55e9e1254" > f7c55e9e1254_root.txt
root@f7c55e9e1254:/ datavol/# ll
total 12
drwxr-xr-x  2 root root 4096 Dec 22 20:03 ./
drwxr-xr-x 34 root root 4096 Dec 22 20:03 ../
-rw-r--r--  1 root root   56 Dec 22 20:02 f7c55e9e1254_root.txt

From another terminal let’s start another docker container and examine it.

[me@dockerhsot: ~]$ docker run -i -t --volumes-from datavolumecontainer --name ubuntu2 ubuntu-1504-test                                                                                                                 │
root@cd67b342aa9d:/# cd datavol/
root@cd67b342aa9d:/datavol# ll
total 12
drwxr-xr-x  2 root root 4096 Dec 22 20:03 ./
drwxr-xr-x 34 root root 4096 Dec 22 20:09 ../
-rw-r--r--  1 root root   56 Dec 22 20:02 f7c55e9e1254_root.txt
root@cd67b342aa9d:/datavol# echo "This file was created by root in docker cd67b342aa9d" > cd67b342aa9d_root.txt
root@cd67b342aa9d:/datavol# ll
total 16
drwxr-xr-x  2 root root 4096 Dec 22 20:09 ./
drwxr-xr-x 34 root root 4096 Dec 22 20:09 ../
-rw-r--r--  1 root root   53 Dec 22 20:09 cd67b342aa9d_root.txt
-rw-r--r--  1 root root   56 Dec 22 20:02 f7c55e9e1254_root.txt

In case the used image already contains a directory called the same way as specified in the data volume container, then the mounted volume hides the files from the image and only show the files from the data volume container. To show that we simply do an example. So let’s see what we have in my test image under ls /var/log

root@cd67b342aa9d:/# ls /var/log │
alternatives.log apt bootstrap.log btmp dmesg dpkg.log faillog fsck lastlog wtmp

Ok, so when I create a data volume for /var/log and mount it to a container let’s check what we see

[me@dockerhsot: ~]$ docker create -v /var/log --name logvolume ubuntu-1504-test /bin/true
[me@dockerhsot: ~]$ docker run -i -t --volumes-from logvolume ubuntu-1504-test /bin/bash
root@8ded61ac29b9:/# ll /var/log/
alternatives.log apt bootstrap.log btmp dmesg dpkg.log faillog fsck lastlog wtmp

Well we see all the files from the image. But what when me modify one of these, for example change the content:

root@8ded61ac29b9:/# echo "empty file" > /var/log/bootstrap.log
root@8ded61ac29b9:/# cat /var/log/bootstrap.log
empty file
root@8ded61ac29b9:/# exit
[me@dockerhsot: ~] docker run -i -t --volumes-from logvolume ubuntu-1504-test /bin/bash
root@5c026159ecb1:/# cat /var/log/bootstrap.log
empty file
root@5c026159ecb1:/# exit

By not mapping the data volume the actual file from the image is not hidden, so it shows the original content as expected:

[me@dockerhsot: ~] docker run -i -t ubuntu-1504-test /bin/bash
root@4b00e4dd417b:/# cat /var/log/bootstrap.log
gpgv: Signature made Fri Apr 24 18:46:59 2015 UTC using DSA key ID 437D05B5
gpgv: Good signature from "Ubuntu Archive Automatic Signing Key <[email protected]>"
gpgv: Signature made Fri Apr 24 18:46:59 2015 UTC using RSA key ID C0B21F32
gpgv: Good signature from "Ubuntu Archive Automatic Signing Key (2012) <[email protected]>"
gpgv: Signature made Fri Apr 24 18:46:59 2015 UTC using DSA key ID 437D05B5

Removing data volumes

On the Docker website I have read something interesting

To delete the volume from disk, you must explicitly call docker rm -v against the last container with a reference to the volume. This allows you to upgrade, or effectively migrate data volumes between containers.

If you remove containers without using the -v option, you may end up with “dangling” volumes; volumes that are no longer referenced by a container

You remember the data volume “logvolume”. All containers which used it are stopped and removed - so is also the container “logvolume”. Unfortunately I did not use the -v option ending up with “dangling volumes”

[me@dockerhsot: ~] docker ps -a
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS                               NAMES
80ee0ef23d47        ubuntu-1504-test    "/bin/true"              11 hours ago        Created                                                 datavolumecontainer
51e408c0232d        ubuntu-1504-test    "/bin/bash"              40 hours ago        Created                                                 angry_bartik
3e0c3abf14ec        ubuntu-1504-test    "/bin/bash"              40 hours ago        Created                                                 elated_hodgkin
82b442fb1bc1        ubuntu-1504-test    "/bin/bash"              45 hours ago        Up 45 hours                                             adoring_brattain
[me@dockerhsot: ~]  docker volume ls -f dangling=true
DRIVER              VOLUME NAME
local               6239291866bcfd652581b12c8b8d9b13dd276de768432c2164745ebb80cedd36
[me@dockerhsot: ~] ls /var/lib/docker/volumes/6239291866bcfd652581b12c8b8d9b13dd276de768432c2164745ebb80cedd36/_data/
alternatives.log  apt  bootstrap.log  btmp  dmesg  dpkg.log  faillog  fsck  lastlog  wtmp
[me@dockerhsot: ~] cat /var/lib/docker/volumes/6239291866bcfd652581b12c8b8d9b13dd276de768432c2164745ebb80cedd36/_data/bootstrap.log
empty file

Only docker volume rm actually removes the volume. Now let’s do it correctly with datavolumecontainer used to with containers “ubuntu1” and “ubuntu2”. Both containers are stopped and removed leaving “datavolumecontainer” left and on the disk on the host you can find the files

[me@dockerhsot: ~]~$ docker inspect datavolumecontainer
...
        "Mounts": [
            {
                "Name": "c5b4db54a1671e0565a40f0ced83fe2cba5cc639e2451527e815a12cbfdf6544",
                "Source": "/var/lib/docker/volumes/c5b4db54a1671e0565a40f0ced83fe2cba5cc639e2451527e815a12cbfdf6544/_data",
...
[me@dockerhsot: ~] ls /var/lib/docker/volumes/c5b4db54a1671e0565a40f0ced83fe2cba5cc639e2451527e815a12cbfdf6544/_data/
cd67b342aa9d_root.txt  f7c55e9e1254_root.txt

So now we remove the data volume with option -v and we can see that also the files on the disk have gone:

[me@dockerhsot: ~] docker rm -v datavolumecontainer
datavolumecontainer
[me@dockerhsot: ~]  ls /var/lib/docker/volumes/c5b4db54a1671e0565a40f0ced83fe2cba5cc639e2451527e815a12cbfdf6544/_data/
ls: cannot access '/var/lib/docker/volumes/c5b4db54a1671e0565a40f0ced83fe2cba5cc639e2451527e815a12cbfdf6544/_data/': No such file or directory

Conclusion

Having persistent files is obviously very important, especially if you keep in mind that data in containers are per-se not persistent. Being able to mount not only host directories but also any other data storage gives you a lot of flexibility but depends on a particular driver to be installed. Data volume containers on the other hand do not rely on a driver and give you some abstraction - the data is logically stored in the container and you do not need to know where the data is physically stored (it is still on the host system  somewhere in /var/lib/docker/). What solution to use depends on what is your scenario and what problem you want to solve.