Docker and Data Storage
Posted in container on December 23, 2016 by Adrian Wyssmann ‐ 12 min read
Not everything can live without persistent storage, so let's see how docker offers and handles it.
In this blog entry we have seen that once a Docker container is deleted also the data written to the container is deleted. Obviously there are good reasons where you might want to keep certain data you have created in your container to be able to reuse them later or even to share data among containers. Therefore you can have data volumes
A data volume is a directory or file in the Docker host’s filesystem that is mounted directly into a container. Data volumes are not controlled by the storage driver. Reads and writes to data volumes bypass the storage driver and operate at native host speeds. You can mount any number of data volumes into a container. Multiple containers can also share one or more data volumes
Data volumes are designed to persist data, independent of the container’s life cycle…
Docker therefore never automatically deletes volumes when you remove a container, nor will it “garbage collect” volumes that are no longer referenced by a container.
Create Data Volumes
Data volumes can be added by using the option -v when running either docker create
or docker run
. So let’s start a container based on the ubuntu-1504-test image we have created and map a data volume to /datatestvol
[[email protected] ~]$ docker run -v /testdatavol -t -i ubuntu-1504-test /bin/bash
[email protected]:/# ls /
bin boot dev etc home lib lib64 media mnt opt proc root run sbin srv sys testdatavol tmp usr var
So the first command creates the data volume and maps it to /testdatavol as we can see it when performing the ls command inside the docker container. Now let’s create some data
[email protected]:/testdatavol# echo "This file was created by user root within the container 82b442fb1bc1" > 82b442fb1bc1_root.txt
[email protected]:/testdatavol# ls
file-in-container_user-root.txt
But where is the data then stored? We can easily find out by inspect our container
[[email protected] ~]$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
82b442fb1bc1 ubuntu-1504-test "/bin/bash" 31 minutes ago Up 31 minutes adoring_brattain
a2cd4d7b28ab jenkins "/bin/tini -- /usr/lo" 42 hours ago Up 42 hours 0.0.0.0:8080->8080/tcp, 50000/tcp jenkins
[[email protected] ~]$ docker inspect adoring_brattain
...
"Mounts": [
{
"Name": "b5bb0cdb16c2be36951bf7e58b9b7065dae8bc023d471605a8f00f9587db8c33",
"Source": "/var/lib/docker/volumes/b5bb0cdb16c2be36951bf7e58b9b7065dae8bc023d471605a8f00f9587db8c33/_data",
"Destination": "/testdatavol",
"Driver": "local",
"Mode": "",
"RW": true,
"Propagation": ""
}
],
...
The highlighted line show the location of the data volume and so we can examine that:
[[email protected] ~]$ ls /var/lib/docker/volumes/b5bb0cdb16c2be36951bf7e58b9b7065dae8bc023d471605a8f00f9587d b8c33/_data
82b442fb1bc1_root.txt
[[email protected] ~]$ cat /var/lib/docker/volumes/b5bb0cdb16c2be36951bf7e58b9b7065dae8bc023d471605a8f00f9587/_data/82b442fb1bc1_root.txt
This file was created by user root within the container 82b442fb1bc1
Map to existing directory
Actually I a wondering what happens when we create a data volume for an existing directory
[[email protected] ~]$ docker run -v /bin -d -t -i ubuntu-1504-test /bin/bash
[[email protected] ~]$ docker ps [257/257]
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
5a5e0df11b55 ubuntu-1504-test "/bin/bash" 42 seconds ago Up 42 seconds jolly_panini
82b442fb1bc1 ubuntu-1504-test "/bin/bash" About an hour ago Up About an hour adoring_brattain
a2cd4d7b28ab jenkins "/bin/tini -- /usr/lo" 42 hours ago Up 42 hours 0.0.0.0:8080->8080/tcp, 50000/tcp jenkins
[[email protected] ~]$ docker inspect jolly_panini
....
"Mounts": [
{
"Name": "e30d0a1f8594f31fb2473b45532e267a6b98965e372a3c7fd680667e8ba5366e",
"Source": "/var/lib/docker/volumes/e30d0a1f8594f31fb2473b45532e267a6b98965e372a3c7fd680667e8ba5366e/_data",
"Destination": "/bin",
"Driver": "local",
"Mode": "",
"RW": true,
"Propagation": ""
}
],
...
So we can see that the volume destination within the docker container is “/bin”. On the host system we can the examine the directory and actually see all files from bin.
[[email protected] ~]$ ls /var/lib/docker/volumes/e30d0a1f8594f31fb2473b45532e267a6b98965e372a3c7fd680667e8ba5366e/_data/
bash dash dnsdomainname findmnt ....
So let’s kill our recently created container and remove it. As expected, the data form the volume is keept persistend and therfore still exists on the host system
[[email protected] ~]$ docker kill jolly_panini
jolly_panini
[[email protected] ~]$ docker rm jolly_panini
[[email protected] ~]$ ls /var/lib/docker/volumes/e30d0a1f8594f31fb2473b45532e267a6b98965e372a3c7fd680667e8ba5366e/_data/
bash dash dnsdomainname findmnt ....
Mount host directories as data volumes
So far the data volumes are per container, so there is not much of sharing these data among containers. In addition to simply create a data volume you may actually map an existing directory form the host into the container by simply using the -v parameter as docker run -v {host path}:/{container path}
. Let’s create a sample directory for this purpose and add a file to it:
[[email protected] ~]$ mkdir datacontainer
[[email protected] ~]$ cd datacontainer/
[[email protected] ~/datacontainer]$ echo "This file was created by user 'me' on host 'dockerhost'" > dockerhost_me.txt
The we mount the host directory to /testdatavol/ within the container and examine the content:
[[email protected] ~/datacontainer]$ docker run -v ~/Docker/datacontainer:/testdatavol -t -i ubuntu-1504-test /bin/bash
[email protected]:/# ls /testdatavol/
dockerhost_me.txt
[email protected]:/# cat /testdatavol/dockerhost_me.txt
This file was created by user 'me' on host 'dockerhost'
As you can see, the file we have create on the host is accessible within the container. We can modify that or create a new file within the container and then see what happens on the host:
[email protected]:/# echo "This file was created by user root within the container 13d4433ee187" > /testdatavol/13d4433ee187_root.txt
[email protected]:/# exit
exit
[[email protected] ~/datacontainer]$ ls
13d4433ee187_root.txt dockerhost_me.txt
[[email protected] ~/datacontainer]$ cat 13d4433ee187_root.txt
This file was created by user root within the container 13d4433ee187
Something you may bear in mind when talking about mapping host directories:
Note: The host directory is, by its nature, host-dependent. For this reason, you can’t mount a host directory from
Dockerfile
, theVOLUME
instruction does not support passing ahost-dir
, because built images should be portable. A host directory wouldn’t be available on all potential hosts.
By the way, you can also be used to mount a single file from the host machine.
[[email protected] ~/datacontainer] docker run -i -t -v ~/datacontainer/dockerhost_me.txt:/root/dockerhost_me.txt ubuntu-1504-test /bin/bash
[email protected]:/# cat /root/dockerhost_me.txt
This file was created by user 'me' on host 'archlinux'
[email protected]:/# vi /root/dockerhost_me.txt
... add a new line with text "I create new line"
[email protected]:/# cat /root/dockerhost_me.txt
This file was created by user 'me' on host 'archlinux'
I create a new line
File ownership
This is an interesting aspect to consider. When we check the file ownership on the host system, we notice that the file “dockerhost_me.txt” in my datacontainer directory belong to user “me” and group “me”. I also made it read and writable only by user “me”. Additionally I created a user “test” and a file “test.txt” that only belongs to him:
[[email protected] ~/datacontainer]$ ll
total 16
drwxrwxr-x 2 me me 4096 Dec 21 10:34 ./
drwxrwxr-x 3 me me 4096 Dec 21 10:31 ../
-rw-r--r-- 1 root root 69 Dec 21 10:34 13d4433ee187_root.txt
-rw------- 1 me me 55 Dec 21 10:32 dockerhost_me.txt
-rw------- 1 test test 0 Dec 21 14:27 dockerhost_test.txt
Within the docker container there is currently no user “me” or “test” therefore the files does not belong to anybody and only show the userid and group id as onwer information:
[email protected]:/testdatavol# ll
total 20
drwxrwxr-x 2 1000 1000 4096 Dec 21 14:28 ./
drwxr-xr-x 40 root root 4096 Dec 21 10:43 ../
-rw-r--r-- 1 root root 45 Dec 21 10:34 13d4433ee187_root.txt
-rw------- 1 1000 1000 26 Dec 21 14:25 dockerhost_me.txt
-rw------- 1 1001 1001 5 Dec 21 14:28 dockerhost_test.txt
Docker starts a process inside its container as a “root” user which has per all privileges and therefore can modify all files regardless the ownership. So a process inside docker may modify or destroy files within the data volume. This can be avoided when starting the container with another user than root. Usually there are no additional users available unless the image created added them - those user are accessible by name otherwise you may pass any numeric ID. In my case I use ID 1001 (the id which user “test” has on my host system):
[[email protected]: ~]$ docker run -u 1001 -v ~/Docker/datacontainer:/testdatavol -t -i ubuntu-1504-test /bin/bash
I have no [email protected]:/$ cd /testdatavol/
I have no [email protected]:/testdatavol$ ls -l
total 12
-rw-r--r-- 1 root root 45 Dec 21 10:34 13d4433ee187_root.txt
-rw------- 1 1000 1000 26 Dec 21 14:25 dockerhost_me.txt
-rw------- 1 1001 1001 5 Dec 21 14:28 dockerhost_test.txt
I have no [email protected]:/testdatavol$ cat dockerhost_me.txt
cat: dockerhost_me.txt: Permission denied
I have no [email protected]:/testdatavol$ cat dockerhost_test.txt
This file was created by user 'test' on host 'dockerhost'
But when using another user you shall also ensure that the user you are using for running your container has permissions to write on the mapped directory on the host system.
Mount a shared-storage volume
Docker also provides a lot of additional volume plugins to mount shared storage like NFS or Azure file storage. This may come handy as shared volumes are host-independent and can be used from any container as long as the volume is accordingly shared and the necessary driver is installed.To install a volume driver, you need do follow the instructions in the plugin’s documentation as this is slightly different from plugin to plugin.
You can use shared volumes either directly providing the appropriate parameters to the docker run
command or you create a volume before using it in a container with the docker volume create
command.
Creating and mounting a data volume container
Docker also offers the possibility to create Data Volume Containers to share persistent data between containers. Whereas containers usually do not persist data, Data Volume Containers do. You can mount more than volume from different containers if you like, but I play with only one for now:
[[email protected]: ~]$ docker create -v /datavol --name datavolumecontainer ubuntu-1504-test /bin/true
80ee0ef23d475857ae435f447fd8c772452c97ca04f76259289de9f94f454674
[[email protected]: ~]$ docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
80ee0ef23d47 ubuntu-1504-test "/bin/true" 25 seconds ago Created datavolumecontainer
It’s recommended to reuse an existing image so you can save space as it uses the same common layers - so I used my ubuntu 15.04 test image. So once the data data volume container is created, it can be attached to other containers by using the --volumes-from
parameter. The data volume is automatically mapped in the location /datavol
as specified when we were creating it:
[[email protected]: ~]$ docker run -i -t --volumes-from datavolumecontainer --name ubuntu1 ubuntu-1504-test
[email protected]:/# cd /datavol
[email protected]:/datavol/# ll
total 8
drwxr-xr-x 2 root root 4096 Dec 22 19:45 ./
drwxr-xr-x 34 root root 4096 Dec 22 20:01 ../
[email protected]:/datavol/# echo "This file was created by root in container f7c55e9e1254" > f7c55e9e1254_root.txt
[email protected]:/ datavol/# ll
total 12
drwxr-xr-x 2 root root 4096 Dec 22 20:03 ./
drwxr-xr-x 34 root root 4096 Dec 22 20:03 ../
-rw-r--r-- 1 root root 56 Dec 22 20:02 f7c55e9e1254_root.txt
From another terminal let’s start another docker container and examine it.
[[email protected]: ~]$ docker run -i -t --volumes-from datavolumecontainer --name ubuntu2 ubuntu-1504-test │
[email protected]:/# cd datavol/
[email protected]:/datavol# ll
total 12
drwxr-xr-x 2 root root 4096 Dec 22 20:03 ./
drwxr-xr-x 34 root root 4096 Dec 22 20:09 ../
-rw-r--r-- 1 root root 56 Dec 22 20:02 f7c55e9e1254_root.txt
[email protected]:/datavol# echo "This file was created by root in docker cd67b342aa9d" > cd67b342aa9d_root.txt
[email protected]:/datavol# ll
total 16
drwxr-xr-x 2 root root 4096 Dec 22 20:09 ./
drwxr-xr-x 34 root root 4096 Dec 22 20:09 ../
-rw-r--r-- 1 root root 53 Dec 22 20:09 cd67b342aa9d_root.txt
-rw-r--r-- 1 root root 56 Dec 22 20:02 f7c55e9e1254_root.txt
In case the used image already contains a directory called the same way as specified in the data volume container, then the mounted volume hides the files from the image and only show the files from the data volume container. To show that we simply do an example. So let’s see what we have in my test image under ls /var/log
[email protected]:/# ls /var/log │
alternatives.log apt bootstrap.log btmp dmesg dpkg.log faillog fsck lastlog wtmp
Ok, so when I create a data volume for /var/log
and mount it to a container let’s check what we see
[[email protected]: ~]$ docker create -v /var/log --name logvolume ubuntu-1504-test /bin/true
[[email protected]: ~]$ docker run -i -t --volumes-from logvolume ubuntu-1504-test /bin/bash
[email protected]:/# ll /var/log/
alternatives.log apt bootstrap.log btmp dmesg dpkg.log faillog fsck lastlog wtmp
Well we see all the files from the image. But what when me modify one of these, for example change the content:
[email protected]:/# echo "empty file" > /var/log/bootstrap.log
[email protected]:/# cat /var/log/bootstrap.log
empty file
[email protected]:/# exit
[[email protected]: ~] docker run -i -t --volumes-from logvolume ubuntu-1504-test /bin/bash
[email protected]:/# cat /var/log/bootstrap.log
empty file
[email protected]:/# exit
By not mapping the data volume the actual file from the image is not hidden, so it shows the original content as expected:
[[email protected]: ~] docker run -i -t ubuntu-1504-test /bin/bash
[email protected]:/# cat /var/log/bootstrap.log
gpgv: Signature made Fri Apr 24 18:46:59 2015 UTC using DSA key ID 437D05B5
gpgv: Good signature from "Ubuntu Archive Automatic Signing Key <[email protected]>"
gpgv: Signature made Fri Apr 24 18:46:59 2015 UTC using RSA key ID C0B21F32
gpgv: Good signature from "Ubuntu Archive Automatic Signing Key (2012) <[email protected]>"
gpgv: Signature made Fri Apr 24 18:46:59 2015 UTC using DSA key ID 437D05B5
Removing data volumes
On the Docker website I have read something interesting
To delete the volume from disk, you must explicitly call
docker rm -v
against the last container with a reference to the volume. This allows you to upgrade, or effectively migrate data volumes between containers.…
If you remove containers without using the
-v
option, you may end up with “dangling” volumes; volumes that are no longer referenced by a container
You remember the data volume “logvolume”. All containers which used it are stopped and removed - so is also the container “logvolume”. Unfortunately I did not use the -v option ending up with “dangling volumes”
[[email protected]: ~] docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
80ee0ef23d47 ubuntu-1504-test "/bin/true" 11 hours ago Created datavolumecontainer
51e408c0232d ubuntu-1504-test "/bin/bash" 40 hours ago Created angry_bartik
3e0c3abf14ec ubuntu-1504-test "/bin/bash" 40 hours ago Created elated_hodgkin
82b442fb1bc1 ubuntu-1504-test "/bin/bash" 45 hours ago Up 45 hours adoring_brattain
[[email protected]: ~] docker volume ls -f dangling=true
DRIVER VOLUME NAME
local 6239291866bcfd652581b12c8b8d9b13dd276de768432c2164745ebb80cedd36
[[email protected]: ~] ls /var/lib/docker/volumes/6239291866bcfd652581b12c8b8d9b13dd276de768432c2164745ebb80cedd36/_data/
alternatives.log apt bootstrap.log btmp dmesg dpkg.log faillog fsck lastlog wtmp
[[email protected]: ~] cat /var/lib/docker/volumes/6239291866bcfd652581b12c8b8d9b13dd276de768432c2164745ebb80cedd36/_data/bootstrap.log
empty file
Only docker volume rm actually removes the volume. Now let’s do it correctly with datavolumecontainer used to with containers “ubuntu1” and “ubuntu2”. Both containers are stopped and removed leaving “datavolumecontainer” left and on the disk on the host you can find the files
[[email protected]: ~]~$ docker inspect datavolumecontainer
...
"Mounts": [
{
"Name": "c5b4db54a1671e0565a40f0ced83fe2cba5cc639e2451527e815a12cbfdf6544",
"Source": "/var/lib/docker/volumes/c5b4db54a1671e0565a40f0ced83fe2cba5cc639e2451527e815a12cbfdf6544/_data",
...
[[email protected]: ~] ls /var/lib/docker/volumes/c5b4db54a1671e0565a40f0ced83fe2cba5cc639e2451527e815a12cbfdf6544/_data/
cd67b342aa9d_root.txt f7c55e9e1254_root.txt
So now we remove the data volume with option -v and we can see that also the files on the disk have gone:
[[email protected]: ~] docker rm -v datavolumecontainer
datavolumecontainer
[[email protected]: ~] ls /var/lib/docker/volumes/c5b4db54a1671e0565a40f0ced83fe2cba5cc639e2451527e815a12cbfdf6544/_data/
ls: cannot access '/var/lib/docker/volumes/c5b4db54a1671e0565a40f0ced83fe2cba5cc639e2451527e815a12cbfdf6544/_data/': No such file or directory
Conclusion
Having persistent files is obviously very important, especially if you keep in mind that data in containers are per-se not persistent. Being able to mount not only host directories but also any other data storage gives you a lot of flexibility but depends on a particular driver to be installed. Data volume containers on the other hand do not rely on a driver and give you some abstraction - the data is logically stored in the container and you do not need to know where the data is physically stored (it is still on the host system somewhere in /var/lib/docker/
). What solution to use depends on what is your scenario and what problem you want to solve.