Not everything can live without persistent storage, so let's see how docker offers and handles it.
In this blog entry we have seen that once a Docker container is deleted also the data written to the container is deleted. Obviously there are good reasons where you might want to keep certain data you have created in your container to be able to reuse them later or even to share data among containers. Therefore you can have data volumes
A data volume is a directory or file in the Docker host’s filesystem that is mounted directly into a container. Data volumes are not controlled by the storage driver. Reads and writes to data volumes bypass the storage driver and operate at native host speeds. You can mount any number of data volumes into a container. Multiple containers can also share one or more data volumes
Docker therefore never automatically deletes volumes when you remove a container, nor will it “garbage collect” volumes that are no longer referenced by a container.
Create Data Volumes
Data volumes can be added by using the option -v when running either docker create or docker run. So let’s start a container based on the ubuntu-1504-test image we have created and map a data volume to /datatestvol
So the first command creates the data volume and maps it to /testdatavol as we can see it when performing the ls command inside the docker container. Now let’s create some data
But where is the data then stored? We can easily find out by inspect our container
The highlighted line show the location of the data volume and so we can examine that:
Map to existing directory
Actually I a wondering what happens when we create a data volume for an existing directory
So we can see that the volume destination within the docker container is “/bin”. On the host system we can the examine the directory and actually see all files from bin.
So let’s kill our recently created container and remove it. As expected, the data form the volume is keept persistend and therfore still exists on the host system
Mount host directories as data volumes
So far the data volumes are per container, so there is not much of sharing these data among containers. In addition to simply create a data volume you may actually map an existing directory form the host into the container by simply using the -v parameter as docker run -v {host path}:/{container path}. Let’s create a sample directory for this purpose and add a file to it:
The we mount the host directory to /testdatavol/ within the container and examine the content:
As you can see, the file we have create on the host is accessible within the container. We can modify that or create a new file within the container and then see what happens on the host:
Something you may bear in mind when talking about mapping host directories:
Note: The host directory is, by its nature, host-dependent. For this reason, you can’t mount a host directory from Dockerfile, the VOLUME instruction does not support passing a host-dir, because built images should be portable. A host directory wouldn’t be available on all potential hosts.
By the way, you can also be used to mount a single file from the host machine.
File ownership
This is an interesting aspect to consider. When we check the file ownership on the host system, we notice that the file “dockerhost_me.txt” in my datacontainer directory belong to user “me” and group “me”. I also made it read and writable only by user “me”. Additionally I created a user “test” and a file “test.txt” that only belongs to him:
Within the docker container there is currently no user “me” or “test” therefore the files does not belong to anybody and only show the userid and group id as onwer information:
Docker starts a process inside its container as a “root” user which has per all privileges and therefore can modify all files regardless the ownership. So a process inside docker may modify or destroy files within the data volume. This can be avoided when starting the container with another user than root. Usually there are no additional users available unless the image created added them - those user are accessible by name otherwise you may pass any numeric ID. In my case I use ID 1001 (the id which user “test” has on my host system):
But when using another user you shall also ensure that the user you are using for running your container has permissions to write on the mapped directory on the host system.
Mount a shared-storage volume
Docker also provides a lot of additional volume plugins to mount shared storage like NFS or Azure file storage. This may come handy as shared volumes are host-independent and can be used from any container as long as the volume is accordingly shared and the necessary driver is installed.To install a volume driver, you need do follow the instructions in the plugin’s documentation as this is slightly different from plugin to plugin.
You can use shared volumes either directly providing the appropriate parameters to the docker run command or you create a volume before using it in a container with the docker volume create command.
Creating and mounting a data volume container
Docker also offers the possibility to create Data Volume Containers to share persistent data between containers. Whereas containers usually do not persist data, Data Volume Containers do. You can mount more than volume from different containers if you like, but I play with only one for now:
It’s recommended to reuse an existing image so you can save space as it uses the same common layers - so I used my ubuntu 15.04 test image. So once the data data volume container is created, it can be attached to other containers by using the --volumes-from parameter. The data volume is automatically mapped in the location /datavol as specified when we were creating it:
From another terminal let’s start another docker container and examine it.
In case the used image already contains a directory called the same way as specified in the data volume container, then the mounted volume hides the files from the image and only show the files from the data volume container. To show that we simply do an example. So let’s see what we have in my test image under ls /var/log
Ok, so when I create a data volume for /var/log and mount it to a container let’s check what we see
Well we see all the files from the image. But what when me modify one of these, for example change the content:
By not mapping the data volume the actual file from the image is not hidden, so it shows the original content as expected:
Removing data volumes
On the Docker website I have read something interesting
To delete the volume from disk, you must explicitly call docker rm -v against the last container with a reference to the volume. This allows you to upgrade, or effectively migrate data volumes between containers.
…
If you remove containers without using the -v option, you may end up with “dangling” volumes; volumes that are no longer referenced by a container
You remember the data volume “logvolume”. All containers which used it are stopped and removed - so is also the container “logvolume”. Unfortunately I did not use the -v option ending up with “dangling volumes”
Only docker volume rm actually removes the volume. Now let’s do it correctly with datavolumecontainer used to with containers “ubuntu1” and “ubuntu2”. Both containers are stopped and removed leaving “datavolumecontainer” left and on the disk on the host you can find the files
So now we remove the data volume with option -v and we can see that also the files on the disk have gone:
Conclusion
Having persistent files is obviously very important, especially if you keep in mind that data in containers are per-se not persistent. Being able to mount not only host directories but also any other data storage gives you a lot of flexibility but depends on a particular driver to be installed. Data volume containers on the other hand do not rely on a driver and give you some abstraction - the data is logically stored in the container and you do not need to know where the data is physically stored (it is still on the host system somewhere in /var/lib/docker/). What solution to use depends on what is your scenario and what problem you want to solve.