Say I have a Docker container running in Google Cloud Platform. Inside that container a custom library interacts with huge amounts of data from Google Cloud Storage. In order to debug that library without suffering from latency issues or egress costs I would need to ssh into the VM and from there use docker exec
to get into the container. Then I could debug my library using vim or emacs.
I wanted to use the Remote Debugger features of PyCharm, but to set up my remote cloud platform development machine for this I needed to go through a number of step. Below is a step-by-step tutorial of how you can setup a remote VM to run a docker container that you can debug into from your local desktop/laptop.
Docker newbie here. I’m using a Mac Book Pro with 128Gb disk. Is there any way to have the docker images be stored in an external hard drive rather than on the mac’s main drive? I don’t have that much free space, and can’t afford to remove any installed software. Ideally I’d like to have Docker fetch the images from an external drive. Docker For Mac Os X El Capitan Easeus Disk Copy For Mac Os X Rar For Mac Os X 10.6.8 Download Mac Os X Yosemite Iso For Virtualbox Cocoa Programming Tutorial For Mac Os X Skype For Mac Os X 10.6 8 Download Epson Scanner Driver For Mac Os X 10.9 Free 3d Software For Mac Os X Download Xcode For Os X 10.9.
To complete this tutorial you’ll need a Google Cloud Platform (GCP) account with admin access, you’ll need to have gcloud tools installed on your development machine, and you’ll need to have PyCharm Professional (the standard free edition doesn’t have remote debugging installed). Understanding some basics off ssh, command line tools and Docker will make this a lot easier.
Notes on the format of the blog: Wherever possible I try to use bold type for user interface elements that you’ll interact with, italics type for filenames, directory names and values you’ll enter in fields, and code blocks
will be reserved for code examples and bash commands. I’m human, there will be errors.
This tutorial was completed using the following versions:
- PyCharm Professional: 2017.1.4
- GCP VM Image Container-Optimized OS: 59.9460.64.0 stable
- GCP VM Image Docker version: 1.11.2
- gcloud Google Cloud SDK: 159.0.0
- Mac OS: El Capitan 10.11.6 (but this all should work from Windows or Linux as PyCharm is cross platform)
Github repos that will be used in the tutorial:https://github.com/davidraleigh/blog-remote-debug-pythonhttps://github.com/davidraleigh/remote-debug-docker
Firewall Rules
First off you’ll need to create a new firewall rule for your project so you can ssh into the Docker Container’s port. We’ll use the port number 52022. Go to the table of contents in Google cloud and select Networking and then the Firewall Rules:
The fields you’ll have to edit are Name, Target Tags, Source IP ranges, and Protocols and ports. You should add a description, but that’s optional. If you want to specify that only your IP address can access the machine you should define the Source IP ranges as something besides 0.0.0.0/0. Below you can see all the settings I’ve used, if you copy all these settings this tutorial should work:
Create VM with Proper Permissions
Go to Compute Engine in GCP console table of contents and select Create an Instance. You’ll want to change the Boot disk to be a docker enabled image (in this case the Container-Optimized OS, a Chromium OS) and maybe increase the size beyond 10 gigs if you plan on using this for development of lots of different docker images (images can be pretty damn large):
Under the Firewall section of your instance creation dialog select the Allow HTTP traffic field. You may need to check with your Networking Firewall rules to make sure that port 80 is open for your IP address (by default GCP projects make it open to all addresses). Below the Firewall section you’ll need to select the Networking tab and place the Target Tags you defined earlier in the Firewall section of this tutorial (mine was container-ssh) in the Network tags field:
Once you’ve finished these settings Create your VM and you should be ready to get your VM ready for debugging.
Get a Docker Image with Updated Source Code on your Remote Development VM
You can either SSH onto your machine and clone the repo to debug onto your VM and build it (great if you have a slow connection at your local machine and you don’t want to push an image into your container registry). Or you can build your test image locally, push it to your container registry of choice and then pull from that registry onto your remote VM.
Git Clone your Code onto the Remote Development VM
SSH into your remote VM (in my case remote-debug-demo
) and use git to clone the following repo:
Once your source code is there in your remote VM you’ll want to build your Docker image:
Pull an Updated Image to Your Remote Development VM
If you’ve already pushed your test image to an image repository (Docker Hub Registry, Google Container Registry, etc.)then you can SSH onto your remote VM and pull it down (this example uses GCR):
Create a Debug Image
In order to debug your code with PyCharm you must be able to SSH into the running docker container. Rather than screw up your project’s Dockerfile, we’ll just use a Dockerfile that inherits from the image you want to use as your remote debugging image.
Get a Debugable SSH Server Enabled Dockerfile Project
The easiest way to do this is to use the Dockerfile and associated supervisord configuration files from the https://github.com/davidraleigh/remote-debug-docker repo. In your VM, clone this repo and follow the repo’s instructions:
Copy in SSH Public Key
In order to build a Docker image that has the ssh public key to approve your request you’ll need to print it to the authorized_keys file. Copy your google cloud google_compute_engine.pub public ssh key to your remote VM. It is the key that google created when you installed gcloud
and setup your account and configuration for GCP (at some point you should have executed the following commands: gcloud auth login
, gcloud auth activate-service-account
and gcloud config set project
). So from your local dev maching you’ll execute the following commands:
Get PyCharm Helper Functions on Remote Development VM (optional)
The last thing you’ll need for this all to work is to get a hold of the pycharm remote debug helpers that Pycharm installs on any remote debug machine. This is a little tricky. How I’ve done this in the past is that I’ve setup a remote debug VM with PyCharm and then gone into that remote VM and copied the ~/.pycharm_helpers to a google storage location for later use. It’d be nice if pycharm just provided a distribution location for those helpers instead of having PyCharm copy them over the first time you connect to a remote machine. If you can’t get a copy of the .pycharm_helpers directory you can just make an empty directory.
(Optional) Saves Time Connecting to Remote Machine First Time
Docker Mac El Capitan
If Above Optional Doesn’t Work for You
Side-NoteAfter you’ve run your container once, if you don’t have a copy of .pycharm_helpers in google storage, you can extract them from your container and that way new images you built will already have them. You can even upload them to your google storage directory:
Build Your Debug Image and Get it Running
Now you’ve tagged your development image with the name test-image and you’ve gotten your remote-debug-docker directory setup for creating a Docker container you can ssh into.
First let’s build the debug-image:
Now that we have built our image let’s run it and hook up all the ports so we can access it from outside GCP. The port 52022 of our remote VM is mapped to the Docker container’s port 22. And for the purpose of this tutorial we’re using flask and therefore mapping the port 5000 from the Docker container to the VM’s port 80. --privileged
is necessary for running supervisord:
Now you should be able to ssh into this container from your local dev machine. I’ve assigned my remote dev VM a static ip address in Google in order to minimize hassle if the machine shuts down (of course this ip address will be abandoned after I’ve finished writing the blog):
Setup PyCharm Development Environment for Debugging
In PyCharm start a new Flask project called blog-remote-debug-python. Leave the Interpreter option set to whatever is the current default of your PyCharm environment.
If you’ve created a new project instead of cloning the blog-remote-debug-python repo then you’ll need to update the blog-remote-debug-python.py file to match the one in this repo. The only thing that is different from PyCharm’s default Flask app is that the __main__
method has been changed from:
to:
The reason for this is explained in a Docker forum here
In PyCharm you should be able to select ^R on your keyboard and run this flask project and open your browser to http://0.0.0.0:5000/ and see a “Hello World!” message. Press the “Stop” button in PyCharm’s Navigation Bar to end the Flask app.
Now let’s add a Dockerfile to the project. Copy the file from this repo into your blog-remote-debug-python directory.
Install Docker El Capitan
On your local dev machine build this image and run it:
Open your browser to http://0.0.0.0:5000/ and again you should see a “Hello World!” message.
Now that we know the image can create a functioning docker container let’s see how this works with PyCharm’s remote debugger settings. In PyCharm->Preferences we’ll select the Project Interpreter from the lefthand table of contents. To the right of the currently defined interpreter is a cog symbol, like a gear, select that cog button and a drop-down will appear. From the dropdown select Add Remote:
In the Configure Remote Python Interpreter Diaglog select the SSH Credentials radio button. For the Host field you’ll enter in your remote development VM IP address (in my case 130.211.210.118). For Port you’ll change the default 22 to 52022. Remember that the remote VM is already using 22 as it’s SSH port, so for us to access the remote VM’s Docker Container SSH port we mapped the container’s port 22 to the VM’s port 52022 (that’s why we added the 52022 firewall rule). The Username field will be root, as that’s what we defined in the Dockerfile in the https://github.com/davidraleigh/remote-debug-docker repo. In the Auth type dropdown select Key Pair and then point to the google_compute_engine private key that is the pair to the google_compute_engine.pub file you copied inside of your container. The Python interpreter path is the location of the python interpreter on your Docker container:
Once you’ve selected OK you’ll be taken back to the Project Interpreter dialog. If you weren’t able to copy the pycharm_helpers from above, you’ll see PyCharm running a background process where it is uploading all the debug utilities necessary for remote debug.
With your newly created interpreter selected in the Project Interpreter drop down you’ll want to update the Path mappings field by selecting the Ellipsis, …, button:
In the Edit Project Path Mappings dialog you’ll set the mapping for your local source to the location of your source code inside of your container. In the case of the tutorial the location of the source code is defined in the Dockerfile at this line, COPY . /opt/src/test
. Your dialog should look something like this:
Technically, the above Path Mappings step could be skipped by doing the Deployment Configuration steps below.
In order to keep your local source code and your remote source in sync you have to setup a Deployment Configuration. This isn’t a deployment in the sense of something that your users will interact with. Select Tools->Deployment->Configuration:
In the Add Server dialog give your server a name and from the Type dropdown select SFTP.
In the Deployment dialog, the Connection tab should be filled out similarily to the the Configure Remote Python Interpreter dialog. for the Root Path field select the base path for this tutorial. Once you’ve filled out the fields press the Test FTP connection… button to confirm you’re able to connect:
The Mappings tab in the Deployment diaglog should look the same as the Project Path Mappings from above:
After selecting Ok in the Deployment dialog you can now upload development files from your local machine to your remote machine. I usually select Automatic so that I don’t have to right click a file and push it to my remote debug server after every edit:
Debugging
Using the Select Run/Debug Configuration dropdown in the Navigation Bar near the top of PyCharm select the Edit Configurations… option. You want to check to make sure that your Python interpreter is the remote interpreter we’ve just created and not one of your local python interpreters:
You should now be able to put a breakpoint at the Hello World
line in the blog-remote-debug-python.py file in our sample project, press ^D to debug and once you visit the http address associated with your ip address (in my case http://130.211.210.118/) you’ll trigger the breakpoint and be able to look at the variables from your remote docker container. You also should be able to update the blog-remote-debug-python.py file, save it and those changes will be automatically uploaded to your container and demonstrated in your next debugging with PyCharm (sometimes those changes can be experienced within one debug session).
Rebuilding Image
There will be times when you’ll want to checkout a different branch. You could ssh into your container remotely or using the docker exec
command from within your remote VM and then checkout a different branch (one that matches whatever branch is on your local machine). This will work pretty well for changes in branches.
At other times there will be large changes to a Dockerfile or an inherited image that requires a rebuild of a machine. To do so you’ll need to follow theses commands:
PyCharm will try to protect you from this new Docker container, so you’ll get a warning like this:
Docker Desktop El Capitan
You can click Yes and all will work fine (unless someone just happens to be eavesdropping at the moment you rebuilt your docker image, but that seems unlikely).
Docker El Capitan Free
Term | Definition |
---|---|
amd64 | AMD64 is AMD’s 64-bit extension of Intel’s x86 architecture, and is alsoreferred to as x86_64 (or x86-64). |
aufs | aufs (advanced multi layered unification filesystem) is a Linux filesystem thatDocker supports as a storage backend. It implements theunion mount for Linux file systems. |
base image | A base image has no parent image specified in its Dockerfile. It is createdusing a Dockerfile with the |
btrfs | btrfs (B-tree file system) is a Linux filesystem that Dockersupports as a storage backend. It is a copy-on-writefilesystem. |
build | build is the process of building Docker images using a Dockerfile.The build uses a Dockerfile and a “context”. The context is the set of files in thedirectory in which the image is built. |
cgroups | cgroups is a Linux kernel feature that limits, accounts for, and isolatesthe resource usage (CPU, memory, disk I/O, network, etc.) of a collectionof processes. Docker relies on cgroups to control and isolate resource limits. Also known as : control groups |
cluster | A cluster is a group of machines that work together to run workloads and provide high availability. |
Compose | Compose is a tool for defining andrunning complex applications with Docker. With Compose, you define amulti-container application in a single file, then spin yourapplication up in a single command which does everything that needs tobe done to get it running. Also known as : docker-compose, fig |
copy-on-write | Docker uses acopy-on-writetechnique and a union file system for both images andcontainers to optimize resources and speed performance. Multiple copies of anentity share the same instance and each one makes only specific changes to itsunique layer. Multiple containers can share access to the same image, and makecontainer-specific changes on a writable layer which is deleted whenthe container is removed. This speeds up container start times and performance. Images are essentially layers of filesystems typically predicated on a baseimage under a writable layer, and built up with layers of differences from thebase image. This minimizes the footprint of the image and enables shareddevelopment. For more about copy-on-write in the context of Docker, see Understand images,containers, and storagedrivers. |
container | A container is a runtime instance of a docker image. A Docker container consists of
The concept is borrowed from Shipping Containers, which define a standard to shipgoods globally. Docker defines a standard to ship software. |
Docker | The term Docker can refer to
|
Docker Desktop for Mac | Docker Desktop for Mac is an easy-to-install, lightweightDocker development environment designed specifically for the Mac. A nativeMac application, Docker Desktop for Mac uses the macOS Hypervisorframework, networking, and filesystem. It’s the best solution if you wantto build, debug, test, package, and ship Dockerized applications on aMac. |
Docker Desktop for Windows | Docker Desktop for Windows is aneasy-to-install, lightweight Docker development environment designedspecifically for Windows 10 systems that support Microsoft Hyper-V(Professional, Enterprise and Education). Docker Desktop for Windows uses Hyper-V forvirtualization, and runs as a native Windows app. It works with Windows Server2016, and gives you the ability to set up and run Windows containers as well asthe standard Linux containers, with an option to switch between the two. Dockerfor Windows is the best solution if you want to build, debug, test, package, andship Dockerized applications from Windows machines. |
Docker Hub | The Docker Hub is a centralized resource for working withDocker and its components. It provides the following services:
|
Dockerfile | A Dockerfile is a text document that contains all the commands you wouldnormally execute manually in order to build a Docker image. Docker canbuild images automatically by reading the instructions from a Dockerfile. |
ENTRYPOINT | In a Dockerfile, an
In practice, |
filesystem | A file system is the method an operating system uses to name filesand assign them locations for efficient storage and retrieval. Examples :
|
image | Docker images are the basis of containers. An Image is anordered collection of root filesystem changes and the correspondingexecution parameters for use within a container runtime. An image typicallycontains a union of layered filesystems stacked on top of each other. An imagedoes not have state and it never changes. |
layer | In an image, a layer is modification to the image, represented by an instruction in theDockerfile. Layers are applied in sequence to the base image to create the final image.When an image is updated or rebuilt, only layers that change need to be updated, andunchanged layers are cached locally. This is part of why Docker images are so fastand lightweight. The sizes of each layer add up to equal the size of the final image. |
libcontainer | libcontainer provides a native Go implementation for creating containers withnamespaces, cgroups, capabilities, and filesystem access controls. It allowsyou to manage the lifecycle of the container performing additional operationsafter the container is created. |
libnetwork | libnetwork provides a native Go implementation for creating and managing containernetwork namespaces and other network resources. It manages the networking lifecycleof the container performing additional operations after the container is created. |
link | links provide a legacy interface to connect Docker containers running on thesame host to each other without exposing the hosts’ network ports. Use theDocker networks feature instead. |
Machine | Machine is a Docker tool whichmakes it really easy to create Docker hosts on your computer, oncloud providers and inside your own data center. It creates servers,installs Docker on them, then configures the Docker client to talk to them. Also known as : docker-machine |
namespace | A Linux namespaceis a Linux kernel feature that isolates and virtualizes system resources. Processes which are restricted toa namespace can only interact with resources or processes that are part of the same namespace. Namespacesare an important part of Docker’s isolation model. Namespaces exist for each type ofresource, including |
node | A node is a physical or virtualmachine running an instance of the Docker Engine in swarm mode. Manager nodes perform swarm management and orchestration duties. By defaultmanager nodes are also worker nodes. Worker nodes execute tasks. |
overlay network driver | Overlay network driver provides out of the box multi-host network connectivityfor docker containers in a cluster. |
overlay storage driver | OverlayFS is a filesystem service for Linux which implements aunion mount for other file systems.It is supported by the Docker daemon as a storage driver. |
parent image | An image’s parent image is the image designated in the |
persistent storage | Persistent storage or volume storage provides a way for a user to add apersistent layer to the running container’s file system. This persistent layercould live on the container host or an external device. The lifecycle of thispersistent layer is not connected to the lifecycle of the container, allowinga user to retain state. |
registry | A Registry is a hosted service containing repositories of imageswhich responds to the Registry API. The default registry can be accessed using a browser at Docker Hubor using the |
repository | A repository is a set of Docker images. A repository can be shared by pushing itto a registry server. The different images in the repository can belabeled using tags. Here is an example of the shared nginx repositoryand its tags. |
SSH | SSH (secure shell) is a secure protocol for accessing remote machines and applications.It provides authentication and encrypts data communication over insecure networks suchas the Internet. SSH uses public/private key pairs to authenticate logins. |
service | A service is the definition of howyou want to run your application containers in a swarm. At the most basic levela service defines which container image to run in the swarm and which commandsto run in the container. For orchestration purposes, the service defines the“desired state”, meaning how many containers to run as tasks and constraints fordeploying the containers. Frequently a service is a microservice within the context of some largerapplication. Examples of services might include an HTTP server, a database, orany other type of executable program that you wish to run in a distributedenvironment. |
service discovery | Swarm mode container discovery is a DNS componentinternal to the swarm that automatically assigns each service on an overlaynetwork in the swarm a VIP and DNS entry. Containers on the network share DNSmappings for the service via gossip so any container on the network can accessthe service via its service name. You don’t need to expose service-specific ports to make the service available toother services on the same overlay network. The swarm’s internal load balancerautomatically distributes requests to the service VIP among the active tasks. |
swarm | A swarm is a cluster of one or more Docker Engines running in swarm mode. |
Docker Swarm | Do not confuse Docker Swarm with the swarm mode features in Docker Engine. Docker Swarm is the name of a standalone native clustering tool for Docker.Docker Swarm pools together several Docker hosts and exposes them as a singlevirtual Docker host. It serves the standard Docker API, so any tool that alreadyworks with Docker can now transparently scale up to multiple hosts. Also known as : docker-swarm |
swarm mode | Swarm mode refers to cluster management and orchestrationfeatures embedded in Docker Engine. When you initialize a new swarm (cluster) orjoin nodes to a swarm, the Docker Engine runs in swarm mode. |
tag | A tag is a label applied to a Docker image in a repository.Tags are how various images in a repository are distinguished from each other. Note : This label is not related to the key=value labels set for docker daemon. |
task | A task is theatomic unit of scheduling within a swarm. A task carries a Docker container andthe commands to run inside the container. Manager nodes assign tasks to workernodes according to the number of replicas set in the service scale. The diagram below illustrates the relationship of services to tasks andcontainers. |
Union file system | Union file systems implement a unionmount and operate by creatinglayers. Docker uses union file systems in conjunction withcopy-on-write techniques to provide the building blocks forcontainers, making them very lightweight and fast. For more on Docker and union file systems, see Docker and AUFS inpractice,Docker and Btrfs inpractice,and Docker and OverlayFS inpractice. Example implementations of union file systems areUnionFS,AUFS, andBtrfs. |
virtual machine | A virtual machine is a program that emulates a complete computer and imitates dedicated hardware.It shares physical hardware resources with other users but isolates the operating system. Theend user has the same experience on a Virtual Machine as they would have on dedicated hardware. Compared to containers, a virtual machine is heavier to run, provides more isolation,gets its own set of resources and does minimal sharing. Also known as : VM |
volume | A volume is a specially-designated directory within one or more containersthat bypasses the Union File System. Volumes are designed to persist data,independent of the container’s life cycle. Docker therefore never automaticallydeletes volumes when you remove a container, nor will it “garbage collect”volumes that are no longer referenced by a container.Also known as: data volume There are three types of volumes: host, anonymous, and named:
|
x86_64 | x86_64 (or x86-64) refers to a 64-bit instruction set invented by AMD as anextension of Intel’s x86 architecture. AMD calls its x86_64 architecture,AMD64, and Intel calls its implementation, Intel 64. |