Application Isolation With NGINX Unit

Original: https://www.nginx.com/blog/application-isolation-nginx-unit/

One of the most recent developments in NGINX Unit’s feature set is the support for application isolation, introduced in version 1.11.0 and implemented via Linux namespaces. It was announced just a few weeks ago, and there’s a reason for that: the developer behind the feature, Tiago de Bem Natel de Moura, joined the NGINX Unit team only this summer.

Let’s start with a brief recap of Linux namespaces: essentially, they are a kernel mechanism that enables a group of processes to share several types of system resources separately from the resources shared by other groups of processes. The kernel ensures that processes in a namespace access only the resources assigned to that namespace. Although processes in two different namespaces can share some resources, other resources are “invisible” to processes in the other namespace. The types of resources that can be isolated in namespaces vary by OS but include process and user IDs, interprocess communication entities, mount points in the file system, networking objects, and many more.

Sound a bit bland? Maybe, especially if you’re not into operating system technicalities. However, namespaces are one of the key factors behind the containerization revolution – segregating and isolating application processes within a single OS instance enables critical security and scaling mechanisms that are required to run applications in containers.

The Idea

OK, now we’ve established that namespaces are perhaps a Nice Thing to Have, but what is NGINX Unit’s take on this matter? Let’s outline the background before proceeding further, hearing from Tiago himself:

I was investigating better options for monitoring and intercepting traffic from applications. In my spare time, I was studying the internals of NGINX Unit and thought that process isolation may be a good fit. However, I was not sure of the best approach yet. Earlier, I considered eBPF and researched how it redirects packets at the kernel level, but then I had a different idea. Since NGINX Unit runs and manages applications in a way similar to container runtimes, what if we add [application] isolation support for NGINX Unit and use it in place of the runtimes? By chance, this happened to be one of the things that the NGINX Unit team envisioned for the future.

In a cluster, the container runtime starts and stops applications, so we are aware of everything running in the cluster. Meanwhile, NGINX Unit’s architecture does the same but also implements traffic monitoring and interception by default: the only way to reach the application is NGINX Unit’s shared memory model. The interesting thing is that we can even isolate the network, similar to skipping interface setup inside a container, but the app still can communicate [with the outside world] by sharing memory with NGINX Unit, without any expensive networking hacks.

The Configuration

From the configuration perspective, everything boils down to the new isolation object that defines namespace‑related settings within the application object.

The namespace options in the isolation object are system dependent because the types of resources that can be segregated into namespaces vary from one OS to another. Here’s a basic example that creates separate user ID and mount point namespaces for the app:

{  
   "applications": {  
      "isolation_app": {  
         "type": "external",
         "executable": "/tmp/go-app",
         "isolation": {  
            "namespaces": {  
               "credential": true,
               "mount": "true"
            }
         }
      }
   }
}

Currently, NGINX Unit supports configuration of six of the seven namespace isolation types supported by the Linux kernel. The corresponding configuration options are cgroup, credential, pid, mount, network, and uname. The last type, ipc, is reserved.

By default, all isolation types are disabled (the options are set to false), which means that apps reside in NGINX Unit’s namespace. When you enable a certain isolation type for an app by setting its option to true, NGINX Unit creates a separate namespace of that type for the app. Thus, for example, an app can reside in the same namespaces as NGINX Unit except for having a separate mount or credential namespace to itself.

For more details about the options in the isolation object, see the NGINX Unit documentation.

Note: At the time of writing, all apps need to use the same ipc namespace as NGINX Unit; this is required for the shared memory mechanism. You can include the ipc option in the configuration, but its setting has no effect. This situation is subject to change in future releases.

User and Group ID Mapping

Application isolation in NGINX Unit includes support for UID and GID mapping which can be configured if credential isolation is enabled (meaning your app runs in a separate credential namespace). You can map a range of IDs from your app’s namespace (let’s call it the container namespace) to a same‑length ID range in the credential namespace of the app’s parent process (let’s call it the host namespace).

For example, imagine you have an app running with non‑privileged user credentials and then enable credential isolation, creating a container namespace for the app. NGINX Unit allows you to map the UID of the non‑privileged user in the host namespace to UID 0 (root) inside the container namespace. By design, a UID of 0 in any namespace has full rights in that namespace, whereas the rights of its mapped counterpart in the host namespace remain restricted. Therefore, the app seemingly possesses root capabilities, but only for resources within its namespaces. The same considerations apply to GID mapping.

Here, we map a 10‑item range of UIDs starting at UID 500 in the host namespace to a UID range starting at UID 0 in the container namespace (host: 500–509, container: 0–9). Similarly, we map a 20‑item range of GIDs starting at GID 1000 in the host namespace to a range starting at GID 0 in the container namespace (host: 1000–1019, container: 0–19):

{  
   "applications": {  
      "isolation_app": {  
         "type": "external",
         "executable": "/bin/app",
         "isolation": {  
            "namespaces": {  
               "credential": true
            },
            "uidmap": [  
               {  
                  "container": 0,
                  "host": 500,
                  "size": 10
               }
            ],
            "gidmap": [  
               {  
                  "container": 0,
                  "host": 1000,
                  "size": 20
               }
            ]
         }
      }
   }
}

If you do not create explicit UID and GID mappings, by default the current effective UID (EUID) of the non‑privileged NGINX Unit process in the host namespace is mapped to the root UID in the container namespace. Also, note that UID/GID mapping is available only if the host OS supports user namespaces. Having said all that, let’s continue to discover the effects of application isolation on the applications running in NGINX Unit.

Getting Started: Basic Application Isolation

Let’s begin with the basics and see how the feature behaves at runtime. For this purpose, we’re employing one of the Go applications from our official repository (which we run during testing of new builds):

package main

import (
	"encoding/json"
	"fmt"
	"net/http"
	"nginx/unit"
	"os"
	"strconv"
)

type (
	NS struct {
		USER   uint64
		PID    uint64
		IPC    uint64
		CGROUP uint64
		UTS    uint64
		MNT    uint64
		NET    uint64
	}

	Output struct {
		PID int
		UID int
		GID int
		NS  NS
	}
)

func abortonerr(err error) {
	if err != nil {
		panic(err)
	}
}

func getns(nstype string) uint64 {
	// readlink returns: [nstype]:[4026531835]
	str, err := os.Readlink(fmt.Sprintf("/proc/self/ns/%s", nstype))
	if err != nil {
		return 0
	}

	str = str[len(nstype)+2:]
	str = str[:len(str)-1]
	val, err := strconv.ParseUint(str, 10, 64)
	abortonerr(err)
	return val
}

func handler(w http.ResponseWriter, r *http.Request) {
	pid := os.Getpid()
	out := &Output{
		PID: pid,
		UID: os.Getuid(),
		GID: os.Getgid(),
		NS: NS{
			PID:    getns("pid"),
			USER:   getns("user"),
			MNT:    getns("mnt"),
			IPC:    getns("ipc"),
			UTS:    getns("uts"),
			NET:    getns("net"),
			CGROUP: getns("cgroup"),
		},
	}
	data, err := json.Marshal(out)
	if err != nil {
		w.WriteHeader(http.StatusInternalServerError)
		return
	}

	w.Write(data)
}

func main() {
	http.HandleFunc("/", handler)
	unit.ListenAndServe(": 7080", nil)
}

This code responds to requests with a JSON‑formatted inventory of the app’s process and namespace IDs, enumerating the contents of the /proc/self/ns/ directory. Let’s configure the app in NGINX Unit, omitting the isolation object for now:

{  
   "listeners": {  
      "*:8080": {  
         "pass": "applications/go-app"
      }
   },
   "applications": {  
      "go-app": {  
         "type": "external",
         "executable": "/tmp/go-app"
      }
   }
}

The HTTP response from a running app instance:

$ curl -X GET http://localhost:8080

{  
   "PID": 5778,
   "UID": 65534,
   "GID": 65534,
   "NS": {  
      "USER": 4026531837,
      "PID": 4026531836,
      "IPC": 4026531839,
      "CGROUP": 4026531835,
      "UTS": 4026531838,
      "MNT": 4026531840,
      "NET": 4026531992
   }
}

Now we add the isolation object to enable application isolation. The application needs to restart for the isolation mechanism to take effect. Conveniently, NGINX Unit takes care of this behind the scenes, so the update is quite transparent from the end user’s perspective.

{  
   "listeners": {  
      "*:8080": {  
         "pass": "applications/go-app"
      }
   },
   "applications": {  
      "go-app": {  
         "type": "external",
         "user": "root",
         "executable": "/tmp/go-app",
         "isolation": {  
            "namespaces": {  
               "cgroup": true,
               "credential": true,
               "mount": true,
               "network": true,
               "pid": true,
               "uname": true
            },
            "uidmap": [  
               {  
                  "host": 1000,
                  "container": 0,
                  "size": 1000
               }
            ],
            "gidmap": [  
               {  
                  "host": 1000,
                  "container": 0,
                  "size": 1000
               }
            ]
         }
      }
   }
}

Notice the user option is set to root. This is required to enable mapping to UID/GID 0 in the container namespace.

We issue the command again:

$ curl -X GET http://localhost:8080

{  
   "PID": 1,
   "UID": 0,
   "GID": 0,
   "NS": {  
      "USER": 4026532180,
      "PID": 4026532184,
      "IPC": 4026531839,
      "CGROUP": 4026532185,
      "UTS": 4026532183,
      "MNT": 4026532181,
      "NET": 4026532187
   }
}

Now that we have enabled application isolation, the namespace IDs have changed – they are now the ones in the container namespace rather than the host namespace. The only one that remains the same is IPC, for the reasons outlined above.

Going Further: Application Isolation for Networking

To delve a little deeper, let’s explore the practical implications of application isolation for networking, which is sort of important for web apps. Our tool of choice for this is nsenter, available for many of the OS distributions supported by NGINX Unit. The utility allows us to run arbitrary commands within a process namespace, and we’ll use it to reify the changes caused by different settings in the isolation object of the same Go app we configured above. First, we find out the host PID:

# ps aux | grep go-app
1000      5795  0.0  0.3 424040  7380 ?        Sl   14:51   0:00 /tmp/go-app

Knowing the PID, we can enter the container namespace and explore its internals:

# nsenter --all -t 5795 /bin/sh
# ip a
1: lo:  mtu 65536 qdisc noop state DOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
# id
uid=0(root) gid=0(root) groups=0(root)

Note that only the loopback interface is available; however, the app is quite capable of serving external HTTP requests via NGINX Unit. Next, we remove the network option from the list of namespaces in the config to see the resulting network interface configuration of the app with network isolation disabled:

{  
   "listeners": {  
      "*:8080": {  
         "pass": "applications/go-app"
      }
   },
   "applications": {  
      "go-app": {  
         "type": "external",
         "user": "root",
         "executable": "/tmp/go-app",
         "isolation": {  
            "namespaces": {  
               "cgroup": true,
               "credential": true,
               "mount": true,
               "pid": true,
               "uname": true
            },
            "uidmap": [  
               {  
                  "host": 1000,
                  "container": 0,
                  "size": 1000
               }
            ],
            "gidmap": [  
               {  
                  "host": 1000,
                  "container": 0,
                  "size": 1000
               }
            ]
         }
      }
   }
}

Then we repeat the same steps as above:

# ps aux | grep go-app
nobody    7615  0.0  0.4 403552  8356 ?        Sl   15:12   0:00 /tmp/go-app
# nsenter --all -t 7615 /bin/sh
# ip a
1: lo:  mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0:  mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 52:34:01:6d:37:22 brd ff:ff:ff:ff:ff:ff
    inet 192.168.128.41/21 brd 192.168.134.255 scope global dynamic eth0
       valid_lft 600225sec preferred_lft 600225sec
    inet6 fe80::5054:ff:fe6e:3621/64 scope link
       valid_lft forever preferred_lft forever

Now there’s also the interface that the app process inherits from NGINX Unit at startup (eth0). Voilà!

What’s Next

We have come to expect our users to question things, so many of you may be wondering: Is that all there is? Of course not! At this early stage of implementation, application isolation is rather low level, so NGINX Unit needs other features before our users can reap its benefits in full. For example, avoiding the need to maintain individual container‑related options in application configuration will simplify the setup and make it less prone to errors.

Currently, we’re working to add the rootfs capability to the app isolation implementation to securely confine the app to a file system directory. That directory becomes the file system root from the app’s point of view, which for all practical purposes enables you to make apps into easily configurable containers. Yes, that’s right; we are fast‑forwarding to implement app containerization – the NGINX Unit way. As always, stay tuned and take your time to toy with the new features while we’re at it! Feel free to share your first impressions, concerns, and ideas for improvement in our GitHub repository or via our mailing list.

Retrieved by Nick Shadrin from nginx.com website.