CoreOS on VMWare using VMWare GuestInfo API

CoreOS bills itself as “Linux for Massive Server Deployments”, but it turns out, it’s excellent for smaller deployments as well. CoreOS was intended to run in big cloud providers (Azure, AWS, Digital Ocean, etc) backed by OpenStack. I’ve been running CoreOS for a while now, on premises in VMWare. It is kind of a pain: clone the template; interrupt the boot to enable autologin, set a password for “core”; reboot again; re-mount the disk rw; paste in a hand crafted cloud-config.yml. Fortunately for me (and you!), VMWare has an API to inject settings into guests, and CoreOS has added support for using those settings in coreos-cloudinit. Information about the supported properties is here.

My new process is:

  1. Clone the template to a new VM
  2. Inject the config through the guest vmx file
  3. Boot the VM

It’s all automated though, so mostly I get other productive work or coffee drinking done while it’s conjured into existence.

The short version is that you base64 encode and inject your cloud-config, along with any additional parameters. For me it looks like:

Property Value Explanation
hostname coreos0 The hostname, obviously 🙂
interface.0.name ens192 The name of the interface to assign id #0 for future settings. The kernel names the first nic ens192 in VMWare.
interface.0.role private This is my internal (and only) NIC.
interface.0.dhcp no Turn off DHCP. I like it on, most like it off.
interface.0.ip.0.address 192.168.229.20/24 CIDR IP address
interface.0.route.0.gateway 192.168.229.2 Default gateway
interface.0.route.0.destination 0.0.0.0/0 Default gateway
dns.server.0 192.168.229.2 DNS server
coreos.config.data [some ugly string of base64] Base64 encoded cloud-config.yml
coreos.config.data.encoding base64 Tells cloudinit how to decode coreos.config.data

Ok, so this is all great, but how to you get it into VMWare? You have two choices, you can manually edit the .VMX file (boring!) or you can use powershell. The script I use is in github, but the workflow is basically:

  1. Clone the VM
  2. Connect to the datastore and fetch the vmx file
  3. Add the guestinfo to the VMX file
  4. Upload the VMX file back to the datastore
  5. Boot the VM

 

The script will pick up a cloud-config.yml and base64 encode it for you and inject it. Check out the source in github to learn more. If you’re looking at the CoreOS documentation on the VMWare backdoor, you need to put “guestinfo.” infront of all the properties. For example, guestinfo.dns.server.0. The VMWare RPC API only passes properties that start with guestinfo.

This is how it looks when it’s written out to the VMX:

guestinfo.coreos.config.data.encoding = "base64"
guestinfo.interface.0.dhcp = "no"
guestinfo.coreos.config.data = "I2Nsyb2JAd............WJ1bnR1MA0K"
guestinfo.dns.server.0 = "192.168.229.2"
guestinfo.interface.0.ip.0.address = "192.168.229.20/24"
guestinfo.interface.0.route.0.gateway = "192.168.229.2"
guestinfo.interface.0.role = "private"
guestinfo.interface.0.route.0.destination = "0.0.0.0/0"
guestinfo.interface.0.name = "ens192"
guestinfo.hostname = "coreos0"

Wanna see it in action?

Advertisements

Splunk forwarders on CoreOS with JournalD

The team at TNWDevLabs started a new effort to develop an internal SaaS product. It’s a greenfield project, and since everything is new, it let us pick up some new technology and workflows, including neo4j and nodejs. In my role as DevOps Engineer, the big change was running all the application components in Docker containers hosted on CoreOS.

CoreOS is a minimalist version of Linux, basically life support for Docker containers. It is generally not recommended to install applications on the host, which raises a question: “How am I going to get my application logs into Splunk?”.
With a more traditional Linux system, you would just install a Splunk forwarder on the host, and pick up files from the file system.

image00

With CoreOS, you can’t really install applications on the host, so there is the possibility of putting a forwarder in every container.

image02

This would work, but it seems wasteful to have a number of instances of Splunk running on the same host, and it doesn’t give you any information about the host.
CoreOS leverages SystemD which has an improved logging facility called JournalD. All system events (updates, etcd, fleet, etc) from CoreOS are written to JournalD. Apps running inside docker containers generally log to stdout, and all those events are sent to JournalD as well. This improved logging facility means getting JournalD into Splunk is the obvious solution.

The first step was to get the Splunk Universal Forwarder into a docker container. There were already some around, but I wanted to take a different approach. The idea is instead of trying to manage .conf files and getting them into the container, I leverage the Deployment Server feature already built into Splunk. The result is a public image called tnwinc/splunkforwarder. It takes two parameters passed in as environment variables: DEPLOYMENTSERVER and CLIENTNAME. These two parameters are fed into $SPLUNK_HOME/etc/system/local/deploymentclient.conf when the container is started. This is the bare minimum to get the container up and talking to the deployment server.

image03

Setting up CoreOS

Running CoreOS, the container is started as a service. The service definition might look like: ExecStart=/usr/bin/docker run -h %H -e DEPLOYMENTSERVER=splunk.example.com:8089 -e CLIENTNAME=%H -v /var/splunk:/opt/splunk –name splunk tnwinc/splunkforwarder

-h %H Sets the hostname inside the container to match the hostname of the CoreOS host. %H gets expanded to the CoreOS hostname when the .service file is processed.
-e DEPLOYMENTSERVER The host and port of your deployment server
-e CLIENTNAME =%H This is the friendly client name.
-v /var/splunk:/opt/splunk This exposes the real directory /var/splunk as /opt/splunk inside the container. This directory persists on disk when the container is restarted.

Now that Splunk is running and has a place to live, we need to feed data to it. To do this, I setup another service, which uses the journalctl tool to export the journal to a text file. This is required as Splunk can’t read the native JournalD binary format:

ExecStart=/bin/bash -c '/usr/bin/journalctl --no-tail -f -o json > /var/splunk/journald'

This command dumps everything in the journal out to JSON format (more on that later), then tails the journal. The process doesn’t exit, and continues to write out to /var/splunk/journald, which exists as /opt/splunk/journald inside the container.

Also note that as the journald file will continue to grow, I have an ExecStartPre directive that will trim the journal before the export happens:

ExecStartPre=/usr/bin/journalctl --vacuum-size=10M

Since I’m not appending, every time the service starts, the file is replaced. You may want to consider a timer to restart the service on a regular interval, based on usage.

Get the data into Splunk

This was my first experience with Deployment Server, it’s pretty slick. The clients picked up and reached out to the deployment server. My app lives in $SPLUNK_HOME\etc\deployment-apps\journald\local. I’m not going to re-hash the process of setting up a deployment server, there is great documentation at Splunk.com on how to do it.

My inputs.conf simply monitors the file inside the container:

[monitor:/opt/splunk/journald]
sourcetype = journald

The outputs.conf then feeds it back to the appropriate indexer:

[tcpout]
defaultGroup = default-autolb-group
[tcpout:default-autolb-group]
server = indexer.example.com:9997
[tcpout-server://indexer.example.com:9997]

Do something useful with it

None of getting the data into Splunk is special, but the props.conf is fun so I’m covering it separately. Running the journal out in JSON structures the data in a very nice format, and the following props.conf helps Splunk understand it:

[journald]
KV_MODE = json
MAX_TIMESTAMP_LOOKAHEAD = 10
NO_BINARY_CHECK = 1
SHOULD_LINEMERGE = false
TIME_FORMAT = %s
TIME_PREFIX = \"__REALTIME_TIMESTAMP\" : \"
pulldown_type = 1
TZ=UTC
KV_MODE=json Magically parse JSON data. Thanks, Splunk!
TIME_PREFIX This ugly bit of regex pulls out the timestamp from a field called __REALTIME_TIMESTAMP
TIME_FORMAT Standard strpdate for seconds
MAX_TIMESTAMP_LOOKAHEAD JournalD uses GNU time which is in microseconds (16 characters). This setting tells splunk to use the first 10.

Once that app was published to all the agents, I could query the data out of Splunk. It looks great!

image01

This is where the JSON output format from journald really shines. I get PID, UID, the command line, executable, message, and more. For me, coming from a Windows background, this is the kind of logging we’ve had for 20 years, now finally in Linux, and easily analyzed with Splunk.

The journal provides a standard transport, so I don’t have to think about application logging on Linux ever again. The contract with the developers is: You get it into the journal, I’ll get it into Splunk.