Uncategorized

Packaging guidelines for macOS

Note: This post may be a little out of date as it was originally written in 2015. But I’m posting it here as the fundamentals have not really changed much.

Credits: Thanks to Gary Larizza for his post on AFP548.com where most of this documents content was sourced ( https://www.afp548.com/2010/06/03/the-commandments-of-packaging-in-os-x )

Overview

When managing Mac OS X devices, you will enviably have to deploy files or applications to many devices. There are many ways to achieve this, however the most effective and best practice method is to use Packages.
While packaging is quite simple, it can very quickly become quite complex. This document serves to provide some guidelines to help you avoid some simple mistakes and prevent confusion when creating packages.

Tools

There are many tools out there used to create Packages, Apple offer their own built in command line tools like pkgbuild. This guide will not go into detail about how to use any of these tools, it is up to the system admin’s own personal preference on which tools they wish to use in order to create their packages.
However version control is very important, as is the ability to quickly and accurately create and recreate packages. The ability for packages to be peer reviewed and package versions to easily be diff’d is also important and the admin’s choice of tools should take this into account. It is also highly recommend that a version control system such as git is used in combination with package creation.
Below is a list of tools that are recommended for creating packages:

 

Packages by Whitebox

 http://s.sudre.free.fr/Software/Packages/about.html

A great GUI driven tool to create flat and distribution packages and provides an easy to learn GUI. It is still quite powerful and allows a great deal of control over how your packages are created. A build file is created which saves information on how the package should be created such as the payload, pre/post flight scripts, additional resources etc etc.

Cost: $0 – FREE

 

The Luggage

https://github.com/unixorn/luggage

A completely text driven package building system perfect for use with version control systems such as Git.  Files can easily be reviewed to see what will be in the package without any extra work.

The big benefit to using The Luggage is that because the packages are created with make files, these make files can easily be diff’d to see changes as well as talking other users through the creation process. No GUI panes to navigate.

Cost: $0 – FREE

 

Munki PKG 

https://github.com/munki/munki-pkg

Munki PKG is a simple tool very similar to The Luggage which builds packages in a consistent, repeatable manner from source files and scripts in a project directory.

Files, scripts and metadata are stored in a way that is easy to track and manage using a version control system like git.

Cost: $0 – FREE

 

The Guidelines

Installation method

Your installer should not require any input from the end user.

DO NOT:

  • Assume that your package will be installed interactively via the GUI or to the currently booted volume.  More often than not packages will be deployed to machines via management systems such as Munki or Casper. Because of this you should ensure that your package can be installed to machines that are unattended (at the login window without a console user logged in)

DO:  

  • Ensure that your package can be installed via the command line and by any management framework with and without a user logged in.

 

Installation target

DO NOT:

  • Assume that your package will be installed to the currently booted volume. Your package might not necessarily be installed to the currently booted volume, so ensure that any scripts in your package use the correct variables passed to it from the installer application. For example, reference the target volume in your scripts by using the variable $3 (in bash) rather than using absolute file references.
  • Use tools such as sw_vers in order to get the Operating System version. These tools will only report the OS of the currently booted volume.

DO:

  • Check the SystemVersion.plist on the target volume ($3)
  • Check if the boot volume (/) is the same as the target volume ($3) if any of your scripts require it.

Unnecessary actions.

DO NOT:

  • Perform ‘helpful’ things like using osascript to open a Finder window showing your newly installed application. Similarly do not do things like opening a browser window to the installed software’s homepage.
  • The problem with these things is if you are installing the software in an unattended mode where the computer is at the LoginWindow, these types of things will simply cause errors in your installation process.
  • Require unnecessary reboots if you can accomplish the same thing by loading/unloading LaunchDaemons/LaunchAgents – If you go down this path, remember that it is even more important to check if you are installing to the boot volume or not.
  • Automatically add files to the Dock, Desktop or anywhere outside of /Applications or other required directories. If you wish to add Dock items, use another package/script/profile/tool to achieve that.
  • Ask for admin/elevated privileges if they are not needed for installation, i.e. installing into
    /Users/Shared
  • Create separate installers for different architectures/OS versions. If you have separate payloads for separate architectures/OS versions, perform your architecture/OS check on the target volume, not the currently booted operating system see rule 2.

DO: 

  • Use a distribution meta-package to provide a single package that will correctly determine OS/Architecture of the destination volume and install the appropriate payload.

 Licensing 

Licensing should be managed by Systems Administrators. Wherever possible licensing files should be packaged separately to the application being deployed. This allows for a single application package to be deployed to multiple sites with different licensing files applied later depending upon the licence that is appropriate for that site.

Licensing information might be supplied via a global plist/config profile/KMS  or other.

This also prevents unauthorised installation of software should your application package be obtained by a unauthorised third party.

DO NOT:

  • Place licensing and registration files in the user’s home directory wherever possible. Use a global location such as /Library
  • Building licensing/registration mechanisms into the installer GUI.

DO:

  • Allow a scriptable licensing interface to your software

 

Pre/Post install scripts

Use pre and post install scripts only when necessary, and follow all other rules with your scripts.

For example, it would be silly to use a package to install some files on disk and then use a post install script to set the permissions of those files. Instead correctly set the permissions of the files in the payload.

This also allows for reviewing of package contents via lsbom

DO NOT:

  • Use postinstall scripts to create or modify files – do this in the package payload.
  • If you must use post-install scripts, do not use osascript to move and copy files. Use CLI tools such as cp and mv in bash
  • Use any kind of GUI scripting, see Rule 1.
  • Use sudo in your scripts, your script is already running as root.

DO:

  • Exit your script with 0 on success, or non-zero on failure.
  • Trap error codes in your scripts
  • Use globbing in your scripts, because no one likes repetition and computers are built to do the work for us so let them.
  • Ensure your scripts handle paths with spaces in them.

Naming Conventions and Version Numbers

Naming conventions are necessary and helpful. For example VPN.pkg is NOT helpful.

Give your packages meaningful names and version numbers.  Providing vendor and product name, along with important version numbers and vendor identification codes.

DO:

  • List your vendor and product name in your package name
  • Give packages meaningful names with version numbers. Remember 1.15 is greater than 1.2 in most situations.

Supporting Operating System Versions

If you are going to supporting running your application or payload on operating systems back to say version 10.8, then it should go without saying that you need to TEST your package on every version from 10.8 to the most current.

DO NOT:

  • Change the ownership and permissions of core Operating System folders and files

DO:

  • Keep your config data and cache data separate
  • Follow the directory structure mandated by the target platforms software deployment guidelines
  • Provide an uninstaller or uninstall script
  • Use the documented OS X .pkg format and not just a .pkg wrapper for a 3rd party solution that installs the software for you – obvious exception for Adobe software.

Be Descriptive

Even if you are not planning on having your package installed via the GUI you should still make it GUI-friendly.

DO:

  • Provide a welcome message, read-me, description of whats happening and whats being installed.
  • Comment your pre/post install scripts thoroughly.

 

Snapshotting and Re-Packaging

Try to avoid using Snapshot methods to create packages – a common tool used to create snapshot packages is JAMF’s composer.

Snapshotting is generally considered bad juju and the result of a lazy (not in a good way) sysadmin

Packages created from snapshots lack the nuances and intent of the original package. They can often miss critical files or modifications to the file system.

If you are unable to use a vendor package, consider the following:

DO:

  • Attempt to unpack and reverse engineer the package – Use tools such as Pacifist (https://www.charlessoft.com/) and pkgutil –expand to determine what the package is attempting to achieve.
  • Try to modify the existing vendor package using things like providing a custom Choices.XML to select certain packages in a meta/distribution package for installation.

Product Signing

Gatekeeper was introduced in 10.8 as a way to alert users to unsigned packages. For this reason, it is best practice to sign your installer packages with a developer ID certificate that lets your users know your packages can be trusted. It also allows packages to be installed in the GUI when Gatekeeper is configured to allow apps downloaded from the App Store and identified developers

Unsigned packages are not an issue when not using the GUI installer however.

DO:

  • Use productsign to sign your packages with an Apple Developer ID certificate

Monitoring Apple Caching Server

Update 10-04-2017 :

So after running this is in test/pre-prod for some time, I realised a couple of problems with my initial configuration.

  • My math was off. I was graphing two values, ‘bytes.fromcache.toclients’ and ‘bytes.fromorigin.toclients’

This is not correct, what we actually want is a value of ALL data sent to clients, this is a sum of 3x values, bytes.fromcache.toclients, bytes.fromorigin.toclients and bytes.frompeers.toclients

Then we can accurately see exactly how much data was served to client devices vs how much data was pulled from Apple over the WAN (bytes.fromorigin.toclients).

  • The way I was importing data into InfluxDB was incorrect. I was importing each table from sqlite as a ‘measurement’ into influxdb.

This is not the way it should be done with influx, rather, we should create a ‘measurement’ and then add fields to this measurement.

The reason for this is mainly because Influx can not do any math between different measurements. i.e. it can’t join measurements or do a sum on measurements.

For example, if I want to do a query showing the sum of the three metrics I mentioned above, and these three metrics were different measurements in influxdb it would not work.

Luckily all that was required to fix these two issues was to simply adjust my logstash configuration.

Logstash is super flexible and allows for multiple inputs and outputs and basic if /then logic.

I have updated the post below to include the new information

Summary

So I have a lot of Apple Caching Servers to manage. The problem for me is monitoring their stats. How much data have they saved the site where they are located? Sure you can look at the Server.app stats and get a general idea. You could also look at the raw data from:

serveradmin fullstatus caching

You could also use the fantastic script from Erik Gomez Cacher which can trigger server alerts to send email notifications as well as send slack notifications to provide you with some statistics of your caching server.

And this is all great for a small amount of caching servers, but once your fleet starts getting up into the 100+ territory, we really need something better. Management are always asking me for stats for this site and that site, or for a mixture of the sites, or a region, or all of the sites combined. Collecting this data and then creating graphs in excel with the above methods is rather painful.

There has to be a better way!

Enter the ILG stack (InfluxDB, Logstash and Grafana)

If you have had a poke around Server.app 5.2 caching server on macOS 10.12, you may have noticed that there is a Metrics.sqlite database in

/Library/Server/Caching/Logs/

Lets have a look whats in this little database:

$ sqlite3 Metrics.sqlite
SQLite version 3.14.0 2016-07-26 15:17:14
Enter ".help" for usage hints.

Lets turn on headers and columns

sqlite> .headers ON
sqlite> .mode column

Now lets see what tables we have in here

sqlite> .tables
statsData  version

statsData sounds like what we want, lets see whats in there.

sqlite> select * from statsData;
entryIndex  collectionDate  expirationDate  metricName               dataValue
----------  --------------  --------------  -----------------------  ----------
50863       1487115473      1487720273      bytes.fromcache.topeers  0
50864       1487115473      1487720273      requests.fromclients     61
50865       1487115473      1487720273      imports.byhttp           0
50866       1487115473      1487720273      bytes.frompeers.toclien  0
50867       1487115473      1487720273      bytes.purged.total       0
50868       1487115473      1487720273      replies.fromorigin.tope  0
50869       1487115473      1487720273      bytes.purged.youngertha  0
50870       1487115473      1487720273      bytes.fromcache.toclien  907
50871       1487115473      1487720273      bytes.imported.byxpc     0
50872       1487115473      1487720273      requests.frompeers       0
50873       1487115473      1487720273      bytes.fromorigin.toclie  227064
50874       1487115473      1487720273      replies.fromcache.topee  0
50875       1487115473      1487720273      bytes.imported.byhttp    0
50876       1487115473      1487720273      bytes.dropped            284
50877       1487115473      1487720273      replies.fromcache.tocli  4
50878       1487115473      1487720273      replies.frompeers.tocli  0
50879       1487115473      1487720273      imports.byxpc            0
50880       1487115473      1487720273      bytes.purged.youngertha  0
50881       1487115473      1487720273      bytes.fromorigin.topeer  0
50882       1487115473      1487720273      replies.fromorigin.tocl  58
50883       1487115473      1487720273      bytes.purged.youngertha  0

Well now this looks like the kind of data we are after!

Looks like all the data is stored in bytes, so no conversions from MB KB TB need to be done. Bonus.
Also looks like each stat or measurement i.e. bytes.fromcache.topeers appears to be written to this DB after, or very shortly after, a transaction or event occurs on the caching server such as a GET request for content from a device. This means that we can add all these stats up over a day and get a much more accurate idea of how much data the caching server is seeing.
This solves the problem that the Cacher script by Erik runs into when the server reboots.

In Cacher, the script looks for a summary of how much data the server has served since the service has started by scraping the Debug.log. You have probably seen this in the Debug log

2017-02-22 09:41:10.137 Since server start: 1.08 GB returned to clients, 973.5 MB stored from Internet, 0 bytes from peers; 0 bytes imported.

Cacher then checks the last value of the previous day, compares it to the latest value for the end of report period day and works out the difference to arrive at a figure of how much data was served by the client for that report day. While this works great on a stable caching server that never reboots or has the service restart on you, it is a little too fragile for my needs. I’m sure Erik would also like a more robust method to generate that information as well.

Looking back at the Metrics.sqlite DB, if you are wondering about those collectionDates and expirationDate values they are epoch time stamps, which is also a bonus as this is very easy to convert into human readable with a command like:

$ date -j -f %s 1487115473
Wed 15 Feb 2017 10:37:53 AEDT

But also makes it easy to do comparisons and do simple math with if you need to.

Having all this information in a sqlite database already makes it quite easy-ish for us to pick up this data with Logstash, feed it into an InfluxDB instance and then visualise it with Grafana.

simples_fb_1567463

With this setup I was able to very easily show the statistics of all our caching servers at once. Of course we can also drill down into individual schools caching servers to reveal those results as well.

YAY PRETTY GRAPHS!

screen-shot-2017-02-21-at-2-58-34-pm

The nuts and bolts

So how do we get get this setup? Well this is not going to be a step by step walkthrough but it should be enough to get you going. You can then make your own changes for how you want to set it up in your own environment, everyones prod environment is a little different but this should be enough to get you setup with a PoC environment.

Lets start with getting Logstash setup on your Caching server.

Requirements:

  • macOS 10.12.x +
  • Server.app 5.2.x +
  • Java8
  • Java8JDK
  • Java JVM script from the always helpful Rich and Frogor  script here

Start by:

  • Getting your caching server up and running.
  • Install Java 8 and the Java 8 JDK.
  • Run the JVM script
  • Confirm that you have Java 8 installed correctly by running  java -version from the command line

If all has gone well, you should get something like this back:

# java -version
java version "1.8.0_111"
Java(TM) SE Runtime Environment (build 1.8.0_111-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.111-b14, mixed mode)

Now we are ready to install Logstash.

  • Download the latest tar ball from here: https://www.elastic.co/downloads/logstash
  • Store it somewhere useful like /usr/local
    • Extract the tar with tar -zxvf logstash-5.2.1.tar.gz -C /usr/local
    • This will extract it into the /usr/local directory for you

Now we need to add some plugins, this is where it gets a little tricky.
If you have authenticated proxy servers, you are going to have a bad time, so lets pretend you don’t.

Installing Logstash plugins

First lets get the plugin that will allow Logstash to send output to InfluxDB

Run the logstash plugin binary and install the plugin logstash-output-influxdb:

$ cd /usr/local/logstash-5.2.1/bin
$ ./logstash-plugin install logstash-output-influxdb

Now we will install the SQLite JDBC connector that allows Logstash to access the sqlite db that caching server saves its metrics into.

  • Download the sqlite-jdbc-3.16.1 plugin from here: https://bitbucket.org/xerial/sqlite-jdbc/downloads/
  • Create a directory in our Logstash dir to save it, I like to put it in ./plugins
    • mkdir -p /usr/local/logstash-5.2.1/plugins
  • Copy the sqlite plugin into our new directory
    • cp sqlite-jdbc-3.16.1 /usr/local/logstash-5.2.1/plugins

Ok we now have Logstash installed and ready to go! Next up we make a configuration file to do all the work.

Thinking about the InfluxDB Schema

Before we start pumping data into Influx, we should probably think about how we are going to structure that data.

I came up with a very basic schema that looks like this:

Here I have created 7 ‘measurements’ which then group together the metricnames from the caching server sqlite database

Because of the way math works in influxdb, this allows me to write a query like :

SELECT sum("bytes.fromcache.toclients") + sum("bytes.fromorigin.toclients") + sum("bytes.frompeers.toclients") as TotalServed FROM "autogen"."bytestoclients" WHERE "site_code" = '1234' AND $timeFilter GROUP BY time(1d)

This query will add the three metrics together to give a total of all bytes from cache, origin and peers to client devices.

 

Creating the configuration file

This is the most challenging part, and a huge shoutout goes the @mosen for all his help on this I definitely wouldn’t have been able to get this far without his help.

The configuration file we need contains three basic components, the inputs, a filter and the outputs.

The inputs

The input is where we are getting out data from, in our case its from the sqlite DB, so our input is going to be the sqlite jdbc plugin and we need to config it so that it knows what information to get and where to get it from.

Its pretty straight forward, and should make sense, but I’ll describe each item below

We are going to have an input for each measurement we want, this way we can write a sqlite query to get the metric names or sqlite tables we want to put into that metric.

For example the below input is going to get the following tables:

  1. bytes.fromcache.toclients
  2. bytes.fromorigin.toclients
  3. bytes.frompeers.toclients
  4. bytes.fromcache.topeers
  5. bytes.fromorigin.topeers

From the sqldatabase by running the query in the statement section.

Then we add a label to this input so that we can call it later and ensure the data from this input is put into the correct measurement in the influxdb output.

The label is set by using the type key in the input as you see below

input {
    jdbc {
        jdbc_driver_library => "/usr/local/logstash-5.2.1/plugins/sqlite-jdbc-3.16.1.jar"
        jdbc_driver_class => "org.sqlite.jdbc"
        jdbc_connection_string => "jdbc:sqlite:/Library/Server/Caching/Logs/Metrics.sqlite"
        jdbc_user => ""
        schedule => "* * * * *"
        statement => "SELECT * FROM statsData WHERE metricname LIKE "bytes.%.toclients" OR metricname LIKE "bytes.%.topeers""
        tracking_column => "entryindex"
        use_column_value => true
        type => "bytestoclients"
    }
}

The Logstash documentation is pretty good and describes each of the above items, check out the documentation here

The only thing to really worry about here is the schedule this is in regular cron style format, with the current setting as above, Logstash will check that Metrics.sqlite database every minute and submit information to InfluxDB.

This is probably far to often for a production system, for testing its fine though as you will see almost instant results. But before you go to production you should consider running this on a more sane schedule like perhaps every hour or two or whatever suits your environment.

So in the completed logstash config file we will end up with a jdbc input for each sqlite statement or query we need to run to populate the 7 measurements we add to influx.

The filter

The filter is applied to the data that we have retrieved with the input, so here is where we are going to add some extra fields and tags to go with our data to allow us to use some logic to direct the right input to the right output in the logstash file as well as allow us to group and search our data based on which server that data is coming from
Think of these fields as a way to ‘tag’ the data coming from this caching server with information about which physical caching server it is.

In my environment I have 4 tags that I want the data to have that I can search on and group with. In my case:

region – This is the physical region of where the server is located

site_name – This is the actual name of the site

site_code –  This is a unique number that each site is assigned

school_type  – In my case this is either primary school or high school

We are going to use some logic here to add a tag to the inputs depending upon what the type is. We can’t use the ‘type’ directly in the output section, so we have to convert it into a tag and then we can send that to the output and we can do logic on that.

We also remove any unneeded fields such as collectiondata and expiration date with the date, match, remove_field section

Then we also add our location and server information by adding our tags region, site_name etc etc with the mutate function


filter {
    if [type] == "bytestoclients" {
        mutate {
            add_tag => [ "bytestoclients" ]
        }
     }
    date {
        match => [ "collectiondate", "UNIX" ]
        remove_field => [ "collectiondate", "expirationdate" ]
    }
    mutate {
        add_field => {
            "region" => "Region 1"
            "site_name" => "Site Name Alpha"
            "site_code" => "1234"
            "school_type" => "High School"
         }
    }
}

Again the documentation from Logstash is pretty good to describe how each of these items works, check here for the documentation

The important parts above that you might want to modify are the fields that are added with the mutate section.

The output

Now we are getting closer, the output section is where we tell Logstash what to do with all the data we have ingested, filtered and mutated.
Again all of this is pretty straight forward, but theres a couple of things that I’ll talk about:

we are going to use an if statement to check if the data coming from our input contains a string in a tag.

For example, if the string ‘bytestoclients’ exists in a tag, then we should use a certain output.

This allows us to direct the inputs we created above to a specific output. Each output will have a measurement name and a list of fields (datapoint) that will be sent to influx

We have to list each metric name in the coerce_values section to ensure the data is sent as a float or integer because otherwise it will be sent as a string and this is no good for our math.

There is also an open issue on github with the influxdb output plugin where we can’t use the a variable to handle this. ideally we would simply be able to use something like

coerce_values => {

    "%{metric name}" => "integer"

}

But unfortunately this does not work, and we must list out each metric name – like animal

output {
    if "bytestoclients" in [tags] {
        influxdb {
            allow_time_override => true
            host => "my.influxdb.server"
            measurement => "bytestoclients"
            idle_flush_time => 1
            flush_size => 100
            send_as_tags => [ "region", "site_code", "site_name", "school_type" ]
            data_points => {
                "%{metricname}" => "%{datavalue}"
                "region" => "%{region}"
                "site_name" => "%{site_name}"
                "site_code" => "%{site_code}"
                "school_type" => "%{school_type}"
            }
            coerce_values => {
                "bytes.fromcache.toclients" => "integer"
                "bytes.fromorigin.toclients" => "integer"
                "bytes.frompeers.toclients" => "integer"
                "bytes.fromorigin.topeers" => "integer"
                "bytes.fromcache.topeers" => "integer"
            }
            db => "caching"
            retention_policy => "autogen"
        }
}

So the really only interesting things here is:

send_as_tags : This is where we send the fields we created in the mutate section to influx as tags. The trick here, which is barely documented if at all, is that we also need to specify them as data points.

data_points : Here we need to add our tags (extra fields we added from mutate) as datapoints to send to influxdb, we use the %{name} syntax just like we would use a $name variable in bash. This will then replace the variable with the content of the field from the mutate section.

retention_policy : This is the retention policy of the influx db, again documentation was a bit hard to find on this one, but the default retention policy is not actually called default as it seems to be mentioned everywhere, in fact the default policy is actually called ‘autogen’

Consult the InfluxDB documentation for more info

Completed conf file

So now we have those sections filled out we should have a complete conf file that looks somewhat like this:

Install the conf file

  • Create a directory in our logstash dir to store our conf file
    • mkdir -o /usr/local/logstash-5.2.1/conf
  • Create the conf file and move it into this new location
    • cp logstash.conf /usr/local/logstash-5.2.1/conf/

Running Logstash

Ok so now we have log stash all installed and configured, we need a way to get logstash running and using our configuration file.

Of course this is a perfect place to use a launch daemon. I won’t go into much depth as there are many great resources out there on how to create and use launchdaemons.
If you haven’t already go ahead and check out launchd.info 

Here is a launchd that I’ve create already, just pop this into your /Library/LaunchDaemons folder give your machine a reboot and logstash should start running.

Setting up InfluxDB and Grafana

There are lots and lots of guides on the web for how to get these two items setup, so I won’t go into too much detail. My preferred method of deployment for these kinds of things is to use Docker.

This makes it very quick to deploy and manage the service.

I’ll assume that you already have a machine that is running docker and have a basic understanding of how docker works.
If not, again there are tons of guides out there and it really is pretty simple to get started.

InfluxDB

You can get an influxdb instance setup very quickly with the below command, this will create a db called caching, you can of course give it any name you like, but you will need to remember it when we connect Grafana to it later on.

docker run -d -p 8083:8083 -p 8086:8086 -e PRE_CREATE_DB=caching --expose 8090 --expose 8099 --name influxdb tutum/influxdb

You should now have InfluxDB up and running on your docker machine.

Port 8083 is the admin web app port and you can check your influxDB is up and running by pointing your web browser to your docker machine IP address on port 8083. You should then get your influx DB web app like this:

screen-shot-2017-02-22-at-10-55-46-am
Grafana

You can also setup Grafana on the same machine with the following command, this will automatically ‘link’ the Grafana instance to the InfluxDB and allow communication between the two containers.

docker run -d -p 3000:3000 --link influxdb:influxdb --name grafana grafana/grafana

Now you should also have a Grafana instance running on your docker machine on port 3000. Load up a web browser and point it to your docker machine IP address on port 3000 and you should get the Grafana web app like this:

screen-shot-2017-02-22-at-10-59-11-am

The default login should be admin/admin

Login and add a data source

screen-shot-2017-02-22-at-11-43-45-am

Setting up the dashboards

So now we get to the fun stuff, displaying the data!

Start by creating a new dashboard

screen-shot-2017-02-22-at-11-45-56-am

Now select the Graph panel.

screen-shot-2017-02-22-at-11-46-06-am

On the Panel Title select edit

screen-shot-2017-02-22-at-11-47-40-am

Now we can get to the guts of it, creating the query to display the information we want

Under the Metrics heading, click on the A to expand the query.

screen-shot-2017-02-22-at-11-49-23-am

From here it is pretty straight forward as Grafana will help you by giving you pop up menus of the items you can choose:

screen-shot-2017-02-22-at-11-50-05-am

What might be a bit strange is that the FROM is actually the retention policy, which is weird, you might think that the FROM should be the name of the database. But no, its the name of the default retention policy which in our case should be autogen.

If you need to remove an item just click it and a menu will appear allowing you to remove it, heres an example of removing the mean() item

screen-shot-2017-02-22-at-11-52-05-am

So to display some information you can start with a query like this:

screen-shot-2017-02-22-at-11-54-33-am

This is going to select all the data from the database caching, with the retention policy of autogen, in the field called bytes.fromcache.toclients

Next we are going to select all of those values in that bytes.fromcache.toclients measurement, by telling it to select field(value)

Then we click plus next to the field(value) and from the aggregations menu choose sum() this will then add the values all together.

Then we want to display that total grouped by 1 day – time(1d)

This will show us how much data has been delivered to client devices, from the cache on our caching server in 1 day groupings.

Phew, ok thats the query done.

But, thats just going to show us how much data came from the “cache”, its not going to show us how much data was delivered to clients from cache+peers+origin.

So for that query, we have to do a little trick.

We select the measurement bytestoclients.

Then we select the field bytes.fromcache.toclients click the plus and add our other fields to it looks like this:

But you might notice that this doesn’t show us a single bar graph like we want, we have to manually edit the query to remove some erroneous commas

Hit the toggle edit mode button:

And then remove the commas and add a plus symbol instead.

from:

to:

Now we need to format the graph to look pretty. Under the Axes heading we need to change the unit to bytes.

screen-shot-2017-02-22-at-12-04-48-pm

Under the Legend heading, we can also add the Total so that it prints the total next to our measurement on our graph.

screen-shot-2017-02-22-at-12-05-28-pm

And to finish it off we will change the display from lines to bars. Under the Display heading check bars and uncheck lines.

screen-shot-2017-02-22-at-12-09-42-pm

Almost there.

From the top right, lets select a date range to display, like the this week for example.

screen-shot-2017-02-22-at-12-11-17-pm

AND BOOM!

You can of course change the heading from Panel Title to something more descriptive, add your own headings and axis titles etc etc

Of course you can also add additional queries to the graph so you can see multiple measurements at once for comparison.

For example we might want to see how much data was sent to clients from, and how much data had to be retrieved from Apple

We just add another query under the metrics heading.

So lets add the data from the bytes.fromorigin.toclients field

We can also use the WHERE filter to select only the data from a particular caching server rather than all of the caching server data that is being shown above.

That should be enough to get you going and creating some cool dashboard for your management types.

 

Introducing Serveralerts.py

I manage a lot of Apple Caching Servers. I have a fully automated build process for them, however I have come across the need to manage the email notifications for these servers. For example some times the person who receives the emails about events or notifications has moved on and we need to remove their address and add someone else.

Unfortunately Apple does not make a tool that can do this in a scriptable or automated fashion.

Thats where this tool comes in. It can create and modify the alertData.db that stores these email addresses.

It allows you to create the database if it doesn’t already exist. It also allows you to add, remove and list all the currently configured email accounts in the database.

Huge shout out to the one they call @mosen and the one they call @carl for all their help with this as this has been my first Python project and I’ve had to ask so many dumb questions. Thanks guys appreciate the help!

I’m sure my code is absolutely horrid, but YOLO it works. Its on Github here: https://github.com/hunty1/Serveralerts

Pull Requests greatly appreciated!

Reversing the AD Plugin UID algorithm

Background:
I’ll start by saying we have a rather large AD with over a million users. #humblebrag

I had a ticket escalated to me that was quite odd. A user had logged into their Mac, but a number of applications and command line tools were reporting a different user account.

After a bit of trouble shooting and I found that both of these users had the same UID on the Mac. So I started to dig into how this could be and how the AD Plugin in generates its UID. The post below is a result of that work.

I think it is pretty well known that the AD Plugin uses the first 32 Bits of the 128 Bit objectGUID in AD to determine the UID on the Mac.

But after a bit more digging it turns out its not quite that simple.

I’ll work with a few examples here and show you how the AD Plugin determines the UID that will be used and provide a script that will allow you to determine the UID of your users accounts in AD. From here you can check to see if you have any UID collisions.

First lets start with a user in AD.

If we inspect the users account record in AD with something like Apache directory studio, we can see the objectGUID which is a 128 bit hex string.

For example:
screen-shot-2016-11-29-at-4-50-30-pm

Here we can see an objectGUID with a value of:

6C703CF1-B5D1-41F8-880B-317728CBD4F5

Now the AD Plugin will read this value and take the first 32 Bits which is: 6C703CF1

It then converts this hex value to decimal. This can be achieved by using the following command:

echo "ibase=16; 6C703CF1" | bc

Which will return: 1819294961

Now you might say, well thats easy!

And I thought so too. But theres a slight issue with this. The UID on the Mac has a maximum decimal value as it is a 32 Bit integer. So the maximum number the UID can be is: 2147483647

In this example the hex value 6C703CF1 converts nicely into a 32 Bit integer (as in, its value is less than or equal to 2147483647) and so can be used as the Mac UID without any further work.

But lets look at another example:

screen-shot-2016-11-29-at-4-56-01-pm

Here we can see an objectGUID with a value of:

BEB08781-0DAF-4B12-9EB6-AF33CBA90876

Now if we do our conversion on this as we did before:

echo "ibase=16; BEB08781" | bc

We end up with a result of: 3199240065

Unfortunately this number is larger than the maximum 32 Bit integer allowed by the Mac UID.

So what do we do?

Turns out Apple use a table to convert the first character of these 32 Bit GUID’s into a number and then recalculate the UID based on this new 32 Bit GUID.

For example with the GUID above, BEB08781, we take the first character B and replace it with the number 3 to end up with: 3EB08781

Now when we do the conversion:

echo "ibase=16; 3EB08781" | bc

We get a value of: 1051756417

Which fits perfectly into our 32 Bit integer Mac UID.

The table of conversion looks like this:

Building a script
So now we know this, how do we build a script to do this conversion for us?

Interacting with records in AD is usually done with `ldapsearch` and thats how i work with all my AD queries from my machine. It allows me to target specific OU’s and is generally easier to work with than dscl for me, and I don’t need to have my machine bound to AD for it to work.

So first lets start with a basic ldapsearch to get the users objectGUID.

This should return an output like this:

dn: CN=Smith\, John,OU=Accounts,OU=My Users,OU=My House,OU=My Room,DC=My,DC=Domain
objectGUID:: TE0kyRyv8UCppPeXes5JTg==
sAMAccountName: john.smith

Now the objectGUID here does not match what we see in AD with Apache Directory Studio, and that is because it is encoded in base64 as denoted by the double colons "::" after the objectGUID label.

So to convert this into something we can work with we need to decode it from base64 and then hex dump it.

So to achieve that we use the following function:

This then converts our objectGUID from ldapsearch into:

C9244D4C-AF1C-40F1-A9A4-F7977ACE494E

By now we should have all the bits we need to run a script to
1. Pull the objectGUID from AD using ldapsearch
2. Convert that objectGUID from Base64 into text
3. Convert the first 32 Bit from hex to decimal
4. Decide if that decimal value is larger than the maximum for a 32 Bit integer
5. If it is larger, we then know what number to replace the first character of that objectGUID with
6. Recalculate that new objectGUID into decimal to determine the UID the AD Plugin will set for that user on the Mac.

Completed script
With the script below you can target a user account DN in the search base and it will return that users DN and ObjectGUID in clear text and also the UID that will be used on a Mac when that user logs in.

Bonus points
For bonus points, you might want to target a container of Users, say OU=Users and then iterate through that container outputting the UID’s for those users so you can then check for duplicates.

So here is an ugly bash script that does just that.

Automating macOS Server alerts

Update: 14-11-2016

So I thought I better add support for adding multiple email addresses.

Each email address that gets imported needs an updated index number to go into the Z_PK column. So the script will now import multiple email addresses and assign a Z_PK index number for each email address.

The script now will read in a CSV file instead of taking stdin for input. You will need to specify the location of the CSV file in the script, or modify as needed.

Update: 17-10-2016

So after a bit of further testing, it appears that the alertData.db does not get created automatically at installation of Server.app and requires the Alert tab be selected in Server.app for the db to be created. This presented a problem for me as I am automating the deployment of these macOS server machines and I want to include the alert email settings with zero touch. So some more digging around with the alertData.db and I was able to find the tables and values needed to create a bare database that enables the caching service to be enabled for alerts. The updated script now will create the alertData.db if it does not exist, enable email alerts for caching server (no other services are enabled) and then sets the notification recipients email address.

If you wish to enable notifications for extra services such as Mail, you should add a ‘1’ to the ZMAILENABLED and ZPUSHENABLED columns for the relevant service in the script where it inserts these values to the relevant column.

For example:
The table (ZADALERTBUNDLE) contains the following column names and types:

(Z_PK INTEGER PRIMARY KEY, Z_ENT INTEGER, Z_OPT INTEGER, ZENABLED INTEGER, ZMAILENABLED INTEGER, ZPUSHENABLED INTEGER, ZBUNDLE VARCHAR, ZNAME VARCHAR)

The columns we are interested in are ZMAILENABLED and ZPUSHENABLED. These columns will accept a data type of: INTEGER. In actual fact, this is really a boolean with either a 0 (False) or a 1 (True) value assigned.

Here is an example of what having the caching service enabled for mail and push notifications looks like in our alertData.db.
You can also see that the Mail service has mail and push notifications disabled

Z_PK        Z_ENT       Z_OPT       ZENABLED    ZMAILENABLED  ZPUSHENABLED  ZBUNDLE            ZNAME      
----------  ----------  ----------  ----------  ------------  ------------  -----------------  -----------
2           2           2           1           1             1             Caching            Caching    
5           2           2           1           0             0             Mail               Mail       

Note the two ‘0’s in the ZMAILENABLED and ZPUSHENABLED columns mentioned earlier, setting these to 1 means the service will have push and email alerts enabled for it.

Now we know what controls the enabled/disabled checkboxes in the alerts section of server.app it is trivial to modify this directly in the alertsData.db.

Original post below:

In macOS Server it used to be possible to edit the alert settings with something like :

serveradmin settings info:notifications:certificateExpiration:who = john.smith@contoso.com

Unfortunately it appears that it is no longer possible to add an email address to the alert settings via the command line.

This presented an issue for me as I automate the deployment of many macOS servers where we want to have an email address entered in the alert settings so that user receives notifications from their server.

After a little bit of digging the location where this information is now saved was found in

/Library/Server/Alerts/alertData.db

Specifically it is stored in a TABLE called ZADMAILRECIPIENT in a COLUMN called ZADDRESS VARCHAR

So now we know that, all we have to do is work out a way of adding in our desired values to that table.

The quick and dirty of it is something like this:

sqlite /Library/Alerts/alertData.db "INSERT or REPLACE INTO ZADMAILRECIPIENT VALUES('2','4','1','1','john.smith@apple.com','en')"

But in my quest to force myself into using python, I wrote a quick python script that takes the email address as the first argument and then writes this into our db, which makes it easy to put into my deployment workflow.

To use it, simply run it like this:

./script.py

You will need to edit the location of your csv file. Currently this is set in line number 96 of the script.

The script is as below.

Your one stop formatting shop

Update 22 Dec 2016

So I finally got around to creating a new NBI based on 10.12.
When I tried to run this script on 10.12, I found that Apple had changed the output in the diskutil command.

So previously I was searching for the text of either yes or no for the removable media label.
Under 10.12, this is no longer the case, with Apple replacing the word “No” with “fixed”

To combat this, the only thing to do is, of course, use python, for which I have a love hate relationship. So in the interests of just getting it done, I have updated the script replacing the following:

 $(echo "$DISK_INFO" | awk '/Solid State:/ {print $3}') = "No")

with

$(diskutil info -plist $DISK | plutil -convert json -o - - | python -c 'import sys, json; print json.load(sys.stdin)["SolidState"]')

This gets the output from diskutil as a plist, converts it into json and then uses python to print out the value for the key ‘SolidState’ which is returned as a boolean (true/false)

This is much better than parsing text which may change in the future.

Update – 9 Aug 2016

Well so it turns out I made some assumptions in my first draft of the formatting script around how to identify a machine for Fusion Drive. Turns out I also left out what to do if a FileVault enabled disk was discovered. I have updated the script to handle all these cases.

The script should now detect the correct device ID’s for the SSD and HDD if found. It will also check to see if a FileVault disk is locked or unlocked. If it is unlocked, it will proceed to delete the core storage volume. If the FileVault volume is locked, it will throw an error to the user via a CocoaDialog message box.

The script will also now check to ensure that the drives it formats are internal, physical, non removable media disks. As SD cards can often present as internal, physical this could be a complication. Luckily they also show up as removable, so checking this we can avoid formatting any SD cards that may be in the machine.

As I also do a lot of testing with VMware Fusion, I have a check in the script to ensure that VMware disks are formatted correctly as well. This is because VMware fusion disks show up as external, physical disks rather than internal, physical disks.

 

 

 

In my environment I use DeployStudio and Imagr to deploy our “image” to client devices.

Recently I came across an issue with some iMacs that have a Fusion drive.

When I use DeployStudio, I was targeting the restore task to “First drive available”

Screen Shot 2016-06-22 at 10.56.39 AM

This had always worked very well for me in the past, however I noticed that a few of the latest iMacs had failed to complete the image build (via Munki) due to a full HD.

When I checked their computer record in Munki Report it was pretty clear what had happened.

storage

storage2

For some reason, the fusion drive has ‘un-fused’ and DeployStudio has installed our base OS image onto the SSD.

Turns out this is a bit of an issue with DeployStudio. There are quite a few posts on the deploy studio forums about this.

There are a few solutions out there like having a workflow that is for Fusion Drive Mac’s that uses DeployStudio’s Fusion Drive task first and then you can just target the volume name of your new Fusion Drive in the Restore Task

Screen Shot 2016-06-22 at 11.13.10 AM

But I really like to have One Workflow To Rule Them All! So I didn’t like that solution, also users don’t know if their Mac has a fusion drive or not so there is confusion there.

Instead I came up with a script that will check a machine to see if it has a Fusion Drive or atleast the components for a fusion drive i.e. SSD and HDD.

The script will then create a new Fusion Drive, deleting any current FD if it already exists, create a new volume on the Fusion Drive called Macintosh HD.

The script will also be able to tell if the machine does not have a fusion drive, in this case the script will simply locate the internal HDD or SSD and format it and create a volume called Macintosh HD.

So now I simply run this script as the first item in the workflow and ensure that my Restore Task targets my new volume called Macintosh HD whether it be on a Fusion Drive LVG or a regular JHFS+ Partition.

The contents of the script are as below:

 

 

 

 

When Apple Caching Server just can’t even right now

Update: This was logged as a bug to Apple and has been resolved in iOS 10 and macOS 10.12

See http://www.openradar.me/radar?id=4958891762778112 for details

Background

Apple Caching Server is pretty cool and it really makes a lot of sense in a large environment.

However, large environments often have a rather complex network topology which makes configuration and troubleshooting a little more difficult.

I just happen to work in a very large environment with a complex network topology.

We have many public WAN IP’s which our client devices and Apple caching servers use to get out to the internet – via authenticated proxies no less.

Apple has some pretty good, although a bit ambiguous in parts, documentation on configuring Apple Caching for complex networks here: http://help.apple.com/serverapp/mac/5.1/#/apd6015d9573

Essentially we have a network that looks a little bit like this:

complex

 

Apple Caching server supports this network topology, however we need to provide our client devices access to a DNS TXT service record in their default search domain so the client device will know all of our WAN IP ranges.

So how does this caching server thing work on the client anyway?

There is a small binary/framework on the client device that does a ‘discovery’ of Apple caching servers approximately every hour – or if it has not yet populated a special file on disk, it will run immediately when requested by a content downloading service such as the App Store.

This special binary does this discovery by grabbing some info about the client device such as the LAN IP address and subnet range, and then it looks up our special DNS Text record ( _aaplcache._tcp. ) and sends all of this data to the Apple Locator service at: lcdn-locator.apple.com

Apple then matches the WAN IP ranges and the LAN IP  ranges provided and sends back a config that the special process writes out to disk. This config file contains the URL of a caching server that it should use (if one has been registered)

This special file on disk is called diskcache.plist, if it has been able to successfully locate a caching server, you should see in this file a line like this:

"localAddressAndPort" => "10.10.10.10:49313"

Where 10.10.10.10:49313 is the IP address and port of the caching server the client should use.

Now this diskcache.plist file exists in a folder called com.apple.AssetCacheLocatorService inside /var/folders. The exact location is stored in the DARWIN_USER_CACHE_DIR variable. This can be revealed by running:

getconf DARWIN_USER_CACHE_DIR

Which should output a directory path like this:

/var/folders/yd/y87k7kk14494j_9c0y814r8c0000gp/C/

Then you can just use plutil -p to read the diskCache.plist

sudo plutil -p /var/folders/yd/y87k7kk14494j_9c0y814r8c0000gp/C/com.appleAssetCacheLocatorService/diskCache.plist

And it should give you some output like this

*Thanks to n8felton for the info about the /var/folders !

Now all of this is fine and no problem, it all works as expected.

c4jt321

Except when it doesn’t.

At some sites, we were seeing a failure of client devices to pull their content from their caching server. The client device would simply pull its content over the WAN.

After a lot of trial and error and wire-sharking (is that a thing?) we found the problem.

As I mentioned earlier we were having _some_ client devices not able to pull their content from the caching server. After investigation on the client we found that they were not populating their diskcache.plist with the information we need from the apple locator service.

How come?

Well in our environment, we utilise a RODC at each site. This AD RODC (Read only domain controller) also operates as a DNS server. It is also the primary DNS server that is provided to clients via DHCP.

We have a few “issues” with our RODCs from time to time and quite often we just shut them down and let the clients talk to our main DC’s and DNS servers over the WAN. However, when we shutdown the RODC’s we don’t remove them from the DHCP servers DNS option. So clients still receive a DHCP packet with a primary DNS server of our now turned off RODC DNS server, they also receive a secondary, and third DNS server that they will use.

As expected the clients seem quite happy with this, the clients are able to perform DNS lookups and browse the internet as expected even though their primary DNS server is non-responsive.

BUT it seems that the special little caching service discovery tool on the client devices does not fail over and use the secondary (or third) DNS server. It seems that this tool only does the DNS lookup for our TXT record against the primary DNS server.

So because this DNS TXT record lookup fails, the caching service discovery tool doesn’t get a list of WAN IP address ranges to send to the Apple locator URL and thus never gets a response back about which caching server it should use!

The fix.

Once we manually remove the non-responsive primary DNS server from the DHCP packet, so the client device now only gets our 2 functional DNS servers as the primary and secondary servers, the caching service discovery tool is able to lookup our DNS TXT record and receive the correct caching server URL from the Apple locator service and everything is right in the world again!

Rainbows-Unicorns-Button