Wednesday, July 18, 2018

How to Increase Quality with a Code Coverage Hack

Back of Watch Mechanism Close Up : Stock Photo




In this post I'll summarize what code coverage is, how it can be abused, but also how it can be leveraged to gently increase design and architecture quality, reduce bug regressions, and provide verifiable documentation.  But first a short story:

The Hawthorne Effect


From 1924 to 1932, a Western Electric company called Hawthorne Works conducted productivity experiments on their workers.  The story goes like this:

First, they increased lighting and observed that productivity went up.  Enthused, they increased lighting further and productivity went even higher

Before reporting the fantastic news that increasing productivity would be cheap and easy, they tried decreasing lighting as a control.  To their horror instead of productivity decreasing as predicted it now soared!

What was going on?!

The conclusion they eventually reached was that lighting was completely unrelated to productivity and that the workers were more productive the more they felt they were being observed.  This psychological hack has been dubbed The Hawthorne Effect or the observer effect.

Leverage A Psychological Hack


If you're wondering what this has to do with software development, the Hawthorne Effect is a tool that can be leveraged by astute managers and team leads to gently increase quality on a team.  Instead of forcing unit testing on possibly reluctant team members, leads can regularly report on the one metric, and as with the Hawthorne Effect, teams will feel the effects of being observed and naturally want to increase their number.

If it sounds too good to be true keep in mind this is obviously more relevant for newer or less mature teams than highly functioning ones.  Or, perhaps you doubt that quality will increase in conjunction with code coverage.  Before we can get there we should cover (see what I did there) what it is.

What is Coverage?


According to Wikipedia

Coverage is a measure used to describe the degree to which the source code of a program is executed when a particular test suite runs.

Or put another way it's a percentage that shows how many lines of production code have been touched by unit tests.  An example will help.

Consider this method:

public static string SayHello(string name)
{
    if (string.IsNullOrEmpty(name))
    {
        return "Hello Friend";
    }
    else
    {
        return "Hello " + name;
    }
}

If you have just a single (XUnit) unit test like this:

[Fact]
public void Test1()
{
    var actual = Class1.SayHello("Bob");
    Assert.Equal("Hello Bob", actual);
}

Then it will cover ever line of code except for the "Hello Friend" line.

On C# based projects there's this amazing tool called NCrunch that runs tests continuously.  It calculates the SayHello method as five lines of code.  It shows covered lines with green dots and uncovered lines as white:



Since four of those lines are touched by tests the result is a code coverage of 4/5 or 80%.



As a quick aside I find continuous testing tools like NCrunch and its JavaScript cousin Wallaby.js to be extremely motivating -- fun even.  Doesn't that white dot just bug the OCD in you?  They're also a huge productivity enhancer thanks to their nearly instantaneous feedback.  And another bonus: they also report coverage statistics.  If you're looking to increase quality on a team consider continuous testing tools, they pay for themselves quickly.

How to Cheat


If you're concerned that sneaky developers will find some way to cheat the number, make themselves look good, and not increase quality at all, you're not entirely wrong.  As with any metric, coverage can be cheated, abused, and broken.  For one thing, I've known at least one developer (not me I swear) who wrote a unit test to use reflection to loop through every class and every property to ensure that setting the property and subsequently getting it resulted in the property that was set.  

Was it a valuable test?  Debatable.  Did it increase code coverage significantly to make one team look better than the others to a naive observer?  Absolutely.  

On the opposite side of the spectrum consider this code:

public bool IsValid()
{
    return Regex.IsMatch(Email,
        @"^(?("")("".+?(?<!\\)""@)|(([0-9a-z]((\.(?!\.))|" + 
        @"[-!#\$%&'\*\+/=\?\^`\{\}\|~\w])*)" +
        @"(?<=[0-9a-z])@))(?(\[)(\[(\d{1,3}\.){3}\d{1,3}\])|" + 
        @"(([0-9a-z][-0-9a-z]*[0-9a-z]*\.)" + 
        @"+[a-z0-9][\-a-z0-9]{0,22}[a-z0-9]))$");
}

A developer could get 100% code coverage for that method with a single short test.  Unfortunately, that one line method has an insane amount of complexity and should actually contain perhaps hundreds of tests, not one of which will increase the code coverage metric beyond the first. 

The thing about cheating code coverage is that even if developers are doing it, they're still writing tests.  And as long as they they keep a continued focus on writing tests they'll necessarily begin to focus on writing code that's testable.  And as Michael Feathers, author of Legacy Code, points out in one of my favorite presentations The Deep Synergy between Testability and Good Design,

Testable code is necessarily well designed

Go watch the video if you don't believe me (or even if you do and haven't watched it yet). 





The trick, however, is to keep the focus on code coverage over time, not just as a one time event.

The Ideal Number


In order to maintain a focus on code coverage over time, perhaps setting a target goal would be a good approach.  I'm usually a little surprised when I ask "What's the ideal coverage number?" at local user groups and routinely hear answers like 80%, 90%, or even 100%.  In my view the correct answer is "better than last sprint" -- or at least no worse.  Or in the immortal words of Scott Adams (creator of Dilbert): goals are for losers, systems are for winners.

To that end, I love that tools like VSTS don't just report code coverage, they show a chart of it over time.  But while incorporating coverage in the continuous integration process is a great starting point, as it provides a single source of truth, great teams incorporate the number into other places.  

I've written about the importance of a retrospective before, but I feel it's also the perfect venue to leverage the Hawthore Effect to bring up the topic of coverage on a recurring basis.  The retrospective can also be an excellent opportunity for positivity.  For instance, a code coverage of 0.02% may not sound great, but if coverage was 0.01% the prior sprint that could legitimately show up on a retrospective under "what we did well" as doubled the code coverage!.

Summary


Even if a team abuses the code coverage metric to some degree, a sustained interest in testing through ongoing reporting can gradually and incrementally allow a team to reap the benefits of unit testing.    As a team writes more tests their code will become more testable, their testable code will become more loosely coupled and better architected, their bugs will regress less often, they'll end up with verifiable documentation, and small refactorings will become more common because they are safer and easier.  In short the team will increase in maturity and their product will increase in quality.

As always if you agree or disagree I'd love to hear about it in the comments or on twitter.

Wednesday, May 23, 2018

Intro to Cake: Cross Platform Build Automation in C# [video]

This is a version of my "Intro to Cake" presentation redone for YouTube. 

In case you aren't aware (or missed my post Why Use Cake? 4 Reasons.), Cake is fantastic tool for automating the compilation, test, package, and deploy of .Net projects.  In the presentation I explore the why and what of Cake (C# Make), and compare it with popular build automation solutions like powershell, make, ant, psake, and VSTS tasks or Jenkins plugins.

Friday, May 11, 2018

Why Use Cake? 4 Reasons.

Continuous Integration servers like Team City or Visual Studio Team Services can provide an incredible amount of power.  They can distill a breathtaking range of devops complexity to a few checkboxes thanks to tasks or 3rd party plug-ins.

Unfortunately, many build managers fail to realize that GUI-based plugins (or using Continuous Integration Logic to coin a phrase) comes at a cost.  Until recently, I certainly didn't recognize these costs.  Cake and other make-like tools reduce these costs.  Here are the 4 main reasons I prefer make-like tools:

Embed from Getty Images

1. Reduce Vendor Lock-In


And in the darkness bind them

On my current project we've changed CI platforms twice in the last two years due to changing InfoSec requirements.  It was a large amount of work to build a cross platform mobile devops pipeline in the first place, but to rebuild it two more times after that was a staggering amount of rework.

If there's one thing I've learned through this experience it's this: my ideal CI server build definition now consists of a single task: Run Make-Like Tool.  In my case that tool happens to be Cake, because it's based in C#, it's cross platform and open source, and it has a gigantic set of plugins for accomplishing a wide range of tasks in the .Net toolchain and beyond.

But regardless of which Make-Like tool you use: ant, gradle, rake, psake, fake, etc, the point is that by eliminating a CI Server's custom tools and marketplace plugins, you reduce dependence on a particular  CI server, eliminate vendor lock in, and increase portability.  Then should you ever be forced to move from the cloud to in-house or back the migration will be a small effort.

2. Democratize DevOps


In my experience most teams end up with one owner of the build pipeline.  If something breaks or a developer adds a feature that requires build script changes then nothing can move forward until the build manager, or BM if you will, can investigate.

But what happens if the BM is on vacation?  What happens if they're overloaded fighting fires, deploying builds, or fixing production servers that just went offline?  What happens if they get hit by a bus?

When we moved away from custom Jenkins tasks to Cake on my last project, the build scripts became available to every developer.  And because the build automation scripts were in the exact same language (C#) and location (source control) as the production system code, regular developers felt more comfortable jumping in and contributing, fixing, and extending as necessary.

The evolution reminded me of  projects where database access was once owned exclusively by stored procedure writing DBA's.  When those projects moved to ORM, at first there was fear that everyone would break everything.  Very quickly, however, the removal of strict firewalls replaced fear with flexibility, deadlocks with agility, and hard dependencies with increased productivity.  Some people became less busy, others became more so, but overall the project moved noticeably faster.

3. Reduce Impedance Mismatch


Imagine you're tasked with implementing a feature that affects both production code and the build automation process -- something like crash reporting, swapping unit testing frameworks, or incorporating build environment information into the app (e.g. use a green background if in UAT).  In a world of extensive Continuous Integration Logic you (or a BM) must create a custom build definition for the feature branch for testing.  Then when you move the feature to the develop branch you must remember to update the associated build definitions there.  Then you have to remember to update again when you merge to master, and possibly again for release branches.

The problem is that production code is frequently tightly coupled with build automation logic whether you like it or not, yet build definitions stick rigidly with a particular branch -- they fail to move with production code.

By placing all build automation logic in scripts that live under the same source control as production code you can leverage the tight coupling instead of fighting it.  Build automation logic will flow smoothly through branches and merges, and no one ever needs to remember to update build definition logic, since all build definitions are identical.

And, as a bonus feature because your scripts are under source control you get version history, you can do a blame to determine that: oh yea, it was actually you that wrote that crap.

4. Simplify Debugging


The best part of moving to make-like scripts and eliminating Continuous Integration Logic is improving the "developer experience".  When things go wrong on a CI server with custom logic you can't set breakpoints, environmental differences are inaccessible, logging options are limited, and you frequently have to wait very long times to see the results of any changes (i.e. the build manager inner loop, to coin another phrase).

With Cake I get feedback fast.  I can start a debugging session and inspect variables.  I can temporarily remove time consuming dependent tasks whose output I happen to know is cached.  I never have to wait for other people's builds to complete.  Also, having IntelliSense (code completion) is a productivity boost, and having the flexibility of writing my own custom plugins in C# has been extremely powerful.

Summary


The hidden costs of Continuous Integration Logic may not exceed their convenience for every project.  However, having seen the pain first-hand on several occasions, my preference will be for make-like tools on all future projects.  I've also migrated all my personal projects to Cake knowing it'll be the last CI migration they'll need.  That is a great feeling.

Hopefully this post has helped illuminate some hidden costs of which you were unaware.  If I missed one, or you disagree, please respond in the comments to start a discussion, or hit me up on twitter.

Monday, April 30, 2018

How To Push Updates to Raspberry Pi UWP Apps in Prod

Updating Raspberry Pi apps in the field can be tricky.  This post covers the general problem and address some specific side-loading problems you are likely to run into.


The (Re)Deployment Problem


Imagine you have an IoT app like a kiosk, digital sign, or temperature reader that you want to productize and ship onto small (inexpensive) devices like Raspberry Pi's.  If you're already in the Microsoft ecosystem or you want features like BitLocker, Automatic Updates, or Enterprise Data Protection (the ability to remote wipe a lost device), then the Windows IoT operating system is an obvious choice (plus the Windows IoT Core edition is free).

But once you've developed your app and are ready to ship, how do you quickly and consistently get your app (or more likely a set of foreground and background apps) onto devices?  More importantly, when you have a bug fix or feature enhancement how do you push it out to devices in the field?

The Initial Image


In most regards, the Microsoft documentation on building Windows IoT images is excellent. After installing some prerequisite software, you use a set of provided command line tools to create numerous xml files that all together describe an image.  You then compile those xml files into a multi-gigabyte .ffu image file that you can write to an SD Card using the Windows IoT Core Dashboard software (or better yet, the flashsd command line tool).

Now you can plug your newly flashed SD cards into a bunch of Raspberry Pi's and send them all over the world.  However, unless you're Chuck Norris, you're going to need to ship bug fixes to those Raspberry Pi's, and visiting each one is out of the question.  There are two primary options: store-based, and side loading.

Store-Based Updates


Microsoft allows developers to publish Windows IoT apps to the Microsoft Store, at which point it will push app updates down onto device and install them automatically.  This option is extremely secure and reliable.  However, there are several downsides.


  1. There's a special sign up process that can be time consuming.  I imagine it's gotten better, but it took me several weeks to get approved.  
  2. Background apps require extra work to publish because each one requires a special empty Foreground UWP app that developers must submit to the store (see Special Instructions for Headless Apps).  
  3. It's tricky to get multiple dependent apps to publish simultaneously as a package.  
  4. Updates require an approval process and on top of that can take up to 24 hours to get pushed to devices.


Side Loading Updates


If you need complete control over the deployment process, for instance to ensure that multiple dependent packages upgrade in a certain order, then there is an additional option.  You can manage your own app deployments using the PackageManager API along with a special .appxmanifest permission that allows side-loading app updates.  Naturally, you'll need a server for hosting your app update files and informing your clients that an update is available.  You then create a background UWP app that monitors your server and downloads and installs packages when it detects changes.

While this option has the benefit of more control, keep in mind the following disadvantages:


  1. Hand rolling the side loading code opens the attack surface for your app.
  2. You'll have to maintain a dedicated server and ssl certificate
  3. It may be more brittle as the entire process could break if your URL changes, SSL certificate expires, or something breaks in your upgrader background app


Matthijs Hoekstra beautifully documents one way to do this in his blog post Auto updater for my side loaded UWP Apps.  His approach involves sending notifications to the updater background app using the UWP App-To-App Communication approach I described in a Visual Studio Magazine article I published last year.

If you prefer having a single app both monitor changes and install updates, thus isolating all deployment responsibilities into a single app, you may want to take a look at the SirenOfShame.Uwp.Maintenance project of the Siren of Shame UWP App.   If you check out that code, you'll see I download the app to a temporary directory (perhaps a new UWP requirement), provide a lot of logging, and use certificate pinning for better security.  I designed that project to be generic enough that you could copy-paste it into a new project and only make a few changes.

Regardless, if you combine side-loading and image creation you are likely to run into problems.  Here are the problems and solutions I've found.

Problems


Package failed updates


ErrorCode = -2147009293
System.Runtime.InteropServices.COMException (0x80073CF3): Package failed updates, dependency or conflict validation.

You'll get this error when you include an appx in your image instead of an appxbundle.  Including an appx is what they tell you to do in Step 4 of Lab 1b when they say "Generate app bundle: Never".  The problem is you can only side-load an appxbundle if you originally installed an appxbundle.  To get around this ignore the directions and set "Generate app bundle" to "Always" when you generate your app to include in your image.  When you run the newAppxPkg command you can just reference your appxbundle instead of your appx and everything works exactly the same from there.

Removal failed


ErrorCode = -2147009286
System.Runtime.InteropServices.COMException (0x80073CFA): Removal failed. Please contact your software vendor.

You'll get this error if you try to uninstall a package via PackageManager.RemovePackageAsync() that had been installed into an image as an appx.  Instead, always include an appxbundle instead of an appx when you build images (see above).

Install failed


ErrorCode = -2147009287
System.Runtime.InteropServices.COMException (0x80073CF9): Install failed. Please contact your software vendor.

You'll get this error if you try to sideload via .UpdatePackageAsync() an app that you initially included in an image but then you subsequently F5 deployed over top of from Visual Studio.  The solution here is to put a try/catch around your call to .UpdatePackageAsync() and if there's a COMException try to do an uninstall and then an install.

Summary


I've spent numerous hours getting this process right and I hope this document saves someone some time.  If so please share your IoT deployment experience in comments or ping me @lprichar.

Monday, January 29, 2018

Securing Communications via Certificate Pinning in UWP

Embed from Getty Images

If you've ever clicked the "Decrypt HTTPS Traffic" button in Fiddler you know how extremely easy it is to initiate a man-in-the-middle attack, and watch (and even modify) the encrypted traffic between an application and a server.  You can see passwords and app private information and all kinds of very interesting data that the app authors probably never intended to have viewed or modified.

It's also easy to protect against against man-in-the-middle attacks, but few apps do.

For instance, I own a Ring doorbell and have the Ring (UWP) app installed in Windows so I can (among other things) ensure when outgoing Siren of Shame packages are picked up by the post  Here's a recent HTTPS session between the app and the server:


I wonder what would happen if I modified the value of "bypass_account_verification" to True upon requests to https://api.ring.com/clients_api/profile?  You can do that type of thing with little effort in the FiddlerScript section, which I show in a supplementary episode of Code Hour:





If you're writing an app, your risk of man-in-the-middle attacks isn't limited to curious developers willing to install a Fiddler root certificate in order to hide all HTTPS snooping errors.  Consider this scary and articulate stack overflow answer:

Anyone on the road between client and server can stage a man in the middle attack on https. If you think this is unlikely or rare, consider that there are commercial products that systematically decrypt, scan and re-encrypt all ssl traffic across an internet gateway. They work by sending the client an ssl cert created on-the-fly with the details copied from the "real" ssl cert, but signed with a different certificate chain. If this chain terminates with any of the browser's trusted CA's, this MITM will be invisible to the user.

The under-utilized solution for app developers is: certificate pinning.

UWP Pinning?  No Soup For You


Certificate pinning, or public key pinning, is the process of limiting the servers that your application is willing to communicate with, primarily for the purpose of eliminating man in the middle attacks.

If the Ring app above had implemented certificate pinning, then they would have received errors on all HTTPS requests that Fiddler had intercepted and re-signed in transit.  My personal banking app in Windows does this and on startup gives the error "We're sorry, we're unable to complete your request.  Please try again" if it detects that the signing certificate isn't from whom it should be (even if it is fully trusted).

Implementing certificate pinning is usually pretty easy in .Net.  Typically it involves the ServerCertificateVerificationCallback method on the ServicePointManager.  It then looks something like this:

public static async void Main(string[] args)
{
    // Set callback (deleagte)
    ServicePointManager.ServerCertificateValidationCallback = PinPublicKey;

    WebRequest request = WebRequest.Create("https://...");
    WebResponse response = await request.GetResponseAsync();
    // ...
}

private static bool PinPublicKey(object sender, X509Certificate certificate, X509Chain chain, SslPolicyErrors sslPolicyErrors)
{
    if (certificate == null || chain == null)
        return false;

    if (sslPolicyErrors != SslPolicyErrors.None)
        return false;

    // Verify against known public key within the certificate
    String pk = certificate.GetPublicKeyString();
    return pk.Equals(PUB_KEY);

}

That works for all requests in the AppDomain (which, incidentally, is bad for library providers, but convenient for regular app developers).  You could also do it on a request by request basis by setting the ServerCertificateCustomValidationCallback method of the HttpClientHandler for an HttpClient (see example below).

Either way, notice the GetPublicKeyString() method.  That's a super-useful method that'll extract out the public key so you can compare it with a known value.  As OWASP describes in the Pinning Cheat Sheet, this is safer than pinning the entire certificate because it avoids problems if the server rotates it's certificates.

That works beautifully in Xamarin and .Net Core.  Unfortunately, there's no ServicePointManager in Universal Windows Platform (UWP) apps.  Also, as you'll see we won't be given an X509Certificate object so getting the public key is harder.  There's also virtually zero documentation on the topic and so the following section represents a fair amount of time I spent fiddling around.

UWP Certificate Pinning Solved (Kinda)


As described by this Windows Apps Team blog there are two HttpClients in UWP:

Two of the most used and recommended APIs for implementing the HTTP client role in a managed UWP app are System.Net.Http.HttpClient and Windows.Web.Http.HttpClient. These APIs should be preferred over older, discouraged APIs such as WebClient and HttpWebRequest (although a small subset of HttpWebRequest is available in UWP for backward compatibility).

If you're tempted to use System.Net.Http.HttpClient because it's cross platform or because you want to use the ServerCertificateCustomValidationCallback method I mentioned earlier, then you're in for a unpleasant surprise when you attempt to write the following code:

HttpMessageHandler handler = new HttpClientHandler
{
    ServerCertificateCustomValidationCallback = OnCertificateValidate
};

var httpClient = new System.Net.Http.HttpClient(handler);

UWP will give you this response:

System.PlatformNotSupportedException: The value 'System.Func`5[System.Net.Http.HttpRequestMessage,System.Security.Cryptography.X509Certificates.X509Certificate2,System.Security.Cryptography.X509Certificates.X509Chain,System.Net.Security.SslPolicyErrors,System.Boolean]' is not supported for property 'ServerCertificateCustomValidationCallback'.

Even using Paul Betts' awesome ModernHttpClient doesn't get around the problem. The only solution I've found is to use the Windows.Web.Http.HttpClient and the ServerCustomValidationRequested event like this:

using (var filter = new HttpBaseProtocolFilter())
{
    // todo: probably remove this in production, avoids overly aggressive cache
    filter.CacheControl.ReadBehavior = HttpCacheReadBehavior.NoCache;
    filter.ServerCustomValidationRequested += FilterOnServerCustomValidationRequested;
    var httpClient = new Windows.Web.Http.HttpClient(filter);
    var result = await httpClient.GetStringAsync(new Uri(url));
    // always unsubscribe to be safe
    filter.ServerCustomValidationRequested -= FilterOnServerCustomValidationRequested;

Notice the CacheControl method.  I thought I was going mad for a while when requests stopped showing up in Fiddler.  Turns out Windows.Web.Http.HttpClient's cache is so aggressive that unlike System.Net.Http.HttpClient, it won't make subsequent requests to a url it's seen before, it'll just return the previous result.

The last piece of the puzzle is the FilterOnServerCustomValidationRequested method and how to extract a public key from a certificate without the benefit of of an X509Certificate:

private void FilterOnServerCustomValidationRequested(
    HttpBaseProtocolFilter sender, 
    HttpServerCustomValidationRequestedEventArgs args
    ) {

    if (!IsCertificateValid(
        args.RequestMessage, 
        args.ServerCertificate, 
        args.ServerCertificateErrors))
    {
        args.Reject();
    }
}

private bool IsCertificateValid(
    Windows.Web.Http.HttpRequestMessage httpRequestMessage, 
    Certificate cert, 
    IReadOnlyList sslPolicyErrors)
{
    // disallow self-signed certificates or certificates with errors
    if (sslPolicyErrors.Count > 0)
    {
        return false;
    }

    // by default reject any requests that don't use ssl or match up to our known base url
    if (!RequestRequiresCheck(httpRequestMessage.RequestUri)) return false;

    var certificateSubject = cert?.Subject;
    bool subjectMatches = certificateSubject == CertificateCommonName;

    var certificatePublicKeyString = GetPublicKey(cert);
    bool publicKeyMatches = certificatePublicKeyString == CertificatePublicKey;

    return subjectMatches && publicKeyMatches;
}

private static string GetPublicKey(Certificate cert)
{
    var certArray = cert?.GetCertificateBlob().ToArray();
    var x509Certificate2 = new X509Certificate2(certArray);
    var certificatePublicKey = x509Certificate2.GetPublicKey();
    var certificatePublicKeyString = Convert.ToBase64String(certificatePublicKey);
    return certificatePublicKeyString;
}

private bool RequestRequiresCheck(Uri uri)
{
    return uri.IsAbsoluteUri &&
        uri.AbsoluteUri.StartsWith("https://", StringComparison.CurrentCultureIgnoreCase) &&
        uri.AbsoluteUri.StartsWith(HttpsBaseUrl, StringComparison.CurrentCultureIgnoreCase
        );

}

There may be a less expensive version of the GetPublicKey() method that involves indexing into the type array, but the above seems pretty clean to me.  The only possible issue is you might need to reference the System.Security.Cryptography.X509Certificates nuget package from Microsoft depending on your UWP version.

You can see my final version in the Maintenance project of the Siren of Shame UWP app I'm building, along with a possible drop-in CertificatePinningHttpClientFactory.

Summary


Hopefully this clarifies what certificate pinning is, why you'd want it, and how to implement it.  If you found it useful or have any questions please share in the comments or hit me up on twitter.