Monday, February 24, 2020

Strongly Typed, Dependency Managed Azure in C#: Introducing Cake.AzureCLI

The story, nay legend, of providing strongly typed, cross platform, dependency managed access to all 2,935 Azure CLI commands in C#.



You can now have strongly typed, cross platform, dependency managed access to all 2,935 Azure CLI commands in C#, with full intellisense including examples. That's because I just published a Cake plugin for AzureCLI called Cake.AzureCli.


This blog post is a little about what it is and how to use it, but it's more about how I built it. That's because I had a blast solving this problem and my solution might even entertain you: parsing thousands of help files through the CLI, storing results in 16 meg intermediate JSON files, and code generating 276K lines of code with T4 templates.


In the process I apparently also broke Cake's static site generator.


Oops.

But first, I suppose the most relevant information is the what and the how.

Have & Eat Your Cake.AzureCLI


This plugin runs in Cake. If you aren't familiar with cake, please check out Code Hour Episode 16, Intro to Cake, where I go over what it is and why you should care.


If you don't have a spare hour right now: it's a dependency management system (like make, ant, maven, or rake) except in C#. It also has a huge plugin ecosystem, one that's now "slightly" larger with access to all of Azure CLI.

Right, you didn't watch the video, and you're still skeptical, right? You're wondering what was wrong with the official azure-sdk nuget plugins. The answer is: they aren't Cake enabled and so they don't support dependency management. If that statement isn't meaningful to you, please, watch just the "Scripts" section of my talk starting at 9:08.

Now that you're 100% convinced let's dig in. Using Cake.AzureCLI is as simple as adding a preprocessor directive to pull it from NuGet:
#addin "nuget:?package=Cake.AzureCli&version=1.2.0"
And then accessing commands like Az().. So a simple program to log in and list all your resource groups might look like this:

var target = Argument("target""Default"); var username = Argument<string>("username"null); var password = Argument<string>("password"null); Task("Login")    .Does(() => {    // 'az login' is accessed via Az().Login()    Az().Login(new AzLoginSettings {       Username = username,       // all commands can be customized if necessary with a ProcessArgumentBuilder       Arguments = new ProcessArgumentBuilder()          // anything appended with .AppendSecret() will be rendered as [REDACTED]           //    if cake is run with `-verbosity=diagnostic`          .Append("--password").AppendSecret(password)    }); }); Task("ListResourceGroups")    .IsDependentOn("Login"// yayy dependency management!    .Does(() => {    // listing names of all resource groups    Information("Resource Groups:");    // all results are strongly typed as dynamic if results are json    dynamic allResourceGroups = Az().Group.List(new AzGroupListSettings());    foreach (var resourceGroup in allResourceGroups) {       Information(resourceGroup.name);    } }); RunTarget(target);
And that should hopefully provide enough background to go create sql instances, scale up or down kubernetes clusters, and provision VM's with dependency management, from the comfort of a language you know and love.

The Making Of


"But Lee, I'm dying to know, how did you build this work of art?"
Oh, I'm so very glad you asked. Writing something this large by hand was obviously not going to work. Plus it needs to be easy to update when Azure team releases new versions. Code generation it was. And I always wanted to learn T4 templates.

I first came up with a data structure, always a solid place to start. I wanted something that would support Azure CLI, but that could also be used to generically represent any CLI tool, because ideally this solution could work for other CLI programs as well. I came up with this:



A Program contains a single root Group (az). Groups can contain other Groups recursively (e.g. az contains az aks which contains az aks nodepool). Groups can contain Commands (e.g. az aks contains az aks create). And for documentation Commands can have Examples and Arguments.

It's basically a tree, with Commands as leafs, and so will work nicely in json. But how to populate it?

Fill er up

"Well, that's not how I would have done it"

said a skeptical co-worker when I told him I was executing thousands of az [thing] --help commands and parsing the results. See, AzureCLI was written in Python and is open source, so theoretically I could have downloaded their source and generated what I needed from there in Python.

But I really wanted a more generic approach that I or someone else could apply to any CLI program. So I parsed each "xyz --help" into an intermediary object: a Page. That's basically just a collection of headers, name-value pairs, and paragraphs. Then I converted pages to groups or commands and recursed to produce a 350,385 line, 15 megabyte behemoth.



Incidentally a fun side-effect of this approach is you can see all the changes across Azure CLI version changes e.g. this commit shows changes from 2.0.77 to 2.1.0 (although GitHub doesn't like showing diffs across 16 Meg files in the browser for some reason, can't imagine why).

T4 Templates


I'd never used T4 templates. Turns out they're super awesome. Well, super powerful, and pretty awesome anyway. They are a little annoying when every time you hit save or tab off a .tt file it takes 13 seconds to generate your 178 thousand lines of code -- even on an 8 core i9 with 64 gigs of ram and an SSD. Oh, and then at that scale Visual Studio seems to crash periodically, although I'm sure Resharper doesn't help.

But whatever, they work. And this part is cool: If you set hostspecific=true, then you can access this.Host to get the current directory, read a json file, then deserializing it to model objects that live in a .Core project that you can reference inside of the tt file yet not reference inside of your main project (Cake.AzureCli). If you're interested check out Az.tt.

What to generate was interesting too. The easy part is exposing a method to cake. You just write an extension method like this:

public static class AzAliases {     [CakeMethodAlias]     public static AzCliGroup Az(this ICakeContext context)     {         return new AzCliGroup(context);     } }
And Cake is good to go. But what about generating 2,935 extension methods? Turns out, not such a great idea. The intellisense engine in Visual Studio code is powered by OmniSharp. As awesome as OmniSharp is, it just isn't quite powerful enough to generate intellisense quickly or accurately with that architecture. However, if you group commands into "namespaces" like Az().Aks.Create() instead of AzAksCreate(), then you get nice intellisense at every level:




Conclusion


While this project may not solve world hunger just yet, I do hope it'll make someone's life a little easier. More importantly, I hope this technique will entertain or better yet inspire someone (you) to create something cool. If it does, please let me know about it in the comments or on twitter.

Wednesday, January 22, 2020

Conquer ASP.Net Boilerplate Query Performance in LINQPad, (Announcing LINQPad.ABP)

Ever made it to production only to realize your code fails miserably at scale?  When performance problems rear their gnarly head and there name is EntityFramework, there is but one blade to slice that gordeon knot: LINQPad.  
However, if you're using the ASP.Net Boilerplate (ABP) framework, the situation is a tad more dire. That's because ABP uses a repository pattern, which, it turns out, is less than compatible with LINQPad's DataContext centric approach.
In this post I'll describe two ways to use LINQPad with ABP apps to solve performance problems:
1. Rewrite repository pattern based queries to run in LINQPad with a data context.  This works well for small problems.
2. Enable LINQPad to call directly into your code, support authentication, multi-tenancy, and the unit of work and repository patterns with the help of a NuGet packge I just released called LINQPad.ABP. This helps immensely for more complex performance problems.

Repository Pattern 101


The Repository and Unit of Work patterns used in ASP.Net Boilerplate apps are a wonderful abstraction for simplifying unit testing, enabling annotation based transactions, and handling database connection management.  As described in this article from the Microsoft MVC docs:
The repository and unit of work patterns are intended to create an abstraction layer between the data access layer and the business logic layer of an application. Implementing these patterns can help insulate your application from changes in the data store and can facilitate automated unit testing or test-driven development (TDD)
That document gives a nice diagram to show the difference between an architecture with and without a repository pattern:



However, these abstractions present a problem for LINQPad.  When using LINQPad to diagnose performance problems it would often be super convenient to call directly into your app's code to see the queries translated into SQL and executed.  However, even if LINQPad understood dependency injection, it would have no idea how to populate a
 IRepository, what to do with a [UnitOfWork(TransactionScopeOption.RequiresNew)] attribute, or what value to return for IAbpSession.UserId or IAbpSession.TenantId.  Fortunately, I just released a NuGet Package to make that easy. 
But first, the simplest way to solve the problem for single queries is just to rewrite the query without a repository pattern and paste it into LINQPad.

Undoing Repository Pattern


This is where this blog post gets into the weeds.  If you'd rather watch me perform the following steps please check out my latest episode of Code Hour:




Otherwise, here it is in written form:
Step one is to enable ABP'S data context to support a constructor that takes a connection string.  If you add the following code to your DataContext:
#if DEBUG
        private string _connectionString;

        ///
        /// For LINQPad
        ///

        public MyProjDbContext(string connectionString)
            : base(new DbContextOptions())
        {
            _connectionString = connectionString;
        }

        protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
        {
            if (_connectionString == null)
            {
                base.OnConfiguring(optionsBuilder); // Normal operation
                return;
            }

            // We have a connection string
            var dbContextOptionsBuilder = new DbContextOptionsBuilder();
            optionsBuilder.UseSqlServer(_connectionString);
            base.OnConfiguring(dbContextOptionsBuilder);
        }

#endif

Then in LINQPad you can
  1. Add a connection
  2. "Use a typed data context from your own assembly"
  3. Select a Path to Custom Assembly like "MyProjSolution\server\src\MyProj.Web.Host\bin\Debug\netcoreapp2.1\MyProj.EntityFrameworkCore.dll"
  4. Enter a Full Type Name of Typed DbContext like "MyProj.EntityFrameworkCore.MyAppDbContext"
  5. LINQPad should instantiate your DbContext via a constructor that accepts a string, then provide your connection string


Now, when you start a new query you can write:
var thing = this.Things.Where(t => t.Id == 1);
thing.Dump();

And if you run it you'll see the resulting SQL statement.
Not bad.  If you paste in any real code you'll need to add using statements and replace _thingRepository.GetAll() with this.Things and you'll be translating LINQ to SQL in no time.


Pretty cool.  It certainly works for simple queries.
However, in my experience performance problems rarely crop up in the easy parts of the system.  Performance problems always seem to happen in places where multiple classes interact because there was simply too much logic for the author to have stuffed it all into one class and have been able to sleep at night.

Enter: LINQPad.ABP


To enable LINQPad to call directly into ABP code you'll need to set up dependency injection, define a module that's starts up your core module, specify a current user and tenant to impersonate, and somehow override the default unit of work to use LINQPad's context rather than ABP's.  That's a lot of work.
Fortunately, I just published LINQPad.ABP, an Open Source NuGet Package that does all this for you.  To enable it:
  1. In LINQPad add a reference to "LINQPad.ABP"
  2. Add a custom Module that's dependent on your project's specific EF Module
  3. Create and Cache a LINQPad ABP Context
  4. Start a UnitOfWork and specify the user and tenant you want to impersonate
The code will look like this:
// you may need to depend on additional modules here eg MyProjApplicationModule
[DependsOn(typeof(MyProjEntityFrameworkModule))]
// this is a lightweight custom module just for LINQPad
public class LinqPadModule : LinqPadModuleBase
{
    public LinqPadModule(MyProjEntityFrameworkModule abpProjectNameEntityFrameworkModule)
    {
        // tell your project's EF module to refrain from seeding the DB
        abpProjectNameEntityFrameworkModule.SkipDbSeed = true;
    }

    public override void InitializeServices(ServiceCollection services)
    {
        // add any custom dependency injection registrations here
        IdentityRegistrar.Register(services);
    }
}

async Task Main()
{
    // LINQPad.ABP caches (expensive) module creation in LINQPad's cache
    var abpCtx = Util.Cache(LinqPadAbp.InitModule(), "LinqPadAbp");

    // specify the tenant or user you want to impersonate here
    using (var uowManager = abpCtx.StartUow(this, tenantId: 5, userId: 8))
    {
        // retrieve what you need with IocManager in LINQPad.ABP's context
        var thingService = abpCtx.IocManager.Resolve();
        var entity = await thingService.GetEntityByIdAsync(1045);
        entity.Dump();
    }
}

That may look like a lot of code, but truest me, it's way simpler than it would otherwise be.  And now you can call into your code and watch every single query and how it gets translated to SQL.  

Summary


I hope this helps you track down a hard bug faster.  If maybe then please subscribe, like, comment, and/or let me know on twitter.

Tuesday, December 10, 2019

Multi-Tenancy is Hard: ASP.Net Boilerplate Makes it Easy

If you're liable to start a new web project that even might need multi-tenancy, you should probably use ASP.Net Boilerplate (ABP). As I've blogged about previously, ABP will save weeks of dev time on new websites, even without multi-tenancy. However, as soon as you bring on a second customer, I'd estimate you'll eliminate over a month of development time (extrapolating from my 2 ABP project data points, solid math).


But what even is multi-tenancy? What are typical solutions? And how does ABP save so much dev time? Fortunately, I just released a new episode of Code Hour to answer these questions:



If you don't have 35 minutes to invest right now (less at chipmunk speed, even less if you stop after ~6 minutes when I switch to live coding) then let me tl;dr (tl;dw? 😜):

Multi tenancy is a software architecture in which a single application is shared between multiple customers. Each customer only sees their own data and is completely unaware that there are other customers.

There are several ways to approach the problem, as described in ABP's Multi-Tenancy Documentation.

1. Multiple Deployment - Multiple Database


This is the less work up-front approach. There's no need for a framework, you just deploy your app multiple times, once per tenant. This offers the best performance (because tenants can be scaled independently) and best data isolation (e.g. database backups will never contain other customers data).

In exchange it requires the highest maintenance cost and most challenging deployments. The maintenance challenge is you'll need to pay for an app and database for each customer, and if you're passing those costs on, it could be detrimental to smaller customers. The deployment risk is you'll have to be extremely structured in deployments of app and database script to all environments and carefully consolidate error logs.

But there are three other common solutions, and in these scenarios ABP brings huge benefits to the table:

2. Single Deployment - Single Database


This is my favorite approach because it's simplest and least expensive to maintain. In this solution each database table contains a foreign key to a tenants table. All database queries must filter to retrieve items for the current user's tenant and insert records with the current users tenant. When doing it by hand it would be a pain to apply these filters to every single query. Enter ABP.

By inheriting from the IMustHaveTenant interface, ABP give all entities a foreign key to a Tenants table. Then, silently in the background, ABP figures out the tenant of the currently logged in user and for all queries only returns the records from that tenant. If a user creates an entity with IMustHaveTenant, then ABP additionally automatically sets the correct foreign key. No code is required and all database queries pick up this filter (just like the soft delete I described in Be a Hero On Day 1).

The downside to this approach is that one tenant with a lot of data could affect the performance of other tenants, and some users might worry about security since all data lives in the same database. So there's another approach ABP provides:

3. Single Deployment - Multiple Database


When a user from the host (host = a singleton tenant that can create other tenants) creates a tenant, they can specify a connection string specific to that tenant. ABP even offers a cool solution to data migrations that I explain in the video (at ~12:55). But the end result is much better data isolation, great performance, but still a potentially high price tag since you could be paying for one database per customer.


4. Single Deployment - Hybrid Databases


ABP offers the best of the last two solutions by allowing some tenants to live in shared database instances and others to live in their own databases. This offers data isolation and performance to tenants that need it (or will pay for it), and value for tenants that don't (or won't).

What's awesome about ABP is that it works identically from a code perspective for all of the above multi-tenancy approaches. The only difference is whether a tenant's connection string property is provided or not. The filtering, permissions, and migrations are otherwise all identical.

Summary


If you've interested in more details (such as how to disable tenant filtering) please check out the video (and like and subscribe and all that). Also, hit me up on twitter or in the comments if you have any questions, comments, or threats.

Thursday, September 12, 2019

Stored Procedures in ASP.Net Boilerplate

Using stored procedures in ASP.Net Boilerplate is a little trickier than you might imagine.  There's the problem of getting them into the database with EF Code-First migrations.  Then there's the issue of how to call them through the repository pattern while keeping things unit testable.

And did you know there are three different way to call stored procedures?  Which you use depends on on whether they return an existing entity, return nothing, or return something else entirely.  That last option is the trickiest.

Fortunately I just released Episode 23 of Code Hour, that lays it all out:



All the code is available in this tidy little sproc pull request.

See also: the official ASP.Net boilerplate documentation on stored procedures.

Monday, August 26, 2019

3 Ways To Refactor EF Linq Queries w/o Killing Perf

Extracting a method from an Entity Framework LINQ query can quietly kill performance.  Here are three easy solutions including: Expressions, Extension Methods, and LinqKit.

Embed from Getty Images

Enumeration<Problem>

Last week I was shocked to discover that refactoring Entity Framework LINQ queries for readability or reusability by extracting a method can quietly swap a query out of SQL and into in-memory processing and kill performance.

Here's a simplified version of my problem.

private async Task<List<User>> GetUsersMatching(IMainFilterDto filter, string prefix)
{
   var usersQuery = Users.Where(u =>
      (filter.StartDate == null || u.CreationTime > filter.StartDate) &&
      (filter.EndDate == null || u.CreationTime <= filter.EndDate) &&
      u.Name.StartsWith(prefix));
   return await usersQuery.ToListAsync();
}


I had a site-wide filtering object supplied by the front-end, but then I needed to do something else specific to the task at hand like the .StartsWith().

Then elsewhere I needed something very similar:

private async Task<List<User>> GetUsersWithoutRoles(IMainFilterDto filter)
{
       var usersQuery = Users.Include(i => i.Roles).Where(u =>
              (filter.StartDate == null || u.CreationTime > filter.StartDate) &&
              (filter.EndDate == null || u.CreationTime <= filter.EndDate) &&
              !u.Roles.Any()
              );

       return usersQuery.ToList();
}

Uch.  The common code between the two isn't DRY and feels awful.  If I ever needed to change it, perhaps by replacing > with >= I'd have to track down all the places with that code.  I was tempted to extract it like this:

private bool ApplyMainFilter(IMainFilterDto filter, User u)
{
       return (filter.StartDate == null || u.CreationTime > filter.StartDate) &&
              (filter.EndDate == null || u.CreationTime <= filter.EndDate);
}

And use it like this:

private async Task<List<User>> GetUsersWithoutRoles(IMainFilterDto filter)
{
    var usersQuery = Users.Where(u =>
        ApplyMainFilter(filter, u) &&
        u.Name.StartsWith(prefix));

That certainly reads better.  And when I tested it, it returns the exact same results.  Sadly, when I ran it through LINQPad the original query (where filter has a non-null start date but null end date) turns from this:

SELECT [stuff]
FROM
[Users] AS [u]
WHERE ([u].[CreationTime] > @__filter_StartDate_0) AND (([u].[Name] LIKE @__prefix_1 + N'%' AND (LEFT([u].[Name], LEN(@__prefix_1)) = @__prefix_1)) OR (@__prefix_1 = N''))

into:

SELECT [stuff]
FROM
 [Users] AS [u]WHERE ([u].[Name] LIKE @__prefix_1 + N'%' AND (LEFT([u].[Name], LEN(@__prefix_1)) = @__prefix_1)) OR (@__prefix_1 = N'')

It dropped out all the code in ApplyMainFilter()!  That may not look terrible in this simple example, but imagine more complex scenarios.  It could result in a lot more records returning from the database.  It could create a network bottleneck or put excess strain on the middleware.

Worst of all it could prevent the database from doing what it does best: use indexes to optimize query execution.  This could mean bypassing existing indexes, preventing query optimization with future indexes, or reducing the effectiveness of performance recommendations in e.g. the Azure SQL database by hiding the problem from the database entirely.

Incidentally, if you'd prefer to see a video of the problem and solutions, check out Episode 22 of Code Hour:



return solution[0]

The solution turned out to be fairly easy once I identified the problem.  Understanding how Entity Framework works internally helped.  It's all about expression trees, which I've written about before (ok, I wrote that 11 years ago, but the fundamentals it describes are still solid).

Anticipating all the possible ways someone might pass arbitrary C# language into a where clause and turning it all into SQL is a hard problem.  I needed to give Entity Framework a hand.  One way to do that is to return a fully parseable expression tree like Expression<Func<User, bool>> rather than just a bool or a Func<User, bool>.  It looked like this:

private Expression<Func<User, bool>> GetMainFilterQuery(IMainFilterDto filter)
{
    return u => (filter.StartDate == null || u.CreationTime > filter.StartDate) &&
        (filter.EndDate == null || u.CreationTime <= filter.EndDate);

}

Executed like this:

private async Task<List<User>> GetUsersMatching(IMainFilterDto filter, string prefix)
{
       var usersQuery = Users
              .Where(GetMainFilterQuery(filter))

              .Where(u => u.Name.StartsWith(prefix));

Isn't that an aesthetically pleasing solution?  It's reusable, reads well, and converts to SQL.

But Wait, There's More

But, if you're up for reading further I thought I'd present one more more interesting option.  If you're into flow style API's then an extension method approach may be perfect:

public static class QueryUtils
{
    public static IQueryable AppendMainFilterQuery(
        this IQueryable existingQuery, IMainFilterDto filter)
    {
        return existingQuery.Where(u => (
            filter.StartDate == null ||  u.CreationTime > filter.StartDate) &&
            (filter.EndDate == null || u.CreationTime <= filter.EndDate));
    }
}

Which is a little harder to read, but allows this:

private async Task<List<User>> GetUsersMatching(IMainFilterDto filter, string prefix)
{
    var usersQuery = Users
        .Where(u => u.Name.StartsWith(prefix))
        .AppendMainFilterQuery(filter);

That reads nicely, is reusable, and like the 1st solution keeps the SQL exactly how it was initially.

OR LinqKit?

I ran all this by a smart co-worker who recommended I check out LinqKit in case I ever needed to do anything more complicated.  Among other things LinqKit allows you to build up expressions across multiple methods.  For instance if I needed an OR clause instead of an AND clause it might look like this:

private ExpressionStarter<User> GetMainFilterPredicate(IMainFilterDto filter)
{
    var predicate = PredicateBuilder.New<User>().Start(u =>
        (filter.StartDate == null || u.CreationTime > filter.StartDate) &&
        (filter.EndDate == null || u.CreationTime <= filter.EndDate));
    return predicate;
}

private Task<List<User>> GetUsersMatching(IMainFilterDto filter, string prefix)
{
    var predicate = GetMainFilterPredicate(filter);
    predicate = predicate.Or(u => u.Name.StartsWith(prefix));
    return Users.Where(predicate).ToListAsync();
}

Pretty nifty.

Summary

I like the 1st approach if I don't need anything more complex, but regardless, identifying how not to refactor LINQ queries is the important part.  If you have any other creative solutions please share in the comments or hit me up on twitter.