Introduction to the cue tooling layer

Posted on 2021-10-25 by eon@patapon.info, Tags: #cue

So you have used cue to create your configuration, but how to you actually do something with it? Write a bash script to wrap calls to cue?

One special feature of cue compared to other configuration languages is be ability to execute things from your configuration. After all, cue means Configure, Unify, Execute and in this article we'll explore the execute part of cue.

We call this execution part of cue the tooling layer or scripting layer.

This tooling layer gives to the user an easy way to actually do something with the configuration directly in cue without relying on shell scripting. For example you might want to write your configuration in a file and run a cli tool with it or even apply kubernetes resource manifests you have defined to a cluster.

Instead of wrapping the configuration in code or scripts, we wrap the code that exploit the configuration inside the configuration itself (oh my that's meta!) and use cue to run the code.

While a normal cue configuration evaluation is hermetic, the tooling layer allows to interact with the outside world by allowing side effects such as writing files or making http calls. Theses kind of side effects can only be declared in <something>_tool.cue files and be run with the cue cmd subcommand. When using cue eval or cue export the tool files won't be evaluated at all.

Let's dive in by defining a very simple cue command.

Defining commands and tasks

cue will look for a command field at the root of _tool.cue files to determine the list of available commands.

Let's create our first command in a file that ends with _tool.cue:

package foo

import "tool/cli"

command: "hello-world": cli.Print & {
    text: "Hello!"
}

We have defined an hello-world command which uses the task cli.Print. This task simply output text on stdout.

Let's run it!

 cue cmd hello-world
Hello!
# The command is also exposed directly as a subcommand of `cue`
 cue hello-world
Hello!

A command can contain multiple tasks and you can organize them as you want. There is no particular structure to follow when defining tasks inside commands. For example, this is fine:

package foo

import "tool/cli"

command: "hello-world": {
    print: something: on: screen: cli.Print & {
        text: "Hello!"
    }
    another: {
        task: cli.Print & {
            text: "Woot!"
        }
    }
}

A task consists of a cue schema (cli.Print in this example) and some go code builtin in the cue cli that will be executed when you run the command.

The definition of a task is just some data like your configuration. It is not a function call or something like that. Because it is just data, you can use any cue construct to generate or define tasks.

Task types

There aren't many builtin tasks provided by cue but the ones that are available are pretty generic and allows you to derive specific tasks you might need in your project.

The different tasks you can use in the tooling layer can be found here.

Currently you can find tasks to manipulate files, execute commands, make HTTP calls.

Discovering commands

cue help cmd can list the different commands you have defined:

[...]
Available Commands:
  hello-world
  oops
[...]

It is possible to provide some more "special" fields or a comment on the command to improve the output:

command: {
    // Greeting command
    //
    // Greets you on the command line
    "hello-world": cli.Print & {
        text: "Hello!"
    }

    "hello-world-2": cli.Print & {
        $short: "Gretting command"
        $long:  "Greets you on the command line"
        text:   "Hello!"
    }
}

 cue help cmd
[...]
Available Commands:
  hello-world   Greeting command
  hello-world-2 Gretting command
[...]
 cue help cmd hello-world
Greets you on the command line

Usage:
  cue cmd hello-world [flags]
[...]

Command options

In some cases a command might need some dynamic input provided by the user. This could be an HTTP API endpoint URL, an optional flag to trigger some specific tasks etc...

The cue tooling layer doesn't provide any specific feature for this but we can use the generic injection mechanism of cue to handle this (cue help injection).

For example in our greeting command let's add field that can customize the greeting output:

package foo

import "tool/cli"

who: *"world" | string @tag(who)

command: "hello-world": cli.Print & {
    text: "Hello \(who)!"
}

The field who has a default value of world but it can be changed using the tag who on the cli:

 cue cmd hello-world
Hello world!
 cue cmd -t who=cue hello-world
Hello cue!

With injection all cue unification constraint rules still applies.

Note: because of some parsing limitation the -t k=v option must be written before the name of the command.

Handling dependencies between tasks

It is common to have multiple tasks that must be run in a specific order.

To handle this situation cue resolve dependencies between tasks much like terraform does it between resources.

If a task references some field from another task, cue will automatically treat this as a dependency:

package foo

import (
    "tool/file"
    "tool/cli"
)

command: foo: {
    cmd1: file.Read & {
        filename: "file.txt"
        contents: string
    }
    cmd2: cli.Print & {
        text: cmd1.contents
    }
}

In this example we can see that cmd2.text field has a reference to cmd1.contents field. Because of this cmd2 will be run after cmd1 has been run successfully.

We can also see here a very powerful feature of cue tasks. The cmd1.contents isn't concrete and will be resolved at runtime when the file is actually read. The value will be filled by cue in the document and must respect unification constraint rules. In this case the value must be a valid string. Once cmd1.contents is filled with a concrete value the cmd2 can be run.

Much like terraform depends_on, if a task needs to be run after some other but doesn't reference any field of the other task you can simply add a field that references the other task:

package foo

import (
    "tool/file"
    "tool/exec"
)

command: foo: {
    cmd1: exec.Run & {
        cmd: "mkdir -p generated"
    }
    cmd2: file.Create & {
        $after: cmd1
        filename: "./generated/file.txt"
        contents: "foo"
    }
}

In this example we introduce an $after field in cmd2 that references cmd1 to materialize the dependency between the two tasks.

Since file.Create is not a definition you can define any field name you'd like, it just needs to not clash with file.Create fields in this example. $after has no particular meaning for the cue tooling layer.

Dynamic tasks

Until now we defined tasks that didn't use any real configuration. Let's see how tasks can refer to and use some configuration.

Imagine we manage a list of users in a cue configuration and we want to provision them in some API.

First, we want first a clear schema of what a user is and define some:

package users

// This is what a user look like
#User: {
	username:   string
	first_name: string
	last_name:  string
	email:      =~"@example.com$"
	role:       *"developer" | "admin"
}

users: [Username=string]: #User & {username: Username}

// Our list of users
users: {
	jdoe: {
		first_name: "John"
		last_name:  "Doe"
		email:      "jdoe@example.com"
		role:       "admin"
	}
	fday: {
		first_name: "Francis"
		last_name:  "Day"
		email:      "francis@example.com"
	}
}

This is a pretty simple cue configuration. We have a #User definition which constraints the users struct values. Two users are defined.

Next we want to create these users in some API. We can create a cue command for this by using the tool/http package. I'm using https://requestbin.com to post the data.

package users

import (
  "tool/http"
  "encoding/json"
)

command: create: {
    for u in users {
        // Generate a task for every user
        "\(u.username)": http.Post & {

            // Our dummy API
            url: "https://enelnux7735ki.x.pipedream.net/users"

            request: {
                header: "Content-Type": "application/json"
                // Marshal the user info to JSON
                body: json.Marshal(u)
            }

            // We expect a 200 HTTP code from the API
            // Other codes will make the task fail.
            response: statusCode: 200
        }
    }
}

What's interesting here is that we use normal cue constructs to generate the tasks (one per user). With a simple comprehension we can generate a task for every user we need to provision.

We can also transform the data that is sent to the API. Here we marshal it to JSON. But we could imagine also filtering out some fields or adding other that would be required by the API.

And finally the success of the tasks is determined by unification. I've defined response.statusCode to 200 so that if the API respond with some other code the unification will fail because cue will try to unify 200 with some other value and, thus, the task will fail.

Going further

With a classic REST API this probably won't work as once a user is created in the system a POST request on an existing user will most likely trigger an HTTP 400 response.

Can cue handle this ? Well yes since cue is injecting HTTP calls response data and status in the document it should be possible to specialize tasks based on some field result, or even inject new tasks.

In our imaginary API, to determine if we need to create or update a user we first need to get the user and then based on the status code proceed with the appropriate HTTP call (POST or PUT).

package users

import (
	"tool/http"
	"encoding/json"
)

users_api_base_url: "https://my-user-api.example.com/users"

command: create: {
    for u in users {

        // Get user from the API
        "\(u.username)": http.Get & {
            url: "\(users_api_base_url)/\(u.username)"
            // We handle only these status codes
            response: statusCode: 200 | 404
        }

        // Common data for creating / updating a user
        "create_or_update_\(u.username)": http.Do & {
            request: {
                header: "Content-Type": "application/json"
                body: json.Marshal(u)
            }
        }

        // User doesn't exists, do a POST on the users/ url
        if create[u.username].response.statusCode == 404 {
            "create_or_update_\(u.username)": http.Post & {
                url: users_api_base_url
            }
        }

        // User exists, do a PUT on users/username url
        if create[u.username].response.statusCode == 200 {
            "create_or_update_\(u.username)": http.Put & {
                url: "\(users_api_base_url)/\(u.username)"
            }
        }

    }
}

It looks like we are doing imperative code but it's not. We're still defining data and specializing the create_or_update task based on the result of the get task.

As you can see, a lot can be done using the cue tooling layer, but don't get too crazy!

Under the hood

So we've seen that tasks are just data like any cue configuration but the difference is that cue will run some code associated with each particular task. So how does it know which code has to be run ?

If we look at cli.Print documentation we see:

// Print sends text to the stdout of the current process.
Print: {
	$id: *"tool/cli.Print" | "print" // for backwards compatibility

	// text is the text to be printed.
	text: string
}

Every task as an $id field which is unique between tasks. When cue evaluates the command it will walk all values and find all tasks denoted by this $id field. This is an implementation detail that may change in the future.

The value of the $id field is used to know which go code has to be run for a particular task.

Note also that tasks are not proper cue definitions as they should be. This is for historic reasons because they were introduced in cue before definitions.

Let's give cue an unknown task to run and see what happens:

package foo

command: oops: {
    $id: "tool/cli.Oops"
    text: "Hello!"
}

 cue cmd oops
runner of kind "tool/cli.Oops" not found:
    ./oops_tool.cue:3:10

Right, cue has found that this is a potential task but it has no go code registered to run it so it exits with an error.

Currently there is no way to provide additional tasks to the cue cli but that may be possible in the future.

Tooling layer caveats

I noticed that if something is wrong in a _tool.cue file the error reporting is not very good when using cue cmd my-command. In such cases I try to put a maximum of code outside of the _tool.cue files (but no tasks obviously), and debug the issue with cue eval.

In some cases the tasks dependencies are not properly discovered when using comprehensions or guards, related issue: https://github.com/cue-lang/cue/issues/1088

You can't control error handling. If some task fails you cannot control what needs to happen next, cue will stop the command right there and exit.

You can't run commands that are declared in imported modules/packages. This would be a neat feature to allow package authors to distribute associated commands easily.

Conclusion

In conclusion the cue tooling layer is just a way to describe side effects as data to exploit your configuration.

Because tasks are just structured data you benefit of all cue constructs and unification guarantees.

No need to export the data and importing it back in some script or other tool to run actions, everything is contained and driven by the configuration itself!

In a next article we will explore the tool/flow API from the cue go lib that is used by cue cmd.