Running Duplicate Batch Jobs in HashiCorp Nomad

By default, HashiCorp Nomad prevents duplicate batch jobs from executing. This is by design because duplicate job submissions could result in unnecessary work. From Nomad’s perspective, an unchanged job qualifies as a duplicate batch job.

However, there are times when a duplicate batch job or an unchanged job may be the correct approach. One example is a batch job that executes a calculation and outputs the results. In this scenario, it is likely that there is no need to change the Nomad job specification definition. Running the command nomad run for the specific job would be the desired behavior. But, due to Nomad’s default behavior, this would result in the batch job placement failing.

To get around this default behavior, you can use a couple of techniques to inject variation in ways that don't require you to alter the job’s content. This blog presents two approaches to injecting variability into your Nomad batch job template without having to modify the template in the future.

»Use a UUID as an Ever-Changing Value

The meta block of a Nomad job specification allows for user-defined arbitrary key-value pairs. By using HCL2 functions and the meta block, you can inject variation into a batch job without having to alter the job specification template. You can use the UUID function to inject variation and thus ensure the job is unique every time you run the command nomad run.

To see how it works, create a file called uuid.nomad and copy the content below into it. This batch job runs the Hello World Docker example. Note how the meta block is setting a key-value pair and using the uuidv4() function:

job "uuid.nomad" {
 datacenters = ["dc1"]
 type = "batch"

 meta {
  run_uuid = "${uuidv4()}"
 }

 group "uuid" {
  task "hello-world" {
   driver = "docker"

   config {
    image = "hello-world:latest"
   }
  }
 }
}

Start a local Nomad server by issuing the command nomad agent -dev:

nomad agent -dev
==> No configuration files loaded
==> Starting Nomad agent...
==> Nomad agent configuration:
...
client: node registration complete

Ensure the Nomad server is up and running. Then navigate to the directory where you created the file uuid.nomad and issue the command nomad run uuid.nomad. This will submit the batch job to Nomad:

$ nomad run uuid.nomad
==> Monitoring evaluation "44c8a150"
  Evaluation triggered by job "uuid.nomad"
==> Monitoring evaluation "44c8a150"
  Allocation "4fb444d4" created: node "fd14a894", group "uuid"
  Evaluation status changed: "pending" -> "complete"
==> Evaluation "44c8a150" finished with status "complete"

Check the status of the job allocation by using the nomad alloc status command:

$ nomad alloc status 4fb444d4
ID         = 4fb444d4-3c5c-51d7-3820-c3752796aad7
Eval ID       = 44c8a150
Name        = uuid.nomad.uuid[0]
Node ID       = fd14a894
Node Name      = myDeskTop
Job ID       = uuid.nomad
Job Version     = 0
Client Status    = complete
Client Description = All tasks have completed
Desired Status   = run
Desired Description = 
Created       = 2m30s ago
Modified      = 2m27s ago

Task "hello-world" is "dead"
Task Resources
CPU    Memory    Disk   Addresses
0/100 MHz 0 B/300 MiB 300 MiB

Task Events:
Started At   = 2021-05-28T19:47:41Z
Finished At  = 2021-05-28T19:47:41Z
Total Restarts = 0
Last Restart  = N/A

Recent Events:
Time            Type    Description
2021-05-28T12:47:41-07:00 Terminated Exit Code: 0
2021-05-28T12:47:41-07:00 Started   Task started by client
2021-05-28T12:47:38-07:00 Driver   Downloading image
2021-05-28T12:47:38-07:00 Task Setup Building Task Directory
2021-05-28T12:47:38-07:00 Received  Task received by client

The output indicates a successful job with an exit code 0. Submit the job again though the command nomad run uuid.nomad:

nomad run uuid.nomad
==> Monitoring evaluation "fd2e5e6d"
  Evaluation triggered by job "uuid.nomad"
  Allocation "9528b83d" created: node "fd14a894", group "uuid"
  Evaluation status changed: "pending" -> "complete"
==> Evaluation "fd2e5e6d" finished with status "complete"

The job ran again and bypassed the default behavior due to having a different uuid value. You can verify that the job ran twice through the Nomad UI by looking at the jobs overview, as shown here:

Nomad

You can see in the Recent Allocations view that the two jobs ran successfully.

»Use an HCL2 Variable

You can achieve the same behavior of injecting variability by utilizing the meta block in a job specification and a variable.

Start by creating a file named variable.nomad and copy the content below into the file. This batch does the exact same thing as the uuid.nomad file, except this code snippet is using variables:

job "variable.nomad" {
 datacenters = ["dc1"]
 type = "batch"

 meta {
  run_index = "${floor(var.run_index)}"
 }

 group "variable" {
  task "hello-world" {
   driver = "docker"

   config {
    image = "hello-world:latest"
   }
  }
 }
}

variable "run_index" {
 type = number
 description = "An integer that, when changed from the current value, causes the job to restart."
 validation {
  condition = var.run_index == floor(var.run_index)
  error_message = "The run_index must be an integer."
 }
}

Go ahead and submit the batch job by running the command nomad run -var run_index=1 variable.nomad:

$ nomad run -var run_index=1 variable.nomad
==> Monitoring evaluation "387bfe35"
  Evaluation triggered by job "variable.nomad"
  Allocation "de54c080" created: node "185068cf", group "variable"
==> Monitoring evaluation "387bfe35"
  Evaluation status changed: "pending" -> "complete"
==> Evaluation "387bfe35" finished with status "complete"

Check the status of the job with the nomad alloc status command:

$ nomad alloc status de54
ID         = de54c080-e3f3-cef3-1d9e-b1a4d956106c
Eval ID       = 387bfe35
Name        = variable.nomad.variable[0]
Node ID       = 185068cf
Node Name      = myDeskTop
Job ID       = variable.nomad
Job Version     = 0
Client Status    = complete
Client Description = All tasks have completed
Desired Status   = run
Desired Description = 
Created       = 26s ago
Modified      = 24s ago

Task "hello-world" is "dead"
Task Resources
CPU    Memory    Disk   Addresses
0/100 MHz 0 B/300 MiB 300 MiB

Task Events:
Started At   = 2021-05-28T20:50:48Z
Finished At  = 2021-05-28T20:50:48Z
Total Restarts = 0
Last Restart  = N/A

Recent Events:
Time            Type    Description
2021-05-28T13:50:48-07:00 Terminated Exit Code: 0
2021-05-28T13:50:48-07:00 Started   Task started by client
2021-05-28T13:50:46-07:00 Driver   Downloading image
2021-05-28T13:50:46-07:00 Task Setup Building Task Directory
2021-05-28T13:50:46-07:00 Received  Task received by client

The output reveals that the job completed successfully. If you were to submit the job again through the command nomad run -var run_index=1 variable.nomad, the job allocation would have failed as the index value provided is the same as the previously submitted batch job. The screenshot below was taken after three submissions of the same batch job were submitted:

Three

Three evaluations were conducted but only one batch job was allocated, the first one:

One

In order for Nomad to accept the job, you need to provide a unique value. Go ahead and change the index to a value of 2 and issue the command nomad run -var run_index=2 variable.nomad:

$ nomad run -var run_index=2 variable.nomad
==> Monitoring evaluation "522bce96"
  Evaluation triggered by job "variable.nomad"
  Allocation "298d7cf7" created: node "185068cf", group "variable"
  Evaluation status changed: "pending" -> "complete"
==> Evaluation "522bce96" finished with status "complete"

This submission is accepted because it contains a unique value, an index value of 2. You can confirm the allocation was successful by visiting the Nomad UI or by running the command nomad alloc status:

Two

»Next Steps

This post shared two approaches to injecting variability into your Nomad batch job template without having to modify the template in the future. There are many more Nomad tutorials available on the HashiCorp Learn Platform, where you can expand your Nomad knowledge and skills. Here are a few tutorials worth checking out this summer that will help you power up your Nomad skills.



Read more here: https://www.hashicorp.com/blog/running-duplicate-batch-jobs-in-hashicorp-nomad

Content Attribution

This content was originally published by Kerim Satirli at HashiCorp Blog, and is syndicated here via their RSS feed. You can read the original post over there.

%d bloggers like this: