Handle errors and debug

Errors come in two categories: expected and unexpected. The Rhize BPMN engine has ways to handle both.

A robust workflow should have built-in logic to anticipate errors. For unexpected issues, Rhize also creates a trace for each workflow, which you can use to observe the behavior and performance of the workflow at each element as it executes sequentially. You can also use debug flags and variables to trace variable context as it transforms across the workflow.

Strategies to handle errors

All error handling likely uses some conditional logic. The workflow author anticipates the error and then writes some logic to conditionally handle it. However, you have many ways to handle conditions. When deciding how to direct flows, consider both the context of the error and overall readability of your diagram. This section describes some key strategies.

Gateways

Use exclusive gateways for any type of error handling. For example, you might define a normal range for a value, then send alerts for when the value falls outside of this range. If it makes sense, these error branches also might flow into an early end event.

A BPMN workflow with customResponse in the output of the end node — Download this workflow from BPMN templates

JSON schema validation

Validation can greatly limit the scope of possible errors. To validate your JSON payloads, use the JSON schema task.

The JSON schema task outputs a boolean value that indicates whether the input conforms to the schema that you set. You can then set a condition based on whether this valid variable is true, and create logic to handle errors accordingly. For example, this schema requires that the input variables include a property arr whose value is an array of numbers.

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "title": "Generated schema for Root",
  "type": "object",
  "properties": { "arr": { "type": "array", "items": { "type": "number" } } },
  "required": ["arr"]
}

In a production workflow, you might use this exact schema to validate the input for a function that calculates statistics (perhaps choosing a different variable name).

Screenshot of a conditional that branches when the JSON schema task receives invalid input. — A conditional that branches when the JSON schema task receives invalid input. Download the template

JSONata conditions

Besides the logical gateway, it may make sense to use JSONata ternary expressions in one of the many parameters that accepts JSONata expressions. For example, this expression creates one message body if valid is true and another if not:

=
{
  "message": $.valid ? "payload is valid" : "Invalid payload" 
}

Check JSONata output

If a field has no value, JSONata outputs nothing. For example, the following expression outputs only {"name": "Rhize"}, because no $err field exists.

=( 
  $name := "Rhize";


  { 
  "name": $name,
  "error": $err
  }
)

{
  "name": "Rhize"
}

You can use this behavior to direct flows. For example, an exclusive gateway may have a condition such as $exists(err) that flows into an error-handling condition.

Create event logging

To persist error handling, you can set gateways that flow to mutation tasks that use the addEvent operation. The added event may be a successful operation, an error, or both, creating a record of events emitted in the workflow that are stored in your manufacturing knowledge graph. This strategy increases the observability of errors and facilitates future analysis. It may also be useful when combined with the debugging strategies described in the next section.

Strategies to debug

For detailed debugging, you can use an embedded instance of Grafana Tempo to inspect each step of the workflow, node by node. To debug on the fly, you may also find it useful to use customResponse and intermediate message throws to print variables and output at different checkpoints.

Debug from the API calls

When you first test or run a workflow, consider starting the testing and debugging process from an API trigger. All API triggers return information about the workflow state (for example COMPLETED or ABORTED). With the createAndRunBpmnSync operation, you can also use the customResponse to provide information from the workflow’s variable context. For details of how this works, read the guide to triggering workflows.

For example, consider a workflow that has two nodes, a Message throw event and a REST task.

When the message completes, the user writes Message sent into customResponse as an output variable.
When the REST task completes, the response is saved into customResponse.

So the jobState property reports on the overall workflow status, and customResponse serves as a checkpoint to report the state of each node execution. You can also request the dataJSON field, which reports the entire variable context at the last node. Now imagine that the user has started the workflow from the API and receives this response:

{
  "data": {
    "createAndRunBpmnSync": {
      "jobState": "ABORTED",
      "customResponse": "message sent",
      "traceID": "993ee32af9522f5b35b4ec80f4ff58a8"
    }
  }
}

Note how ABORTED indicates the workflow failed somewhere. Yet, the value of customResponse must have been set after the message event executed. So the problem is likely with the REST node.

You could also use a similar strategy with intermediate message events. However, while customResponse and messages are undoubtedly useful debugging methods, they are also limited— the BPMN equivalents of printf() debugging. For full-featured debugging, use the traceID to explore the workflow through Tempo.

Debug in Tempo

The instructions here provide the minimum about using Tempo as a tool. To discover the many ways you can filter your BPMN traces for debugging and analysis, refer to the official documentation.

Rhize creates a unique ID and trace for each workflow that runs. This ID is reported as the traceID in the createAndRunBPMN mutation operation. Within this trace, each node is instrumented, with spans emitted at every input and output along each node of execution. With the trace ID, you can find the workflow run in Tempo and follow the behavior.

To inspect a workflow in Tempo:

Go to your Grafana instance.
Select Explore and then Tempo.
From the TraceQL tab, enter the traceID and query. Alternatively, use the Search tab with the bpmn-engine to find traces for all workflows.

Screenshot of a compact view of spans for a BPMN process in Tempo

Each workflow instance displays spans that trace the state of each node at its start, execution, and end states. When debugging, you are likely interested in the spans that result in ABORTED. To inspect the errors:

Select the nodes with errors.
Use the events property to inspect for exceptions.

For example, this REST task failed because the URL was invalid.

Screenshot of a detailed view of an error for a BPMN process in Tempo — A detailed view of an error for a BPMN process in Tempo

Also note the names of the spans in the previous two screenshots. Names that convey semantic information it easier to find specific nodes and easier to understand and follow the overall workflow. Well-named nodes make debugging easier. This is one of the reasons we recommend always following a set of naming conventions when you author BPMN workflows.

Adding the debug flag

For granular debugging, it also helps to trace the variable context as it passes from node to node. To facilitate this, Rhize provides a debugging option that you can pass in multiple ways:

From an API call with the debug:true argument.
In the process variable context, by setting __traceDebug: true
In the BPMN service configuration by setting OpenTelemetry.defaultDebug to true

When the debugging variable is set, Tempo reports the entire variable context in the Span Attributes at the end of each node.

Screenshot showing the process variable context at the end of a node in a BPMN workflow. — The process variable context at the end of a node in a BPMN workflow.

Use JSONata Tune BPMN performance