Handle errors and debug
Errors come in two categories: expected and unexpected. The Rhize BPMN engine has ways to handle both.
A robust workflow should have built-in logic to anticipate errors. For unexpected issues, Rhize also creates a trace for each workflow, which you can use to observe the behavior and performance of the workflow at each element as it executes sequentially. You can also use debug flags and variables to trace variable context as it transforms across the workflow.
Strategies to handle errors
All error handling likely uses some conditional logic. The workflow author anticipates the error and then writes some logic to conditionally handle it. However, you have many ways to handle conditions. When deciding how to direct flows, consider both the context of the error and overall readability of your diagram. This section describes some key strategies.
Gateways
Use exclusive gateways for any type of error handling. For example, you might define a normal range for a value, then send alerts for when the value falls outside of this range. If it makes sense, these error branches also might flow into an early end event.

JSON schema validation
Validation can greatly limit the scope of possible errors. To validate your JSON payloads, use the JSON schema task.
The JSON schema task outputs a boolean value that indicates whether the input conforms to the schema that you set.
You can then set a condition based on whether this valid
variable is true, and create logic to handle errors accordingly.
For example, this schema requires that the input variables include a property arr
whose
value is an array of numbers.
{
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "Generated schema for Root",
"type": "object",
"properties": { "arr": { "type": "array", "items": { "type": "number" } } },
"required": ["arr"]
}
In a production workflow, you might use this exact schema to validate the input for a function that calculates statistics (perhaps choosing a different variable name).

JSONata conditions
Besides the logical gateway, it may make sense to use JSONata ternary expressions in one of the many parameters that accepts JSONata expressions.
For example, this expression creates one message body if valid
is true
and another if not:
=
{
"message": $.valid ? "payload is valid" : "Invalid payload"
}
Check JSONata output
If a field has no value, JSONata outputs nothing.
For example, the following expression outputs only {"name": "Rhize"}
,
because no $err
field exists.
=(
$name := "Rhize";
{
"name": $name,
"error": $err
}
)
{
"name": "Rhize"
}
<button class=“hextra-code-copy-btn hx-group/copybtn hx-transition-all active:hx-opacity-50 hx-bg-primary-700/5 hx-border hx-border-black/5 hx-text-gray-600 hover:hx-text-gray-900 hx-rounded-md hx-p-1.5 dark:hx-bg-primary-300/10 dark:hx-border-white/10 dark:hx-text-gray-400 dark:hover:hx-text-gray-50” title=“Copy code”
<div class="copy-icon group-[.copied]/copybtn:hx-hidden hx-pointer-events-none hx-h-4 hx-w-4"></div>
<div class="success-icon hx-hidden group-[.copied]/copybtn:hx-block hx-pointer-events-none hx-h-4 hx-w-4"></div>
You can use this behavior to direct flows.
For example, an exclusive gateway may have a condition such as $exists(err)
that flows into an error-handling condition.
Create event logging
To persist error handling, you can set gateways that flow to mutation tasks that use the addEvent
operation.
The added event may be a successful operation, an error, or both,
creating a record of events emitted in the workflow that are stored in your manufacturing knowledge graph.
This strategy increases the observability of errors and facilitates future analysis.
It may also be useful when combined with the debugging strategies described in the next section.
Strategies to debug
For detailed debugging,
you can use an embedded instance of Grafana Tempo to inspect each step of the workflow, node by node.
To debug on the fly, you may also find it useful to use customResponse
and intermediate message throws to print variables and output at different checkpoints.
Debug from the API calls
When you first test or run a workflow, consider starting the testing and debugging process from an API trigger.
All API triggers return information about the workflow state (for example COMPLETED
or ABORTED
).
With the createAndRunBpmnSync
operation, you can also use the customResponse
to provide information from the workflow’s variable context.
For details of how this works, read the guide to triggering workflows.
For example, consider a workflow that has two nodes, a Message throw event and a REST task.
- When the message completes, the user writes
Message sent
intocustomResponse
as an output variable. - When the REST task completes, the response is saved into
customResponse
.
So the jobState
property reports on the overall workflow status, and customResponse
serves as a checkpoint to report the state of each node execution.
You can also request the dataJSON
field, which reports the entire variable context at the last node.
Now imagine that the user has started the workflow from the API and receives this response:
{
"data": {
"createAndRunBpmnSync": {
"jobState": "ABORTED",
"customResponse": "message sent",
"traceID": "993ee32af9522f5b35b4ec80f4ff58a8"
}
}
}
Note how ABORTED
indicates the workflow failed somewhere.
Yet, the value of customResponse
must have been set after the message event executed.
So the problem is likely with the REST node.
You could also use a similar strategy with intermediate message events.
However, while customResponse
and messages are undoubtedly useful debugging methods, they are also limited— the BPMN equivalents of printf()
debugging.
For full-featured debugging, use the traceID
to explore the workflow through Tempo.
Debug in Tempo
Rhize creates a unique ID and trace for each workflow that runs.
This ID is reported as the traceID
in the createAndRunBPMN
mutation operation.
Within this trace, each node is instrumented, with spans emitted at every input and output along each node of execution.
With the trace ID, you can find the workflow run in Tempo and follow the behavior.
To inspect a workflow in Tempo:
- Go to your Grafana instance.
- Select Explore and then Tempo.
- From the TraceQL tab, enter the
traceID
and query. Alternatively, use the Search tab with thebpmn-engine
to find traces for all workflows.

Each workflow instance displays spans that trace the state of each node at its start, execution, and end states.
When debugging, you are likely interested in the spans that result in ABORTED
.
To inspect the errors:
- Select the nodes with errors.
- Use the
events
property to inspect for exceptions.
For example, this REST task failed because the URL was invalid.

Also note the names of the spans in the previous two screenshots. Names that convey semantic information it easier to find specific nodes and easier to understand and follow the overall workflow. Well-named nodes make debugging easier. This is one of the reasons we recommend always following a set of naming conventions when you author BPMN workflows.
Adding the debug flag
For granular debugging, it also helps to trace the variable context as it passes from node to node. To facilitate this, Rhize provides a debugging option that you can pass in multiple ways:
- From an API call with the
debug:true
argument. - In the process variable context, by setting
__traceDebug: true
- In the BPMN service configuration by setting
OpenTelemetry.defaultDebug
totrue
When the debugging variable is set, Tempo reports the entire variable context in the Span Attributes at the end of each node.
