Step Functions¶
Overview¶
- Model workflows as state machines.
- Serverless workflows to orchestrate lamba functions.
- Support sequence, parallel, conditions, timeouts, error handling.
- Integrates with EC2, EC2, On-Prem Servers, API gateway etc.
- Maximum execution time is 1 year.
- Supports human/manual approval feature.
- Useful for order fulfillment, data processing (ETL), web applications (payment processing).
States¶
State | Description |
---|---|
Task | Do some work. Invoke an AWS service, run an activity. Activity will poll the step function for work, and return the results. |
Choice | Test for a condition to send to a branch. |
Fail/Succeed | Stop execution with fail/success. |
Pass | Pass input to an output, or inject some fixed data without doing any work. |
Wait | Wait until a specified date/time, or delay for an amount of time. |
Map | Dynamically iterate over steps. |
Parallel | Being parallel branches of execution. |
Error Handling¶
The default failure behaviour is to fail the entire execution if a state reports an error.
An error can occur due to:
- Task failures.
- Machine definition issues (no matching rule etc).
- Transient issues (network partition event).
Try to include data in the error messages to help with troubleshooting.
Pre-defined Error Codes¶
There's four pre-defined error codes that can be returned -
Error Code | Description |
---|---|
States.ALL | Catch all errors that occur inside the lambda function. |
States.Timeout | Task ran longer than TimeoutSeconds, or no heartbeat received. |
States.TaskFailed | Execution failure. |
States.Permissions | Insufficient privileges to execute the code. |
Retrying¶
Applies to tasks, or parallel state. A task can use multiple retry methods that are evaluated in order.
There's several parameters that can be used to modify how often a retry is attempted, how many times to retry before giving up, and a back off rate to increase the interval between each subsequent retry.
Method | Description |
---|---|
IntervalSecond | Wait a set number of seconds and try again. |
MaxAttempts | Retry immediately and give up after X attempts (default is 3). |
BackoffRate | Wait an exponentially increasing amount of time before trying again. |
Example
``` json
"HelloWorld": {
"Type": "Task",
"Resource": "arn:aws:lambda:REGION:ACCOUNT_ID:function:FUNCTION_NAME",
"Retry": [
{
"ErrorEquals": ["CustomError"],
"IntervalSeconds": 1,
"MaxAttempts": 2,
"BackoffRate": 2.0
},
{
"ErrorEquals": ["States.TaskFailed"],
"IntervalSeconds": 30,
"MaxAttempts": 2,
"BackoffRate": 2.0
},
{
"ErrorEquals": ["States.ALL"],
"IntervalSeconds": 5,
"MaxAttempts": 5,
"BackoffRate": 2.0
}
],
"End": true
}
```
Catching Errors¶
Applies to Tasks or parallel steps, and has two attributes to control what errors to catch, what the next step should be, and what to include in the result
Attribute | Description |
---|---|
ErrorEquals | Match a specific type of error. |
Next | Proceed to next step. |
ResultPath | Way to include the input, into the output of the next task. |
Example
``` json
"HelloWorld": {
"Type": "Task",
"Resource": "arn:aws:lambda:REGION:ACCOUNT_ID:function:FUNCTION_NAME",
"Catch": [
{
"ErrorEquals": ["CustomError"],
"Next": "CustomErrorCallback"
},
{
"ErrorEquals": ["States.TaskFailed"],
"Next": "ReservedTypeCallback"
},
{
"ErrorEquals": ["States.ALL"],
"Next": "NextTask",
"ResultPath": "$.error"
}
],
"End": true
},
"NextTask": {
"Type": "Pass",
"Result": "This is a fallback from a reserved error code.",
"End": true
},
"CustomErrorFallback": {
"Type": "Pass",
"Result": "This is a fallback from a custom lambda function exception.",
"End": true
}
```
Standard Step Functions¶
- Max duration of 1year.
- Execution start rate up to 2,000/sec.
- State transition rate over 4,000/sec per account.
- Priced per state transition.
- Can be listed and described with step function APIs, and visually debugged through the console.
- Can be inspected in CloudWatch Logs by enabling logging on the state machine.
- Exactly-once workflow execution.
Express Step Functions¶
- Max duration of 5mins.
- Execution start rate up to 100,000/sec.
- State transition rate is unlimited.
- Prices by number of executions run, their duration, and memory consumption (cheaper).
- Can be inspected in CloudWatch Logs by enabling logging on the state machine.
- At-least-once workflow execution.
Last update: June 30, 2021