Construct multi-step purposes and AI workflows with AWS Lambda sturdy capabilities

Trendy purposes more and more require complicated and long-running coordination between companies, corresponding to multi-step fee processing, AI agent orchestration, or approval processes awaiting human choices. Constructing these historically required vital effort to implement state administration, deal with failures, and combine a number of infrastructure companies.

Beginning as we speak, you should utilize AWS Lambda sturdy capabilities to construct dependable multi-step purposes immediately inside the acquainted AWS Lambda expertise. Sturdy capabilities are common Lambda capabilities with the identical occasion handler and integrations you already know. You write sequential code in your most popular programming language, and sturdy capabilities monitor progress, mechanically retry on failures, and droop execution for as much as one 12 months at outlined factors, with out paying for idle compute throughout waits.

AWS Lambda sturdy capabilities use a checkpoint and replay mechanism, often known as sturdy execution, to ship these capabilities. After enabling a operate for sturdy execution, you add the brand new open supply sturdy execution SDK to your operate code. You then use SDK primitives like “steps” so as to add automated checkpointing and retries to your enterprise logic and “waits” to effectively droop execution with out compute prices. When execution terminates unexpectedly, Lambda resumes from the final checkpoint, replaying your occasion handler from the start whereas skipping accomplished operations.

Getting began with AWS Lambda sturdy capabilities

Let me stroll you thru learn how to use sturdy capabilities.

First, I create a brand new Lambda operate within the console and choose Creator from scratch. Within the Sturdy execution part, I choose Allow. Word that, sturdy operate setting can solely be set throughout operate creation and at present can’t be modified for present Lambda capabilities.

After I create my Lambda sturdy operate, I can get began with the offered code.

Lambda sturdy capabilities introduces two core primitives that deal with state administration and restoration:

Steps—The context.step() technique provides automated retries and checkpointing to your enterprise logic. After a step is accomplished, it will likely be skipped throughout replay.
Wait—The context.wait() technique pauses execution for a specified length, terminating the operate, suspending and resuming execution with out compute prices.

Moreover, Lambda sturdy capabilities offers different operations for extra complicated patterns: create_callback() creates a callback that you should utilize to await outcomes for exterior occasions like API responses or human approvals, wait_for_condition() pauses till a selected situation is met like polling a REST API for course of completion, and parallel() or map() operations for superior concurrency use instances.

Constructing a production-ready order processing workflow

Now let’s develop the default instance to construct a production-ready order processing workflow. This demonstrates learn how to use callbacks for exterior approvals, deal with errors correctly, and configure retry methods. I preserve the code deliberately concise to give attention to these core ideas. In a full implementation, you would improve the validation step with Amazon Bedrock so as to add AI-powered order evaluation.

Right here’s how the order processing workflow works:

First, validate_order() checks order knowledge to make sure all required fields are current.
Subsequent, send_for_approval() sends the order for exterior human approval and waits for a callback response, suspending execution with out compute prices.
Then, process_order() completes order processing.
All through the workflow, try-catch error dealing with distinguishes between terminal errors that cease execution instantly and recoverable errors inside steps that set off automated retries.

Right here’s the entire order processing workflow with step definitions and the principle handler:

import random
from aws_durable_execution_sdk_python import (
    DurableContext,
    StepContext,
    durable_execution,
    durable_step,
)
from aws_durable_execution_sdk_python.config import (
    Period,
    StepConfig,
    CallbackConfig,
)
from aws_durable_execution_sdk_python.retries import (
    RetryStrategyConfig,
    create_retry_strategy,
)


@durable_step
def validate_order(step_context: StepContext, order_id: str) -> dict:
    """Validates order knowledge utilizing AI."""
    step_context.logger.data(f"Validating order: {order_id}")
    # In manufacturing: calls Amazon Bedrock to validate order completeness and accuracy
    return {"order_id": order_id, "standing": "validated"}


@durable_step
def send_for_approval(step_context: StepContext, callback_id: str, order_id: str) -> dict:
    """Sends order for approval utilizing the offered callback token."""
    step_context.logger.data(f"Sending order {order_id} for approval with callback_id: {callback_id}")
    
    # In manufacturing: ship callback_id to exterior approval system
    # The exterior system will name Lambda SendDurableExecutionCallbackSuccess or
    # SendDurableExecutionCallbackFailure APIs with this callback_id when approval is full
    
    return {
        "order_id": order_id,
        "callback_id": callback_id,
        "standing": "sent_for_approval"
    }


@durable_step
def process_order(step_context: StepContext, order_id: str) -> dict:
    """Processes the order with retry logic for transient failures."""
    step_context.logger.data(f"Processing order: {order_id}")
    # Simulate flaky API that typically fails
    if random.random() > 0.4:
        step_context.logger.data("Processing failed, will retry")
        elevate Exception("Processing failed")
    return {
        "order_id": order_id,
        "standing": "processed",
        "timestamp": "2025-11-27T10:00:00Z",
    }


@durable_execution
def lambda_handler(occasion: dict, context: DurableContext) -> dict:
    attempt:
        order_id = occasion.get("order_id")
        
        # Step 1: Validate the order
        validated = context.step(validate_order(order_id))
        if validated["status"] != "validated":
            elevate Exception("Validation failed")  # Terminal error - stops execution
        context.logger.data(f"Order validated: {validated}")
        
        # Step 2: Create callback
        callback = context.create_callback(
            title="awaiting-approval",
            config=CallbackConfig(timeout=Period.from_minutes(3))
        )
        context.logger.data(f"Created callback with id: {callback.callback_id}")
        
        # Step 3: Ship for approval with the callback_id
        approval_request = context.step(send_for_approval(callback.callback_id, order_id))
        context.logger.data(f"Approval request despatched: {approval_request}")
        
        # Step 4: Await the callback outcome
        # This blocks till exterior system calls SendDurableExecutionCallbackSuccess or SendDurableExecutionCallbackFailure
        approval_result = callback.outcome()
        context.logger.data(f"Approval acquired: {approval_result}")
        
        # Step 5: Course of the order with customized retry technique
        retry_config = RetryStrategyConfig(max_attempts=3, backoff_rate=2.0)
        processed = context.step(
            process_order(order_id),
            config=StepConfig(retry_strategy=create_retry_strategy(retry_config)),
        )
        if processed["status"] != "processed":
            elevate Exception("Processing failed")  # Terminal error
        
        context.logger.data(f"Order efficiently processed: {processed}")
        return processed
        
    besides Exception as error:
        context.logger.error(f"Error processing order: {error}")
        elevate error  # Re-raise to fail the execution

This code demonstrates a number of necessary ideas:

Error dealing with—The try-catch block handles terminal errors. When an unhandled exception is thrown outdoors of a step (just like the validation test), it terminates the execution instantly. That is helpful when there’s no level in retrying, corresponding to invalid order knowledge.
Step retries—Contained in the process_order step, exceptions set off automated retries primarily based on the default (step 1) or configured RetryStrategy (step 5). This handles transient failures like non permanent API unavailability.
Logging—I exploit context.logger for the principle handler and step_context.logger inside steps. The context logger suppresses duplicate logs throughout replay.

Now I create a check occasion with order_id and invoke the operate asynchronously to start out the order workflow. I navigate to the Check tab and fill within the non-compulsory Sturdy execution title to determine this execution. Word that, sturdy capabilities offers built-in idempotency. If I invoke the operate twice with the identical execution title, the second invocation returns the present execution outcome as an alternative of making a replica.

I can monitor the execution by navigating to the Sturdy executions tab within the Lambda console:

Right here I can see every step’s standing and timing. The execution exhibits CallbackStarted adopted by InvocationCompleted, which signifies the operate has terminated and execution is suspended to keep away from idle prices whereas ready for the approval callback.

I can now full the callback immediately from the console by selecting Ship success or Ship failure, or programmatically utilizing the Lambda API.

I select Ship success.

After the callback completes, the execution resumes and processes the order. If the process_order step fails because of the simulated flaky API, it mechanically retries primarily based on the configured technique. As soon as all retries succeed, the execution completes efficiently.

Monitoring executions with Amazon EventBridge

You may also monitor sturdy operate executions utilizing Amazon EventBridge. Lambda mechanically sends execution standing change occasions to the default occasion bus, permitting you to construct downstream workflows, ship notifications, or combine with different AWS companies.

To obtain these occasions, create an EventBridge rule on the default occasion bus with this sample:

{
  "supply": ["aws.lambda"],
  "detail-type": ["Durable Execution Status Change"]
}

Issues to know

Listed here are key factors to notice:

Availability—Lambda sturdy capabilities at the moment are out there in US East (Ohio) AWS Area. For the most recent Area availability, go to the AWS Capabilities by Area web page.
Programming language assist—At launch, AWS Lambda sturdy capabilities helps JavaScript/TypeScript (Node.js 22/24) and Python (3.13/3.14). We suggest bundling the sturdy execution SDK together with your operate code utilizing your most popular package deal supervisor. The SDKs are fast-moving, so you possibly can simply replace dependencies as new options turn into out there.
Utilizing Lambda variations—When deploying sturdy capabilities to manufacturing, use Lambda variations to make sure replay at all times occurs on the identical code model. In case you replace your operate code whereas an execution is suspended, replay will use the model that began the execution, stopping inconsistencies from code modifications throughout long-running workflows.
Testing your sturdy capabilities—You possibly can check sturdy capabilities regionally with out AWS credentials utilizing the separate testing SDK with pytest integration and the AWS Serverless Utility Mannequin (AWS SAM) command line interface (CLI) for extra complicated integration testing.
Open supply SDKs—The sturdy execution SDKs are open supply for JavaScript/TypeScript and Python. You possibly can overview the supply code, contribute enhancements, and keep up to date with the most recent options.
Pricing—To be taught extra on AWS Lambda sturdy capabilities pricing, consult with the AWS Lambda pricing web page.

Get began with AWS Lambda sturdy capabilities by visiting the AWS Lambda console. To be taught extra, consult with AWS Lambda sturdy capabilities documentation web page.

Blissful constructing!

— Donnie

Construct multi-step purposes and AI workflows with AWS Lambda sturdy capabilities

How Bettors Use Arbitrage to Make Free Cash on Kalshi and Polymarket

This Researcher Trains Robots to Make Educated Guesses

Key Steps that Expose the Gaps OEMs Can’t Remedy Alone

You Can Construct Your Personal ESP32 Walkie-Talkies

Deloitte Japan Advances Safety Operations with Cisco Basis AI’s Open-Supply Mannequin

Was “Tik-Tok of Oz” the First Clever Robotic to Seem in Literature?

Federal drone insurance policies summer season 2026

UrbanV and Japan Airport Consultants (JAC) announce a strategicpartnership to develop AAM in Japan and past – sUAS Information

The Mannequin Everybody Stated Could not Exist Is Now Accessible to Everybody |

The best way to Generate AI Movies utilizing Gemini

Key Steps that Expose the Gaps OEMs Can’t Remedy Alone

4D-printed absorber makes use of heat-driven form change to tune microwave shielding