Wednesday, November 20, 2019

AWS uses Step Functions to orchestrate Amazon EMR workloads

Amazon Web Services announced its AWS Step Functions that allows users to add serverless workflow automation to their applications. The steps of the workflow can run anywhere, including in AWS Lambda functions, on Amazon Elastic Compute Cloud (EC2), or on-premises. 



Workflows are made up of a series of steps, with the output of one step acting as input into the next. Application development is simpler and more intuitive using Step Functions, because it translates workflow into a state machine diagram that is easy to understand, easy to explain to others, and easy to change. 

Users can monitor each step of execution as it happens, which means they can identify and fix problems quickly. Step Functions automatically triggers and tracks each step, and retries when there are errors, so that the application executes in order and as expected.

Step Functions connects to Amazon EMR to create data processing and analysis workflows with minimal code, saving time, and optimizing cluster utilization. For example, building data processing pipelines for machine learning is time consuming and hard. With this new integration, users have a simple way to orchestrate workflow capabilities, including parallel executions and dependencies from the result of a previous step, and handle failures and exceptions when running data processing jobs.

Specifically, a Step Functions state machine can create or terminate an EMR cluster, including the possibility to change the cluster termination protection. In this way, consumers can reuse an existing EMR cluster for their workflow, or create one on-demand during execution of a workflow. It also can add or cancel an EMR step for their cluster. 


Each EMR step is a unit of work that contains instructions to manipulate data for processing by software installed on the cluster, including tools such as Apache Spark, Hive, or Presto.

The offering can also modify the size of an EMR cluster instance fleet or group, allowing users to manage scaling programmatically depending on the requirements of each step of the workflow. For example, the user may increase the size of an instance group before adding a compute-intensive step, and reduce the size after it has completed.

When creating or terminating a cluster or add an EMR step to a cluster, users can use synchronous integrations to move to the next step of the workflow only when the corresponding activity has completed on the EMR cluster.

No comments:

Post a Comment

Masimo secures FDA clearance for neonatal RD SET Pulse Oximetry sensors with improved accuracy specifications

Masimo announced that RD SET sensors with Masimo Measure-through Motion and Low Perfusion SET pulse oximetry have received FDA clearance ...