Imagine that you have a service that works with a serverless architecture, you have several processes divided into lambdas that make up a Step Function and you need to monitor and know each time a particular lambda is invoked or fails or a complete StateMachine fails. Well if you have this need or something similar, this post is for you.
In this article you will learn what flow you should follow when you want to implement an alarm, and you will also see 2 examples and the small differences you can find in them. This way we'll get to know CloudWatch and you'll take advantage of its tools to effectively monitor your resources on AWS.
Okay, but where do we start?
🧠 Step 1: Identifying our resource to be monitored
The first step is to navigate the sea of possibilities in terms of metrics in CloudWatch, so from the console we can go directly to CloudWatch, once there search the panel for the section of Metrics, then to All metrics.

Well, now you may be overwhelmed with the number of options, but to continue with the example, we will look for a particular metric, although as you can see in the following image, practically any resource can be monitored, such as API calls in API Gateway, the capacity of a table in DynamoDB or the number of messages reported through SNS.

🧠 Step 2: Choosing metrics
This time you will take two types of metrics, the most common, with their options and you will see small differences that, to our sadness, are not specified in official AWS documentation.
Search for a metric Lambda, and another metric of State for a StateMachine.
Lambda metric: Here you will see 3 options that you can use, the most effective thing to monitor a lambda alone is to use By function name. These are: Duration, Errors, Invocations, Limitations, ConcurrentExecutions and UnreservedConcurrentExecutions. We will use Invocations and Mistakes in this example.

State Metrics: As in lambdas, state lambdas can be defined in several ways, for this example you'll use the Execution metrics. These are: ExecutionsSucceeded, ExecutionsFailed, ExecutionsAbborted, ExecutionThrottled, ExecutionsTimedOut, ExecutionsStarted and ExecutionTime. You will use ExecutionsFailed in this example.

💬 Important facts about metrics:
OK, I'm going to clarify a couple of important facts about metrics.
Metrics can be created and monitored through a graph in the AWS console. Either by selecting it in Metrics/All metrics or by creating a custom panel/dashboard for one or more metrics. Yes. We'll see examples of how to create panels to add your graphics and keep everything more tidy and organized.
However, there is another element of the metrics that we will be using in this post, and that is alarms. These help us, together with the SNS service, not to have to constantly review the graphics on the console, but we can choose to send us an email when it fails or when a limit is exceeded on a certain resource.
🧠 Step 3: How do you define a metric
Well let's see how the metric is defined, according to the official documentation, we have many possible fields, some required or others optional, you can see more here: https://docs.aws.amazon.com/es_es/AWSCloudFormation/latest/UserGuide/aws-properties-cw-alarm.html. But I'll give you an example below:
This is an example of how two alarms would be defined for a Lambda, one that alerts each time the function is invoked and another each time it fails.
💡 This is an example of how two alarms would be defined for a Lambda, one that alerts each time the function is invoked and another each time it fails.
💡 This example defines an alarm for a StateMachine which sends an email when it fails an execution.
💬 Small Differences
First, you should keep in mind that the name of the metric is the one that defines it as such, then there are fields that define in time and form how the metric will be displayed on the graph.
For example, the field Period, in seconds, during which the statistic is applied. This is necessary for a metric-based alarm. Valid values are 10, 30, 60, and any multiple of 60.
✅ To learn more about each field in the example: https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-cw-alarm.html
The field Dimensions It is the one that defines the resource to which it is connected and how (by name or by arn).
So, this is the difference between defining a lambda and a StateMachine (the lambda is under the metric by name and the StateMachine under the metric by the state ARN).
✅ Regarding the name and the value, it should be considered that when the alarm is defined by name, the Name You can give it to your choice and the Value the name of the resource, in this case the Lambda, must be exact. However, in case of using arn as Value, the Name it must be the name of the resource plus the arn, in this case it's StateMachinEarn. This rule may vary, but if you want to be sure, create the alarm directly on the console as a test and instead Shares lobbies See source.

This technique is very useful, when you don't know the service well or are not sure how to define it. In this case, this option opens a window like the following, where we can see the correct and automatic definition.

The field AlarmActions It is the one that defines what service or action is executed when the metric condition is met (in this case an SNS is executed).
Second, the Dashboard has a way of being defined, usually in a dashboard.js and then a .json widget. In a Dashboard there can be N widgets, each reflecting a different alarm. Let's look at an example:
📌 Fact: You can view the metric information by selecting it in the panel and going to the tab Origin.

The panels/dashboards are viewed as follows:

Perfect, now you know how to define and also what path to follow each time you want to monitor a resource in AWS. But I know that all the help works, so prepare a repository with an example structure, where you can consult how to reference in serverless.yml for example and how to manage environment variables such as mail or arn in different environments.
https://github.com/JazminTrujilloEyzaguirre/CloudWatch_ProjectExample
✅ Last data: Alarms can be defined in the same project where you have your resources, for example your application, with your lambdas or your step functions, or you can also have a separate project just for alarms, since these are referred to by name or arns. This will give you order and help you not to overload your projects.
Finally, these alarms create and integrate the SNS service, which allows you to add emails in which we will receive alarm notifications, for this you must view them on the console.

In Themes you can view all the topics that have been created.

Once located, make sure that the email you added is in the section Subscriptions. But to receive notifications you must subscribe from an email that will arrive similar to this:

The notifications you should expect when your alarm goes off will be of this type:

Each resource handles the metric alarm in a unique way, although similar at first glance. I hope this example helped you, that you know where to start if you have a particular need, but if you still have doubts... Write to us! We can't wait to help you.