Troubleshooting actions on running scopes

There are situations where you want to perform specific actions on a running scope. For example, if you see unusual behavior in production, you might want to kill a specific instance to see if the problem is related to that instance. One possible approach is to make a new deployment, but this can be time-consuming and might not be the best approach for this situation.

These actions can be triggered through the Performance view in the UI. Any change they make is not persisted, which means that if you make a new deployment, those changes will be lost.

Use case scenario

Let’s suppose you have a traffic spike in production. Through the Performance view, you can quickly change the scaling behavior by adding more instances to the underlying Autoscaling Group. If you do a deployment, the changes you previously made will be overridden by the scope configuration.

For example, you can have a scope with scaling enabled, with a min/max range of 2 and 5 instances, respectively. If there’s a traffic spike, you can quickly set the number of instances to 10 to handle the increase. If you do a deployment, the number of instances will be set back to the min/max range of 2 and 5, because that's what the scope configuration says. These troubleshooting actions are meant to quickly take action on a running scope, without the need to do a deployment.

Current supported actions

We have a set of actions you can perform on a running scope. If there are other actions you’d find useful, please reach out! Currently, these actions are:

Instance Kill

This action will terminate an instance or pod. You can kill a specific instance or a group of them at once. We need to be careful with this action as it may generate service degradation.

Autoscaling Fix

This is a quick way to scale your infrastructure by adding or removing instances/pods, depending on the desired number you set. Along with this number, you can also specify whether to leave autoscaling enabled or not.

Autoscaling Stop

If a scope has autoscaling enabled, you can stop it. This is useful if the scaling rules you previously set were too aggressive and you want to avoid further scaling.

Autoscaling Start

If a scope has autoscaling enabled and was previously stopped, you can enable it again.

Setup

Applies only to instance_kill in server-based scopes; the rest do not require further setup.

In order to be able to kill an instance, nullplatform needs permission to terminate an instance. To do so, you need to follow two steps:

Create a new policy
Attach policy to the role we assume

New Policy

From the AWS Management Console, in IAM > Policies, click "Create policy" and select the JSON tab. Then add the following:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": "ec2:TerminateInstances",
            "Resource": "*"
        }
    ]
}

After setting this JSON, name the policy np-troubleshooting-manager. Any new action you want to add in the future can be added to this policy.

Attach Policy to Role

From the AWS Management Console, in IAM > Roles, click the null-scope-and-deploy-manager role. Click "Attach policies" and search for the policy you just created. Attach it to the role.

That's it. Now you're ready to use the troubleshooting actions.

API Examples

If you need to create a troubleshooting action without going through the UI, you can use the API.

Please refer to our API reference for more details about usage.

Creating an Action

`instance_kill`

You can kill a single instance/pod or multiple at once. The instance_id is for both server-based scopes and K8s-based scopes.

For server-based scopes, you just need the instance ID, which you can find through the Performance View or through the EC2 Dashboard or AWS CLI using describe-instances command.

For K8s-based scopes, you need the pod name, which you can find through the Performance View or through the EKS Dashboard (or equivalent) or kubectl get pods command.

POST /scope/<SCOPE_ID>/action
{
    "name": "instance_kill",
    "parameters": {
        "instance_id": "i-abcdef123456"
    }
}

POST /scope/<SCOPE_ID>/action
{
    "name": "instance_kill",
    "parameters": {
        "instance_id": [
            "i-abcdef123456",
            "i-defghi654321"    
        ]
    }
}

`autoscaling_fix`

With this action, you set the desired number of instances/pods. Under the hood, if you set 10 instances and you currently have 5, nullplatform will add 5 more instances. If you set 5 instances and you currently have 10, nullplatform will remove 5 instances.

Along with the desired amount, we can also specify whether to leave the autoscaling enabled or not (in case the scope previously had scaling enabled through the scope details).

POST /scope/<SCOPE_ID>/action
{
    "name": "autoscaling_fix",
    "parameters": {
        "desired_instances": 10,
        "autoscaling_enabled": false
    }
}

`autoscaling_stop`

If the scope has autoscaling enabled, you can stop it. This is useful if the scaling rules you previously set were too aggressive and you want to avoid further scaling.

POST /scope/<SCOPE_ID>/action
{
    "name": "autoscaling_stop",
    "parameters": {}
}

`autoscaling_start`

If the scope has autoscaling enabled and was previously stopped, you can enable it again.

POST /scope/<SCOPE_ID>/action
{
    "name": "autoscaling_start",
    "parameters": {}
}

Reading an Action

GET /scope/<SCOPE_ID>/action/<ACTION_ID>
{
    "id": 5678,
    "status": "failed",
    "entity": "scope",
    "entity_id": 1234,
    "name": "instance_kill",
    "parameters": {
        "instance_id": "i-abcdef123456"
    },
    "messages": [
        {
            "level": "ERROR",
            "message": "Instance id i-abcdef123456 not found"
        }
    ],
    "created_at": "2024-09-26T19:24:02.714Z",
    "updated_at": "2024-09-26T19:24:05.801Z",
    "nrn": "organization=1:account=2:namespace=3:application=4:scope=1234"
}

Use case scenario​

Current supported actions​

Instance Kill​

Autoscaling Fix​

Autoscaling Stop​

Autoscaling Start​

Setup​

New Policy​

Attach Policy to Role​

API Examples​

Creating an Action​

instance_kill​

autoscaling_fix​

autoscaling_stop​

autoscaling_start​

Reading an Action​