Retry timeout rule

During use of Kubernetes Engine, the system may time out in the following 2 processes:

1. Timeout Provision

During the provisioning of a Kubernetes Engine cluster, issues may arise that lead to a timeout, calculated according to the table below:

When provisioning, one of two outcomes occurs:

When provision fails, two situations may occur:

Provision succeeded but status sync was lost:
- The Kubernetes cluster was actually created successfully.
- When you select Retry, the system re-syncs the data and does not re-run the provision process.
Provision failed due to a processing error:
- If status = failed, a Retry option is displayed so the user can try again.
- If status = provisioning, the system starts counting time from when the request was submitted.
  - After 20 minutes with no status change, the system transitions status to slowing.
  - After 20 minutes in slowing status with no change, the system transitions status to pending.
  - After 40 minutes in pending status with no change, the system transitions status to error.
  - When status = error, the user can Retry. The system resets the counter and restarts the provision process from the beginning.
- Total time from the start of provisioning to full timeout: 1 hour 20 minutes.
- When status = error, the user can retry.

After a Kubernetes cluster is successfully created, during autoscaling or manual scaling, the core processor scales nodes up or down:

When scaling, one of two outcomes occurs:

When scaling fails, two situations may occur:

Scaling succeeded but status sync was lost:
- The Kubernetes cluster actually has the additional workers as requested.
- When you select Retry, the system re-syncs the data and does not re-run the scaling process.
Scaling failed due to a processing error:
- If status = failed, a Retry option is displayed so the user can try again.
- If status = processing, the system starts counting time from when the request was submitted.
  - After 10 minutes (adjusted by number of workers added) with no status change (failed/success), the system transitions status to slowing.
  - After 20 minutes in slowing status with no change, the system transitions status to pending.
  - After 30 minutes in pending status with no change, the system transitions status to error.
  - When status = error, the user can Retry. The system resets the counter and restarts the scaling process from the beginning.
- Total time from the start of scaling to full timeout: 1 hour (increases for more than 5 new workers).

When the cluster reaches error status, the user can select Retry.