Skip to main content

Retry timeout rule

During use of Kubernetes Engine, the system may time out in the following 2 processes:

  • Provision
  • Scaling

1. Timeout Provision

During the provisioning of a Kubernetes Engine cluster, issues may arise that lead to a timeout, calculated according to the table below:

Status changeCondition
Provisioning => Slowing20 min
Slowing => Pending20 min
Pending => Error (timeout)40 min

When provisioning, one of two outcomes occurs:

  • Provision succeeds
  • Provision fails

When provision fails, two situations may occur:

  • Provision succeeded but status sync was lost:

    • The Kubernetes cluster was actually created successfully.
    • When you select Retry, the system re-syncs the data and does not re-run the provision process.
  • Provision failed due to a processing error:

    • If status = failed, a Retry option is displayed so the user can try again.
    • If status = provisioning, the system starts counting time from when the request was submitted.
      • After 20 minutes with no status change, the system transitions status to slowing.
      • After 20 minutes in slowing status with no change, the system transitions status to pending.
      • After 40 minutes in pending status with no change, the system transitions status to error.
      • When status = error, the user can Retry. The system resets the counter and restarts the provision process from the beginning.
    • Total time from the start of provisioning to full timeout: 1 hour 20 minutes.
    • When status = error, the user can retry.

2. Timeout Scaling

After a Kubernetes cluster is successfully created, during autoscaling or manual scaling, the core processor scales nodes up or down:

Status change1 <= workers added < 5workers added > 5
Running => Slowing10 min10 min + (workers added - 5) × 3 min
Slowing => Pending20 min20 min + (workers added - 5) × 3 min
Pending => Error (timeout)30 min30 min + (workers added - 5) × 3 min

When scaling, one of two outcomes occurs:

  • Scaling succeeds
  • Scaling fails

When scaling fails, two situations may occur:

  • Scaling succeeded but status sync was lost:

    • The Kubernetes cluster actually has the additional workers as requested.
    • When you select Retry, the system re-syncs the data and does not re-run the scaling process.
  • Scaling failed due to a processing error:

    • If status = failed, a Retry option is displayed so the user can try again.
    • If status = processing, the system starts counting time from when the request was submitted.
      • After 10 minutes (adjusted by number of workers added) with no status change (failed/success), the system transitions status to slowing.
      • After 20 minutes in slowing status with no change, the system transitions status to pending.
      • After 30 minutes in pending status with no change, the system transitions status to error.
      • When status = error, the user can Retry. The system resets the counter and restarts the scaling process from the beginning.
    • Total time from the start of scaling to full timeout: 1 hour (increases for more than 5 new workers).

When the cluster reaches error status, the user can select Retry.