Help Docs

Google Kubernetes Engine monitoring integration

Monitor your Google Kubernetes Engine with its nodes and apps using Site24x7's integration.

Setup and configuration

  • Adding Google Kubernetes Engine while configuring a new Google Cloud monitor

    If you have not configured a Google Cloud monitor yet, add one by following the steps below:

    1. Log in to your Site24x7 account.
    2. Go to Cloud > GCP > Add GCP Monitor or Admin > Cloud Monitoring > Google Cloud Platform(GCP).
    3. Provide a unique display name for identification purposes.
    4. Upload a service account JSON file to authenticate Site24x7 for performing resource discovery.
    5. Select Google Kubernetes Engine from the Select the Resources for Monitoring list.
    6. Select existing Notification Profiles, User Alerts Groups, Tags, and IT Automation Templates or add new ones. You can also integrate Site24x7's alarms with your preferred third-party service.
    7. Click Start GCP Monitoring.
  • Adding Google Kubernetes Engine to an existing Google Cloud monitor

    If you already have a Google Cloud monitor configured for the service account, you can add Google Kubernetes Engine by following the steps below:

    1. Log in to your Site24x7 account.
    2. Go to Cloud > GCP and select your GCP monitor.
    3. Click the hamburger Hamburger icon icon next to Service View and select Edit, which brings you to the Edit GCP Monitor page.
    4. On the Edit GCP Monitor page, select Google Kubernetes Engine from the Select the Resources for Monitoring list and click Save.
    5. After successful configuration, go to Cloud > GCP > Google Kubernetes Engine. Now you can view the discovered Google Kubernetes Engine resources.
Note

It will take approximately five minutes to discover new GCP resources.

Polling frequency

Site24x7's Google Kubernetes Engine monitor collects the metrics data every five minutes and the statuses of your Google Kubernetes Engine resources every minute.

Supported metrics

Metric nameDescriptionStatisticUnit
Container CPU Usage Time The CPU cores limit of the container Average Second
Container Limit Cores The delta count of bytes received over the network which are grouped by the API method name and response code Average Count
Container CPU Limit Utilization The fraction of the CPU limit that is currently in use on the instance. This value cannot exceed one, as usage cannot exceed the limit. Average Count
Container Request Cores The total number of CPU cores requested by the container Total Count
Container CPU Request Utilization The fraction of the requested CPU that is currently in use on the instance. This value can be greater than one, as usage can exceed the request. Average Count
Container Ephemeral Storage Limit The total local ephemeral storage limit Total Bytes
Container Ephemeral Storage Request The total local ephemeral storage request Average Count
Container Ephemeral Storage Usage The total local ephemeral storage usage Total Bytes
Container Memory Limit The total memory limit of the container Total Bytes
Container Memory Limit Utilization The fraction of the memory limit that is currently in use on the instance. This value cannot exceed one, as usage cannot exceed the limit. Average Count
Container Page Faults The total number of page faults, broken down by type Total Count
Container Memory Request The total memory request of the container Total Bytes
Container Memory Request Utilization The fraction of the requested memory that is currently in use on the instance. This value can be greater than one, as usage can exceed the request. Average Count
Container Memory Usage The total memory usage Total Bytes
Container Restart Count The total number of times the container has restarted Total Count
Container Uptime The average time in seconds that the container has been running Average Seconds
Container Accelerator Duty Cycle The total percent of time over the past sample period (10s) during which the accelerator was actively processing. Values are integers between 0 and 100 Total Percentage
Container Accelerator Memory Total The total accelerator memory Total Bytes
Container Accelerator Memory Used The total accelerator memory Total Bytes
Container Request Accelerators The total number of accelerator devices requested by the container Total Count
Overall GPU Utilization The percentage of GPU resources currently being used Average Percentage
Overall GPU Temperature Model The average temperature across all GPUs in the system Average Celsius
GPU Utilization The specific utilization rate of an individual GPU Average Percentage
FrameBuffer Memory Utilization The percentage of framebuffer memory currently in use Average Bytes
GPU Temperature Model The average temperature of an individual GPU Average Celsius
SM Clock Speed The clock speed of the streaming multiprocessor within the GPU Average MHz
Memory Temperature Model The average temperature of the memory modules in the system Average Celsius
Total Power Usage Model The total power consumption of the GPU Total Watts
Memory Clock Speed The clock speed of the GPU's memory Average MHz
Graphic Engine Active The percentage of time the graphics engine is actively processing tasks Average Percentage
Memory Bandwidth Utilization The rate at which data is being transferred within the GPU memory Average Bytes
SM Utilization The utilization rate of the streaming multiprocessor in the GPU Average Percentage
Tensor Utilization The utilization rate of tensor cores within the GPU Average Percentage
FP64 Utilization The utilization rate of 64-bit floating point operations within the GPU Average Percentage
FP32 Utilization The utilization rate of 32-bit floating point operations within the GPU Average Percentage
FP16 Utilization The utilization rate of 16-bit floating point operations within the GPU Average Percentage

Threshold configuration

  • Global configuration
    1. In the Site24x7 web client, go to the Admin section on the left navigation pane.
    2. Select Configuration Profiles from the left pane and select Threshold and Availability from the drop-down menu.
    3. Click Add Threshold Profile in the top-right corner.
    4. For Monitor Type, select Google Kubernetes Engine.
    5. Now you can set the threshold values for the metrics listed above.
  • Monitor-level configuration
    1. In the Site24x7 web client, go to Cloud > GCP > Google Kubernetes Engine.
    2. Select a resource you would like to set a threshold for, then click the hamburger Hamburger icon icon.
    3. Select Edit, which directs you to the Edit Google Kubernetes Engine Monitor page.
    4. You can set the threshold values for the metrics with the Threshold and Availability option.
    5. You can also configure IT Automation at the attribute level.

IT Automation

With Site24x7's IT Automation tools, you can streamline your operations, reduce manual effort, and proactively address performance issues. The alarm engine in Site24x7 continually evaluates system events based on the thresholds you have set. When a breach occurs, the mapped automation associated with that event is triggered, ensuring prompt remediation and minimizing the impact on your IT infrastructure.

How to configure IT Automation for a monitor

Configuration Rules

With Site24x7's Configuration Rules, you can optimize your monitoring setup, save time, and efficiently manage your monitoring resources. Associate different monitor groups or add specific tags to multiple monitors simultaneously. This eliminates the need to manually edit each monitor individually, saving you valuable time and effort.

How to add Configuration Rules

Summary

The Summary tab will give you the performance data organized by time for the metrics listed above. To view the summary:

  1. Go to Cloud > GCP > Google Kubernetes Engine.
  2. Select a resource.
  3. Click the Summary tab.

Configuration Details

The Configuration Details tab provides details on the configurations of application instances. To get the configuration details:

  1. Go to Cloud > GCP > Google Kubernetes Engine.
  2. Select a resource.
  3. Click the Configuration Details tab.

Reports

Gain in-depth data about the various parameters of your monitored resources and accentuate your service performance using our insightful reports.

To view reports for a Google Kubernetes Engine resource:

  1. Go to the Reports section on the left navigation pane.
  2. Select Google Kubernetes Engine from the menu on the left.
  3. You can find the Availability Summary Report, Performance Report, and Inventory Report for one selected monitor. Or you can get the Summary Report, Availability Summary Report, Health Trend Report, and Performance Report for all the Google Kubernetes Engine monitors.

You can also get reports from the Summary tab of the Google Kubernetes Engine monitor:

  1. Click the Summary tab.
  2. Get the Availability Summary Report of the monitor by clicking Availability.
  3. You can also find the Performance Report of the monitor by clicking any chart title.

Related content

Was this document helpful?

Would you like to help us improve our documents? Tell us what you think we could do better.


We're sorry to hear that you're not satisfied with the document. We'd love to learn what we could do to improve the experience.


Thanks for taking the time to share your feedback. We'll use your feedback to improve our online help resources.

Shortlink has been copied!