Introduction

Stackdriver Monitoring, now part of Google Cloud's Operations Suite, is a powerful tool for monitoring, logging, and diagnosing applications and infrastructure on Google Cloud Platform (GCP). It provides visibility into the performance, uptime, and overall health of your applications and infrastructure.

Key Concepts

  1. Metrics: Quantitative data points collected over time, such as CPU usage, memory usage, and network traffic.
  2. Dashboards: Customizable visual displays of metrics and logs.
  3. Alerts: Notifications triggered when specific conditions are met.
  4. Uptime Checks: Tests to ensure that your services are available and responsive.
  5. Service Monitoring: Monitoring the performance and health of your services.

Setting Up Stackdriver Monitoring

Step 1: Enable Stackdriver Monitoring

  1. Navigate to the GCP Console: Open the GCP Console at console.cloud.google.com.
  2. Select Your Project: Choose the project you want to monitor.
  3. Enable the API: Go to the API Library and enable the "Cloud Monitoring API".

Step 2: Install the Monitoring Agent

For Compute Engine instances, you need to install the Stackdriver Monitoring agent:

# For Debian/Ubuntu
curl -sSO https://dl.google.com/cloudagents/add-monitoring-agent-repo.sh
sudo bash add-monitoring-agent-repo.sh
sudo apt-get update
sudo apt-get install stackdriver-agent
sudo service stackdriver-agent start

# For RHEL/CentOS
curl -sSO https://dl.google.com/cloudagents/add-monitoring-agent-repo.sh
sudo bash add-monitoring-agent-repo.sh
sudo yum install stackdriver-agent
sudo service stackdriver-agent start

Step 3: Configure Monitoring

  1. Create a Workspace: In the GCP Console, navigate to Monitoring and create a new workspace.
  2. Add Resources: Add the resources you want to monitor, such as Compute Engine instances, App Engine services, or Kubernetes clusters.

Creating Dashboards

Dashboards provide a visual representation of your metrics. Here's how to create a custom dashboard:

  1. Navigate to Monitoring: In the GCP Console, go to Monitoring > Dashboards.
  2. Create a Dashboard: Click "Create Dashboard" and give it a name.
  3. Add Widgets: Add widgets to display metrics, such as line charts, bar charts, and heatmaps.

Example: CPU Usage Dashboard

# Example of creating a CPU usage dashboard using Python client library
from google.cloud import monitoring_v3

client = monitoring_v3.DashboardsServiceClient()
project_name = f"projects/{project_id}"

dashboard = monitoring_v3.Dashboard(
    display_name="CPU Usage Dashboard",
    grid_layout=monitoring_v3.GridLayout(columns=2),
    widgets=[
        monitoring_v3.Widget(
            title="CPU Usage",
            xy_chart=monitoring_v3.XyChart(
                data_sets=[
                    monitoring_v3.DataSet(
                        time_series_query=monitoring_v3.TimeSeriesQuery(
                            time_series_filter=monitoring_v3.TimeSeriesFilter(
                                filter='metric.type="compute.googleapis.com/instance/cpu/usage_time"',
                                aggregation=monitoring_v3.Aggregation(
                                    alignment_period={"seconds": 60},
                                    per_series_aligner=monitoring_v3.Aggregation.Aligner.ALIGN_RATE
                                )
                            )
                        )
                    )
                ]
            )
        )
    ]
)

response = client.create_dashboard(name=project_name, dashboard=dashboard)
print(f"Created dashboard: {response.name}")

Setting Up Alerts

Alerts notify you when specific conditions are met. Here's how to set up an alerting policy:

  1. Navigate to Monitoring: In the GCP Console, go to Monitoring > Alerting.
  2. Create a Policy: Click "Create Policy" and define the conditions for the alert.
  3. Set Notification Channels: Choose how you want to be notified, such as email, SMS, or Slack.

Example: CPU Usage Alert

# Example of creating a CPU usage alert using Python client library
from google.cloud import monitoring_v3

client = monitoring_v3.AlertPolicyServiceClient()
project_name = f"projects/{project_id}"

alert_policy = monitoring_v3.AlertPolicy(
    display_name="High CPU Usage Alert",
    conditions=[
        monitoring_v3.AlertPolicy.Condition(
            display_name="CPU Usage",
            condition_threshold=monitoring_v3.AlertPolicy.Condition.MetricThreshold(
                filter='metric.type="compute.googleapis.com/instance/cpu/usage_time"',
                comparison=monitoring_v3.ComparisonType.COMPARISON_GT,
                threshold_value=0.8,
                duration={"seconds": 300},
                aggregations=[
                    monitoring_v3.Aggregation(
                        alignment_period={"seconds": 60},
                        per_series_aligner=monitoring_v3.Aggregation.Aligner.ALIGN_RATE
                    )
                ]
            )
        )
    ],
    notification_channels=[notification_channel_id]
)

response = client.create_alert_policy(name=project_name, alert_policy=alert_policy)
print(f"Created alert policy: {response.name}")

Practical Exercise

Exercise: Create a Custom Dashboard and Alert

  1. Create a Custom Dashboard:

    • Navigate to Monitoring > Dashboards.
    • Create a new dashboard named "My Custom Dashboard".
    • Add a widget to display CPU usage for your Compute Engine instances.
  2. Set Up an Alert:

    • Navigate to Monitoring > Alerting.
    • Create an alerting policy named "High CPU Alert".
    • Set the condition to trigger when CPU usage exceeds 80% for more than 5 minutes.
    • Configure the notification channel to send an email to your address.

Solution

  1. Custom Dashboard:

    • Follow the steps in the "Creating Dashboards" section to create a dashboard and add a CPU usage widget.
  2. Alert:

    • Follow the steps in the "Setting Up Alerts" section to create an alerting policy for high CPU usage.

Conclusion

Stackdriver Monitoring is an essential tool for maintaining the health and performance of your applications and infrastructure on GCP. By setting up dashboards and alerts, you can proactively monitor and respond to issues, ensuring high availability and reliability for your services. In the next module, we will explore Cloud Deployment Manager and how to automate the deployment of your GCP resources.

© Copyright 2024. All rights reserved