Introduction
Stackdriver Monitoring, now part of Google Cloud's Operations Suite, is a powerful tool for monitoring, logging, and diagnosing applications and infrastructure on Google Cloud Platform (GCP). It provides visibility into the performance, uptime, and overall health of your applications and infrastructure.
Key Concepts
- Metrics: Quantitative data points collected over time, such as CPU usage, memory usage, and network traffic.
- Dashboards: Customizable visual displays of metrics and logs.
- Alerts: Notifications triggered when specific conditions are met.
- Uptime Checks: Tests to ensure that your services are available and responsive.
- Service Monitoring: Monitoring the performance and health of your services.
Setting Up Stackdriver Monitoring
Step 1: Enable Stackdriver Monitoring
- Navigate to the GCP Console: Open the GCP Console at console.cloud.google.com.
- Select Your Project: Choose the project you want to monitor.
- Enable the API: Go to the API Library and enable the "Cloud Monitoring API".
Step 2: Install the Monitoring Agent
For Compute Engine instances, you need to install the Stackdriver Monitoring agent:
# For Debian/Ubuntu curl -sSO https://dl.google.com/cloudagents/add-monitoring-agent-repo.sh sudo bash add-monitoring-agent-repo.sh sudo apt-get update sudo apt-get install stackdriver-agent sudo service stackdriver-agent start # For RHEL/CentOS curl -sSO https://dl.google.com/cloudagents/add-monitoring-agent-repo.sh sudo bash add-monitoring-agent-repo.sh sudo yum install stackdriver-agent sudo service stackdriver-agent start
Step 3: Configure Monitoring
- Create a Workspace: In the GCP Console, navigate to Monitoring and create a new workspace.
- Add Resources: Add the resources you want to monitor, such as Compute Engine instances, App Engine services, or Kubernetes clusters.
Creating Dashboards
Dashboards provide a visual representation of your metrics. Here's how to create a custom dashboard:
- Navigate to Monitoring: In the GCP Console, go to Monitoring > Dashboards.
- Create a Dashboard: Click "Create Dashboard" and give it a name.
- Add Widgets: Add widgets to display metrics, such as line charts, bar charts, and heatmaps.
Example: CPU Usage Dashboard
# Example of creating a CPU usage dashboard using Python client library from google.cloud import monitoring_v3 client = monitoring_v3.DashboardsServiceClient() project_name = f"projects/{project_id}" dashboard = monitoring_v3.Dashboard( display_name="CPU Usage Dashboard", grid_layout=monitoring_v3.GridLayout(columns=2), widgets=[ monitoring_v3.Widget( title="CPU Usage", xy_chart=monitoring_v3.XyChart( data_sets=[ monitoring_v3.DataSet( time_series_query=monitoring_v3.TimeSeriesQuery( time_series_filter=monitoring_v3.TimeSeriesFilter( filter='metric.type="compute.googleapis.com/instance/cpu/usage_time"', aggregation=monitoring_v3.Aggregation( alignment_period={"seconds": 60}, per_series_aligner=monitoring_v3.Aggregation.Aligner.ALIGN_RATE ) ) ) ) ] ) ) ] ) response = client.create_dashboard(name=project_name, dashboard=dashboard) print(f"Created dashboard: {response.name}")
Setting Up Alerts
Alerts notify you when specific conditions are met. Here's how to set up an alerting policy:
- Navigate to Monitoring: In the GCP Console, go to Monitoring > Alerting.
- Create a Policy: Click "Create Policy" and define the conditions for the alert.
- Set Notification Channels: Choose how you want to be notified, such as email, SMS, or Slack.
Example: CPU Usage Alert
# Example of creating a CPU usage alert using Python client library from google.cloud import monitoring_v3 client = monitoring_v3.AlertPolicyServiceClient() project_name = f"projects/{project_id}" alert_policy = monitoring_v3.AlertPolicy( display_name="High CPU Usage Alert", conditions=[ monitoring_v3.AlertPolicy.Condition( display_name="CPU Usage", condition_threshold=monitoring_v3.AlertPolicy.Condition.MetricThreshold( filter='metric.type="compute.googleapis.com/instance/cpu/usage_time"', comparison=monitoring_v3.ComparisonType.COMPARISON_GT, threshold_value=0.8, duration={"seconds": 300}, aggregations=[ monitoring_v3.Aggregation( alignment_period={"seconds": 60}, per_series_aligner=monitoring_v3.Aggregation.Aligner.ALIGN_RATE ) ] ) ) ], notification_channels=[notification_channel_id] ) response = client.create_alert_policy(name=project_name, alert_policy=alert_policy) print(f"Created alert policy: {response.name}")
Practical Exercise
Exercise: Create a Custom Dashboard and Alert
-
Create a Custom Dashboard:
- Navigate to Monitoring > Dashboards.
- Create a new dashboard named "My Custom Dashboard".
- Add a widget to display CPU usage for your Compute Engine instances.
-
Set Up an Alert:
- Navigate to Monitoring > Alerting.
- Create an alerting policy named "High CPU Alert".
- Set the condition to trigger when CPU usage exceeds 80% for more than 5 minutes.
- Configure the notification channel to send an email to your address.
Solution
-
Custom Dashboard:
- Follow the steps in the "Creating Dashboards" section to create a dashboard and add a CPU usage widget.
-
Alert:
- Follow the steps in the "Setting Up Alerts" section to create an alerting policy for high CPU usage.
Conclusion
Stackdriver Monitoring is an essential tool for maintaining the health and performance of your applications and infrastructure on GCP. By setting up dashboards and alerts, you can proactively monitor and respond to issues, ensuring high availability and reliability for your services. In the next module, we will explore Cloud Deployment Manager and how to automate the deployment of your GCP resources.
Google Cloud Platform (GCP) Course
Module 1: Introduction to Google Cloud Platform
- What is Google Cloud Platform?
- Setting Up Your GCP Account
- GCP Console Overview
- Understanding Projects and Billing
Module 2: Core GCP Services
Module 3: Networking and Security
Module 4: Data and Analytics
Module 5: Machine Learning and AI
Module 6: DevOps and Monitoring
- Cloud Build
- Cloud Source Repositories
- Cloud Functions
- Stackdriver Monitoring
- Cloud Deployment Manager
Module 7: Advanced GCP Topics
- Hybrid and Multi-Cloud with Anthos
- Serverless Computing with Cloud Run
- Advanced Networking
- Security Best Practices
- Cost Management and Optimization