
Overview
In Google Cloud Platform (GCP), effective identity and access management is the foundation of a secure and efficient cloud environment. For serverless workloads, Google Cloud Run services rely on Service Accounts to act as their identity, granting them the necessary permissions to interact with other GCP APIs and resources. However, mismanagement of these identities creates significant operational and financial risks.
Two common failures disrupt this critical relationship: a Cloud Run service referencing a deleted or disabled service account, and the retention of enabled service accounts that are no longer in use. A missing identity causes immediate application failure, halting business operations. Conversely, an inactive but enabled identity creates a dormant security vulnerability—a potential entry point for attackers that can go undetected for months. Mastering the lifecycle of Cloud Run service accounts is a core discipline for any team serious about FinOps and cloud governance.
Why It Matters for FinOps
The consequences of mismanaged service accounts extend beyond technical errors, translating directly into tangible business costs. From a FinOps perspective, every identity-related failure is a source of waste. An application outage caused by a missing service account leads to direct revenue loss, penalties for breaking Service Level Agreements (SLAs), and wasted engineering hours spent on reactive troubleshooting instead of value-added work.
Furthermore, inactive service accounts represent a hidden liability. They are assets with no corresponding business function, contributing to a bloated attack surface and increasing compliance risk. During audits for frameworks like SOC 2 or PCI DSS, failing to de-provision unused identities can result in findings that jeopardize certifications and delay sales cycles. Proper governance over these identities is essential for maintaining operational excellence and protecting the bottom line.
What Counts as “Idle” in This Article
For the purposes of this article, an "idle" service account in the context of Google Cloud Run falls into two main categories:
-
Missing or Disabled: This refers to a service account that is explicitly linked to an active Cloud Run service but has been deleted from IAM or is in a disabled state. This is an immediate operational issue, as the Cloud Run instance cannot authenticate to perform its duties, resulting in execution failures.
-
Inactive or Unused: This describes a service account that is enabled but has not been used to authenticate or access any resources for an extended period (e.g., 90 days). While not causing an immediate failure, these accounts represent unnecessary security risk and are a sign of poor hygiene. They are essentially dormant waste waiting to become a problem.
Common Scenarios
These misconfigurations often arise from predictable operational patterns.
Scenario 1
An automated cleanup script or a well-intentioned administrator deletes a service account that appears inactive based on recent authentication data. They are unaware that the account is tied to a critical Cloud Run service that only runs periodically, such as a monthly report generator. The next time the job triggers, it fails, causing a high-priority incident.
Scenario 2
A team manages its infrastructure using code, but the Cloud Run service and its associated service account are defined in separate modules or state files. During a refactoring, the service account resource is removed from the code and deleted by the automation tool. The Cloud Run service, however, is not updated and is now pointing to a non-existent identity.
Scenario 3
Following security best practices, a team deletes the default Compute Engine service account to reduce the risk posed by its broad Editor permissions. However, they neglect to first audit which Cloud Run services were deployed using this default identity. As a result, multiple services that relied on it for authentication immediately break.
Risks and Trade-offs
The primary tension in managing service accounts is between security and availability. Aggressively deleting any account that appears unused can reduce the attack surface, but it carries a high risk of causing an outage—the classic "don’t break prod" dilemma. Deleting an identity is a destructive action that can be difficult to reverse correctly.
A more balanced approach involves a "soft delete" policy. By disabling an account first, you initiate a "scream test." If a critical service depends on that identity, the failure will occur immediately and can be quickly remediated by re-enabling the account. This trade-off—a brief, controlled test period—is far safer than permanent deletion and provides a buffer to validate dependencies before taking an irreversible action.
Recommended Guardrails
To prevent these issues from recurring, FinOps and platform engineering teams should establish clear governance guardrails.
- Ownership and Tagging: Implement a mandatory tagging policy for all service accounts. Tags should clearly identify the business owner, the associated application or service, and the creation date. This context is crucial for making informed de-provisioning decisions.
- Deprovisioning Lifecycle: Formalize a "disable-first" policy for all service accounts. An account flagged as inactive should be disabled for a defined period (e.g., 30-60 days) before it is scheduled for deletion.
- Automated Alerts: Configure monitoring to generate alerts on
DeleteServiceAccountevents. These alerts should trigger a cross-check to verify if the deleted account is associated with any active Cloud Run services, enabling a rapid response. - Infrastructure as Code (IaC) Standards: Mandate that a service account and the Cloud Run service that depends on it are managed within the same IaC module and lifecycle. This creates an explicit dependency, preventing the identity from being destroyed without updating the service that uses it.
Provider Notes
GCP
In Google Cloud, the relationship between a Cloud Run service and its identity is fundamental. Each service revision is configured to run as a specific Service Account, which dictates its permissions. To effectively govern these relationships, teams can leverage Cloud Asset Inventory to track all service account configurations and monitor for changes over time. Additionally, Organization Policies can be used to enforce standards, such as restricting the use of default service accounts.
Binadox Operational Playbook
Binadox Insight: Mismanaged service accounts are a hidden FinOps liability. They create waste through both operational downtime and dormant security risks, directly impacting your unit economics and increasing the total cost of ownership.
Binadox Checklist:
- Audit all Cloud Run services to identify any pointing to missing or disabled service accounts.
- Inventory all existing service accounts and tag them with clear owners and application dependencies.
- Implement a formal "disable-first, wait, then delete" policy for de-provisioning unused accounts.
- Prohibit the use of the default Compute Engine service account for new Cloud Run workloads.
- Ensure service accounts and their dependent resources are managed within the same Infrastructure as Code (IaC) module.
Binadox KPIs to Track:
- Number of production incidents caused by missing or invalid service account identities.
- Percentage of service accounts with no authentication activity in the last 90 days.
- Mean Time to Resolution (MTTR) for identity-related service outages.
- Percentage of service accounts with complete and accurate ownership tags.
Binadox Common Pitfalls:
- Deleting a service account without verifying its dependencies on infrequent or scheduled Cloud Run jobs.
- Managing service accounts and Cloud Run services in separate, uncoordinated IaC states.
- Overlooking services that were deployed using the default service account before disabling or deleting it.
- Recreating a deleted account with the same email address without redeploying the Cloud Run service to bind to the new unique ID.
Conclusion
Proactive hygiene for Google Cloud Run service accounts is not just a security task—it is a critical FinOps discipline. By establishing clear ownership, implementing lifecycle management policies, and leveraging automation, organizations can prevent costly downtime, reduce their security risk, and ensure their cloud environment remains both efficient and compliant.
The first step is to gain visibility into your current state. Begin by auditing your existing service accounts and their connections to Cloud Run services. From there, you can build the guardrails and processes needed to manage these critical identities effectively throughout their lifecycle.