Securing Publicly Accessible BigQuery Datasets in GCP

Overview

Google BigQuery is a powerful, serverless data warehouse capable of storing and analyzing petabytes of an organization’s most critical data. This includes everything from customer analytics and financial records to proprietary intellectual property. Because it houses such valuable information, BigQuery is a high-priority target for malicious actors. Its security model is deeply integrated with Google Cloud’s Identity and Access Management (IAM), which dictates who can access, query, or modify datasets.

A common but critical misconfiguration occurs when a BigQuery dataset is made publicly accessible. This exposure happens when IAM policies are improperly configured to grant permissions to broad, public identifiers like allUsers or allAuthenticatedUsers. The former grants access to anyone on the internet, while the latter, often misunderstood, grants access to anyone with a Google account—not just users within your organization. This seemingly small error effectively removes the security perimeter around your most sensitive data, exposing it to significant risk.

Why It Matters for FinOps

Exposing a BigQuery dataset isn’t just a security failure; it’s a major FinOps concern that introduces severe financial and operational waste. The most direct impact is the “denial of wallet” attack. BigQuery’s pricing model is based on the volume of data processed by queries. If a malicious actor gains the ability to run queries against a public dataset, they can execute complex, resource-intensive operations that process terabytes of data. This unauthorized activity is billed directly to your project, leading to sudden and astronomical cloud bills.

Beyond direct costs, this misconfiguration creates operational drag. Responding to a data breach, conducting forensic analysis to determine the extent of the exposure, and managing regulatory reporting consumes significant engineering and compliance resources. These reactive efforts divert teams from value-creating initiatives. From a governance perspective, a public dataset is a clear failure of cost control and security guardrails, undermining the trust and predictability essential for a mature FinOps practice.

What Counts as “Idle” in This Article

In the context of this security issue, a resource isn’t “idle” in the traditional sense of unused CPU or memory. Instead, a publicly accessible BigQuery dataset is considered idle from a governance and risk management perspective. It represents a misconfigured asset that is not serving its intended, secure business purpose.

The key signals of this risky state include:

  • The presence of allUsers in an IAM policy attached to a dataset containing non-public data.
  • The use of allAuthenticatedUsers for a dataset that should be restricted to internal or specifically authorized external principals.
  • Any dataset where access controls do not align with a documented and approved data sharing strategy.

Essentially, if a dataset’s accessibility level creates unmanaged risk and potential for financial waste, it has failed its governance objectives and is a liability.

Common Scenarios

Scenario 1

A development team, needing to grant temporary access to a third-party contractor, sets a dataset’s IAM policy to allAuthenticatedUsers as a quick workaround. They assume it only applies to their organizational users. This temporary permission is forgotten and remains active as the dataset is promoted to a production environment, exposing sensitive analytics to anyone with a Google account.

Scenario 2

An organization launches a public data initiative but uses direct IAM policies instead of purpose-built GCP tools like Analytics Hub. By granting allUsers the ability to view and query the data, they inadvertently also expose themselves to “denial of wallet” attacks, as they have no control over the volume or complexity of queries run by the public.

Scenario 3

A legacy dataset, created before mature cloud governance policies were established, retains its original open permissions. During a cloud migration or project consolidation, this dataset is overlooked in security audits. It remains a silent, ticking time bomb until it is discovered by a security scanner or, worse, a malicious actor.

Risks and Trade-offs

Addressing public BigQuery datasets requires balancing security with operational needs. The primary risk of inaction is severe: data exfiltration, regulatory fines for non-compliance with frameworks like PCI DSS or HIPAA, and uncontrolled financial spend. The knee-jerk reaction might be to lock everything down immediately.

However, the trade-off is the potential to break legitimate workflows. Some datasets may be intentionally public as part of an open-data program. A blanket policy to remove all public access without verification could disrupt these services. Therefore, remediation must include a verification step to confirm a dataset’s intended audience. The guiding principle should be “never break production,” which means carefully auditing before revoking permissions and ensuring that any changes are communicated to resource owners.

Recommended Guardrails

Proactive governance is the most effective way to prevent public dataset exposure. Instead of relying on reactive cleanup, organizations should implement robust guardrails.

  • Policy Enforcement: Implement Google Cloud Organization Policies, such as Domain Restricted Sharing, to prevent IAM policies from granting access to identities outside of approved domains.
  • Tagging and Ownership: Enforce a strict tagging policy where every BigQuery dataset has a designated owner and a data classification tag (e.g., public, confidential, pii). This clarifies intent and streamlines audits.
  • Automated Audits: Use automated security posture management tools to continuously scan for IAM policies containing allUsers or allAuthenticatedUsers and alert the appropriate team owner.
  • Approval Workflows: Require a formal review and approval process for any request to make a dataset public, ensuring it aligns with business objectives and that cost implications are understood.

Provider Notes

GCP

Google Cloud provides several native services and concepts to help manage and secure BigQuery datasets. The core of access control is managed through Google Cloud IAM, where roles and permissions are assigned to principals (users, groups, service accounts). A critical guardrail is the Organization Policy Service, which allows administrators to enforce constraints across the entire resource hierarchy. Specifically, the iam.allowedPolicyMemberDomains constraint is highly effective at preventing accidental public exposure. For an additional layer of network-based security, VPC Service Controls can create a service perimeter that isolates sensitive BigQuery data and prevents data exfiltration, even if IAM permissions are misconfigured.

Binadox Operational Playbook

Binadox Insight: The “denial of wallet” threat turns a security misconfiguration into a direct financial liability. An exposed BigQuery dataset isn’t just a data leak risk; it’s an open invitation for attackers to spend your cloud budget. This linkage is critical for getting budget-holder buy-in for stronger security governance.

Binadox Checklist:

  • Systematically audit all BigQuery dataset IAM policies for the principals allUsers and allAuthenticatedUsers.
  • For each identified public dataset, verify with the business owner whether the public access is intentional and approved.
  • Implement the “Domain Restricted Sharing” Organization Policy to programmatically block new public permissions.
  • Establish a data classification standard and enforce its application through tagging on all new datasets.
  • Integrate automated checks for public datasets into your CI/CD pipeline to catch misconfigurations before they reach production.
  • Review FinOps dashboards for anomalous BigQuery query costs, which can be an early indicator of unauthorized use.

Binadox KPIs to Track:

  • Number of publicly accessible BigQuery datasets discovered per month.
  • Mean Time to Remediate (MTTR) for discovered public dataset exposures.
  • Percentage of BigQuery datasets with proper ownership and data classification tags.
  • Number of anomalous BigQuery cost spikes attributed to query activity.

Binadox Common Pitfalls:

  • Misunderstanding that allAuthenticatedUsers means “all users in my company” instead of “anyone with a Google account.”
  • Forgetting to revoke “temporary” public permissions granted during development or troubleshooting.
  • Failing to audit legacy datasets created before security and FinOps governance policies were in place.
  • Making a dataset public for data sharing without considering the financial risk of uncontrolled public queries.

Conclusion

Securing BigQuery datasets from public access is a foundational element of a mature Google Cloud security and FinOps strategy. The risk extends far beyond data confidentiality, creating significant potential for financial waste through “denial of wallet” attacks and imposing a heavy operational burden on engineering teams.

The path forward involves a combination of diligent auditing and proactive governance. By implementing automated guardrails, enforcing clear ownership, and fostering a culture of security awareness, you can ensure your data warehouse remains a secure, cost-effective asset for driving business insights, not an unmanaged liability.