Blog

0 min read

Can Snowflake store unstructured data? How Snowflake handles documents, images, and other data in 2025

Written by

Arman Babayan

What if your PDFs, transcripts, and logs could live in the same place as your BI dashboards? For years, Snowflake was known primarily as a cloud native data warehouse built for structured analytics. It was the go-to solution for SQL analysts, BI teams, and data engineers working with neat rows and columns. Meanwhile, many teams dealing with documents, images, logs, and raw application data assumed they needed separate storage such as Amazon S3, Google Cloud Storage, Azure Blob, or NoSQL databases.

In 2025, that separation no longer has to exist. Snowflake is now a multimodal data platform that can store, process and query unstructured data.

So yes, Snowflake can store unstructured data, but more importantly, it can use it. This capability offers significant architectural advantages for modern data teams. In this blog post, we’ll break down exactly how and why it matters.

What is unstructured data?

Unstructured data refers to any information that doesn't fit neatly into traditional rows and columns. This includes:

Documents: PDF, DOCX, TXT files
Images: PNG, JPG, TIFF formats
Audio and video files: Media content and recordings
Logs and event data: Application and system logs
Communication data: Email threads and chat transcripts
Markup and structured text: HTML, XML, JSON blobs
Binary files: Application-specific file formats

As organisations increasingly generate massive volumes of this data, the need for unified platforms that can both store and analyse unstructured content has become critical.

How Snowflake stores unstructured data?

Snowflake stages for unstructured data

Snowflake manages unstructured data through stages. This means through storage locations that reference files either within Snowflake's managed infrastructure or in external cloud storage:

Internal Stages: Files are stored within Snowflake's managed storage, offering quick setup and seamless integration
External Stages: Files remain in external cloud locations (Amazon S3, Azure Blob Storage, Google Cloud Storage), with Snowflake accessing them via metadata references

You can also combine both approaches for optimal performance and scalability based on your specific requirements.

The FILE data type in Snowflake for unstructured files and metadata

Snowflake provides a dedicated FILE data type for unstructured data. A FILE value represents a reference to a file stored in an internal or external stage, without storing the actual file content in the table itself. This approach allows:

Efficient storage and cost management
Fast metadata querying
Seamless integration with processing pipelines

Accessing unstructured files in Snowflake

Snowflake provides familiar commands for file management:

PUT: Upload files to stages
GET: Download files from stages
LIST: View files stored in stages

These operations mirror cloud storage interactions while maintaining Snowflake's security and governance standards.

Processing and querying unstructured data in Snowflake

Storage is just the beginning. Snowflake's real power lies in its ability to process and extract insights from unstructured data.

Snowflake Cortex AI and Document AI for PDFs, images and hybrid search

Cortex AI enables advanced analytics on unstructured data directly within Snowflake:

Document analysis: Extract text, summarise content, and perform batch LLM inference on PDFs and documents
Image processing: Run classification and analysis on stored images
Multimodal SQL functions: Query and transform documents, images, and audio using SQL-powered pipelines
Schema-aware extraction: Automatically extract structured tables from unstructured documents like invoices and reports

Snowpark for custom processing

With Snowpark, you can:

Extract text from PDFs using Python
Perform image classification with embedded ML models
Parse JSON or log files into VARIANT columns
Run OCR, NLP, and generate embeddings via external functions
Build semantic search capabilities over document collections

VARIANT data type for semi-structured data

The VARIANT data type handles semi-structured data formats like JSON, XML, Parquet, and Avro:

Store complex, nested data structures
Query JSON fields directly using SQL
Maintain schema flexibility while preserving query performance

Why unified data architecture matters?

In most companies, data still lives in many places and tools. Dashboards sit on a legacy SQL warehouse, logs go to a separate observability stack, and documents and images disappear into unmanaged cloud buckets or shared drives.

Instead of stitching together a dozen point solutions, you can use Snowflake as the backbone of your data architecture and keep external systems only where they add unique value. The table below shows how data stack functions shift when you standardise on Snowflake in 2025:

Function	Old architecture	Snowflake in 2025
Analytics	Separate SQL data warehouse	Snowflake core engine
File storage	S3, Google Cloud Storage, Azure Blob	Internal storage plus external tables and integrations
Processing	Spark clusters or ad hoc Python scripts	Snowpark running in the same Snowflake account
Semi-structured & unstructured	NoSQL database or object storage	Native support in Snowflake tables and stages
Search & retrieval	Elasticsearch or a separate search service	Cortex search and vector search
ML & AI	Separate ML platform and custom pipelines	Snowflake AI Studio and Snowpark ML

‍

Real-world use cases of handling unstructured data in Snowflake

Here is how this looks in practice. Below is our recent project, plus common patterns we see when teams bring documents, images, logs, and app data into Snowflake and put them to work.

Global finance, AI-ready in 90 days

A multinational finance firm spending more than 800K per month on cloud was battling rising costs and fragmented data. They needed a governed place for documents, logs, and tables. We used OpenFlow to ingest both structured and unstructured data into Snowflake, tracked lineage and policies in Horizon Catalog, set consistent business logic with semantic views, and enabled natural language querying through Cortex AI SQL. The result was about an 80% reduction in ingestion latency, real-time cost visibility with FinOps, and a platform ready for analytics, ML, and AI at scale.

Read how a global finance managed unstructured data in Snowflake →

Limitations and considerations of Snowflake

Snowflake’s unstructured data capabilities are strong, but it won’t fully replace your data lake or media platform. For B2B teams planning at scale, keep these practical constraints in mind:

Not a pure object storage replacement: Snowflake complements rather than replaces S3/GCS for massive-scale raw object storage
File retrieval performance: Binary object retrieval speed varies by file size and stage type
Compute costs: AI and ML workloads require careful resource management
Specialised use cases: For intensive video/audio editing, use specialised systems.

Best practices for managing unstructured data in Snowflake in 2025

1. Keep big binaries in external object storage, keep brains in Snowflake

Register S3, Blob, or GCS as external stages and reference files via the FILE type; keep only hot assets in internal stages for speed.

2. Standardize file layout and formats from day one

Use predictable paths (org/source/system/YYYY/MM/DD/id) and checksums; prefer compressed columnar formats like Parquet, with extracted text or page JSON beside PDFs and images.

3. Store metadata and embeddings in Snowflake, not in files

Put raw files in stages, but keep metadata, chunks, and embeddings in Snowflake tables linked by stable URIs for fast search and governance. Use directory tables to catalog staged files.

4. Orchestrate ingest → extract → enrich → index → serve with Snowpark

Run OCR, NLP, and parsers as Snowpark tasks and UDFs; batch, log runs, and make jobs idempotent so reruns are safe. Implementation flow in processing files with Snowpark.

5. Treat AI as a costed product

Separate warehouses for ELT and AI, strict auto-suspend, resource monitors, caching, and reuse of embeddings and summaries. Get a baseline with the FinOps savings calculator.

6. Govern at the row, column, and file edge

Classify on arrival, enforce row and column policies with masking, and keep least-privilege stage access and full lineage. For role design patterns, see Snowflake role hierarchy best practices.

Need a hand?

Our snowflake experts at Snowstack can audit your current setup, design a lean reference architecture, and prove value with a focused pilot. Read how we deliver in How we work or talk to a Snowflake expert.

Talk with a Snowflake consultant→

Final thoughts

Snowflake doesn’t just store unstructured data; it makes it usable for search, analytics, and AI. With stages, the FILE data type, VARIANT, Snowpark, and Cortex, you can land documents, images, and logs alongside your tables, extract text and entities, generate embeddings, and govern everything under a single security and policy model. The winning pattern is simple: keep raw binaries in low-cost object storage, centralise metadata and embeddings in Snowflake, and start with one focused, high-value use case you can scale.

Ready to try this in your stack?

Book a 30-minute call with our Snowflake consultant →

‍

FAQs

Can Snowflake store unstructured data (PDFs, images, audio)?

Yes. Snowflake stores and processes unstructured files via stages (internal or external) and a FILE column type. You can access them with SQL and AI features. For setup help, see Snowflake implementation and AI and data governance.

Who can help me implement unstructured data on Snowflake?

Snowstack builds end-to-end pipelines for documents, images, logs, and app data. Start with Snowflake implementation or Contact.

What does a typical Snowstack pilot include?

A focused 4–6 week build: audit, reference architecture, secure stages and directory tables, ingest and extract jobs, embeddings and search, cost guards, and a demo with success metrics. See How we work.

What is Snowflake’s FILE data type?

FILE is a column type that holds a reference to a staged file (plus metadata like MIME type, size, etag, last modified, and URLs). It doesn’t store the binary itself—just a pointer with metadata and helper functions (e.g., FL_GET_SIZE). We design schemas that use FILE in Advisory and architecture.

How do I put PDFs or images “into” Snowflake?

Create a stage, enable a directory table, then map staged files into a FILE column. We set this up during Migrations and integrations and Snowflake implementation.

Should I use internal or external stages?

Use internal stages for simplicity and hot paths. Use external stages when files live in S3, Azure Blob, or GCS. We help you choose in Advisory and architecture.

How do I upload, list, and download files?

Use PUT to upload to internal stages, LIST to enumerate, and GET to download from internal stages. For external stages, upload with your cloud provider tools. At Snowstack, we standardise this in Migrations and integrations.

What are directory tables, and why do they matter?

A directory table catalogs files on a stage so you can query, join to metadata, and build pipelines that react to file changes (with refresh/auto-refresh).

Can Snowflake run AI over documents and images?

Yes. Use built-in services for document extraction, image understanding, and natural language queries. We enable safe usage through AI and data governance.

Does Snowflake support vector search and embeddings?

Yes. Snowflake provides a VECTOR data type, vector similarity functions, and embedding utilities for RAG/search over your files’ text.

What file sizes work best for loading in Snowflake?

Aim for mid-sized files to balance parallelism and overhead; split very large files and compact many tiny ones. Get a sizing plan via Advisory and architecture.

How do I share or serve files securely?

Use scoped URLs (time-limited ~24h) or file URLs (require stage privileges). You can also generate scoped URLs with BUILD_SCOPED_FILE_URL.

How is unstructured data billed in Snowflake?

Internal stage storage is billed by Snowflake; external stage storage is billed by your cloud provider; compute and any egress are separate. Start with the FinOps Savings Calculator and FinOps services.

Can I join unstructured files with tables?

Yes. Use a directory table (file catalog) and join it to tables holding metadata (e.g., owners, tags, PII flags) to power governance and pipelines.

Learn more about Snowflake from top experts

Join data leaders who get Snowflake insights and updates delivered straight to their inbox.

Insights

Learnings for data leaders

Blog

5 min read

The best Snowflake consulting partners in 2026

Compare the best Snowflake consulting partners in 2026. Expert ranking based on AI capability, cost optimization, and delivery maturity. Find the right Snowflake consultant for your business.

Selecting the right Snowflake consulting partner determines whether your data platform becomes a strategic advantage or an operational burden. The consulting partner you choose will shape your platform maturity, AI capability, and cost efficiency for years.

This ranking evaluates Snowflake consulting partners based on delivery capability, AI readiness, and enterprise credibility. Each firm has been assessed on its ability to deliver governed, cost-efficient Snowflake environments that support advanced analytics and machine learning workloads at scale.

Who is a Snowflake consultant?

Before we rank anyone, let's define what these people do.

A Snowflake consultant is a specialist (often a cross-functional team) that designs, implements, and operates on the Snowflake AI Data Cloud. Their work covers:

Implementation. Clear layers for staging, integration, and presentation. RBAC that matches your teams. Orchestration across multiple warehouses.
Integration. Openflow pipelines for batch, streaming, and unstructured data. Change data capture at the source. Solid dbt practices.
Optimisation and FinOps. Right-sized warehouses with the Saving Calculator. Cache aware scheduling. Autosuspend and autoresume settings that fit your workloads.
AI enablement and Data Governance. Governed Cortex use cases. Clear evaluation methods. Cost guardrails for safe scale.
Enablement. Documentation, runbooks, and training that lower long-term consulting spend.

A credible Snowflake consultant works a simple loop: baseline → diagnose → design → prove the delta. They pull your ACCOUNT_USAGE and Query History, map spend to pipelines and users, fix anti-patterns, and prove before/after cost and performance with your telemetry.

Why Snowflake consulting partners matter in 2026

Internal teams lack the implementation experience that Snowflake consultants bring from dozens of production deployments. Poor implementation decisions made early compound over time, creating technical debt that becomes expensive to remediate. The platform's architecture demands deep knowledge of cloud data warehousing, query optimization, and cost management that most organizations do not maintain internally.

The stakes have increased significantly with AI workload requirements. Modern Snowflake services must support machine learning pipelines, large language model integrations, and retrieval-augmented generation patterns. Governance and security requirements now demand comprehensive data lineage, access controls, and audit capabilities from day one. Cost optimization expertise can reduce monthly Snowflake spending by 30 to 50 percent through proper warehouse sizing and query tuning.

A qualified Snowflake consulting partner implements these controls during initial architecture design rather than retrofitting them later. The cost differential between proactive governance and reactive compliance can reach millions in enterprise environments. Legacy migration experience prevents data loss and performance degradation during cloud transitions.

Looking for Snowflake consulting? Check our Snowflake Consulting services and see how we help you.

What defines a top Snowflake consultant in 2026

Elite Snowflake consulting expertise requires specific technical competencies and operational maturity. The best Snowflake consultants demonstrate architecture mastery across warehouses, data sharing, and Snowpark implementations. Cost optimization capability that reduces monthly spend by 30 to 50 percent through query tuning and warehouse right-sizing separates competent from exceptional firms. AI workload implementation covering Cortex AI, ML model deployment, and vector search integration has become mandatory in 2026.

Critical capabilities that define elite Snowflake consultants:

Governance expertise with role-based access controls, data masking, and automated lineage documentation
Security controls implemented as standard practice that satisfy industry regulators
Documented runbooks and escalation paths with production incident resolution within defined SLAs
Regular optimization reviews identifying cost reduction opportunities before clients request them

The best firms demonstrate strategic judgment about when not to use certain features. They push back on unnecessary complexity and recommend simpler patterns that deliver equivalent business value. This judgment comes from extensive implementation experience across multiple client environments and industry verticals.

Snowflake consulting partner comparison table

Partner name	Primary Snowflake services	AI and analytics capability	Ideal client profile
Snowstack	Platform Team as a Service, FinOps cost optimization, AI and data governance, advisory architecture	Advanced with Cortex AI integration, machine learning pipelines, LLM workload support	Mid-market to enterprise requiring rapid 90-day implementations with cost optimization and AI readiness
Slalom	AI workload integration, real-time analytics, architecture design, marketing automation	Strong with focus on predictive analytics, ML workflows, and Cortex AI early adoption	Enterprises pursuing AI-driven transformation with collaborative engagement model
phData	Data engineering, platform optimization, managed services, custom ML solutions	Comprehensive AI and ML application development on Snowflake architecture	Organizations needing specialized data platform expertise and ongoing optimization
Cognizant	Enterprise migrations, legacy warehouse modernization, multi-region deployments	Strong AI transformation capability with strategic guidance and framework-based delivery	Fortune 500 and large enterprises requiring proven scale and industry expertise
Accenture	Full-service strategy through operations, industry accelerators, change management	Advanced with marketing and advertising analytics specialization	Large enterprises needing integrated business and technology transformation
Deloitte	Strategic advisory, compliance-focused implementations, business process alignment	Solid with emphasis on business intelligence and regulatory analytics	Organizations requiring strategic consulting alongside technical implementation
Krish Technolabs	Migration, integration, managed services, AI-driven analytics	Strong with predictive modeling, LLM integration, and native AI capabilities	Enterprises pursuing AI workload integration with managed services support
Wipro	Data strategy, legacy migrations, analytics solutions, offshore delivery	Competent with focus on traditional analytics and BI use cases	Cost-conscious enterprises needing proven delivery capability at scale

‍

The best Snowflake consulting partners in 2026

1. Snowstack

Snowstack operates as a Snowflake-first consulting firm delivering Platform Team as a Service. The firm compresses typical 12-month projects into 90-day engagements from discovery to production using their proprietary framework and how we deliver. Client outcomes include 30 to 50 percent cost reduction and 80 percent faster reporting cycles across pharma, financial services, and FMCG implementations.

Key differentiators:

FinOps cost optimization and AI governance specialization with senior architects on every engagement
Multi-petabyte data volume handling with systematic knowledge transfer to internal teams
Rapid enterprise implementations handling discovery through production in compressed timeframes

2. Slalom

Slalom earned recognition as Snowflake's Global Data Cloud Services AI Partner of the Year 2025. The firm excels at integrating machine learning workflows into Snowflake environments and building real-time analytics dashboards with collaborative consulting methodology.

3. phData

phData maintains exclusive focus on data engineering and analytics with multiple Snowflake Partner of the Year awards including 2025 Americas recognition. The firm offers comprehensive Snowflake consulting spanning strategy, implementation, and managed services with hundreds of certified engineers.

4. Cognizant

Cognizant operates as Snowflake's Global Data Cloud Services Implementation Partner of the Year 2025. The firm brings Fortune 500 scale with proprietary Data Estate Migration toolkit for legacy warehouse transitions and global delivery capability.

5. Accenture

Accenture maintains Elite Snowflake partner status with full-service capabilities from strategy through managed operations. The firm has developed industry accelerators that reduce implementation timelines with particular strength in marketing analytics and advertising use cases.

6. Deloitte

Deloitte combines Big Four strategic advisory with technical Snowflake implementation capability. The firm's Insight Driven Organization framework aligns platform projects with business measurement systems and objectives, particularly in finance, retail, and public sector.

7. Krish Technolabs

Krish Technolabs operates as a certified Snowflake partner with expertise in AI-driven analytics and multi-cloud deployments. The firm delivers comprehensive Snowflake services with focus on predictive modeling and LLM-powered insights for enterprise datasets.

‍8. Wipro

Wipro operates as an Elite Snowflake partner with a dedicated Center of Excellence supporting over 100 client implementations. The firm brings strong execution capability with global delivery scale for complex enterprise deployments in banking, consumer goods, and manufacturing.

The questions you must ask before signing a Snowflake consultant

Technical Competence:

"Walk me through your Snowflake implementation methodology"
"Can you show me sanitized architecture diagrams from similar projects?"
"What's your approach to FinOps and cost optimization?"

Delivery Model:

"Who will actually be on my project team day-to-day?"
"What's your knowledge transfer approach?"

Pricing & Scope:

"What's included vs. out of scope?"
"What happens if the project runs over budget?"

Why choose Snowstack for high-impact Snowflake consulting in 2026

Choosing the right Snowflake partner is not easy. When you’re comparing Snowflake partners, it helps to talk to someone who isn’t trying to sell you a 12-month transformation on day one. If you’d like a second opinion on your shortlist, your current proposals, or whether you should even bring in a GSI vs a specialist, let’s chat. At Snowstack, we combine deep Snowflake expertise with proven delivery methods, transparent team structures, and a focus on long-term governance and optimization. Our Snowflake experts deliver production-ready environments in 90 days while larger consultancies require 12 to 18 months for equivalent capability.

Second is cost optimization delivered as core methodology rather than optional add-on. Every Snowstack engagement includes FinOps analysis that identifies 30 to 50 percent spending reduction through warehouse right-sizing, query optimization, and automated scaling policies. Most Snowflake consultants treat cost management as afterthought, creating expensive platforms that require subsequent optimization projects.

Third is AI readiness embedded in architecture from day one. Snowstack implementations support Cortex AI integration, vector search capabilities, and machine learning pipeline deployment without requiring platform redesign. Firms focused on legacy data warehouse patterns deliver environments that need expensive rework when organizations advance AI initiatives.

The Platform Team as a Service model provides ongoing senior architect access rather than transitioning to junior support resources post-implementation. This continuity ensures optimization opportunities get identified and implemented proactively. Industries with strict governance requirements including pharma and financial services benefit from Snowstack's compliance framework expertise built into initial architecture rather than retrofitted later.

‍

FAQs

What should I look for in a Snowflake consulting partner?

Look for documented client outcomes with measurable cost reduction and performance improvements. The best Snowflake consultants demonstrate AI workload capability, including Cortex AI and machine learning integration. Verify they maintain senior architect involvement throughout implementation, not just during the sales cycle. Request reference conversations with clients who completed projects within the past 12 months.

How long does a typical Snowflake implementation take?

Implementation timelines range from 90 days to 18 months depending on your partner’s capability and the project scope. Snowstack compresses enterprise implementations into 90-day engagements using the Wolfpack Sled Framework. Traditional consultancies often require 12 to 18 months for equivalent capability. Shorter timelines usually reflect specialized Snowflake expertise rather than generalized cloud consulting.

Do I need a Snowflake consultant if I have internal data engineers?

Yes. Internal teams often don’t have the implementation experience that Snowflake consultants bring from dozens of production deployments. Poor architecture decisions made early create technical debt that becomes expensive to remediate. Specialized partners implement governance controls and cost optimization from day one, rather than retrofitting later. The platform’s complexity demands expertise most organizations can’t maintain internally.

How do I verify a Snowflake consulting partner’s capabilities?

Request specific case studies with documented cost savings, performance improvements, and timeline data. Speak with reference clients who completed implementations recently to assess actual execution versus promised capability. Ask detailed architecture questions about warehouse sizing, data sharing patterns, and governance frameworks. Verify they can demonstrate hands-on Cortex AI implementation experience and vector search deployments.

What are red flags when selecting a Snowflake consultant?

Warning signs include vague case studies without measurable outcomes, offshore-heavy staffing with limited senior involvement, and an inability to demonstrate AI workload experience. Avoid partners pushing proprietary tools that create vendor lock-in. Firms focused exclusively on traditional BI without Cortex AI capability will deliver platforms that require expensive upgrades. Lack of willingness to stand behind outcomes is also a signal of low delivery confidence.

Should I choose a large consultancy or a specialized Snowflake partner?

Large consultancies like Cognizant and Accenture provide global scale for multi-region enterprise deployments. Specialized Snowflake partners like Snowstack typically deliver faster implementations with deeper platform expertise. Mid-market organizations often benefit from specialized firms that keep senior architects involved throughout the engagement. Large enterprises requiring standardized delivery across business units may prefer consultancy scale.

How important is AI capability in a Snowflake consultant?

AI capability is critical in 2026 as teams deploy machine learning pipelines and LLM integrations. Consultants without Cortex AI experience can deliver platforms that require expensive redesign when AI initiatives mature. The best partners design AI-ready architecture from day one, including vector search, governed access patterns, and ML deployment workflows. Verify hands-on experience rather than theoretical promises.

Blog

5 min read

Best practices for protecting your data: Snowflake role hierarchy

One stolen password can bring down an entire enterprise. As businesses move more of their data to the cloud and centralize it on platforms like Snowflake, a critical question emerges: who should have access, and how do you manage it at scale without slowing the business or weakening security?

One stolen password can bring down an entire enterprise. The 2024 Snowflake breaches revealed how fragile weak access controls are, with 165 organizations and millions of users affected. The breaches were not the result of advanced attacks. They happened because stolen passwords went unchecked, and multi-factor authentication was missing. As businesses move more of their data to the cloud and centralize it on platforms like Snowflake, a critical question emerges: who should have access, and how do you manage it at scale without slowing the business or weakening security?

In this article, we’ll break down the Snowflake Role Hierarchy, explain why it matters, and share best practices for structuring roles that support security, compliance, and day-to-day operations.

What is Snowflake’s role hierarchy?

Snowflake’s role hierarchy is a structured framework that defines how permissions and access controls are organized within the platform. In Snowflake, access to data and operations is governed entirely by roles. Using the Role-Based Access Control (RBAC) model, you grant privileges to roles, and then assign users to those roles, simplifying administration, ensuring consistency, and making audit access easier. RBAC is generally recommended for production environments and enterprise-level governance.

The hierarchy operates on a parent-child relationship model where higher-level roles inherit privileges from subordinate roles, creating a tree-like structure. This structure provides granularity, clarity, and reusability, but it requires thoughtful planning to avoid sprawl or over-permissioned users.

Core components of Snowflake RBAC

Roles: The fundamental building blocks that encapsulate specific privileges
Privileges: Defined levels of access to securable objects (databases, schemas, tables)
Users: Identities that can be assigned roles to access resources
Securable Objects: Entities like databases, tables, views, and warehouses that require access control
Role Inheritance: The mechanism allowing roles to inherit privileges from other roles

Understanding Snowflake's system-defined roles

Understanding the default role structure is crucial for building secure hierarchies:

ACCOUNTADMIN

Root-level access to all account operations
Can view and manage billing and credit data
Should be tightly restricted to emergency use only
Not a "superuser" - still requires explicit privileges for data access

SYSADMIN

Full control over database objects and users
Recommended parent for all custom roles
Manages warehouses, databases, and schemas

SECURITYADMIN

Manages user and role grants
Controls role assignment and privilege distribution
Essential for maintaining RBAC governance

Custom roles

Created for specific teams or functions within an organization (e.g ANALYST_READ_ONLY, ETL_WRITER).

Best practices for designing a secure Snowflake role hierarchy

A well-structured role hierarchy minimizes risk, supports compliance, and makes onboarding/offboarding easier. Here’s how one should do it right:

1. Follow the Principle of Least Privilege

Grant only the minimum required permissions for each role to perform its function. Avoid blanket grants like GRANT ALL ON DATABASE.

Do this:

Specific, targeted grants
Avoid cascading access down the role tree unless absolutely needed
Regularly audit roles to ensure they align with actual usage

GRANT SELECT ON TABLE SALES_DB.REPORTING.MONTHLY_REVENUE TO ROLE ANALYST_READ;
GRANT USAGE ON SCHEMA SALES_DB.REPORTING TO ROLE ANALYST_READ;
GRANT USAGE ON DATABASE SALES_DB TO ROLE ANALYST_READ;

Not this:

Overly broad permissions

GRANT ALL ON DATABASE SALES_DB TO ROLE ANALYST_READ;

Why does it matter?

Least privilege prevents accidental (or malicious) misuse of sensitive data. It also supports data governance and compliance with various regulations like GDPR or HIPAA.

2. Use a layered role design

Design your roles using a layered and modular approach, often structured like this:

Functional Roles (what the user does):

CREATE ROLE ANALYST_READ;
CREATE ROLE ETL_WRITE;
CREATE ROLE DATA_SCIENTIST_ALL;

Environment Roles (where the user operates)

CREATE ROLE DEV_READ_WRITE;
CREATE ROLE PROD_READ_ONLY;

Composite or Team Roles (Group users by department or team, assigning multiple functional/environment roles under one umbrella)

CREATE ROLE MARKETING_TEAM_ROLE → includes PROD_READ_ONLY + ANALYST_READ

3. Avoid granting privileges directly to users

Always assign privileges to roles and not users. Then, assign users to those roles.

Why it matters?

This keeps access transparent and auditable. If a user leaves or changes teams, simply revoke or change the role. There’s no need to hunt down granular permissions.

4. Establish consistent naming conventions

Enforce naming conventions as consistent role and object naming makes automation and governance far easier to scale.

Recommended Naming Pattern:

Access Roles: {ENV}_{DATABASE}_{ACCESS_LEVEL} (e.g., PROD_SALES_READ)
Functional Roles: {FUNCTION}_{TEAM} (e.g., DATA_ANALYST, ETL_ENGINEER)
Service Roles: {SERVICE}_{PURPOSE}_ROLE (e.g., FIVETRAN_LOADER_ROLE)

5. Use separate roles for Administration vs. Operations

Split roles that manage infrastructure (e.g., warehouses, roles, users) from roles that access data.

Admins: SYSADMIN, SECURITYADMIN‍
Data teams: DATA_ENGINEER_ROLE, ANALYST_ROLE, etc.

Why it matters? This separation of duties limits the potential impact of security incidents and supports audit compliance. Administrators should not have access to sensitive data unless it's absolutely necessary for their role.

6. Secure the top-level roles

Roles like ACCOUNTADMIN and SECURITYADMIN should be assigned to the fewest people possible, protected with MFA, and monitored for any usage.

Implementation Checklist:

Limit ACCOUNTADMIN to 2-3 emergency users maximum
Enable MFA for all administrative accounts
Set up monitoring and alerting for admin role usage
Regular access reviews and privilege audits
Document and justify all administrative access

Monitoring, auditing & compliance: keeping your Snowflake hierarchy healthy

Even the best-designed role trees can get messy over time. Here’s how to maintain security:

1. Regular access reviews

Implement quarterly access reviews to maintain security hygiene:

Role Effectiveness Analysis: Identify unused or over-privileged roles
User Access Validation: Verify users have appropriate role assignments
Privilege Scope Review: Ensure roles maintain least privilege principles
Compliance Mapping: Document role mappings to business functions

2. Logging and monitoring

Enable Access History and Login History in Snowflake to track activity and implement automation tools for role assignments during employee transitions.

3. Onboarding/offboarding automation

Implement automation tools or scripts to efficiently manage role assignments during employee transitions.

4. Object Tagging for enhanced security

Use object tagging to classify sensitive data and control access accordingly.

Measuring RBAC Success: Key Performance Indicators

1. Security Metrics

Access Review Coverage: % of roles reviewed quarterly
Privilege Violations: Number of excessive privilege grants identified
Failed Authentication Attempts: Monitor for unauthorized access patterns
Role Utilization Rate: % of active roles vs. total created roles

2. Operational Metrics

User Onboarding Time: Average time to provision new user access
Role Management Efficiency: Time to modify/update role permissions
Audit Response Time: Speed of access review and remediation
Automation Coverage: % of role operations automated vs. manual

3. Compliance Metrics

SOC 2 Readiness: Role hierarchy documentation completeness
GDPR/Data Privacy: Data access control effectiveness
Industry Compliance: Sector-specific requirement adherence
Change Management: Role modification approval and documentation

Future-Proofing Your RBAC Strategy

The way you manage access today will define how secure and scalable your Snowflake environment is tomorrow. The strength of Snowflake’s RBAC model lies in its flexibility, but that power comes with responsibility. As AI features mature, as multi-cloud deployments become the norm, and as regulators tighten expectations around data privacy, static role hierarchies quickly fall behind. A poorly structured role hierarchy can lead to data leaks, audit failures, higher operational costs, and stalled innovation.

At Snowstack, we specialize in building RBAC strategies that are not only secure today but ready for what’s next. Our team of Snowflake-first engineers has designed role models that scale across continents, safeguard sensitive data for regulated industries, and enable AI without exposing critical assets. We continuously monitor Snowflake’s roadmap and fold new security capabilities into your environment before they become business risks.

Don’t wait for the next breach to expose the cracks in your access controls. Let’s design an RBAC strategy that keeps you secure, compliant, and future-ready.

👉 Book a free RBAC assessment

‍

FAQs

RBAC provides scalability and centralized control by granting privileges to roles, which are then assigned to users. UBAC allows privileges to be assigned directly to individual users and is intended for collaborative scenarios like building Streamlit applications.

Follow the "Role of Three" principle: create Access Roles (data-centric), Functional Roles (business-centric), and Service Roles (system-centric). This approach avoids role explosion while maintaining necessary granularity.

Always assign privileges to roles and not users. Then, assign users to those roles. This keeps access transparent and auditable.

Design with hierarchy in mind – role ownership and grant structure should align with your intended control model. Map business functions to role layers and ensure clear inheritance paths.

Create emergency “break-glass” roles with elevated privileges that are heavily monitored and logged, require additional approval workflows, automatically expire after a set period of time, and immediately notify the security team when activated.

Conduct comprehensive access reviews every quarter, perform monthly spot checks on high-privilege administrative roles, service accounts, recently modified permissions, and roles tied to employees who have left the company.

Yes, automation is critical for scaling. Create stored procedures for role provisioning, use CI/CD pipelines for role deployment, and integrate with identity providers for user lifecycle management.

Classify and tag sensitive data, enforce row-level security and column masking, maintain detailed audit logs with supporting access documentation, run regular compliance assessments and gap analyses, and document the business justification for every access role granted.

Blog

5 min read

From zero to production: a comprehensive guide to managing Snowflake with Terraform

Manual clicks don’t scale. As Snowflake environments grow, managing them through the UI or ad-hoc scripts quickly leads to drift, blind spots, and compliance risks. What starts as a quick fix often becomes a challenge that slows delivery and exposes the business to security gaps.

Infrastructure as Code with Terraform solves these challenges by bringing software engineering discipline to Snowflake management. Using Terraform’s declarative language, engineers define the desired state of their Snowflake environment, track changes with version control, and apply them consistently across environments. Terraform communicates with Snowflake’s APIs through the official snowflakedb/snowflake provider, translating configuration into the SQL statements and API calls that keep your platform aligned and secure.

This guide provides a complete walkthrough of how to manage Snowflake with Terraform. From provisioning core objects like databases, warehouses, and schemas to building scalable role hierarchies and implementing advanced governance policies such as dynamic data masking.

Section 1: bootstrapping Terraform for secure Snowflake automation

The initial setup of the connection between Terraform and Snowflake is the most critical phase of the entire process. A secure and correctly configured foundation is paramount for reliable and safe automation. This section focuses on establishing this connection using production-oriented best practices, specifically tailored for non-interactive, automated workflows typical of CI/CD pipelines.

1.1 The principle of least privilege: the terraform service role

Terraform should not operate using a personal user account. Instead, a dedicated service user must be created specifically for Terraform automation. Before any Terraform code can be executed, a one-time manual bootstrapping process must be performed within the Snowflake UI or via SnowSQL. This involves using the ACCOUNTADMIN role to create the dedicated service user and a high-level role for Terraform's initial operations.

The following SQL statements will create a TERRAFORM_SVC service user and grant it the necessary

-- Use the highest-level role to create users and grant system roles
USE ROLE ACCOUNTADMIN;

-- Create a dedicated service user for Terraform
-- The RSA_PUBLIC_KEY will be set in the next step
CREATE USER TERRAFORM_SVC
	TYPE = SERVICE
  COMMENT = 'Service user for managing Snowflake infrastructure via Terraform.'
  RSA_PUBLIC_KEY = '<YOUR_PUBLIC_KEY_CONTENT_HERE>';

-- Grant the necessary system roles to the Terraform service user
GRANT ROLE SYSADMIN TO USER TERRAFORM_SVC;
GRANT ROLE SECURITYADMIN TO USER TERRAFORM_SVC;

‍

Granting SYSADMIN and SECURITYADMIN to the service user is a necessary starting point for the infrastructure management. The SYSADMIN role holds the privileges required to create and manage account-level objects like databases and warehouses. The SECURITYADMIN role is required for managing security principals, including users, roles, and grants.

1.2 Authentication: the key to automation

The choice of authentication method is important. The Snowflake provider supports several authentication mechanisms, including basic password, OAuth, and key-pair authentication. For any automated workflow, especially within a CI/CD context, key-pair authentication is the industry-standard and recommended approach.

A CI/CD pipeline, such as one running in GitHub Actions, is a non-interactive environment. Basic password authentication is a significant security risk and not recommended. This leaves key-pair authentication as the only method that is both highly secure, as it avoids transmitting passwords, and fully automatable.

The following table provides a comparative overview of the primary authentication methods available in the Snowflake provider, reinforcing the recommendation for key-pair authentication in production automation scenarios.

Table 1: Snowflake provider authentication methods

Method	Primary Use Case	Security Profile	CI/CD Suitability
Password	Local development, quick tests	Low. Exposes credentials in state or environment variables.	Low. Requires secure secret management; often blocked by MFA.
OAuth	User-delegated access for third-party applications	High. Token-based, short-lived credentials.	Medium. Complex to set up for non-interactive server-to-server flows.
Key-Pair	Recommended for Automation. Service accounts, CI/CD pipelines.	High. Asymmetric cryptography; no passwords transmitted.	High. Designed for secure, non-interactive authentication.

‍

To implement key-pair authentication, an RSA key pair must be generated. The following openssl commands will create a 2048-bit private key in the required PKCS#8 format and its corresponding public key:

Bash

# Navigate to a secure directory, such as ~/.ssh
cd ~/.ssh

# Generate an unencrypted 2048-bit RSA private key in PKCS#8 format
openssl genrsa 2048 | openssl pkcs8 -topk8 -inform PEM -out snowflake_terraform_key.p8 -nocrypt

# Extract the public key from the private key
openssl rsa -in snowflake_terraform_key.p8 -pubout -out snowflake_terraform_key.pub

‍

After generating the keys, the content of the public key file (snowflake_terraform_key.pub), including the -----BEGIN PUBLIC KEY----- and -----END PUBLIC KEY----- headers, must be copied and pasted into the ALTER USER statement from the previous step to associate it with the TERRAFORM_SVC user. For enhanced security, the private key itself can be encrypted with a passphrase. The Snowflake provider supports this by using the private_key_passphrase argument in the provider configuration.

1.3 Provider configuration: connecting Terraform to Snowflake

With the service user created and the key-pair generated, the final step is to configure the Snowflake provider in the Terraform project. This is typically done in a providers.tf file.

The foundational configuration requires defining the snowflakedb/snowflake provider and setting the connection parameters.

terraform {
  required_providers {
    snowflake = {
      source  = "snowflakedb/snowflake"
      version = ">= 2.8.0" // Best practice: pin to a major version to avoid breaking changes
    }
  }
}

provider "snowflake" {
  organization_name = var.snowflake_org_name
  account_name      = var.snowflake_account_name
  user              = var.snowflake_user         // e.g., "TERRAFORM_SVC"
  role              = "SYSADMIN"                 // Default role for the provider's operations
  authenticator     = "SNOWFLAKE_JWT"
  private_key       = var.snowflake_private_key
}

‍

It is critical that sensitive values, especially the private_key, are never hardcoded in configuration files. The recommended approach is to define them as input variables marked as sensitive = true and supply their values through secure mechanisms like environment variables (e.g., TF_VAR_snowflake_private_key) or integration with a secrets management tool like GitHub Secrets or AWS Secrets Manager.

A common source of initial connection failures is the incorrect identification of the organization_name and account_name. These values can be retrieved with certainty by executing the following SQL queries in the Snowflake UI: SELECT CURRENT_ORGANIZATION_NAME(); and SELECT CURRENT_ACCOUNT_NAME();. Providing these simple but effective commands can prevent significant user frustration.

For more mature IaC implementations that strictly adhere to the principle of least privilege, Terraform supports the use of aliased providers. This powerful pattern allows for the definition of multiple provider configurations within the same project, each assuming a different role. This mirrors Snowflake's own best practices, where object creation (SYSADMIN) is separated from security management (SECURITYADMIN).

The following example demonstrates how to configure aliased providers:

# Default provider uses SYSADMIN for object creation (e.g., databases, warehouses)
provider "snowflake" {
  alias             = "sysadmin"
  organization_name = var.snowflake_org_name
  account_name      = var.snowflake_account_name
  user              = var.snowflake_user
  private_key       = var.snowflake_private_key
  authenticator     = "SNOWFLAKE_JWT"
  role              = "SYSADMIN"
}

# Aliased provider for security-related objects (e.g., roles, users, grants)
provider "snowflake" {
  alias             = "securityadmin"
  organization_name = var.snowflake_org_name
  account_name      = var.snowflake_account_name
  user              = var.snowflake_user
  private_key       = var.snowflake_private_key
  authenticator     = "SNOWFLAKE_JWT"
  role              = "SECURITYADMIN"
}

‍

When using aliased providers, individual resource blocks must explicitly specify which provider to use via the provider meta-argument (e.g., provider = snowflake.securityadmin). This ensures that each resource is created with the minimum necessary privileges, enforcing a robust security posture directly within the code.

Section 2: provisioning core Snowflake infrastructure

Once the secure connection is bootstrapped, Terraform can be used to define and manage the fundamental building blocks of the Snowflake environment. This section provides code examples for creating databases, virtual warehouses, and schemas - the foundational components for any data workload.

2.1 Laying the foundation: databases

The database is the top-level container for schemas and tables in Snowflake. The snowflake_database resource is used to provision and manage these containers.

The following HCL example creates a primary database for analytics workloads, demonstrating the use of the aliased sysadmin provider and an optional parameter for data retention.

‍resource "snowflake_database" "analytics_db" {
  provider = snowflake.sysadmin // Explicitly use the sysadmin provider for object creation

  name    = "ANALYTICS"
  comment = "Primary database for analytics workloads managed by Terraform."

  // Optional: Configure Time Travel data retention period.
  // This setting can have cost implications.
  data_retention_time_in_days = 30
}

‍

A core strength of Terraform is its ability to manage dependencies implicitly through resource references. In this example, once the analytics_db resource is defined, other resources, such as schemas, can reference its attributes (e.g., snowflake_database.analytics_db.name).

2.2 Compute power: warehouses

Virtual warehouses are the compute engines in Snowflake, responsible for executing queries and data loading operations. FinOps makes a difference, especially once usage grows.The snowflake_warehouse resource provides comprehensive control over their configuration, enabling a balance between performance and cost.

This example defines a standard virtual warehouse for analytics and business intelligence tools, showcasing parameters for cost optimization and scalability.

resource "snowflake_warehouse" "analytics_wh" {
  provider = snowflake.sysadmin

  name    = "ANALYTICS_WH"
  comment = "Warehouse for the analytics team and BI tools."

  // Define the compute capacity of the warehouse.
  warehouse_size = "X-SMALL"

  // Cost-saving measures: suspend the warehouse when idle.
  auto_suspend = 60 // Suspend after 60 seconds of inactivity.
  auto_resume  = true

  // Optional: Configure for multi-cluster for higher concurrency.
  min_cluster_count = 1
  max_cluster_count = 4
  scaling_policy    = "ECONOMY" // Prioritize conserving credits over starting clusters quickly.
}

‍

The parameters in this resource directly impact both performance and billing. warehouse_size determines the raw compute power and credit consumption per second. auto_suspend is a critical cost-control feature, ensuring that credits are not consumed when the warehouse is idle. For workloads with high concurrency needs, the min_cluster_count, max_cluster_count, and scaling_policy parameters allow the warehouse to dynamically scale out to handle query queues, and then scale back in to conserve resources. Managing these settings via Terraform ensures that cost and performance policies are consistently applied and version-controlled.

2.3 Organizing your data: schemas

Schemas are logical groupings of database objects like tables and views within a database. The snowflake_schema resource is used to create and manage these organizational units.

The following HCL creates a RAW schema within the ANALYTICS database defined earlier.

resource "snowflake_schema" "raw_data" {
  provider = snowflake.sysadmin

  // Create an explicit dependency on the database resource.
  database = snowflake_database.analytics_db.name

  name    = "RAW"
  comment = "Schema for raw, unprocessed data ingested from source systems."
}

‍

It is important to note that when a new database is created in Snowflake, it automatically includes a default schema named PUBLIC. While this schema is created outside of Terraform's management, administrators should be aware of its existence. For environments that require strict access control, it is a common practice to immediately revoke all default privileges from the

PUBLIC schema to ensure it is not used inadvertently. Terraform can be used to manage this revocation if desired, but the schema itself will not be in the Terraform state unless explicitly imported.

Section 3: mastering access control with role hierarchies

Effective access control is a cornerstone of data governance and security. Snowflake's Role-Based Access Control (RBAC) model is exceptionally powerful, particularly its support for role hierarchies. Managing this model via Terraform provides an auditable, version-controlled, and scalable approach to permissions management. This section details how to construct a robust RBAC framework using a best-practice model of functional and access roles. At scale, keeping this clean is less about writing the first version and more about maintaining standards over time, which is why Platform Team as a Service often owns RBAC and grants as the platform grows.

3.1 The building blocks: creating account roles

The foundation of the RBAC model is the creation of roles. A recommended pattern is to create two distinct types of roles:

Functional roles: These roles represent a job function or a persona, such as DATA_ANALYST or DATA_ENGINEER. Users are granted these roles.
Access roles: These roles represent a specific set of privileges on a specific set of objects, such as SALES_DB_READ_ONLY or RAW_SCHEMA_WRITE. These roles are granted to functional roles, not directly to users.

This separation decouples users from direct permissions, making the system vastly more scalable and easier to manage. The snowflake_account_role resource is used to create both types of roles

// Define a functional role representing a user persona.
resource "snowflake_account_role" "data_analyst" {
  provider = snowflake.securityadmin // Use the securityadmin provider for role management

  name    = "DATA_ANALYST"
  comment = "Functional role for users performing data analysis and reporting."
}

// Define an access role representing a specific set of privileges.
resource "snowflake_account_role" "analytics_db_read_only" {
  provider = snowflake.securityadmin

  name    = "ANALYTICS_DB_READ_ONLY"
  comment = "Grants read-only access to all objects in the ANALYTICS database."
}

‍

3.2 Constructing the hierarchy: granting roles to roles

The true power of Snowflake's RBAC model is realized by creating hierarchies of roles. By granting access roles to functional roles, a logical and maintainable privilege structure is formed. If a data analyst needs access to a new data source, the corresponding access role is granted to the DATA_ANALYST functional role once, rather than granting privileges to every individual analyst. This pattern is essential for managing permissions at scale.

The snowflake_grant_account_role resource is used to create these parent-child relationships between roles. It is important to use this resource, as the older snowflake_role_grants resource is deprecated.

The following example demonstrates how to grant the ANALYTICS_DB_READ_ONLY access role to the DATA_ANALYST functional role, and then nest the functional role under the system SYSADMIN role to complete the hierarchy.

// Grant the access role to the functional role.
// This gives all members of DATA_ANALYST the privileges of ANALYTICS_DB_READ_ONLY.
resource "snowflake_grant_account_role" "grant_read_access_to_analyst" {
  provider = snowflake.securityadmin

  role_name        = snowflake_account_role.analytics_db_read_only.name
  parent_role_name = snowflake_account_role.data_analyst.name
}

// Grant the functional role to SYSADMIN to create a clear role hierarchy.
// This allows system administrators to manage and assume the functional role.
resource "snowflake_grant_account_role" "grant_analyst_to_sysadmin" {
  provider = snowflake.securityadmin

  role_name        = snowflake_account_role.data_analyst.name
  parent_role_name = "SYSADMIN"
}

‍

3.3 Assigning privileges to access roles

With the role structure in place, the final step is to grant specific object privileges to the access roles. The snowflake_grant_privileges_to_account_role resource is a consolidated and powerful tool for this purpose. This resource has evolved significantly in the Snowflake provider; older versions required separate grant resources for each object type (e.g., snowflake_database_grant), which resulted in verbose and repetitive code. The modern resource uses a more complex but flexible block structure (on_account_object, on_schema, etc.) to assign privileges. Users migrating from older provider versions may find this a significant but worthwhile refactoring effort.

This example grants the necessary USAGE and SELECT privileges to the ANALYTICS_DB_READ_ONLY access role.

// Grant USAGE privilege on the database to the access role.
resource "snowflake_grant_privileges_to_account_role" "grant_db_usage" {
  provider          = snowflake.securityadmin
  account_role_name = snowflake_account_role.analytics_db_read_only.name
  privileges        = ["USAGE"]
  
  on_account_object {
    object_type = "DATABASE"
    object_name = snowflake_database.analytics_db.name
  }
}

// Grant USAGE privilege on the schema to the access role.
resource "snowflake_grant_privileges_to_account_role" "grant_schema_usage" {
  provider          = snowflake.securityadmin
  account_role_name = snowflake_account_role.analytics_db_read_only.name
  privileges        = ["USAGE"]

  on_schema {
    // Use the fully_qualified_name for schema-level objects.
    schema_name = snowflake_schema.raw_data.fully_qualified_name
  }
}

// Grant SELECT on all existing tables in the schema.
resource "snowflake_grant_privileges_to_account_role" "grant_all_tables_select" {
    provider          = snowflake.securityadmin
    account_role_name = snowflake_account_role.analytics_db_read_only.name
    privileges        = ["SELECT"]
    
    on_schema_object {
        all {
            object_type_plural = "TABLES"
            in_schema          = snowflake_schema.raw_data.fully_qualified_name
    }
  }
}

// Grant SELECT on all FUTURE tables created in the schema.
resource "snowflake_grant_privileges_to_account_role" "grant_future_tables_select" {
  provider          = snowflake.securityadmin
  account_role_name = snowflake_account_role.analytics_db_read_only.name
  privileges        = ["SELECT"]

  on_schema_object {
    future {
      object_type_plural = "TABLES"
      in_schema          = snowflake_schema.raw_data.fully_qualified_name
    }
  }
}

‍

A particularly powerful feature demonstrated here is the use of the future block. Granting privileges on future objects ensures that the access role will automatically have the specified permissions on any new tables created within that schema. This dramatically reduces operational overhead, as permissions do not need to be manually updated every time a new table is deployed. However, it is important to understand Snowflake's grant precedence: future grants defined at the schema level will always take precedence over those defined at the database level. This can lead to "insufficient privilege" errors if not managed carefully across different roles and grant levels.

3.4 An optional "Audit" role for bypassing data masks

In certain scenarios, such as internal security audits or compliance reviews, it may be necessary for specific, highly-trusted users to view data that is normally protected by masking policies. Creating a dedicated "audit" role for this purpose provides a controlled and auditable mechanism to bypass data masking when required.

This role should be considered a highly privileged functional role and granted to users with extreme care.

// Define a special functional role for auditing PII data.
resource "snowflake_account_role" "pii_auditor" {
  provider = snowflake.securityadmin

  name    = "PII_AUDITOR"
  comment = "Functional role for users who need to view unmasked PII for audit purposes."
}

‍

Crucially, creating this role is not enough. For it to be effective, every relevant masking policy must be explicitly updated to include logic that unmasks data for members of the PII_AUDITOR role. This ensures that the ability to view sensitive data is granted on a policy-by-policy basis. An example of how to modify a masking policy to incorporate this audit role is shown in the following section.

Section 4: advanced data governance with dynamic data masking

Moving beyond infrastructure provisioning, Terraform can also codify and enforce sophisticated data governance policies. Snowflake's Dynamic Data Masking is a powerful feature for protecting sensitive data at query time. By managing these policies with Terraform, organizations can ensure that data protection rules are version-controlled, auditable, and consistently applied across all environments.

4.1 Defining the masking logic

A masking policy is a schema-level object containing SQL logic that determines whether a user sees the original data in a column or a masked version. The decision is made dynamically at query time based on the user's context, most commonly their active role.

The snowflake_masking_policy resource is used to define this logic. The policy's body contains a CASE statement that evaluates the user's session context and returns the appropriate value.

The following example creates a policy to mask email addresses for any user who is not in the DATA_ANALYST or PII_AUDITOR role.

resource "snowflake_masking_policy" "email_mask" {
  provider = snowflake.sysadmin // Policy creation often requires SYSADMIN or a dedicated governance role

  name     = "EMAIL_MASK"
  database = snowflake_database.analytics_db.name
  schema   = snowflake_schema.raw_data.name
  
  // Defines the signature of the column the policy can be applied to.
  // The first argument is always the column value to be masked.
  argument {
    name = "email_val"
    type = "VARCHAR"
  }
  
  // The return data type must match the input data type.
  return_type = "VARCHAR"

  // The core masking logic is a SQL expression.
  body = <<-EOF
    CASE
      WHEN IS_ROLE_IN_SESSION('DATA_ANALYST') OR IS_ROLE_IN_SESSION('PII_AUDITOR') THEN email_val
      ELSE '*********'
    END
  EOF

  comment = "Masks email addresses for all roles except DATA_ANALYST and PII_AUDITOR."
}

‍

The SQL expression within the body argument offers immense flexibility. It can use various context functions (like CURRENT_ROLE() or IS_ROLE_IN_SESSION()) and even call User-Defined Functions (UDFs) to implement complex logic. However, this flexibility means the logic itself is not validated by Terraform's syntax checker; it is sent directly to Snowflake for validation during the

terraform apply step. It is also a strict requirement that the data type defined in the argument block and the return_type must match the data type of the column to which the policy will eventually be applied.

4.2 Applying the policy to a column

Creating a masking policy is only the first step; it does not protect any data on its own. The policy must be explicitly applied to one or more table columns. This crucial second step is often a point of confusion for new users, who may create a policy and wonder why data is still unmasked. The snowflake_table_column_masking_policy_application resource creates this essential link between the policy and the column.

The following example demonstrates how to apply the EMAIL_MASK policy to the EMAIL column of a CUSTOMERS table.

// For this example, we assume a 'CUSTOMERS' table with an 'EMAIL' column
// already exists in the 'RAW' schema. In a real-world scenario, this table
// might also be managed by Terraform or by a separate data loading process.
// We use a data source to reference this existing table.
data "snowflake_table" "customers" {
  database = snowflake_database.analytics_db.name
  schema   = snowflake_schema.raw_data.name
  name     = "CUSTOMERS"
}

// Apply the masking policy to the specific column.
resource "snowflake_table_column_masking_policy_application" "apply_email_mask" {
  provider = snowflake.sysadmin

  table_name  = "\"${data.snowflake_table.customers.database}\".\"${data.snowflake_table.customers.schema}\".\"${data.snowflake_table.customers.name}\""
  column_name = "EMAIL" // The name of the column to be masked

  masking_policy_name = snowflake_masking_policy.email_mask.fully_qualified_name
  
  // An explicit depends_on block ensures that Terraform creates the policy
  // before attempting to apply it, preventing race conditions.
  depends_on = [
    snowflake_masking_policy.email_mask
  ]
}

‍

This two-step process—defining the policy logic and then applying it - provides a clear and modular approach to data governance. The same policy can be defined once and applied to many different columns across multiple tables, ensuring that the masking logic is consistent and centrally managed.

Conclusion: the path to mature Snowflake IaC

This guide has charted a course from the initial, manual bootstrapping of a secure connection to the automated provisioning and governance of a production-grade Snowflake environment. To ensure the long-term success and scalability of managing Snowflake with Terraform, several key practices should be adopted as standard procedure:

Version control: All Terraform configuration files must be stored in a version control system like Git. This provides a complete, auditable history of all infrastructure changes and enables collaborative workflows such as pull requests for peer review before any changes are applied to production.
Remote state management: The default behaviour of Terraform is to store its state file locally. In any team or automated environment, this is untenable. A remote backend, such as an Amazon S3 bucket with a DynamoDB table for state locking, must be configured. This secures the state file, prevents concurrent modifications from corrupting the state, and allows CI/CD pipelines and team members to work from a consistent view of the infrastructure.
Modularity: As the number of managed resources grows, monolithic Terraform configurations become difficult to maintain. Code should be refactored into reusable modules. For instance, a module could be created to provision a new database along with a standard set of access roles and default schemas. This promotes code reuse, reduces duplication, and allows for more organized and scalable management of the environment.
Provider versioning: The Snowflake Terraform provider is actively evolving. To prevent unexpected breaking changes from new releases, it is crucial to pin the provider to a specific major version in the terraform block (e.g., version = "~> 2.8"). This allows for intentional, planned upgrades. When upgrading between major versions, it is essential to carefully review the official migration guides, as significant changes, particularly to grant resources, may require a concerted migration effort.

With this robust foundation in place, the path is clear for expanding automation to encompass even more of Snowflake's capabilities. The next logical steps include using Terraform to manage snowflake_network_policy for network security, snowflake_row_access_policy for fine-grained data filtering, and snowflake_task for orchestrating SQL workloads. Ultimately, the entire workflow should be integrated into a CI/CD pipeline, enabling a true GitOps model where every change to the Snowflake environment is proposed, reviewed, and deployed through a fully automated and audited process. By embracing this comprehensive approach, organizations can unlock the full potential of their data platform, confident in its security, scalability, and operational excellence.

Why Snowstack for Terraform and Snowflake

Automation without expertise can still fail. Terraform gives you the tools, but it takes experience and the right design patterns to turn Snowflake into a secure, cost-efficient, and scalable platform. The hard part is deciding what “good” looks like in your Snowflake account and making that repeatable across teams, environments, and change cycles.

That is where Snowstack comes in. As a Snowflake-first consulting partner, we help organizations move beyond trial-and-error scripts to fully automated, production-grade environments. Our engineers design secure architectures, embed Terraform best practices, and ensure governance and cost controls are built in from day one.

👉 Book a strategy call with Snowstack and see how we can take your Snowflake platform from manual operations to enterprise-ready automation.

‍

FAQs

Start with the stable building blocks: warehouses, databases, schemas, roles, and grants. Codifying these gives you a repeatable baseline, clearer reviews, and fewer “who changed what” moments. If you want help turning that into a production standard, our Snowflake consulting is built for exactly that.

Yes. A dedicated service user keeps changes auditable, avoids personal-account dependencies, and makes it easier to enforce least privilege. We usually set this up during a Snowflake implementation so the roles and permissions are clean from the start.

Key pair authentication is the go-to approach for automation because it avoids passwords and supports key rotation. The real “gotchas” are how you store the private key, how you rotate it without downtime, and how you lock down the service role. If you want a proven baseline, that’s a common starting point in Snowflake consulting.

Drift usually comes from “out-of-band” changes in Snowsight, differences in naming/identifiers, or having more than one resource trying to own the same privilege set. The fix is to pick one source of truth, standardize object identifiers, and keep grants/ownership patterns consistent. When teams need ongoing ownership, that’s what Platform Team as a Service is for.

Make cost controls part of the code, not tribal knowledge. That means codifying warehouse sizing rules, auto suspend, scaling settings, and environment defaults, then pairing it with a usage review cadence. If savings is the goal, this is exactly the focus of our FinOps work.

Provider upgrades can include grant model changes and deprecations, which can break older configurations if you’re still using removed resources. Treat upgrades like a controlled migration: update in steps, migrate grant resources intentionally, and avoid mixing old and new grant patterns in the same scope. If you want us to review the upgrade path before you roll it out, start with Snowflake consulting.

Yes, but do it in phases. First codify a baseline (core warehouses, core roles, and standard grants), then gradually bring the rest under management through imports or controlled rebuilds. The goal is to reduce risk, not “flip a switch” overnight. We typically approach this through Snowflake consulting or a scoped implementation.

Explore our latest blog posts for valuable insights.

View more insights

Stay up to date

Top data insights, delivered to your inbox

Oops! Something went wrong while submitting the form.

Transform your data with Snowflake

You don't need to hire a data army or wait months to see results. Our Snowflake specialists will get you up and running fast, so you can make better decisions, cut costs, and beat competitors who are still stuck with spreadsheets and legacy systems

Learn more