Skip to content

Conversation

@hwang-db
Copy link
Contributor

Revamping this example of modular AWS DB workspace deployment - we can now define multiple environments with yaml files to deploy m workspaces into n VPCs and create multiple catalogs with isolated underlying infra. All the workspaces are using backend PL and CMK feature; the VPCs are configured with NAT for egress.

@hwang-db hwang-db requested review from a team as code owners April 24, 2025 11:47
Copy link
Contributor

@nkvuong nkvuong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

half way through the review - some minor changes suggested

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should keep this file, but make sure we update the providers to the latest versions - https://developer.hashicorp.com/terraform/language/files/dependency-lock

}
}
```
You can separate out these 2 pipelines into different projects, instead of keeping everything in the same repo folder.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
You can separate out these 2 pipelines into different projects, instead of keeping everything in the same repo folder.
You should separate out these 2 pipelines into different projects, instead of keeping everything in the same repo folder.

You can separate out these 2 pipelines into different projects, instead of keeping everything in the same repo folder.

Since we are using CMK (customer managed key) for encryption on root S3 bucket and Databricks managed resources, you also need to provide an AWS IAM ARN for `cmk_admin`. The format will be: `arn:aws:iam::123456:user/xxx`. You need to create this user and assign KMS admin role to it.
> Step 1.1: Manually create a service principal with account admin role on Account Console, generate client secret; and note down client_id and client_secret values.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
> Step 1.1: Manually create a service principal with account admin role on Account Console, generate client secret; and note down client_id and client_secret values.
> Step 1.1: Manually create a service principal with account admin role on Account Console, generate client secret; and note down `client_id` and `client_secret` values.

workspace_3 = var.workspace_3_config
}
export TF_VAR_aws_account_id=xxxx
export AWS_ACCESS_KEY_ID=your_aws_role_access_key_id
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's avoid this, tf should just use awscli profile to auth

In the default setting, this template creates one VPC (with one public subnet and one private subnet for hosting VPCEs). Each incoming workspace will add 2 private subnets into this VPC. If you need to create multiple VPCs, you should copy paste the VPC configs and change accordingly, or you can wrap VPC configs into a module, we leave this to you.

At this step, your workspaces deployment and VPC networking infra should have been successfully deployed and you will have `n` config json files for `n` workspaces deployed, under `/artifacts` folder, to be used in another Terraform project to deploy workspace objects including IP Access List.
Then in root level `variables.tf`, change the region default value to your region.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why don't you add this to the yml config as well?

@@ -0,0 +1,43 @@
# VPC Configuration
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should start with three dashes

Suggested change
# VPC Configuration
---
# VPC Configuration

@@ -0,0 +1,107 @@
terraform {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we still suggest customer to configure log delivery? system tables should replace this now as parsing the audit logs is quite time-consuming

Comment on lines +32 to +33
deploy_metastore: "false"
existing_metastore_id: "xxxxx-xxxx-xxxx-xxxx-xxxxxx"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should combine these into a single flag existing_metastore_id

}

module "vpc" {
source = "terraform-aws-modules/vpc/aws"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we pin the module version?

}


resource "aws_iam_role_policy" "cross_account" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we already have data.databricks_aws_crossaccount_policy

@alexott alexott requested review from a team and Copilot October 9, 2025 09:09
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR revamps the AWS Databricks modular privatelink example to implement a config file-driven deployment approach. The changes enable deploying multiple Databricks workspaces across multiple VPCs using YAML configuration files, with support for Unity Catalog catalogs and isolated infrastructure.

  • Complete restructuring to use YAML configuration files instead of hardcoded Terraform variables
  • New modular architecture with account-level and workspace-level pipelines
  • Enhanced Unity Catalog support with isolated S3 buckets and external locations

Reviewed Changes

Copilot reviewed 59 out of 69 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
main.tf Replaced hardcoded workspace configs with YAML-driven configuration parsing
variables.tf Simplified to core variables, removed workspace-specific config variables
providers.tf Updated provider versions and added time provider
outputs.tf Modified to output workspace URLs from new module structure
identity_management.tf Added comprehensive user and group management using YAML config
databricks_account_tf_pipeline/ New module for account-level resource deployment including workspaces and metastore
databricks_workspace_tf_pipeline/ New module for workspace-level resources including UC catalogs
configs/ New YAML configuration files for environment and user management
README.md Updated documentation reflecting the new config-driven approach
Files not reviewed (1)
  • examples/aws-databricks-modular-privatelink/.terraform.lock.hcl: Language not supported

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

principals {
type = "AWS"
identifiers = [var.cmk_admin]
identifiers = [data.aws_caller_identity.current.account_id]
Copy link

Copilot AI Oct 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The account ID should be formatted as an ARN. This should be arn:aws:iam::${data.aws_caller_identity.current.account_id}:root to properly identify the root user of the account.

Suggested change
identifiers = [data.aws_caller_identity.current.account_id]
identifiers = ["arn:aws:iam::${data.aws_caller_identity.current.account_id}:root"]

Copilot uses AI. Check for mistakes.
principals {
type = "AWS"
identifiers = [var.cmk_admin]
identifiers = [data.aws_caller_identity.current.account_id]
Copy link

Copilot AI Oct 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The account ID should be formatted as an ARN. This should be arn:aws:iam::${data.aws_caller_identity.current.account_id}:root to properly identify the root user of the account.

Suggested change
identifiers = [data.aws_caller_identity.current.account_id]
identifiers = ["arn:aws:iam::${data.aws_caller_identity.current.account_id}:root"]

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants