Skip to main content

Operations

This section is for running your implementation in a controlled way after deployment.

Operational pillars​

  • Data quality checks.
  • Execution monitoring.
  • Failure triage.
  • Replay and recovery.
  • Environment-safe troubleshooting.

Data quality​

Data quality is enforced both by contract-aware validation and by the way Silver quarantine behavior is reported.

  • Track failure reasons, not just failure counts.
  • Distinguish contract authoring failures from row-level transform failures.
  • Treat invalid transformLogic as a pipeline-authoring problem and mixed row failures as quarantine candidates.
  • Separate casting failures from constraint failures.
  • Make quarantine tables visible to operators, not just developers.

Monitoring​

  • Monitor pipeline completion, duration, input volume, and quarantine rate.
  • Keep operational logging attached to the data plane, not buried in a separate dashboard with no context.
  • Prefer a small set of trusted signals over many unactionable metrics.

Common recovery runbooks​

Key Vault soft-delete conflict​

If redeployment fails because a vault already exists in deleted state, check whether the environment expects recovery or purge-safe behavior before forcing a new name.

  • List deleted vaults first.
  • Recover when purge protection is enabled.
  • Treat the recovery path as part of the environment lifecycle, not a one-off operator trick.

Databricks metastore assignment failures​

When metastore creation succeeds but workspace assignment fails:

  • Verify the workspace identifier and access path.
  • Verify the deployment identity has the required Databricks account permissions.
  • Document any required manual recovery only after confirming the normal deployment path cannot complete it.

Python dependency or import failures in pipelines​

  • Verify the dependency installation step ran in the expected directory.
  • Verify the script is executed through the intended environment tooling.
  • Prefer fixing the install or packaging path rather than adding ad hoc pipeline workarounds.

Bicep validation or policy failures​

  • Build or validate the affected template locally.
  • Recheck naming, tagging, and policy expectations before retrying deployment.
  • Capture the fix back into docs when the failure exposed a missing guardrail.

Fabric operations focus​

  • Emphasize notebook execution context, workspace permissions, and lakehouse object visibility.
  • Track which operational tasks are fully implemented versus documented placeholders.
  • Keep Fabric-specific runbooks explicit until parity improves.

Troubleshooting sequence​

  1. Confirm the failing layer: infra, orchestration, landing, Bronze, or Silver.
  2. Confirm whether the behavior is shared or platform-specific.
  3. Inspect contract and checkpoint assumptions before changing code.
  4. Capture the remediation steps in durable documentation or runbooks.