Operations

This section is for running your implementation in a controlled way after deployment.

Operational pillars

Data quality checks.
Execution monitoring.
Failure triage.
Replay and recovery.
Environment-safe troubleshooting.

Data quality

Data quality is enforced both by contract-aware validation and by the way Silver quarantine behavior is reported.

Track failure reasons, not just failure counts.
Distinguish contract authoring failures from row-level transform failures.
Treat invalid transformLogic as a pipeline-authoring problem and mixed row failures as quarantine candidates.
Separate casting failures from constraint failures.
Make quarantine tables visible to operators, not just developers.

Monitoring

Monitor pipeline completion, duration, input volume, and quarantine rate.
Keep operational logging attached to the data plane, not buried in a separate dashboard with no context.
Prefer a small set of trusted signals over many unactionable metrics.

Common recovery runbooks

Key Vault soft-delete conflict

If redeployment fails because a vault already exists in deleted state, check whether the environment expects recovery or purge-safe behavior before forcing a new name.

List deleted vaults first.
Recover when purge protection is enabled.
Treat the recovery path as part of the environment lifecycle, not a one-off operator trick.

Databricks metastore assignment failures

When metastore creation succeeds but workspace assignment fails:

Verify the workspace identifier and access path.
Verify the deployment identity has the required Databricks account permissions.
Document any required manual recovery only after confirming the normal deployment path cannot complete it.

Python dependency or import failures in pipelines

Verify the dependency installation step ran in the expected directory.
Verify the script is executed through the intended environment tooling.
Prefer fixing the install or packaging path rather than adding ad hoc pipeline workarounds.

Bicep validation or policy failures

Build or validate the affected template locally.
Recheck naming, tagging, and policy expectations before retrying deployment.
Capture the fix back into docs when the failure exposed a missing guardrail.

Fabric operations focus

Emphasize notebook execution context, workspace permissions, and lakehouse object visibility.
Track which operational tasks are fully implemented versus documented placeholders.
Keep Fabric-specific runbooks explicit until parity improves.

Troubleshooting sequence

Confirm the failing layer: infra, orchestration, landing, Bronze, or Silver.
Confirm whether the behavior is shared or platform-specific.
Inspect contract and checkpoint assumptions before changing code.
Capture the remediation steps in durable documentation or runbooks.

Operational pillars​

Data quality​

Monitoring​

Common recovery runbooks​

Key Vault soft-delete conflict​

Databricks metastore assignment failures​

Python dependency or import failures in pipelines​

Bicep validation or policy failures​

Fabric operations focus​

Troubleshooting sequence​