CLI Reference¶
dbx
provides access to it’s functions in a cli-oriented fashion.
Each individual command has a detailed help screen accessible via dbx command_name --help
.
We encourage you to use dbx
both for local development and CI/CD pipelines.
Note
dbx
works with your PAT (Personal Access Token) in exactly the same way as databricks-cli.
This means that if the following environment variables:
DATABRICKS_HOST
DATABRICKS_TOKEN
are defined, dbx
will use them to perform actions.
It allows you to securely store these variables in your CI/CD tool and access them from within the pipeline.
dbx¶
dbx [OPTIONS] COMMAND [ARGS]...
Options
- --version¶
Show the version and exit.
configure¶
Configures project environment in the current folder.
This command might be used multiple times to change configuration of a given environment.
If project file (located in .dbx/project.json
) is non-existent, it will be initialized.
There is no strict requirement to configure project file via this command.
You can also configure it directly via any file editor.
dbx configure [OPTIONS]
Options
- --workspace-dir <workspace_dir>¶
Workspace directory for MLflow experiment.
If not provided, default directory will be
/Shared/dbx/projects/<current-folder-name>
.
- --artifact-location <artifact_location>¶
Artifact location in DBFS.
If not provided, default location will be
dbfs:/dbx/<current-folder-name>
.
- -e, --environment <environment>¶
Environment name.
If not provided,
default
will be used.
- --debug¶
Debug Mode. Shows full stack trace on error.
- --profile <profile>¶
CLI connection profile to use.
The default profile is
DEFAULT
.
datafactory¶
Azure Data Factory integration utilities.
dbx datafactory [OPTIONS] COMMAND [ARGS]...
reflect¶
Reflects job definitions to Azure Data Factory.
During the reflection, following actions will be performed:
Input specs file will be parsed
Per each defined cluster, a new linked service will be created
- Per each defined job, a job object in ADF pipeline will be reflected.Please note that chaining jobs into pipeline shall be done on ADF side.No other steps in datafactory pipeline will be changed by execution of this command.
dbx datafactory reflect [OPTIONS]
Options
- --specs-file <specs_file>¶
Required Path to deployment result specification file
- --subscription-name <subscription_name>¶
Required Name of Azure subscription
- -g, --resource-group <resource_group>¶
Required Resource group name
- --factory-name <factory_name>¶
Required Factory name
- -n, --name <name>¶
Required Pipeline name
- --debug¶
Debug Mode. Shows full stack trace on error.
- -e, --environment <environment>¶
Environment name.
If not provided,
default
will be used.
deploy¶
Deploy project to artifact storage.
This command takes the project in current folder (the .dbx/project.json
shall exist)
and performs deployment to the given environment.
During the deployment, following actions will be performed:
Python package will be built and stored in
dist/*
folder (can be disabled via--no-rebuild
)- Deployment configuration will be taken for a given environment (see
-e
for details)from the deployment file, defined in--deployment-file
(default:conf/deployment.json
).You can specify the deployment file in either json or yaml.[.json, .yaml, .yml]
are all valid file types. Per each job defined in the
--jobs
, all local file references will be checkedAny found file references will be uploaded to MLflow as artifacts of current deployment run
If
--requirements-file
is provided, all requirements will be added to job definitionWheel file location will be added to the
libraries
. Can be disabled with--no-package
.If the job with given name exists, it will be updated, if not - created
- If
--write-specs-to-file
is provided, writes final job spec into a given file.For example, this option can look like this:--write-specs-to-file=.dbx/deployment-result.json
.
dbx deploy [OPTIONS]
Options
- --deployment-file <deployment_file>¶
Path to deployment file in one of these formats: [json, yaml]
- --jobs <jobs>¶
Comma-separated list of job names to be deployed. If not provided, all jobs from the deployment file will be deployed.
- --requirements-file <requirements_file>¶
- --no-rebuild¶
Disable package rebuild
- --no-package¶
Do not add package reference into the job description
- --files-only¶
Do not create jobs, only deploy files.
- --tags <tags>¶
Additional tags for deployment in format (tag_name=tag_value). Option might be repeated multiple times.
- --write-specs-to-file <write_specs_to_file>¶
Writes final job definitions into a given local file. Helpful when final representation of a deployed job is needed for other integrations. Please not that output file will be overwritten if it exists.
- --branch-name <branch_name>¶
The name of the current branch. If not provided or empty, dbx will try to detect the branch name.
- --debug¶
Debug Mode. Shows full stack trace on error.
- -e, --environment <environment>¶
Environment name.
If not provided,
default
will be used.
execute¶
Executes given job on the interactive cluster.
This command is very suitable to interactively execute your code on the interactive clusters.
Warning
There are some limitations for dbx execute
:
Only clusters which support
%pip
magic can work with execute.Currently, only Python-based execution is supported.
The following set of actions will be done during execution:
If interactive cluster is stooped, it will be automatically started
Package will be rebuilt from the source (can be disabled via
--no-rebuild
)Job configuration will be taken from deployment file for given environment
All referenced will be uploaded to the MLflow experiment
- Code will be executed in a separate context. Other users can work with the same packageon the same cluster without any limitations or overlapping.
Execution results will be printed out in the shell. If result was an error, command will have error exit code.
dbx execute [OPTIONS]
Options
- --cluster-id <cluster_id>¶
Cluster ID.
- --cluster-name <cluster_name>¶
Cluster name.
- --job <job>¶
Required Job name to be executed
- --deployment-file <deployment_file>¶
Path to deployment file in one of these formats: [json, yaml]
- --requirements-file <requirements_file>¶
- --no-rebuild¶
Disable package rebuild
- --no-package¶
Do not add package reference into the job description
- -e, --environment <environment>¶
Environment name.
If not provided,
default
will be used.
- --debug¶
Debug Mode. Shows full stack trace on error.
launch¶
Finds the job deployment and launches it on a automated or interactive cluster.
This command will launch the given job by it’s name on a given environment.
Note
Job shall be deployed prior to be launched.
dbx launch [OPTIONS]
Options
- --job <job>¶
Required Job name.
- --trace¶
Trace the job until it finishes.
- --kill-on-sigterm¶
If provided, kills the job on SIGTERM (Ctrl+C) signal
- --existing-runs <existing_runs>¶
Strategy to handle existing active job runs.
Options behaviour:
wait
will wait for all existing job runs to be finishedcancel
will cancel all existing job runspass
will simply pass the check and try to launch the job directly
- Options
wait | cancel | pass
- --as-run-submit¶
Run the job as run submit.
- --tags <tags>¶
Additional tags to search for the latest deployment. Format: (
--tags="tag_name=tag_value"
). Option might be repeated multiple times.
- --parameters <parameters>¶
Parameters of the job.
If provided, default job arguments will be overridden. Format: (
--parameters="parameter1=value1"
). Option might be repeated multiple times.
- --parameters-raw <parameters_raw>¶
Parameters of the job as a raw string.
If provided, default job arguments will be overridden. If provided,
--parameters
argument will be ignored. Example command:dbx launch --job="my-job-name" --parameters-raw='{"key1": "value1", "key2": 2}'
. Please note that no parameters preprocessing will be done.
- --branch-name <branch_name>¶
The name of the current branch. If not provided or empty, dbx will try to detect the branch name.
- -e, --environment <environment>¶
Environment name.
If not provided,
default
will be used.
- --debug¶
Debug Mode. Shows full stack trace on error.