CLI Reference

dbx provides access to it’s functions in a cli-oriented fashion.

Each individual command has a detailed help screen accessible via dbx command_name --help.

We encourage you to use dbx both for local development and CI/CD pipelines.

Note

dbx works with your PAT (Personal Access Token) in exactly the same way as databricks-cli. This means that if the following environment variables:

  • DATABRICKS_HOST

  • DATABRICKS_TOKEN

are defined, dbx will use them to perform actions. It allows you to securely store these variables in your CI/CD tool and access them from within the pipeline.

dbx

dbx [OPTIONS] COMMAND [ARGS]...

Options

--version

Show the version and exit.

configure

Configures project environment in the current folder.

This command might be used multiple times to change configuration of a given environment. If project file (located in .dbx/project.json) is non-existent, it will be initialized. There is no strict requirement to configure project file via this command. You can also configure it directly via any file editor.

dbx configure [OPTIONS]

Options

--workspace-dir <workspace_dir>

Workspace directory for MLflow experiment.

If not provided, default directory will be /Shared/dbx/projects/<current-folder-name>.

--artifact-location <artifact_location>

Artifact location in DBFS.

If not provided, default location will be dbfs:/dbx/<current-folder-name>.

-e, --environment <environment>

Environment name.

If not provided, default will be used.

--debug

Debug Mode. Shows full stack trace on error.

--profile <profile>

CLI connection profile to use.

The default profile is DEFAULT.

datafactory

Azure Data Factory integration utilities.

dbx datafactory [OPTIONS] COMMAND [ARGS]...

reflect

Reflects job definitions to Azure Data Factory.

During the reflection, following actions will be performed:

  1. Input specs file will be parsed

  2. Per each defined cluster, a new linked service will be created

  3. Per each defined job, a job object in ADF pipeline will be reflected.
    Please note that chaining jobs into pipeline shall be done on ADF side.
    No other steps in datafactory pipeline will be changed by execution of this command.
dbx datafactory reflect [OPTIONS]

Options

--specs-file <specs_file>

Required Path to deployment result specification file

--subscription-name <subscription_name>

Required Name of Azure subscription

-g, --resource-group <resource_group>

Required Resource group name

--factory-name <factory_name>

Required Factory name

-n, --name <name>

Required Pipeline name

--debug

Debug Mode. Shows full stack trace on error.

-e, --environment <environment>

Environment name.

If not provided, default will be used.

deploy

Deploy project to artifact storage.

This command takes the project in current folder (the .dbx/project.json shall exist) and performs deployment to the given environment.

During the deployment, following actions will be performed:

  1. Python package will be built and stored in dist/* folder (can be disabled via --no-rebuild)

  2. Deployment configuration will be taken for a given environment (see -e for details)
    from the deployment file, defined in --deployment-file (default: conf/deployment.json).
    You can specify the deployment file in either json or yaml.
    [.json, .yaml, .yml] are all valid file types.
  3. Per each job defined in the --jobs, all local file references will be checked

  4. Any found file references will be uploaded to MLflow as artifacts of current deployment run

  5. If --requirements-file is provided, all requirements will be added to job definition

  6. Wheel file location will be added to the libraries. Can be disabled with --no-package.

  7. If the job with given name exists, it will be updated, if not - created

  8. If --write-specs-to-file is provided, writes final job spec into a given file.
    For example, this option can look like this: --write-specs-to-file=.dbx/deployment-result.json.
dbx deploy [OPTIONS]

Options

--deployment-file <deployment_file>

Path to deployment file in one of these formats: [json, yaml]

--jobs <jobs>

Comma-separated list of job names to be deployed. If not provided, all jobs from the deployment file will be deployed.

--requirements-file <requirements_file>
--no-rebuild

Disable package rebuild

--no-package

Do not add package reference into the job description

--files-only

Do not create jobs, only deploy files.

--tags <tags>

Additional tags for deployment in format (tag_name=tag_value). Option might be repeated multiple times.

--write-specs-to-file <write_specs_to_file>

Writes final job definitions into a given local file. Helpful when final representation of a deployed job is needed for other integrations. Please not that output file will be overwritten if it exists.

--branch-name <branch_name>

The name of the current branch. If not provided or empty, dbx will try to detect the branch name.

--debug

Debug Mode. Shows full stack trace on error.

-e, --environment <environment>

Environment name.

If not provided, default will be used.

execute

Executes given job on the interactive cluster.

This command is very suitable to interactively execute your code on the interactive clusters.

Warning

There are some limitations for dbx execute:

  • Only clusters which support %pip magic can work with execute.

  • Currently, only Python-based execution is supported.

The following set of actions will be done during execution:

  1. If interactive cluster is stooped, it will be automatically started

  2. Package will be rebuilt from the source (can be disabled via --no-rebuild)

  3. Job configuration will be taken from deployment file for given environment

  4. All referenced will be uploaded to the MLflow experiment

  5. Code will be executed in a separate context. Other users can work with the same package
    on the same cluster without any limitations or overlapping.
  6. Execution results will be printed out in the shell. If result was an error, command will have error exit code.

dbx execute [OPTIONS]

Options

--cluster-id <cluster_id>

Cluster ID.

--cluster-name <cluster_name>

Cluster name.

--job <job>

Required Job name to be executed

--deployment-file <deployment_file>

Path to deployment file in one of these formats: [json, yaml]

--requirements-file <requirements_file>
--no-rebuild

Disable package rebuild

--no-package

Do not add package reference into the job description

-e, --environment <environment>

Environment name.

If not provided, default will be used.

--debug

Debug Mode. Shows full stack trace on error.

launch

Finds the job deployment and launches it on a automated or interactive cluster.

This command will launch the given job by it’s name on a given environment.

Note

Job shall be deployed prior to be launched.

dbx launch [OPTIONS]

Options

--job <job>

Required Job name.

--trace

Trace the job until it finishes.

--kill-on-sigterm

If provided, kills the job on SIGTERM (Ctrl+C) signal

--existing-runs <existing_runs>

Strategy to handle existing active job runs.

Options behaviour:

  • wait will wait for all existing job runs to be finished

  • cancel will cancel all existing job runs

  • pass will simply pass the check and try to launch the job directly

Options

wait | cancel | pass

--as-run-submit

Run the job as run submit.

--tags <tags>

Additional tags to search for the latest deployment. Format: (--tags="tag_name=tag_value"). Option might be repeated multiple times.

--parameters <parameters>

Parameters of the job.

If provided, default job arguments will be overridden. Format: (--parameters="parameter1=value1"). Option might be repeated multiple times.

--parameters-raw <parameters_raw>

Parameters of the job as a raw string.

If provided, default job arguments will be overridden. If provided, --parameters argument will be ignored. Example command: dbx launch --job="my-job-name" --parameters-raw='{"key1": "value1", "key2": 2}'. Please note that no parameters preprocessing will be done.

--branch-name <branch_name>

The name of the current branch. If not provided or empty, dbx will try to detect the branch name.

-e, --environment <environment>

Environment name.

If not provided, default will be used.

--debug

Debug Mode. Shows full stack trace on error.