CLI Reference

dbx provides access to it’s functions in a cli-oriented fashion.

Each individual command has a detailed help screen accessible via dbx command_name --help.

We encourage you to use dbx both for local development and CI/CD pipelines.

Note

dbx works with your PAT (Personal Access Token) in exactly the same way as databricks-cli.

This means that if the following environment variables:

  • DATABRICKS_HOST

  • DATABRICKS_TOKEN

are defined, dbx will use them to perform actions.

It allows you to securely store these variables in your CI/CD tool and access them from within the pipeline. In general, we don’t recommend storing your tokens into the config file inside the CI pipeline, since this might be insecure. For Azure-based environments, you can also consider using AAD-based authentication. For local development, please use the Databricks CLI profiles - it’s very convenient for cases when you’re working with multiple environments.

dbx

dbx [OPTIONS] COMMAND [ARGS]...

Options

--version

Show the version and exit.

configure

Configures project environment in the current folder.

This command might be used multiple times to change configuration of a given environment. If project file (located in .dbx/project.json) is non-existent, it will be initialized. There is no strict requirement to configure project file via this command. You can also configure it directly via any file editor.

dbx configure [OPTIONS]

Options

--workspace-dir <workspace_dir>

Workspace directory for MLflow experiment.

If not provided, default directory will be /Shared/dbx/projects/<current-folder-name>.

--artifact-location <artifact_location>

Artifact location in DBFS.

If not provided, default location will be dbfs:/dbx/<current-folder-name>.

-e, --environment <environment>

Environment name.

If not provided, default will be used.

--debug

Debug Mode. Shows full stack trace on error.

--profile <profile>

CLI connection profile to use.

The default profile is DEFAULT.

datafactory

Azure Data Factory integration utilities.

dbx datafactory [OPTIONS] COMMAND [ARGS]...

reflect

Reflects job definitions to Azure Data Factory.

During the reflection, following actions will be performed:

  1. Input specs file will be parsed

  2. Per each defined cluster, a new linked service will be created

  3. Per each defined job, a job object in ADF pipeline will be reflected.
    Please note that chaining jobs into pipeline shall be done on ADF side.
    No other steps in datafactory pipeline will be changed by execution of this command.
dbx datafactory reflect [OPTIONS]

Options

--specs-file <specs_file>

Required Path to deployment result specification file

--subscription-name <subscription_name>

Required Name of Azure subscription

-g, --resource-group <resource_group>

Required Resource group name

--factory-name <factory_name>

Required Factory name

-n, --name <name>

Required Pipeline name

--debug

Debug Mode. Shows full stack trace on error.

-e, --environment <environment>

Environment name.

If not provided, default will be used.

deploy

Deploy project to artifact storage.

This command takes the project in current folder (file .dbx/project.json shall exist) and performs deployment to the given environment.

During the deployment, following actions will be performed:

  1. Python package will be built and stored in dist/* folder (can be disabled via --no-rebuild)

  2. Deployment configuration will be taken for a given environment (see -e for details)
    from the deployment file, defined in --deployment-file.
    You can specify the deployment file in either JSON or YAML or Jinja-based JSON or YAML.
    [.json, .yaml, .yml, .j2] are all valid file types.
  3. Per each job defined in the --jobs, all local file references will be checked

  4. Any found file references will be uploaded to MLflow as artifacts of current deployment run

  5. [DEPRECATED] If --requirements-file is provided, all requirements will be added to job definition

  6. Wheel file location will be added to the libraries. Can be disabled with --no-package.

  7. If the job with given name exists, it will be updated, if not - created

  8. If --write-specs-to-file is provided, writes final job spec into a given file.
    For example, this option can look like this: --write-specs-to-file=.dbx/deployment-result.json.
dbx deploy [OPTIONS]

Options

--job <job>

Deploy a single job by it’s name. Both --jobs and --job cannot be provided.

--jobs <jobs>

Comma-separated list of job names to be deployed. If not provided, all jobs from the deployment file will be deployed. Both --jobs and --job cannot be provided.

--requirements-file <requirements_file>

[DEPRECATED]

--no-rebuild

Disable package rebuild

--no-package

Do not add package reference into the job description

--files-only

Do not create jobs, only deploy files.

--tags <tags>

Additional tags for deployment in format (tag_name=tag_value). Option might be repeated multiple times.

--write-specs-to-file <write_specs_to_file>

Writes final job definitions into a given local file. Helpful when final representation of a deployed job is needed for other integrations. Please note that output file will be overwritten if it exists.

--branch-name <branch_name>

The name of the current branch. If not provided or empty, dbx will try to detect the branch name.

--jinja-variables-file <jinja_variables_file>

Path to a file with variables for Jinja template. Only works when Jinja-based deployment file is used. Read more about this functionality in the Jinja2 support doc.

--debug

Debug Mode. Shows full stack trace on error.

-e, --environment <environment>

Environment name.

If not provided, default will be used.

--deployment-file <deployment_file>

Path to deployment file.

execute

Executes given job on the interactive cluster.

This command is very suitable to interactively execute your code on the interactive clusters.

Warning

There are some limitations for dbx execute:

  • Only clusters which support %pip magic can work with execute.

  • Currently, only Python-based execution is supported.

The following set of actions will be done during execution:

  1. If interactive cluster is stopped, it will be automatically started

  2. Package will be rebuilt from the source (can be disabled via --no-rebuild)

  3. Job configuration will be taken from deployment file for given environment

  4. All referenced will be uploaded to the MLflow experiment

  5. Code will be executed in a separate context. Other users can work with the same package
    on the same cluster without any limitations or overlapping.
  6. Execution results will be printed out in the shell. If result was an error, command will have error exit code.

dbx execute [OPTIONS]

Options

--cluster-id <cluster_id>

Cluster ID.

--cluster-name <cluster_name>

Cluster name.

--job <job>

Required Job name to be executed

--task <task>

Task name (task_key field) inside the job to be executed. Required if the –job is a multitask job.

--requirements-file <requirements_file>

[DEPRECATED]

--no-rebuild

Disable package rebuild

--no-package

Do not add package reference into the job description

--upload-via-context

Upload files via execution context

-e, --environment <environment>

Environment name.

If not provided, default will be used.

--debug

Debug Mode. Shows full stack trace on error.

--deployment-file <deployment_file>

Path to deployment file.

--jinja-variables-file <jinja_variables_file>

Path to a file with variables for Jinja template. Only works when Jinja-based deployment file is used. Read more about this functionality in the Jinja2 support doc.

init

Generates new project from the template

Launching this command without --template-parameters argument will open cookiecutter dialogue to enter the required parameters.

dbx init [OPTIONS]

Options

--template <template>

Built-in dbx template used to kickoff the project.

Options

python_basic

--path <path>

External template used to kickoff the project. Cannot be used together with --template option.

--package <package>

Python package containing external template used to kickoff the project. Cannot be used together with --template option.

--checkout <checkout>

Checkout argument for cookiecutter. Used only if --path is used.

-p, --parameters <parameters>

Additional parameters for project creation in the format of parameter=value, for example:

--no-input
--debug

Debug Mode. Shows full stack trace on error.

launch

Finds the job deployment and launches it on a automated or interactive cluster.

This command will launch the given job by it’s name on a given environment.

Note

Job shall be deployed prior to be launched.

dbx launch [OPTIONS]

Options

--job <job>

Required Job name.

--trace

Trace the job until it finishes.

--kill-on-sigterm

If provided, kills the job on SIGTERM (Ctrl+C) signal

--existing-runs <existing_runs>

Strategy to handle existing active job runs.

Options behaviour:

  • wait will wait for all existing job runs to be finished

  • cancel will cancel all existing job runs

  • pass will simply pass the check and try to launch the job directly

Options

wait | cancel | pass

--as-run-submit

Run the job as run submit.

--tags <tags>

Additional tags to search for the latest deployment. Format: (--tags="tag_name=tag_value"). Option might be repeated multiple times.

--parameters <parameters>

Parameters of the job.

If provided, default job arguments will be overridden. Format: (--parameters="parameter1=value1"). Option might be repeated multiple times.

--parameters-raw <parameters_raw>

Parameters of the job as a raw string.

If provided, default job arguments will be overridden. If provided, --parameters argument will be ignored. Example command: dbx launch --job="my-job-name" --parameters-raw='{"key1": "value1", "key2": 2}'. Please note that no parameters preprocessing will be done.

--branch-name <branch_name>

The name of the current branch. If not provided or empty, dbx will try to detect the branch name.

--include-output <include_output>

If provided, adds run output to the console output of the launch command. Please note that this option is only supported for Jobs V2.X+. For jobs created without tasks section output won’t be printed. If not provided, run output will be omitted.

Options behaviour:

  • stdout will add stdout and stderr to the console output

  • stderr will add only stderr to the console output

Options

stdout | stderr

-e, --environment <environment>

Environment name.

If not provided, default will be used.

--debug

Debug Mode. Shows full stack trace on error.

sync

Sync local files to Databricks and watch for changes, with support for syncing to either a path in DBFS or a Databricks Repo via the dbfs and repo subcommands. This enables one to incrementally sync local files to Databricks in order to enable quick, iterative development in an IDE with the ability to test changes almost immediately in Databricks notebooks.

Suppose you are using the Repos for Git integration feature and have cloned a git repo within Databricks where you have Python notebooks stored as well as various Python modules that the notebooks import. You can edit any of these files directly in Databricks. The dbx sync repo command provides an additional option: edit the files in a local repo on your computer in an IDE of your choice and sync the changes to the repo in Databricks as you make changes.

For example, when run from a local git clone, the following will sync all the files to an existing repo named myrepo in Databricks and watch for changes:

dbx sync repo -d myrepo

At the top of your notebook you can turn on autoreload so that execution of cells will automatically pick up the changes:

%load_ext autoreload
%autoreload 2

The dbx sync repo command syncs to a repo in Databricks. If that repo is a git clone you can see the changes made to the files, as if you’d made the edits directly in Databricks. Alternatively, you can use dbx sync dbfs to sync the files to a path in DBFS. This keeps the files independent from the repos but still allows you to use them in notebooks either in a repo or in notebooks existing in your workspace.

For example, when run from a local git clone in a myrepo directory under a user first.last@somewhere.com, the following will sync all the files to the DBFS path /tmp/users/first.last/myrepo:

dbx sync dbfs

The destination path can also be specified, as in: -d /tmp/myrepo.

When executing notebooks in a repo, the root of the repo is automatically added to the Python path so that imports work relative to the repo root. This means that aside from turning on autoreload you don’t need to do anything else special for the changes to be reflected in the cell’s execution. However, when syncing to DBFS, for the imports to work you need to update the Python path to include this target directory you’re syncing to. For example, to import from the /tmp/users/first.last/myrepo path used above, use the following at the top of your notebook:

import sys

if "/dbfs/tmp/users/first.last/myrepo" not in sys.path:
    sys.path.insert(0, "/dbfs/tmp/users/first.last/myrepo")

The dbx sync commands have many options for controlling which files/directories to include/exclude from syncing, which are well documented below. For convenience, all patterns listed in a .gitignore at the source will be excluded from syncing. The .git directory is excluded as well.

dbx sync [OPTIONS] COMMAND [ARGS]...

dbfs

Syncs from a source directory to DBFS.

dbx sync dbfs [OPTIONS]

Options

--use-gitignore, --no-use-gitignore

Controls whether the .gitignore is used to automatically exclude file/directories from syncing.

--polling-interval <polling_interval_secs>

Use file system polling instead of file system events and set the polling interval (in seconds)

--watch, --no-watch

Controls whether the tool should watch for file changes after the initial sync. With --watch, which is the default, it will watch for file system changes and rerun the sync whenever any changes occur to files or directories matching the filters. With --no-watch the tool will quit after the initial sync.

-ep, --exclude-pattern <exclude_patterns>

A pattern specifying files and/or directories to exclude from syncing, relative to the source directory. This uses the same format as gitignore. For examples, see the documentation of --include-pattern.

--allow-delete-unmatched, --disallow-delete-unmatched

Specifies how to handle files/directories that would be deleted in the remote destination because they don’t match the current set of filters.

For example, suppose you have used the option -i foo to sync only the foo directory and later quit the tool. Then suppose you restart the tool using -i bar to sync only the bar directory. In this situation, it is unclear whether your intention is to 1) sync over bar and remove foo in the destination, or 2) sync over bar and leave foo alone in the destination. Due to this ambiguity, the tool will ask to confirm your intentions.

To avoid having to confirm, you can use either of these options:

  • --allow-delete-unmatched will delete files/directories in the destination that are not present locally with the current filters. So for the example above, this would remove foo in the destination when syncing with -i bar.

  • --disallow-delete-unmatched will NOT delete files/directories in the destination that are not present locally with the current filters. So for the example above, this would leave foo in the destination when syncing with -i bar.

-fip, --force-include-pattern <force_include_patterns>

A pattern specifying files and/or directories to sync, relative to the source directory, regardless of whether these files and/or directories would otherwise be excluded.

See documentation of –include-pattern for usage.

-ip, --include-pattern <include_patterns>

A pattern specifying files and/or directories to sync, relative to the source directory. This uses the same format as gitignore. When this option is used, no files or directories will be synced unless specifically included by this or other include options.

For example:

  • foo will match any file or directory named foo anywhere under the source

  • /foo/ will only match a directory named foo directly under the source.

  • *.py will match all Python files.

  • /foo/*.py will match all Python files directly under the foo directory.

  • /foo/**/*.py will match all Python files anywhere under the foo directory.

You may also store a list of patterns inside a .syncinclude file under the source path. Patterns in this file will be used as the default patterns to include. This essentially behaves as the opposite of a gitignore file, but with the same format.

-e, --exclude <exclude_dirs>

A directory to exclude from syncing, relative to the source directory. This directory must exist.

For example:

  • -e foo will exclude directory foo directly under the source directory from syncing

  • -e foo/bar will exclude directory foo/bar directly under the source directory from syncing

-fi, --force-include <force_include_dirs>

A directory to sync, relative to the source directory. This directory must exist. When this option is used, no files or directories will be synced unless specifically included by this or other include options.

Unlike –include, this will sync a directory regardless of files/directories that are excluded from syncing. This can be useful when, for example, the .gitignore lists a directory that you want to have synced. The patterns in the .gitignore are used by default to exclude files/directories from syncing.

For example:

  • -i foo will sync a directory foo directly under the source directory

  • -i foo/bar will sync a directory foo/bar directly under the source directory

-i, --include <include_dirs>

A directory to sync, relative to the source directory. This directory must exist. When this option is used, no files or directories will be synced unless specifically included by this or other include options.

For example:

  • -i foo will sync a directory foo directly under the source directory

  • -i foo/bar will sync a directory foo/bar directly under the source directory

--dry-run

Log what the tool would do without making any changes.

--full-sync

Ignores any existing sync state and syncs all files and directories matching the filters to the destination.

-s, --source <source>

The local source path to sync from. If the current working directory is a git repo, then the tool by default uses that path as the source. Otherwise the source path will need to be specified.

--profile <profile>

The Databricks CLI connection profile containing the host and API token to use to connect to Databricks.

-d, --dest <dest_path>

A path in DBFS to sync to. For example, -d /tmp/project would sync from the local source path to the DBFS path /tmp/project.

Specifying this path is optional. By default the tool will sync to the destination /tmp/users/<user_name>/<source_base_name>. For example, given local source path /foo/bar and Databricks user first.last@somewhere.com, this would sync to /tmp/users/first.last/bar. This path is chosen as a safe default option that is unlikely to overwrite anything important.

When constructing this default destination path, the user name is determined using the Databricks API. If it cannot be determined, or to use a different user for the path, you may use the --user option.

-u, --user <user_name>

Specify the user name to use when constructing the default destination path. This has no effect when --dest is already specified. If this is an email address then the domain is ignored. For example -u first.last and -u first.last@somewhere.com will both result in first.last as the user name.

repo

Syncs from source directory to a Databricks Repo.

dbx sync repo [OPTIONS]

Options

--use-gitignore, --no-use-gitignore

Controls whether the .gitignore is used to automatically exclude file/directories from syncing.

--polling-interval <polling_interval_secs>

Use file system polling instead of file system events and set the polling interval (in seconds)

--watch, --no-watch

Controls whether the tool should watch for file changes after the initial sync. With --watch, which is the default, it will watch for file system changes and rerun the sync whenever any changes occur to files or directories matching the filters. With --no-watch the tool will quit after the initial sync.

-ep, --exclude-pattern <exclude_patterns>

A pattern specifying files and/or directories to exclude from syncing, relative to the source directory. This uses the same format as gitignore. For examples, see the documentation of --include-pattern.

--allow-delete-unmatched, --disallow-delete-unmatched

Specifies how to handle files/directories that would be deleted in the remote destination because they don’t match the current set of filters.

For example, suppose you have used the option -i foo to sync only the foo directory and later quit the tool. Then suppose you restart the tool using -i bar to sync only the bar directory. In this situation, it is unclear whether your intention is to 1) sync over bar and remove foo in the destination, or 2) sync over bar and leave foo alone in the destination. Due to this ambiguity, the tool will ask to confirm your intentions.

To avoid having to confirm, you can use either of these options:

  • --allow-delete-unmatched will delete files/directories in the destination that are not present locally with the current filters. So for the example above, this would remove foo in the destination when syncing with -i bar.

  • --disallow-delete-unmatched will NOT delete files/directories in the destination that are not present locally with the current filters. So for the example above, this would leave foo in the destination when syncing with -i bar.

-fip, --force-include-pattern <force_include_patterns>

A pattern specifying files and/or directories to sync, relative to the source directory, regardless of whether these files and/or directories would otherwise be excluded.

See documentation of –include-pattern for usage.

-ip, --include-pattern <include_patterns>

A pattern specifying files and/or directories to sync, relative to the source directory. This uses the same format as gitignore. When this option is used, no files or directories will be synced unless specifically included by this or other include options.

For example:

  • foo will match any file or directory named foo anywhere under the source

  • /foo/ will only match a directory named foo directly under the source.

  • *.py will match all Python files.

  • /foo/*.py will match all Python files directly under the foo directory.

  • /foo/**/*.py will match all Python files anywhere under the foo directory.

You may also store a list of patterns inside a .syncinclude file under the source path. Patterns in this file will be used as the default patterns to include. This essentially behaves as the opposite of a gitignore file, but with the same format.

-e, --exclude <exclude_dirs>

A directory to exclude from syncing, relative to the source directory. This directory must exist.

For example:

  • -e foo will exclude directory foo directly under the source directory from syncing

  • -e foo/bar will exclude directory foo/bar directly under the source directory from syncing

-fi, --force-include <force_include_dirs>

A directory to sync, relative to the source directory. This directory must exist. When this option is used, no files or directories will be synced unless specifically included by this or other include options.

Unlike –include, this will sync a directory regardless of files/directories that are excluded from syncing. This can be useful when, for example, the .gitignore lists a directory that you want to have synced. The patterns in the .gitignore are used by default to exclude files/directories from syncing.

For example:

  • -i foo will sync a directory foo directly under the source directory

  • -i foo/bar will sync a directory foo/bar directly under the source directory

-i, --include <include_dirs>

A directory to sync, relative to the source directory. This directory must exist. When this option is used, no files or directories will be synced unless specifically included by this or other include options.

For example:

  • -i foo will sync a directory foo directly under the source directory

  • -i foo/bar will sync a directory foo/bar directly under the source directory

--dry-run

Log what the tool would do without making any changes.

--full-sync

Ignores any existing sync state and syncs all files and directories matching the filters to the destination.

-s, --source <source>

The local source path to sync from. If the current working directory is a git repo, then the tool by default uses that path as the source. Otherwise the source path will need to be specified.

--profile <profile>

The Databricks CLI connection profile containing the host and API token to use to connect to Databricks.

-d, --dest-repo <dest_repo>

Required The name of the Databricks Repo to sync to.

Repos exist in the Databricks workspace under a path of the form /Repos/<user>/<repo>. This specifies the <repo> portion of the path.

-u, --user <user_name>

The user who owns the Databricks Repo to sync to.

Repos exist in the Databricks workspace under a path of the form /Repos/<user>/<repo>. This specifies the <user> portion of the path.

This is optional, as the user name is determined automatically using the Databricks API. If it cannot be determined, or to use a different user for the path, the user name may be specified using this option.