Core dependency management and build options

Building the project package

By default, dbx is heavily oriented towards Python-based projects that could be compiled into a .whl file. To simplify this functionality during deployment, dbx will automatically create a .whl file in the dist/ folder during the dbx deploy command.

However, in some cases there is no need to build the Python-based wheel file. Such cases might be:

  • You’re not using Python in your project (e.g. your project is written in Scala or Java)

  • You’re only using notebooks

  • You’re using an external packaging tool such as poetry

To disable the local package rebuild during deployment, please provide --no-rebuild switch during deployment:

dbx deploy --no-rebuild

Core dependency management

Note

By core dependency package we mean the .whl file located in the dist/ folder.

The core dependency package is very important part of the dependencies that needs to b deployed in case if you develop your job with Python packaging mechanisms.

By default, dbx uploads the package file and adds it into the job definition (if it’s a single task job), or into every task (if it’s a multitask-job).

In the resulting definition you’ll be able to see the package file referenced as:

"libraries": [
        {
            "whl": "dbfs:/Shared/dbx/projects/<some_project>/<some-hash>/artifacts/dist/<some-package>-<version>-py3-none-any.whl"
        }
]

However, in some cases you would like to omit the core dependency reference in the job definition, for example:

  • You’re not using Python in your project (e.g. your project is written in Scala or Java)

  • You’re only using notebooks and they’re not dependent on the .whl file since the code is shipped together with Repos

In such cases, you can do one of the following:

  1. Disable package file reference globally for the whole deployment.
    In this case package file won’t be added to the libraries section nor on the job level, neither on the task level.
    This could be achieved by providing --no-package switch to the deployment command:
dbx deploy --no-package
  1. You can disable package file references on a per-job or per-task level
    by providing the following in the deployment configuration file:
{
    "default": {
        "jobs": [
            {
                "name": "single-task-job",
                "deployment_config": {
                    "no_package": true
                },
                "notebook_task": {
                    "notebook_path": "/Repos/some/notebook"
                }
            },
            {
                "name": "multitask-job",
                "job_clusters": [
                    {
                        "new_cluster": {
                            "spark_version": "9.1.x-cpu-ml-scala2.12",
                            "num_workers": 1,
                            "node_type_id": "{some-node-type-id}"
                        },
                        "job_cluster_key": "basic-cluster"
                    }
                ],
                "tasks": [
                    {
                        "task_key": "first-task",
                        "deployment_config": {
                            "no_package": true
                        },
                        "job_cluster_key": "basic-cluster",
                        "notebook_task": {
                            "notebook_path": "/Repos/some/notebook"
                        }
                    },
                    {
                        "task_key": "second-task",
                        "job_cluster_key": "basic-cluster",
                        "spark_python_task": {
                            "python_file": "file://some/entrypoint.py",
                            "parameters": [
                                "--conf-file",
                                "file:fuse://some/conf/file.yml"
                            ]
                        }
                    }
                ]
            }
        ]
    }
}

As per examples above - it’s possible to provide a per-job or per-task deployment properties in the deployment_config section.