Core dependency management and build options
Building the project package
By default, dbx
is heavily oriented towards Python-based projects that could be compiled into a .whl file.
To simplify this functionality during deployment, dbx will automatically create a .whl file in the dist/
folder during the dbx deploy
command.
However, in some cases there is no need to build the Python-based wheel file. Such cases might be:
You’re not using Python in your project (e.g. your project is written in Scala or Java)
You’re only using notebooks
You’re using an external packaging tool such as poetry
To disable the local package rebuild during deployment, please provide --no-rebuild
switch during deployment:
dbx deploy --no-rebuild
Core dependency management
Note
By core dependency package we mean the .whl file located in the dist/
folder.
The core dependency package is very important part of the dependencies that needs to b deployed in case if you develop your job with Python packaging mechanisms.
By default, dbx
uploads the package file and adds it into the job definition (if it’s a single task job), or into every task (if it’s a multitask-job).
In the resulting definition you’ll be able to see the package file referenced as:
"libraries": [
{
"whl": "dbfs:/Shared/dbx/projects/<some_project>/<some-hash>/artifacts/dist/<some-package>-<version>-py3-none-any.whl"
}
]
However, in some cases you would like to omit the core dependency reference in the job definition, for example:
You’re not using Python in your project (e.g. your project is written in Scala or Java)
You’re only using notebooks and they’re not dependent on the .whl file since the code is shipped together with Repos
In such cases, you can do one of the following:
- Disable package file reference globally for the whole deployment.In this case package file won’t be added to the
libraries
section nor on the job level, neither on the task level.This could be achieved by providing--no-package
switch to the deployment command:
dbx deploy --no-package
- You can disable package file references on a per-job or per-task levelby providing the following in the deployment configuration file:
{
"default": {
"jobs": [
{
"name": "single-task-job",
"deployment_config": {
"no_package": true
},
"notebook_task": {
"notebook_path": "/Repos/some/notebook"
}
},
{
"name": "multitask-job",
"job_clusters": [
{
"new_cluster": {
"spark_version": "9.1.x-cpu-ml-scala2.12",
"num_workers": 1,
"node_type_id": "{some-node-type-id}"
},
"job_cluster_key": "basic-cluster"
}
],
"tasks": [
{
"task_key": "first-task",
"deployment_config": {
"no_package": true
},
"job_cluster_key": "basic-cluster",
"notebook_task": {
"notebook_path": "/Repos/some/notebook"
}
},
{
"task_key": "second-task",
"job_cluster_key": "basic-cluster",
"spark_python_task": {
"python_file": "file://some/entrypoint.py",
"parameters": [
"--conf-file",
"file:fuse://some/conf/file.yml"
]
}
}
]
}
]
}
}
environments:
default:
jobs:
- name: single-task-job
deployment_config:
no_package: true
notebook_task:
notebook_path: "/Repos/some/notebook"
- name: multitask-job
job_clusters:
- new_cluster:
spark_version: 9.1.x-cpu-ml-scala2.12
num_workers: 1
node_type_id: "{some-node-type-id}"
job_cluster_key: basic-cluster
tasks:
- task_key: first-task
deployment_config:
no_package: true
job_cluster_key: basic-cluster
notebook_task:
notebook_path: "/Repos/some/notebook"
- task_key: second-task
job_cluster_key: basic-cluster
spark_python_task:
python_file: file://some/entrypoint.py
parameters:
- "--conf-file"
- file:fuse://some/conf/file.yml
As per examples above - it’s possible to provide a per-job or per-task deployment properties in the deployment_config
section.