Path adjustment logic during deployment
During deployment, dbx
supports uploading local files and properly referencing them in the job definition.
Any keys referenced in the deployment file starting with file://
or file:fuse://
will be uploaded to the artifact storage.
References are resolved with relevance to the root of the project.
There are two types of how the file path will be resolved and referenced in the final deployment definition:
Standard - This definition looks like this
file://some/path/in/project/some.file
. This definition will be resolved intodbfs://<artifact storage prefix>/some/path/in/project/some.file
FUSE - This definition looks like this
file:fuse://some/path/in/project/some.file
. This definition will be resolved into/dbfs/<artifact storage prefix>/some/path/in/project/some.file
The latter type of path resolution might come in handy when the using system doesn’t know how to work with cloud storage protocols.
Please find more examples on path resolution below:
{
"default": {
"jobs": [
{
"name": "your-job-name",
"new_cluster": {
"spark_version": "7.3.x-cpu-ml-scala2.12",
"node_type_id": "some-node-type",
"aws_attributes": {
"first_on_demand": 0,
"availability": "SPOT"
},
"num_workers": 2
},
"libraries": [],
"max_retries": 0,
"spark_python_task": {
"python_file": "file://placeholder_1.py",
"parameters": [
"file:fuse://placeholder_1.py",
"./placeholder_1.py"
]
}
}
]
}
}
environments:
default:
jobs:
- name: "your-job-name"
new_cluster:
spark_version: "7.3.x-cpu-ml-scala2.12"
node_type_id: "some-node-type"
aws_attributes:
first_on_demand: 0
availability: "SPOT"
num_workers: 2
libraries: []
max_retries: 0
spark_python_task:
python_file: "file://placeholder_1.py"
parameters:
- "file:fuse://placeholder_1.py"
- "./placeholder_1.py"