Packaging arbitrary files in Python Packages

Whilst writing a Python Package with dbx, you might be in a need to add some arbitrary files to your Python package.

Such arbitrary files can include:

  • *.sql files where your Spark SQL logic resides

  • small static data files that are used in your pipelines

  • test files, e.g. when you need to have a file in a specific format

Standard Python packaging tools allow to simply collect, combine and package such arbitrary files together with the main package code.


This example is written for packaging. For tools like poetry and another packaging formats please check their respective docs.

Referencing files

First of all, we’ll need to reference files in the files.

Imagine having the following project structure:

├── <package-name>
│       ├──
│       └── resources
│           ├── raw
│           │   └── username.csv
│           └── sql
│               └── create_table.sql

It’s a good practice to keep all arbitrary files in a separate directory (in this case it’s located in <package-name>/resources.

In the the package_data field is responsible for referencing files from this folder:

from setuptools import setup
    package_data={'': ['resources/sql/*.sql', "resources/raw/*.csv"]},

Using the referenced files

To access the referenced files, do the following in Python:

import pkg_resources

raw_csv_path = pkg_resources.resource_filename(
    "<package-name>", "resources/raw/username.csv"
query_path = pkg_resources.resource_filename(
    "<package-name>", "resources/sql/create_table.sql"

The provided paths can be used to locally read these files for any purpose.