Skip to content

Troubleshooting

BLAS/LAPACK

If you see errors related to BLAS/LAPACK libraries, see this StackOverflow post for guidance.

UV

The default python version and gentropy dependencies are managed by uv. To perform a fresh installation run make setup-dev.

Adding new dependencies or updating existing ones

To add new dependencies or update existing ones, you need to update the pyproject.toml file. This can be done automatically with uv add ${package} command. Refer to the uv documentation for more information.

Java

Officially, PySpark requires Java version 8, or 11, 17. To support hail (gentropy dependency) it is recommended to use Java 11.

Pre-commit

If you see an error message thrown by pre-commit, which looks like this (SyntaxError: Unexpected token '?'), followed by a JavaScript traceback, the issue is likely with your system NodeJS version.

One solution which can help in this case is to upgrade your system NodeJS version. However, this may not always be possible. For example, Ubuntu repository is several major versions behind the latest version as of July 2023.

Another solution which helps is to remove Node, NodeJS, and npm from your system entirely. In this case, pre-commit will not try to rely on a system version of NodeJS and will install its own, suitable one.

On Ubuntu, this can be done using sudo apt remove node nodejs npm, followed by sudo apt autoremove. But in some cases, depending on your existing installation, you may need to also manually remove some files. See this StackOverflow answer for guidance.

MacOS

Some functions on MacOS may throw a java error:

python3.10/site-packages/py4j/protocol.py:326: Py4JJavaError

This can be resolved by adding the follow line to your ~/.zshrc:

export OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES

Creating development dataproc cluster (OT users only)

To start dataproc cluster in the development mode run

make create-dev-cluster

Tip

This command will work, provided you have fully commited and pushed all your changes to the remote repository.

The command will create a new dataproc cluster with the following configuration:

  • package installed from the current branch you are checkout on (for example dev or feature/xxx)
  • uv installed in the cluster (to speed up the installation and dependency resolution process)
  • cli script to run gentropy steps

This process requires gentropy to be installable by git repository - see VCS support in pip documentation.