Python Tooling Volume 2

Introduction

Since I’ve written “Modern Python tools for modern Python developers” I’ve made a number of changes to my Python workflows as I’ve discovered and started using new tools. I want to share these tools to help you potentially improve your workflow and efficiency

Tools

Most of my tools are still the same from my previous post, so I’ll focus on the changes I’ve made.

Environment Management

First off, let’s talk about environment management again. I am still using Poetry, however there are a number of competitors in this space that do similar things, that I want to discuss. Pradyun Gedam has a great article on this topic here, but I would like to dive a little deeper in some of the tools.

To start, I still think Poetry is a wonderful tool. I think has a very intuitive CLI, that works the same on every operating system. poetry install installs all the dependencies. poetry shell spawns a shell with the environment already activated. No need to remember source .venv/bin/activate and .venv\Scripts\activate. poetry run lets you run a command inside the virtual environment, without having it activated, for automated situations like CI/CD or Docker. poetry build builds a .tar.gz and .whl file of your package. poetry publish uploads the package to PyPi. Additionally, poetry has a lock file format that installs the same dependency versions every time. At my last job, some of our Python applications were being used to make business decisions, so having reproducible installations was of great concern for us.

However, Poetry has two major pain points. The first is that by default, Poetry creates virtual environments in its own cache directory, rather than in the same project directory. While this can be changed, it’s annoying, and can cause unexpected results in CI/CD. The second is Poetry’s refusal to follow a number of PEPs (Python Enhancement Proposal) that are trying to modernize Python packaging around the pyproject.toml file:

Two tools that address some of these pain points are:

Both are interesting. Hatch is a semi-official tool that is under the umbrella of the Python Packaging Authority. It does nearly everything Poetry does and more, and strictly adheres to Python PEPs. Unfortunately, as of writing, I think it’s strict adherence to Python PEPs makes it not viable. From the FAQ:

The only caveat is that currently there is no support for re-creating an environment given a set of dependencies in a reproducible manner. Although a standard lock file format may be far off since PEP 665 was rejected, resolving capabilities are coming to pip. When that is stabilized, Hatch will add locking functionality and dedicated documentation for managing applications.

So currently, Hatch does not have any sort of lockfile, and therefore, dependency versions are not guaranteed to be the same. As someone who has had too many things break due to dependency upgrades, this is a feature I will not go without. I definitely look forward to the future of Hatch, however. It’s ability to create multiple Python environments like tox is a great idea, though not being able to locate them in the project directory is a major downside.

The other tool, pdm takes heavy inspiration from poetry, but altered to follow PEP 517, PEP 621, and experimental support for PEP 582. Unlike poetry, it creates environments directly in the project directory. It also has awesome support for a script system, like npm.

The only thingI find missing from pdm is a shell command like poetry. While there is a third party plugin for it (pdm-shell), I found (at least on Windows) it doesn’t work very well. I think given more time to mature, pdm may become a very enticing poetry alternative. It does all the same things right as poetry, but fixes some of the annoyances, and adds even more features.

Pre-Commit

The major new tool I’ve using is pre-commit. While I was vaguely aware of it before, I never really tried it, nor understood what it could do for me.

At it’s simplest, pre-commit is a tool written in Python, that takes a list of commands, and runs them against files that have changed when you run git commit. This takes advantage of git hooks to work. While pre-commit works with a large number of languages and ecosystems like docker, it primarily seems to be used with Python tools. pre-commit automatically installs every tool in an isolated virtual environment.

To use pre-commit, you need to create a .pre-commit-config.yaml file in the root of your repository (.yaml versus .yml is very important). In this file, you list hooks you would like to use from git repositories like so:

 1repos:
 2  - hooks:
 3      - id: check-json
 4      - id: check-toml
 5      - id: check-yaml
 6      - id: check-case-conflict
 7      - id: trailing-whitespace
 8      - id: check-merge-conflict
 9      - id: mixed-line-ending
10    repo: https://github.com/pre-commit/pre-commit-hooks
11    rev: v4.4.0
12
13  - hooks:
14      - id: absolufy-imports
15    repo: https://github.com/MarcoGorelli/absolufy-imports
16    rev: v0.3.1

pre-commit is very picky that rev must be an exact commit hash, or tag of the repo. What pre-commit is doing here, is downloading the repository at the specified rev, and reading the .pre-commit-hooks.yaml from that repository to configure a virtual environment.

With your .pre-commit-config.yaml file created, you can install pre-commit with

1pip install pre-commit

and install the git hook with

1pre-commit install

Now, whenever you run git commit, the command pre-commit run runs first, which checks only the files that have changed. To check all files, run

1pre-commit run --all-files

instead.

What I like so much about pre-commit, is that it helps create a single source of checks that should be run, locally and in CI/CD. In the past, getting my team members to remember to run the myriad of tools (black, isort, autoflake) we used to format and check code, before committing was always a pain, and a mistake I often made myself. This would inevitably result in CI/CD failing for simple mistakes that could have been caught easily. This frustrates developers, and requires setting up steps in CI/CD to run all these tools individually, and keeping the command line option in-sync can often be challenging (looking at you autoflake with no config file).

With pre-commit, developers get code checked before every commit, and it’s very easy to create a CI/CD pipeline to run the exact same checks, just in case someone forgets to run pre-commit install:

Clone repo
Install dependencies
Run pre-commit run --all-files --show-diff-on-failure

If you’re using poetry, there’s also a plug available to automatically run pre-commit install after poetry install: poetry-pre-commit-plugin

1poetry self add poetry-pre-commit-plugin

There are a few downsides with pre-commit however. The first and most major one to me, is updating the tools used is challenging to automate. Dependabot (my auto-updater of choice since it’s built in to GitHub) does not support it at time of writing: dependabot/dependabot-core#1524

There are a few workarounds. The first one is, Renovate does, so if you’re using that, that’s fantastic. Additionally, if you use the creator’s CI service, that also automatically updates tools for you. The option I have decided to go with is to use a GitHub Action on a schedule to update the tools and make a pull request. However, this has the downside, that pull requests created by GitHub Actions, can’t then trigger subsequent GitHub Actions (to prevent infinite loops), so verifying the changes requires manual effort:

 1# Inspired by: https://browniebroke.com/blog/gh-action-pre-commit-autoupdate/
 2name: Pre-Commit Update
 3
 4on:
 5  workflow_dispatch:
 6  schedule:
 7    - cron: 0 8 * * 1
 8
 9jobs:
10  auto-update:
11    runs-on: ubuntu-latest
12    steps:
13      - uses: actions/checkout@v3
14        with:
15          ref: main
16
17      - name: Install pre-commit-update
18        run: pip install pre-commit-update
19
20      - name: Run pre-commit-update
21        run: pre-commit-update
22
23      - name: Create Pull Request
24        uses: peter-evans/create-pull-request@v4
25        with:
26          token: ${{ github.token }}
27          branch: update/pre-commit-update
28          title: Update Pre-Commit hooks
29          commit-message: Update Pre-Commit hooks
30          body: |
31            Update versions of tools in pre-commit configs to latest version.
32            Please verify the changes locally, as a pull request created by GitHub Actions cannot trigger workflows:
33            ```bash
34            git fetch
35            git checkout update/pre-commit-update
36            git pull
37            poetry run pre-commit run --all-files
38            ```            
39          labels: dependencies

You may notice, I’m also using the third party pre-commit-update tool, since I find it works better than the built in pre-commit autoupdate command. Poetry in particular conflicts with this built-in command.

Type Checking

While I’m still using pyright for type checking since it’s tightly integrated with VS Code, I have started using it locally through pre-commit as well, rather than install it with npm and defining a package.json file. This is facilitated by the community pyright Python package.

1repos:
2  - hooks:
3      - id: pyright
4    repo: https://github.com/RobertCraigie/pyright-python
5    rev: v1.1.289

However, since pre-commit installs tools into their own virtual environment, this means pyright needs to be told where the actual virtual environment is. In pyproject.toml, add:

1[tool.pyright]
2    typeCheckingMode = "basic" # turn on type checking
3    venvPath         = "."     # parent venv path
4    venv             = ".venv" # venv directory name

Additionally, if using Poetry, you also need to force Poetry to create the virtual environment in your project folder, rather than its normal cache directory. Otherwise, this will not work in CI/CD, or on other developer’s machines. Add the file poetry.toml:

1[virtualenvs]
2in-project = true

With these settings added, everything will work as you would expect.

Linting

In the past, I have used isort for sorting imports, autoflake for removing unused imports, and flake8 to lint code. I recently learned about the tool ruff which basically does all three in one, with proper pyproject.toml support. Probably the best part about it, is that it can automatically fix a number of issues, like sorting imports (isort), removing unused imports (autoflake) and common issues like var == None.

Interestingly, like pyright, it’s not written in Python. Much like some JavaScript tools, it’s written in Rust, so it is way faster than native Python.

Using it with pre-commit is easy, there is a specific repo for it:

1- hooks:
2    - args:
3        - --fix
4      id: ruff
5  repo: https://github.com/charliermarsh/ruff-pre-commit
6  rev: v0.0.223

There is also a VS Code extension which highlights issues, and helps automatically fix them: https://marketplace.visualstudio.com/items?itemName=charliermarsh.ruff

Conclusion

In conclusion, pre-commit makes things so much easier, in having a consistent local development and CI/CD pipeline experience. While the tools I’m using largely are the same (black, pyright, pyleft), how I’m using them got so much easier to manage.

As an example, here’s roughly what I would set up with pre-commit and GitHub Actions for pull requests on a Python project:

 1repos:
 2  - hooks:
 3      - id: check-json
 4      - id: check-toml
 5      - id: check-yaml
 6      - id: check-case-conflict
 7      - id: trailing-whitespace
 8      - id: check-merge-conflict
 9    repo: https://github.com/pre-commit/pre-commit-hooks
10    rev: v4.4.0
11  - hooks:
12      - id: poetry-check
13      - args:
14          - --no-update
15        id: poetry-lock
16    repo: https://github.com/python-poetry/poetry
17    rev: 1.3.2
18  - hooks:
19      - id: black
20    repo: https://github.com/psf/black
21    rev: 22.12.0
22  - hooks:
23      - args:
24          - --fix
25        id: ruff
26    repo: https://github.com/charliermarsh/ruff-pre-commit
27    rev: v0.0.223
28  - hooks:
29      - id: pyleft
30    repo: https://github.com/nathanvaughn/pyleft
31    rev: v1.1.4
32  - hooks:
33      - id: pyright
34    repo: https://github.com/RobertCraigie/pyright-python
35    rev: v1.1.289
36  - hooks:
37      - id: markdownlint
38    repo: https://github.com/igorshubovych/markdownlint-cli
39    rev: v0.33.0

 1name: Tests
 2
 3on:
 4  pull_request:
 5    branches:
 6      - main
 7
 8jobs:
 9  test:
10    runs-on: ubuntu-latest
11    if: "!contains(github.event.head_commit.message, 'ci skip')"
12
13    strategy:
14      fail-fast: false
15      matrix:
16        python_version: ["3.11", "3.10", "3.9", "3.8", "3.7"]
17
18    steps:
19      - name: Checkout Code
20        uses: actions/checkout@v3
21
22      - name: Install Poetry
23        run: |
24          python -m pip install pip wheel pipx --upgrade
25          pipx install poetry          
26
27      - name: Setup Python ${{ matrix.python_version }}
28        uses: actions/setup-python@v4
29        with:
30          # last version is default
31          python-version: |
32            3.11
33            ${{ matrix.python_version }}            
34          cache: poetry
35
36      - name: Cache Pre-Commit
37        uses: actions/cache@v3
38        with:
39          path: ~/.cache/pre-commit
40          key: pre-commit-${{ hashFiles('.pre-commit-config.yaml') }}
41          restore-keys: pre-commit-
42
43      - name: Install Python Dependencies
44        run: poetry install --sync
45
46      - name: Run Pre-Commit Checks
47        run: poetry run pre-commit run --all-files --color=always --show-diff-on-failure
48
49      - name: Run Tests
50        run: poetry run pytest -v

Python Tooling Volume 2

Table of Contents

Introduction #

Tools #

Environment Management #

Pre-Commit #

Type Checking #

Linting #

Conclusion #