Python Tooling

Introduction

With the new year, I wanted to go over some of my favorite Python tools that I learned about and used over the past year, and really just describe my starting point for Python-based projects. If you think:

Wow, Python is easy to get started with, but taking it seriously is hard.

then I hope this post will help you reconsider. Yes, for a long time Python tooling was not great, and made using Python outside of small scripts difficult. However, the community has come a long way in making Python easier to use in more robust settings.

Tools

Let’s get right into the tools that I’ve come to love this past year.

Environment Management

First and foremost, dependency management with Python sucks. pip is the default package manager for Python, that comes with nearly any Python installation, and is the de facto standard.

The first annoying thing about pip is that if you naively run pip install <package>, it will install that package to your system-level Python interpreter. If you you have multiple projects on your machine, then now you have a mess of dependencies of multiple projects that may conflict with each other. npm for example, will create a node_modules folder in the same directory and install the packages there. With Python, you have to explicitly create a virtual environment for every single project you want to work on. Thankfully, Python these days ships with the venv module built in, but it’s annoying you have to do this, and even when you use python -m venv, activating the virtual environment command is different for different operating systems.

1python -m venv .venv
2
3# Windows
4.venv\Scripts\activate
5
6# Linux
7source .venv/bin/activate

Secondly, pip is painful to use in a repeatable manner. Let’s say you want to add the requests package as a dependency to your project. If you write your requirements.txt file like this:

1requests

then every time you run python -m pip install requirements.txt, you’ll get the latest version of the requests package every time. This is almost always not what you want. Every time you build/install/distribute code, dependencies should be the exact same, always. I’ve lost track the number of times version updates of dependencies have broken my program, because I wasn’t careful in pinning version numbers.

So, to fix this, you write your requirements.txt file like this:

1requests==2.26.0

Now, you’re closer, but requests depends on 4 other packages. So while you’ll get the same version of requests every time, you may get different versions of its dependencies, depending on how requests specifies its dependency versions. While closer, you now need to write your requirements.txt file like this to capture everything:

1requests==2.26.0
2certifi==2021.10.8
3charset-normalizer==2.0.9
4idna==3.3
5urllib3==1.26.7

You can see how this can become a nightmare, especially if you ever actually want to update one of your dependencies.

Most sane package managers (such as npm shockingly enough), use two separate files to track dependencies. One for the top-level project dependencies, and then a lock file which has the exact version of every dependency and sub-dependency. pip has no such concept of this.

pipenv is an attempt to remedy these problems, and gets close, but man, long story short, it sucks to use. Super slow, doesn’t work with cross-platform teams (as in, generating a lock file on Windows versus Linux will often yield different results), and an unintuitive CLI. I find instead, that poetry works really well for this.

poetry is sort of an all-in-one replacement for pip, venv and the setup.py file. You create a pyproject.toml file (the new de facto standard for the Python community on project configuration) with information similar to the following:

 1# project information
 2[tool.poetry]
 3name = "pyleft"
 4version = "1.0.0"
 5description = "Python type annotation existence checker"
 6license = "MIT"
 7readme = "README.md"
 8homepage = "https://github.com/NathanVaughn/pyleft"
 9repository = "https://github.com/NathanVaughn/pyleft.git"
10authors = ["Nathan Vaughn <REDACTED>"]
11classifiers = [
12    "Intended Audience :: Developers",
13    "Topic :: Software Development :: Libraries :: Python Modules",
14    "Topic :: Software Development :: Quality Assurance",
15]
16
17# dependencies along with supported Python versions
18[tool.poetry.dependencies]
19python = ">=3.6.2,<4.0"
20toml = ">=0.10.0,<1"
21pathspec = ">=0.9.0, <1"
22
23# development dependencies
24[tool.poetry.dev-dependencies]
25pytest = "^6.2.4"
26black = "^21.9b0"
27isort = "^5.9.3"
28
29# needed to compile as a package
30[build-system]
31requires = ["poetry-core>=1.0.0"]
32build-backend = "poetry.core.masonry.api"

Then you can run poetry install to automatically create a virtual environment and install all dependencies. Run poetry shell to activate the virtual environment, or poetry update to update dependencies to the latest version with your version specifier. If you want to build a package, just run poetry build with no need to faff around with a confusing setup.py file. Publishing to PyPi is just poetry publish.

For me, it’s really been a game-changer for dependency management and makes life so much easier.

Two tips for using poetry:

By default, poetry will create virtual environments in a cache directory. I prefer to keep them in the same directory as the project, so run poetry config virtualenvs.in-project true to enable this.
Poetry always wants to install things in a virtual environment. When running anything automated, especially on disposable systems, this is annoying and you must prefix every python command with poetry run. A much easier way is to run poetry config virtualenvs.create false to disable virtual environment creation. Additionally, you can also do poetry export -o requirements.txt to export the dependencies to a pip requirements file you can install with python -m pip install -r requirements.txt. This is really helpful especially with Docker.

Testing

There’s not a ton to say about testing with Python. I like pytest and have been using it for years. However, a great addition to pytest is the pytest-cov plugin. Once you install it, just add a couple options to you pytest command to create a coverage report.

A simple

1pytest

becomes

1pytest --cov=. --cov-report=html --cov-branch --cov-context=test

To get full branch and context shown in the HTML report, you need to add

1[tool.coverage.html]
2show_contexts = true

to your pyproject.toml file.

You can add these options to your pyproject.toml file as well so you don’t have to remember them.

1[tool.pytest.ini_options]
2addopts = "--cov=. --cov-report=html --cov-branch --cov-context=test"

Static Analysis

Besides running your code, there is also a lot that can be analyzed statically, without needing to execute a single line.

Formatting

I’m a stickler about code formatting. I generally don’t care what it is, I just want it to be consistent. I also want to be to more or less hit “format” in my editor and have all my code magically fixed. For Python, there is one excellent tool for this: black. black is a Python code formatter based on the Henry Ford quote:

A customer can have a car painted any color that he wants, so long as it is black.

black has almost no formatting options other than line length, and it’s fantastic. However, black doesn’t really do anything about the imports in your Python code. For that, we need two more tools.

The first is isort. isort really only does one thing, and that is to take your imports, group them by type (standard library, first-party, third-party) and then alphabetize them.

For example, this:

1import os
2import local.module
3
4import json
5import requests

would become:

1import json
2import os
3
4import requests
5
6import local.module

However, isort doesn’t have the power to remove unused imports for you. The second tool for that is autoflake. autoflake is able to automatically remove unused imports and variables. Unfortunately autoflake doesn’t support any sort of config file at all, so all options must be specified via the CLI.

With the combination of those three tools, no matter how terribly you write your code, it will come out cleaned up every time. This is generally what I do in CI in my projects to enforce formatting:

1# install packages
2python -m pip install black isort autoflake
3# first, run black to format
4python -m black .
5# now, sort the imports
6python -m isort . --profile black
7# finally, remove unused imports
8python -m autoflake . --in-place --recursive --remove-all-unused-imports

If feasible, I then have the changes committed back to the branch, or add --check to each command to make it fail if there any differences.

Type Checking

Python type hinting is something I’ve already talked about a great deal here but in short, it’s a fantastic way to check for issues in your code without needing to execute it. For this, I like to use pyright.

While pyright does a great job at checking type hints for any issues, it doesn’t actually check to make sure that all your type hints exist. I like to require 100% type hinting in my code repos, and it’s easy to accidentally forget them, so I made the tool pyleft to help with this.

pyleft doesn’t check if type hints are correct, it just makes sure they are there. For example:

1> pyleft .
2- tests\files\fail_1.py
3        Argument 'two' of function 'add:1' has no type annotation
4- tests\files\fail_2.py
5        Function 'add:1' has no return type annotation
6- tests\files\fail_3.py
7        Function 'drive:2' has no return type annotation
8- tests\files\fail_4.py
9        Argument 'one' of function 'wheels:4' has no type annotation

Combined with pyright or even mypy, this is a great way to check that your code is fully type checked.

Linting

Last but not least is linting. While all of the above helps manage dependencies, run tests, validate formatting, and ensure type safety, the last piece of the puzzle is ensuring best practices through linting. I’ll be honest, I’m not super sold on this still. I feel like I spend a lot more time hitting “ignore” than it brings value. Especially tools like bandit, which, last I tried, would complain about web requests being insecure, even with a hardcoded URL. However, for now I’ve been using flake8 to help check for easy things like using == None instead of the better is None.

However, flake8 inexplicably doesn’t support pyproject.toml files. To get around this, I’ve used pyproject-flake8 as a wrapper around flake8 to support pyproject.toml files. To support black formatting, I add the following ignores:

1[tool.flake8]
2ignore = "E501,W503"

Then to run, use pflake8 instead of flake8:

1> pflake8 .
2.\PCC\GUI\app.py:61:9: E722 do not use bare 'except'

Conclusion

In conclusion, Python tooling has come a long way. As an example, here’s roughly what I would set up in GitHub Actions for pull requests on a Python project:

  1name: Tests
  2
  3on:
  4  pull_request:
  5    branches:
  6      - main
  7
  8jobs:
  9  test:
 10    runs-on: ubuntu-latest
 11    strategy:
 12      matrix:
 13        # whatever Python versions you choose to support
 14        python_version: ["3.10", "3.9", "3.8", "3.7", "3.6"]
 15
 16    steps:
 17      - name: Checkout Code
 18        uses: actions/checkout@v2
 19
 20      - name: Setup Python ${{ matrix.python_version }}
 21        uses: actions/setup-python@v2
 22        with:
 23          python-version: ${{ matrix.python_version }}
 24
 25      - name: Setup Poetry
 26        uses: Gr1N/setup-poetry@v7
 27
 28      - name: Configure Poetry
 29        run: poetry config virtualenvs.create false
 30
 31      - name: Install Python Dependencies
 32        run: poetry install
 33
 34      - name: Run Tests
 35        run: pytest -v
 36
 37  formatting:
 38    runs-on: ubuntu-latest
 39    steps:
 40      - name: Checkout Code
 41        uses: actions/checkout@v2
 42
 43      # latest supported version is probably best
 44      - name: Setup Python 3.10
 45        uses: actions/setup-python@v2
 46        with:
 47          python-version: "3.10"
 48
 49      - name: Setup Poetry
 50        uses: Gr1N/setup-poetry@v7
 51
 52      - name: Configure Poetry
 53        run: poetry config virtualenvs.create false
 54
 55      - name: Install Python Dependencies
 56        run: poetry install
 57
 58      - name: Run Black
 59        run: python -m black . --check
 60
 61      - name: Run Isort
 62        run: python -m isort . --profile black --check
 63
 64      - name: Run Autoflake
 65        run: python -m autoflake . --recursive --remove-all-unused-imports --check
 66
 67  type-checking:
 68    runs-on: ubuntu-latest
 69    steps:
 70      - name: Checkout Code
 71        uses: actions/checkout@v2
 72
 73      # run on the lowest supported version
 74      - name: Setup Python 3.6
 75        uses: actions/setup-python@v2
 76        with:
 77          python-version: "3.6"
 78
 79      - name: Setup Poetry
 80        uses: Gr1N/setup-poetry@v7
 81
 82      - name: Configure Poetry
 83        run: poetry config virtualenvs.create false
 84
 85      - name: Install Python Dependencies
 86        run: poetry install
 87
 88      - name: Install Pyright
 89        run: npm install pyright
 90
 91      - name: Run Pyright
 92        run: npx pyright .
 93
 94      - name: Run Pyleft
 95        run: python -m pyleft .
 96
 97  linting:
 98    runs-on: ubuntu-latest
 99    steps:
100      - name: Checkout Code
101        uses: actions/checkout@v2
102
103      # latest supported version is probably best
104      - name: Setup Python 3.10
105        uses: actions/setup-python@v2
106        with:
107          python-version: "3.10"
108
109      - name: Setup Poetry
110        uses: Gr1N/setup-poetry@v7
111
112      - name: Configure Poetry
113        run: poetry config virtualenvs.create false
114
115      - name: Install Python Dependencies
116        run: poetry install
117
118      - name: Run Pflake8
119        run: python -m pflake8 .

This is just a starting point, and can be modified to fit your project. There’s a lot of things you can do like adding caching, uploading code coverage, using a private package registry, make slower jobs like tests run after the formatting has passed, etc. It also shouldn’t be too hard to port to other CI systems such as Azure Pipelines or GitLab.

Hopefully this has helped you rethink how to manage your Python projects and maintain code quality. As an example, I recommend looking at my pyleft project which features all of this in action, and is pretty small and digestible.

Python Tooling

Table of Contents

Introduction #

Tools #

Environment Management #

Testing #

Static Analysis #

Formatting #

Type Checking #

Linting #

Conclusion #