Python Tooling

Table of Contents
Introduction
With the new year, I wanted to go over some of my favorite Python tools that I learned about and used over the past year, and really just describe my starting point for Python-based projects. If you think:
Wow, Python is easy to get started with, but taking it seriously is hard.
then I hope this post will help you reconsider. Yes, for a long time Python tooling was not great, and made using Python outside of small scripts difficult. However, the community has come a long way in making Python easier to use in more robust settings.
Tools
Let’s get right into the tools that I’ve come to love this past year.
Environment Management
First and foremost, dependency management with Python sucks. pip
is the default
package manager for Python, that comes with nearly any Python installation, and is
the de facto standard.
The first annoying thing about pip
is that if you naively run
pip install <package>
, it will install that package to your
system-level Python interpreter. If you you have multiple projects on your machine,
then now you have a mess of dependencies of multiple projects that may conflict with
each other. npm
for example, will create a node_modules
folder in the same
directory and install the packages there. With Python, you have to explicitly
create a virtual environment for every single project you want to work on. Thankfully,
Python these days ships with the venv
module built in, but it’s annoying
you have to do this, and even when you use python -m venv
, activating the virtual
environment command is different for different operating systems.
Secondly, pip
is painful to use in a repeatable manner. Let’s say you want
to add the requests package as a dependency to
your project. If you write your requirements.txt
file like this:
1requests
then every time you run python -m pip install requirements.txt
, you’ll get the
latest version of the requests package every time. This is almost always
not what you want. Every time you build/install/distribute code, dependencies should be
the exact same, always. I’ve lost track the number of times version updates
of dependencies have broken my program, because I wasn’t careful in pinning version
numbers.
So, to fix this, you write your requirements.txt
file like this:
1requests==2.26.0
Now, you’re closer, but requests depends on 4 other packages. So while you’ll get
the same version of requests every time, you may get different versions of its
dependencies, depending on how requests specifies its dependency versions. While
closer, you now need to write your requirements.txt
file like this to capture
everything:
You can see how this can become a nightmare, especially if you ever actually want to update one of your dependencies.
Most sane package managers (such as npm
shockingly enough), use two separate files
to track dependencies. One for the top-level project dependencies, and then a lock
file which has the exact version of every dependency and sub-dependency. pip
has no such concept of this.
pipenv
is an attempt to remedy these problems, and gets close,
but man, long story short, it sucks to use. Super slow, doesn’t work with
cross-platform teams (as in, generating a lock file on Windows versus Linux
will often yield different results), and an unintuitive CLI. I find instead, that
poetry
works really well for this.
poetry
is sort of an all-in-one replacement for pip
, venv
and the setup.py
file. You create a pyproject.toml
file (the new de facto standard for the Python
community on project configuration) with information similar to the following:
1# project information
2[tool.poetry]
3name = "pyleft"
4version = "1.0.0"
5description = "Python type annotation existence checker"
6license = "MIT"
7readme = "README.md"
8homepage = "https://github.com/NathanVaughn/pyleft"
9repository = "https://github.com/NathanVaughn/pyleft.git"
10authors = ["Nathan Vaughn <REDACTED>"]
11classifiers = [
12 "Intended Audience :: Developers",
13 "Topic :: Software Development :: Libraries :: Python Modules",
14 "Topic :: Software Development :: Quality Assurance",
15]
16
17# dependencies along with supported Python versions
18[tool.poetry.dependencies]
19python = ">=3.6.2,<4.0"
20toml = ">=0.10.0,<1"
21pathspec = ">=0.9.0, <1"
22
23# development dependencies
24[tool.poetry.dev-dependencies]
25pytest = "^6.2.4"
26black = "^21.9b0"
27isort = "^5.9.3"
28
29# needed to compile as a package
30[build-system]
31requires = ["poetry-core>=1.0.0"]
32build-backend = "poetry.core.masonry.api"
Then you can run poetry install
to automatically create a virtual environment
and install all dependencies. Run poetry shell
to activate the virtual environment,
or poetry update
to update dependencies to the latest version with your version
specifier. If you want to build a package, just run poetry build
with no need
to faff around with a confusing setup.py
file.
Publishing to PyPi is just poetry publish
.
For me, it’s really been a game-changer for dependency management and makes life so much easier.
Two tips for using poetry
:
- By default,
poetry
will create virtual environments in a cache directory. I prefer to keep them in the same directory as the project, so runpoetry config virtualenvs.in-project true
to enable this. - Poetry always wants to install things in a virtual environment.
When running anything automated, especially on disposable systems,
this is annoying and you must prefix every
python
command withpoetry run
. A much easier way is to runpoetry config virtualenvs.create false
to disable virtual environment creation. Additionally, you can also dopoetry export -o requirements.txt
to export the dependencies to apip
requirements file you can install withpython -m pip install -r requirements.txt
. This is really helpful especially with Docker.
Testing
There’s not a ton to say about testing with Python.
I like pytest
and have been using it for years.
However, a great addition to pytest
is the
pytest-cov
plugin.
Once you install it, just add a couple options to you pytest
command to create a
coverage report.
A simple
1pytest
becomes
1pytest --cov=. --cov-report=html --cov-branch --cov-context=test
To get full branch and context shown in the HTML report, you need to add
to your pyproject.toml
file.
You can add these options to your pyproject.toml
file as well so you don’t
have to remember them.
Static Analysis
Besides running your code, there is also a lot that can be analyzed statically, without needing to execute a single line.
Formatting
I’m a stickler about code formatting. I generally don’t care what it is,
I just want it to be consistent. I also want to be to more or less hit “format”
in my editor and have all my code magically fixed. For Python, there is one excellent
tool for this: black
. black
is a Python code
formatter based on the Henry Ford quote:
A customer can have a car painted any color that he wants, so long as it is black.
black
has almost no formatting options other than line length, and it’s fantastic.
However, black
doesn’t really do anything about the imports in your Python code.
For that, we need two more tools.
The first is isort
. isort
really only does one
thing, and that is to take your imports, group them by type
(standard library, first-party, third-party) and then alphabetize them.
For example, this:
would become:
However, isort
doesn’t have the power to remove unused imports for you. The second
tool for that is autoflake
. autoflake
is able to automatically remove unused imports and variables. Unfortunately autoflake
doesn’t support any sort of config file at all, so all options must be specified
via the CLI.
With the combination of those three tools, no matter how terribly you write your code, it will come out cleaned up every time. This is generally what I do in CI in my projects to enforce formatting:
1# install packages
2python -m pip install black isort autoflake
3# first, run black to format
4python -m black .
5# now, sort the imports
6python -m isort . --profile black
7# finally, remove unused imports
8python -m autoflake . --in-place --recursive --remove-all-unused-imports
If feasible, I then have the changes committed back to the branch, or add
--check
to each command to make it fail if there any differences.
Type Checking
Python type hinting is something I’ve already talked about a great deal
here but in short, it’s a fantastic way
to check for issues in your code without needing to execute it.
For this, I like to use pyright
.
While pyright
does a great job at checking type hints for any issues,
it doesn’t actually check to make sure that all your type hints exist.
I like to require 100% type hinting in my code repos, and it’s easy
to accidentally forget them, so I made the tool
pyleft
to help with this.
pyleft
doesn’t check if type hints are correct, it just makes sure they are there.
For example:
1> pyleft .
2- tests\files\fail_1.py
3 Argument 'two' of function 'add:1' has no type annotation
4- tests\files\fail_2.py
5 Function 'add:1' has no return type annotation
6- tests\files\fail_3.py
7 Function 'drive:2' has no return type annotation
8- tests\files\fail_4.py
9 Argument 'one' of function 'wheels:4' has no type annotation
Combined with pyright
or even mypy
,
this is a great way to check that your code is fully type checked.
Linting
Last but not least is linting. While all of the above helps manage dependencies,
run tests, validate formatting, and ensure type safety, the last piece of the puzzle
is ensuring best practices through linting. I’ll be honest, I’m not super sold on this
still. I feel like I spend a lot more time hitting “ignore” than it brings value.
Especially tools like bandit
, which, last I tried, would complain about
web requests being insecure, even with a hardcoded URL. However, for now
I’ve been using flake8
to help
check for easy things like using == None
instead of the better is None
.
However, flake8
inexplicably doesn’t support pyproject.toml
files. To get
around this, I’ve used pyproject-flake8
as a wrapper around flake8
to support pyproject.toml
files.
To support black
formatting, I add the following ignores:
Then to run, use pflake8
instead of flake8
:
Conclusion
In conclusion, Python tooling has come a long way. As an example, here’s roughly what I would set up in GitHub Actions for pull requests on a Python project:
1name: Tests
2
3on:
4 pull_request:
5 branches:
6 - main
7
8jobs:
9 test:
10 runs-on: ubuntu-latest
11 strategy:
12 matrix:
13 # whatever Python versions you choose to support
14 python_version: ["3.10", "3.9", "3.8", "3.7", "3.6"]
15
16 steps:
17 - name: Checkout Code
18 uses: actions/checkout@v2
19
20 - name: Setup Python ${{ matrix.python_version }}
21 uses: actions/setup-python@v2
22 with:
23 python-version: ${{ matrix.python_version }}
24
25 - name: Setup Poetry
26 uses: Gr1N/setup-poetry@v7
27
28 - name: Configure Poetry
29 run: poetry config virtualenvs.create false
30
31 - name: Install Python Dependencies
32 run: poetry install
33
34 - name: Run Tests
35 run: pytest -v
36
37 formatting:
38 runs-on: ubuntu-latest
39 steps:
40 - name: Checkout Code
41 uses: actions/checkout@v2
42
43 # latest supported version is probably best
44 - name: Setup Python 3.10
45 uses: actions/setup-python@v2
46 with:
47 python-version: "3.10"
48
49 - name: Setup Poetry
50 uses: Gr1N/setup-poetry@v7
51
52 - name: Configure Poetry
53 run: poetry config virtualenvs.create false
54
55 - name: Install Python Dependencies
56 run: poetry install
57
58 - name: Run Black
59 run: python -m black . --check
60
61 - name: Run Isort
62 run: python -m isort . --profile black --check
63
64 - name: Run Autoflake
65 run: python -m autoflake . --recursive --remove-all-unused-imports --check
66
67 type-checking:
68 runs-on: ubuntu-latest
69 steps:
70 - name: Checkout Code
71 uses: actions/checkout@v2
72
73 # run on the lowest supported version
74 - name: Setup Python 3.6
75 uses: actions/setup-python@v2
76 with:
77 python-version: "3.6"
78
79 - name: Setup Poetry
80 uses: Gr1N/setup-poetry@v7
81
82 - name: Configure Poetry
83 run: poetry config virtualenvs.create false
84
85 - name: Install Python Dependencies
86 run: poetry install
87
88 - name: Install Pyright
89 run: npm install pyright
90
91 - name: Run Pyright
92 run: npx pyright .
93
94 - name: Run Pyleft
95 run: python -m pyleft .
96
97 linting:
98 runs-on: ubuntu-latest
99 steps:
100 - name: Checkout Code
101 uses: actions/checkout@v2
102
103 # latest supported version is probably best
104 - name: Setup Python 3.10
105 uses: actions/setup-python@v2
106 with:
107 python-version: "3.10"
108
109 - name: Setup Poetry
110 uses: Gr1N/setup-poetry@v7
111
112 - name: Configure Poetry
113 run: poetry config virtualenvs.create false
114
115 - name: Install Python Dependencies
116 run: poetry install
117
118 - name: Run Pflake8
119 run: python -m pflake8 .
This is just a starting point, and can be modified to fit your project. There’s a lot of things you can do like adding caching, uploading code coverage, using a private package registry, make slower jobs like tests run after the formatting has passed, etc. It also shouldn’t be too hard to port to other CI systems such as Azure Pipelines or GitLab.
Hopefully this has helped you rethink how to manage your Python projects
and maintain code quality. As an example, I recommend looking at my
pyleft
project which features
all of this in action, and is pretty small and digestible.