Border-Patrol

Border-Patrol logs all imported packages and their version to support you while debugging. In 95% of all cases when something suddenly breaks in production it is due to some different version in one of your requirements. Pinning down the versions of all your dependencies and dependencies of dependencies inside a virtual environment helps you to overcome this problem but is quite cumbersome and thus this method is not always applied in practice. Also sometimes, like when you are using PySpark, you might not be 100% sure which library versions are installed on some cluster nodes.

With Border-Patrol you can easily find the culprit by looking in the logs of the last working version and compare it to the failing one since Border-Patrol will list all imported packages and their corresponding version right at the end of your application, even if it crashed.

Usage

Border-Patrol is really simple to use, just install it with pip install border-patrol and import it before any other package, e.g.:

from border_patrol import with_print_stdout

import pandas as pd

If you run those lines in a script, you will get a similar output to this one:

Python version is 3.6.7 |Anaconda, Inc.| (default, Oct 23 2018, 14:01:38)
[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)]
Following packages were imported:
PACKAGE         VERSION   PATH
border_patrol   0.1       /Users/fwilhelm/Sources/border_patrol/src/border_patrol
cycler          0.10.0    /Users/fwilhelm/anaconda/envs/lib/python3.6/site-packages/cycler.py
dateutil        2.7.5     /Users/fwilhelm/anaconda/envs/lib/python3.6/site-packages/dateutil/__init__.py
matplotlib      2.2.3     /Users/fwilhelm/anaconda/envs/lib/python3.6/site-packages/matplotlib/__init__.py
numpy           1.15.1    /Users/fwilhelm/anaconda/envs/lib/python3.6/site-packages/numpy/__init__.py
pandas          0.23.4    /Users/fwilhelm/anaconda/envs/lib/python3.6/site-packages/pandas/__init__.py
pyparsing       2.3.0     /Users/fwilhelm/anaconda/envs/lib/python3.6/site-packages/pyparsing.py
pytz            2018.7    /Users/fwilhelm/anaconda/envs/lib/python3.6/site-packages/pytz/__init__.py
six             1.11.0    /Users/fwilhelm/anaconda/envs/lib/python3.6/site-packages/six.py

If you import with_print_stdout, Border-Patrol will use print as output function whereas with_print_stderr will print to standard error. Since most production applications will rather use the logging module, you can tell Border-Patrol to use it by importing with_log_{error|warning|info|debug}. For instance from border_patrol import with_log_info will log the final report by using the INFO logging level.

If you want even more fine grained control you can import the BorderPatrol class directly from the border_patrol package and use the register() and unregister() method to activate and deactivate it, respectively. At any point the tracking can be circumvented by using border_patrol.builtin_import.

How does it work?

Border-Patrol is actually quite simple. It overwrites the __import__ function in Python’s builtins package to track every imported module. For each module the corresponding package is determined and the version number is retrieved with the help of the __version__ attribute which most professional libraries provide at the package level. If this fails the distribution name for the package is determined, e.g. scikit-learn is the distribution containing the sklearn package, with the help of pkg_resources which is a part of setuptools. Then the distribution name is used to determine the version number also using pkg_resources similar to how pip would do it.

Finally, Border-Patrol registers an atexit handler to be called when your application finishes and reports all imported modules. To avoid any problem registering these things more than once, Border-Patrol is implemented as a singleton and thus it is not thread-safe.

Note

This project has been set up using PyScaffold 3.1. For details and usage information on PyScaffold see https://pyscaffold.org/.

Contents

Indices and tables