Pip vs Package Manager for handling Python Packages

Python packages are frequently hosted in many distribution’s repositories. After reading this tutorial, specifically the section titled “Do you really want to do this” I have avoided using pip and preferred to use the system repository, only resorting to pip when I need to install a package not in the repository.

However, because this is an inconsistent installation method, would it be better to only use pip? What are the benefits/detractors to using pip over the system’s own repository for packages that are available in both places?

The link I included states

The advantage of always using standard Debian / NeuroDebian packages, is that the packages are carefully tested to be compatible with each other. The Debian packages record dependencies with other libraries so you will always get the libraries you need as part of the install.

I use arch. Is this the case with other package-management systems besides apt?

Asked By: mas

||

The biggest disadvantage I see with using pip to install Python modules on your system, either as system modules or as user modules, is that your distribution’s package management system won’t know about them. This means that they won’t be used for any other package which needs them, and which you may want to install in the future (or which might start using one of those modules following an upgrade); you’ll then end up with both pip– and distribution-managed versions of the modules, which can cause issues (I ran into yet another instance of this recently). So your question ends up being an all-or-nothing proposition: if you only use pip for Python modules, you can no longer use your distribution’s package manager for anything which wants to use a Python module…

The general advice given in the page you linked to is very good: try to use your distribution’s packages as far as possible, only use pip for modules which aren’t packaged, and when you do, do so in your user setup and not system-wide. Use virtual environments as far as possible, in particular for module development. Especially on Arch, you shouldn’t run into issues caused by older modules; even on distributions where that can be a problem, virtual environments deal with it quite readily.

It’s always worth considering that a distribution’s library and module packages are packaged primarily for the use of other packages in the distribution; having them around is a nice side-effect for development using those libraries and modules, but that’s not the primary use-case.

Answered By: Stephen Kitt

Another reason to go with the package manager is that updates will be automatically applied which is critical for security. Think if the beans package Equifax used had been automatically updated via yum-cron-security, the hack may not have happened.

On my personal dev box I use Pip, in prod I use packages.

Answered By: Joe M

If we’re talking about installing python packages to use in code you’re writing, use pip.

For each project you’re working on, create a virtual environment, and then only use pip to install the things that that project needs. That way, you install all the libraries you use in a consistent way, and they’re contained and don’t interfere with anything you install via your package manager.

If you’re planning to release any python code, typically, you’ll add a setup.py or requirements.txt to your project, which will allow pip to automatically get all it’s dependencies. Allowing you to easily create or recreate a virtual environment for that project.

Answered By: SpoonMeiser

TL;DR

  • use pip (+ virtualenv) for stuff (libs, frameworks, maybe dev tools) your projects (that you develop) use
  • use the package manager for applications you use (as an end-user)

Development dependencies

If you’re developing software in Python, you’ll want to use pip for all of the project’s dependencies, be they runtime dependencies, build-time dependencies or stuff needed for automated testing and automated compliance checks (linter, style checker, static type checker …)

There are several reasons for this:

  • This allows you to use virtualenv (either directly or through virtualenvwrapper or pipenv or other means) to separate dependencies of different projects from each other and to isolate the python applications you use “in production” (as a user) from any exotic shenanigans (or even just incompatibilities) that may go on in development.
  • This allows you to track all of a project’s dependencies in a requirements.txt (if your project is an application) or setup.py (if your project is a library or framework) file. This can be checked into revision control (e.g. Git) together with the source code, so that you always know which version of your code relied on what versions of your dependencies.
  • The above enables other developers to collaborate on your project even if they don’t use the same Linux distribution or not even the same operating system (if the used dependencies are also available on Mac and Windows or whatever they happen to use, that is)
  • You don’t want automatic updates of your operating system’s package manager to break your code. You should update your dependencies, but you should do so consciously and at times you choose, so that you can be ready to fix your code or roll back the update. (Which is easy if you track the complete dependency declaration in your revision control system, together with your code.)

If you feel you need to separate direct and indirect dependencies (or distinguish between acceptable version range for a dependency and actual version used, cf. “version pinning”) look into pip-tools and/or pipenv. This will also allow you to distinguish between build and test dependencies. (The distinction between build and runtime dependencies can probably be encoded in setup.py)

Applications you use

For stuff that you use as normal application and that just happens to be written in Python, prefer your operating system’s package manager. It’ll make sure it stays reasonably up-to-date and compatible to other stuff installed by the package manager. Most Linux distributions will also assert that they don’t distribute any malware.

If something you need isn’t available in your distribution’s default package repo, you can check out additional package repos (e.g. launchpad of deb-based distros) or use pip anyway. If the latter, use --user to install into your user’s home instead of system-wide, so that you’re less likely to break your Python installation. (For stuff you only need temporarily or seldom, you may even use a virtualenv.)

Answered By: das-g

Summary

There are three general categories of modules you’re dealing with:

  1. Those supporting programs installed for all users with the OS package system. (This may even include tools and libraries used by people programming in Python; see below.) For these you use the OS packages where you can, and pip installs to the system directories where necessary.
  2. Those supporting programs installed by a particular user only for her own use, and also for certain aspects of her "day-to-day" use of Python as a scripting language. For these she uses pip --user, perhaps pyenv or pythonz, and similar tools and tactics.
  3. Those supporting development and use of a specific application. For these you use virtualenv (or a similar tool).

Each level here may also be getting support from a previous level. For example, our user in (2) may be relying on a Python interpreter installed via OS packages.

Going into this in a bit more detail:

System Programs and Packages

Programs written in Python that you want to "just run" are easy: just use the OS install tools and let them bring in whatever they need; this is no different from a non-Python program. This is likely to bring in Python tools/libraries (such as the Python interpreter itself!) that users on your machine may start to rely on; this isn’t a problem so long as they understand the dependency and, ideally, know alternative means to handle it on hosts that don’t provide those dependencies.

A common and simple example of such a dependency is some of my personal scripts in ~/.local/bin/ that start with #!/usr/bin/env python. These will work fine (so long as they run under Python 2) on RH/CentOS 7 and most (but not all) Ubuntu installs; they will not under a basic Debian install or on Windows. Much as I dislike my personal environment having much in the way of dependencies on OS packages (I work on a number of different OSes), something like this I find fairly acceptable; my backup plan on the rare hosts that don’t have a system Python and can’t get one is to go with a User system as described below.

People using a system python interpreter are also usually dependent on the system pip3. That’s about where I usually draw the line on my system dependencies; everything from virtualenv forward I deal with myself. (For example, my standard activate script relies on whatever pip3 or pip is in the path, but downloads its own copy of virtualenv to bootstrap the virtual environment it’s creating.

That said, there are probably circumstances where it’s perfectly reasonable to make more of a development environment available. You might have Python interfaces into complex packages (such as a DBMS) where you want to use the system version of that and you feel it’s best you also let the system choose the particular Python library code you’ll use to talk to it. Or you may be deploying a lot of hosts with a basic development environment for a Python class, and find it easiest to automate with standard system packages.

User "Day-to-day" Programs and Packages

Users may have Python libraries or programs that don’t fit well into a virtual environment because they’re wanted to help create virtual environments in the first place (e.g., virtualenvwrapper) or they’re things you commonly use from the command line even while doing non-Python work. Even if they do have the capability to install system versions of these, they may feel more comfortable installing their own (e.g., because they want to be using the latest version of the tool and its dependencies).

Generally pip --user is what people will be using for this, though certain dependencies, such as the Python interpreter itself, require a bit more than that. pyenv and pythonz are useful for building personal interpreters (whether installed in ~/.local/bin to be the default interpreter or otherwise), and of course one can always just build "by hand" from source if the dev libraries are available.

I try to keep the bare minimum set of things installed here: virtualenvwrapper (because I use it constantly) and perhaps the latest version of pip. I try to avoid dependencies outside the standard library or on Python 3 for personal scripts I write to be used across many hosts. (Though we’ll see how long I can hold out with that as I move more and more of these personal scripts to Python.)

Separate Application Development and Runtime Environments

This is the usual virtualenv thing. For development you should almost always be using a virtualenv to ensure that you’re not using system dependencies, or often more than one to test against different Python versions.

These virtual environments are also good for applications with a lot of dependencies where you want to avoid polluting your user environment. For example I usually set up a virtualenv for running Jupyter notebooks and the like.

Answered By: cjs