Advertisement

Pro Python 3 pp 373-390 | Cite as

Distribution

  • J. Burton Browning
  • Marty Alchin
Chapter
  • 2.8k Downloads

Abstract

Once you have a working application, the next step is to decide how and where to distribute it. You might be writing it for yourself, but most likely you will have a wider audience and have a set schedule for releasing it. There are a number of decisions to be made and tasks to be performed before you can do that, however. This process consists primarily of packaging and distribution, but it begins with licensing.

Once you have a working application, the next step is to decide how and where to distribute it. You might be writing it for yourself, but most likely you will have a wider audience and have a set schedule for releasing it. There are a number of decisions to be made and tasks to be performed before you can do that, however. This process consists primarily of packaging and distribution, but it begins with licensing.

Licensing

Before releasing any code to the public, you must decide on a license that will govern its use. A license will allow you to convey to your users how you intend your code to be used, how you expect others to use it, what you ask from them in return, and what rights you expect them to confer on users of their own code after integrating with yours. These are complex questions that can’t be answered in a universal way for every project. Instead, you’ll need to consider a number of issues.

Your own philosophy plays a key role, as it affects many other decisions. Some people intend to earn a living from their code, which could mean the source code won’t be released at all. Instead, your work could be offered as a service that customers can pay to use. By contrast, you may be interested in helping people learn to do things better, faster, easier, or more reliably. Perhaps the most common license is the GPL.

GNU General Public License

When people think of open source, the GNU General Public License (GPL)1 is often the first thing to come to mind. As one of the vanguards of the free software movement, its primary goal is to preserve a certain group of freedoms to the users of software. The GPL requires that if you distribute your program to others, you must also make the source code of that program available to them. That way they’re free to make modifications to your code as they see fit, in order to better support their own needs.

Furthermore, the promise of the GPL is that any users who do alter your code can only distribute their modifications under the GPL or a license that ensures at least the same freedoms. This way users of the software can be confident that if it doesn’t work to their satisfaction, they have a way to make it better no matter how far removed it may be from the original author.

Because the GPL places requirements on any modifications made to the original code and code that links to it, it’s sometimes referred to as “viral.” That’s not necessarily an insult; it simply refers to the fact that the GPL forces the same license on anything that uses it. In other words, it spreads through software in much the same way as a traditional virus. This isn’t unique to the GPL, but it’s the feature many in the business world think of first when they think of the GPL and open source in general.

Because the goal of the GPL is to preserve freedoms for computer users, it can be seen as restricting the freedom of programmers. The freedom of a programmer to distribute an application without divulging the source code restricts the freedom of a user to modify that code. Of those two opposing forces, the GPL is designed to preserve the user’s freedoms by placing a number of restrictions on the behavior of programmers.

The GPL and Python

The GPL was written primarily for statically compiled languages, such as C and C++, so it often speaks in terms of code in “object form” that may be “statically linked” to other code. In other words, when you create a C++ executable, the compiler inserts the code from the libraries you reference to make a stand-alone program. These terms are central to its vocabulary, but aren’t as clearly understood when applied to dynamic languages such as Python. Many Python applications use the GPL because of its overall philosophy, but its terms have yet to be tested in court in the context of a Python application.

It may seem like such details wouldn’t really matter because Python code is generally distributed as source code anyway. The term generally here has exceptions, such as if you used py2exe to make a Windows-compiled Python application. After all, compiled Python bytecode isn’t compatible with all the various systems in which the code might be used. But because the GPL also applies to any other applications that use the code, these details become important if, for example, a statically compiled application uses GPL Python code internally for some features. It has yet to be seen whether such use would trigger the GPL’s requirements on the distribution of that new application’s source code.

Because these restrictions must also be passed on to any other application that includes GPL code, the available licenses that can work with it are limited. Any other license you might consider must include at least the same restrictions as the GPL, although additional restrictions can be added if necessary. One example of this is the AGPL.

Affero General Public License

With the proliferation of the Internet, it’s now quite common for users to interact with software without ever obtaining a copy of that software directly. Because the GPL relies on distribution of code to trigger the requirement to also distribute source code, online services such as web sites and mail systems are exempt from that requirement. Some have argued that those exemptions violate the spirit of the GPL by exploiting a loophole in its provisions.

To close that loophole, the Affero General Public License (AGPL) was created. This license contains all the restrictions of the GPL as well as the added feature that any user interacting with the software, even by way of a network, will trigger the distribution clause. That way, web sites that incorporate AGPL code must divulge the source code for any modifications they’ve made and any additional software that shares common internal data structures with it. Although a bit slow to be adopted by the masses, certainly approval by the Open Source Initiative (OSI) gives this license important support.

Note

Even though the terminology and philosophy of the AGPL are very similar to the GPL, its applicability to Python is a bit more clear. Because just interacting with the software triggers the terms of the license, it doesn’t matter as much whether the code is compiled from a static language such as C or built from a dynamic language such as Python. This also has yet to be tested in court for Python cases, however.

Because the AGPL is more restrictive than the GPL itself, it’s possible for a project that uses AGPL to incorporate code that was originally licensed with the standard GPL. All of the protections of the GPL remain intact, while some extra ones are added. There’s also a variant of the GPL that incorporates fewer restrictions, called the LGPL.

GNU Lesser General Public License

Because the GPL states that statically linking one piece of code to another triggers its terms, many small utility libraries were used less often than they might otherwise have been. These libraries typically don’t constitute an entire application on their own, but because their usefulness requires tight integration with the host application, many developers avoided them in order to avoid their own applications being also bound to the GPL.

The GNU Lesser General Public License (LGPL) was created to handle these cases by removing the static linking clause. Thus, a library released under the LGPL could be freely used in a host application without requiring the host be bound by the LGPL or any other specific license. Even proprietary, commercial applications with no intention of releasing any source code can incorporate code licensed with the LGPL.

All of the other terms remain intact, however, so any modifications to the LGPL code must be distributed as source code if the code itself is distributed in any way. For this reason, many LGPL libraries have extremely flexible interfaces that allow their host applications as many options as possible without having to modify the code directly.

Essentially, the LGPL leans more toward using the notion of open source to foster a more open programming community than to protect the rights of the software’s eventual audience. Further down that road is one of the most liberal open source licenses available: BSD.

Berkeley Software Distribution License

The Berkeley Software Distribution (BSD) license provides a way to release code with the intent of fostering as much adoption as possible. It does this by placing relatively few limitations on the use, modification, and distribution of the code by other parties. In fact, the entire text of the license consists of just a few bullet points and a disclaimer. Referring to BSD as a single license is a misnomer, however, as there are actually a few variations. In its original form, the license consisted of four points:
  • Distributing the source code to the program requires that the code retain the original copyright, the text of the license, and its disclaimer.

  • Distributing the code as a compiled binary program requires the copyright, license text, and disclaimer be included somewhere in the documentation or other materials provided with the distributed code.

  • Any advertising used to promote the final product must attribute the BSD-licensed code as being included in the product.

  • Neither the name of the organization that developed the software nor the names of any of its contributors may be used to specifically endorse the product without explicit consent beyond the license itself.

Notice that this contains no requirement that the source code be distributed at all, even when distributing compiled code. Instead, it only requires that the appropriate attribution is retained at all times and that it remains clear that there are two separate parties involved. This allows BSD-licensed code to be included in proprietary, commercial products with no need to release the source code behind it, making it fairly attractive to large corporations.

The advertising clause caused some headaches with organizations trying to use BSD-licensed code, however. The primary problem is that as the code itself changed hands and was maintained by different organizations, each organization that had a hand in its development must be mentioned by name in any advertising materials. In some cases that could be dozens of different organizations, accounting for a significant portion of advertising space, especially when software often contains quite a few other disclaimers for other reasons.

To address those concerns, another version of the BSD license was created without the advertising clause. This license is called the New BSD license, and it includes all the other requirements of the original. The removal of the advertising clause meant that changes in management of the BSD-licensed code had very little impact on organizations using it, which broadened its appeal considerably.

One further reduction of the BSD license is called the Simplified BSD license. In this variation even the nonendorsement clause is removed, leaving only the requirements that the text of the license and its disclaimer be included. In order to still avoid untrue endorsement, the disclaimer in this version includes an extra sentence that clearly states that the views of both groups are independent of each other.

Other Licenses

The options listed here are some of the more commonly chosen, but there are many more available. The OSI maintains a list of open source licenses2 that have been examined and approved as preserving the ideals of open source. In addition, the Free Software Foundation maintains its own list of licenses3 that have been approved as preserving the ideals of free software.

Note

The difference between free software and open source is primarily philosophical, but does have some real-world implications. In a nutshell, free software preserves the freedom of users of that software, whereas open source focuses on the software development model. Not all licenses are approved for both uses, so you may need to decide which is more important to you.

Once you have a license in place, you can start the process of packaging and distributing your code to others who can make use of it.

Packaging

It’s not very easy to distribute a bunch of files individually, so you’ll first have to bundle them up. This process is called packaging, but it shouldn’t be confused with the standard Python notion of a package. Traditionally, a package is simply a directory with an __init__.py file in it, which can then be used as a namespace for any modules contained in that directory.

For the purposes of distribution, a package also includes documentation, tests, a license, and installation instructions. These are arranged in such a way that the individual parts can be easily extracted and installed into appropriate locations. Typically, the structure looks something like this:

AppName/
    LICENSE.txt
    README.txt
    MANIFEST.in
    setup.py
    app_name/
        __init__.py
        ...
    docs/
        ...
    tests/
        __init__.py
        ...

As you can see, the actual Python code package is a subdirectory of the overall application package, and it sits as a peer alongside its documentation and tests. The documentation contained in the docs directory can contain any form of documentation you prefer, but is usually filled with plain text files formatted using reStructuredText, as described in Chapter  8. The tests directory contains tests such as those described in Chapter  9. The LICENSE.txt file contains a copy of your chosen license and README.txt provides an introduction to your application, its purpose, and its features.

The more interesting features of this overall package are setup.py and MANIFEST.in, which aren’t otherwise part of the application’s code.

setup.py

Inside your package, setup.py is the script that will actually install your code into an appropriate location on a user’s system. In order to be as portable as possible, this script relies on the distutils package provided in the standard distribution. That package contains a setup() function that uses a declarative approach to make the process easier to work with and more generic.

Located within distutils.core, the setup() function accepts a wide array of keyword arguments, each of which describes a particular feature of the package. Some pertain to the package as a whole, whereas others list individual contents that are included in the package. Three of these arguments are required for any package to be distributed using standard tools:
  • name: This string contains the public name of the package as it will be displayed to those who are looking for it. Naming a package can be a complex and difficult task, but as it’s highly subjective, it’s well beyond the scope of this book.

  • version: This is a string containing the dot-separated version number of the application. It’s common for first releases to use a version of '0.1' and increase from there. The first number is typically a major version indicating a promise of compatibility. The second is a minor version number, representing a collection of bug fixes or significant new features that don’t break compatibility. The third is typically reserved for security releases that introduce no new functionality or other bug fixes.

  • url: This string references the main web site where users can learn more about the application, find more documentation, request support, file bug reports, or do other tasks. It typically serves as a central hub for information and activity surrounding the code.

In addition to these three required elements, there are several optional arguments that can provide further detail about the application:
  • author: The name of the author(s) of the application.

  • author_email: An email address where the author can be reached directly.

  • maintainer: If the original author is no longer maintaining the application, this field contains the name of the person now responsible for it.

  • maintainer_email: An email address where the maintainer can be reached directly.

  • description: This string provides a brief description of the purpose of the program. Think of it as a one-line description that could be shown in a list alongside others.

  • long_description: As its name implies, this is a longer description of the application. Rather than being used in lists, this one is typically shown when a user requests more detail about the specific application. Because this is all specified in Python code, many distributions simply read the contents of README.txt into this argument.

Beyond this metadata, the setup() function is responsible for maintaining a list of all the files necessary to distribute the application, including all Python modules, documentation, tests, and licenses. Like the other information, these details are supplied using additional keyword arguments. All paths listed here are relative to the main package directory where setup.py itself is located:
  • license: This is the name of a file that contains the full text of the license under which the program is distributed. Typically that file is called LICENSE.txt, but by explicitly passing it in as an argument, it can be named whatever you prefer.

  • packages: This argument accepts a list of package names where the actual code is located. Unlike license, these values are Python import paths, using periods to separate individual packages along the path.

  • package_dir: If your Python packages aren’t in the same directory as setup.py, this argument provides a way to tell setup() where to find them. Its value is a dictionary that maps a package name to its location in the filesystem. One special key you can use is an empty string, which will use the associated value as a root directory to look for any packages that don’t have an explicit path specified.

  • package_data: If your package relies on data files that aren’t written in Python directly, those files will only get installed if referenced in this argument. It accepts a dictionary that maps package names to their contents, but unlike package_dir, the values in this dictionary are lists, with each value in the list being a path specification to the files that should be included. These paths may include asterisks to indicate broad patterns to match against, similar to what you can query on the command line.

There are other options for more complex configurations, but these should cover most of the bases. For more information, consult the distutils documentation.4 Once you have the pieces in place, you’ll have a setup.py that looks something like this:

from distutils.core import setup
setup(name='MyApp',
      version='0.1',
      author='Marty Alchin',
      author_email='marty@propython.com',
      url='http://propython.com/',
      packages=['my_app', 'my_app.utils'],
)

MANIFEST.in

In addition to setup.py specifying what files should be installed on a user’s system, a package distribution also includes a number of files that are useful to the user without being installed directly. These files, such as documentation, should be available to users with the package but don’t have any code value, so they shouldn’t be installed in an executable location. The MANIFEST.in file controls how these files should be added to the package.

MANIFEST.in is a plain text file, populated with a series of commands that tell distutils what files to include in the package. The filename patterns used in these commands follows the same conventions as the command line, allowing asterisks to serve as a wildcard for a broad range of filenames. For example, a simple MANIFEST.in might include any text files in the package’s docs directory:

include docs/*.txt

This simple instruction will tell disutils to find all the text files in the docs directory and include them in the final package. Additional patterns could be included by separating the patterns with a space. There are a few different commands available, each of which has an include and exclude version available:
  • include: The most obvious option, this command will look for all files that match any of the given patterns and include them in the package. They’ll be placed in the package at the same location as they were found in the original directory structure.

  • exclude: The opposite of include, this will tell distutils to ignore any files that match any of the patterns given here. This provides a way to avoid including some files, without having to explicitly list every included file in an include command. A common example would exclude TODO.txt in a package that specifically includes all text files.

  • recursive-include: This command requires a directory as its first argument, prior to any filename patterns. It then looks inside that directory and any of its subdirectories for any files that match the given patterns.

  • recursive-exclude : Like recursive-include, this command takes a directory first, followed by filename patterns. Any files that are found by this command are not included in the package, even if they’re found by one of the inclusion commands.

  • global-include: This command finds all the paths in the project, regardless of where they may be within the path structure. By looking inside directories, it works much like recursive-include, but because it looks through all directories, it doesn’t need to take any argument other than the filename patterns to look for.

  • global-exclude: Like global-include, this finds matching files anywhere in the source project, but the files found are excluded from the final package.

  • graft: Rather than looking for matching files, this command accepts a set of directories that are simply included in the package in their entirety.

  • prune: Like graft, this command takes a set of directories, but it excludes them from the package completely, even if there were matching files inside.

With both setup.py and MANIFEST.in in place, distutils provides an easy way to bundle up the package and prepare it for distribution.

The sdist Command

To finally create the distributable package, your new setup.py is actually executable directly from a command line. Because this script is also used to install the package later, you must specify what command you’d like it to carry out. Users who obtain the package later will use the install command, but to package up a source distribution, the command is sdist:

$ python setup.py sdist
running sdist
...

This command processes the declarations made in setup.py as well as the instructions from MANIFEST.in to create a single archive file that contains all of the files you’ve specified for distribution. The type of archive file you get by default depends on the system you’re running, but sdist provides a few options that you can specify explicitly. Simply pass in a comma-separated list of formats to the --format option to generate specific types:
  • zip: The default on Windows machines, this format creates a zip file.

  • gztar: The default on Unix machines, including Mac OS, this creates a gzipped tarball. To also create this archive on a Windows system, you’ll need an implementation of tar installed, such as the one available through Cygwin.5

  • bztar: This command uses the alternative bzip compression on the archive tarball. This also requires an implementation of tar installed.

  • ztar: This uses the simpler compress algorithm to compress the tarball. As with the others, an implementation of tar is required to use this option.

  • tar: Rather than using compression, this option simply bundles up a tarball if an implementation of the tar utility is available.

When you run the sdist command, archive files for each of the formats you specified will be created and placed inside a new dist directory within your project. The names of each archive will simply use the name and version you supplied in setup.py, separated by a hyphen. The example provided earlier would result in files such as MyApp-0.1.zip.

Let’s try all of the preceding steps in one example. Follow along with each step to create your zip package:
  1. 1.

    Create a folder you can easily access via a command prompt such as c:\test.

     
  2. 2.

    In the folder, create the following two files named setup.py and MyApp.py:

    #setup.py
    from distutils.core import setup
    setup(name='MyApp',
          version='0.1',
          author='Alchin and Browning',
          author_email='authors@propython.com',
          url='http://www.propython.com/',
    )
    # MyApp.py
    print("Hello Burton and Marty!")
    gone=input("Enter to close: ")

     
  3. 3.

    Shell out to a command prompt, change into the test directory, and execute the command:

    python setup.py sdist  (Enter)

     
  4. 4.

    Press Enter. (If it does not start Python, you will need to check your search path and ensure that your system can find Python.)

     

This will create a dist directory in the test folder with the zip file for your package.

Of course that was a very simple overview, but you have the flexibility to add a manifest file, change compression options, and so on.

Distribution

Once you have these files in place, you’ll need a way to distribute them to the public. One option is to simply host your own web site and serve up the files from there. That’s typically the best way to market your code to a wide audience because you have an opportunity to put the documentation online in a more readable way, show examples of it in use, offer testimonials from people who are already using it, and anything else you can come up with.

The only problem with simply hosting it yourself is that it becomes fairly difficult to find using automated tools. Many packages will rely on the presences of other applications, so it’s often useful to be able to install them directly from inside a script, without having to navigate to a web site and find the right link to download. Ideally, they would be able to translate a unique package name into a way to download that package and install it without assistance.

This is where the Python Package Index (PyPI) 6 comes into play. The secret code name of PyPI is “cheeseshop,” which is an allusion to the Monty Python Cheese shop skit where John Cleese tries to purchase cheese from the shop Michael Palin is running . . . which has none available.

PyPI is an online collection of Python packages that all follow a standardized structure, so they can be discovered more easily. Each has a unique name that can be used to locate it, and the index keeps track of which version is the latest and references the URL to that package. All you need to do is add your package to the index and it will be much easier for your users to work with.

Uploading to PyPI for the first time requires registration on the site. A PyPI account will allow you to manage your application details later and upload new versions and updates. Once you have an account, you can run python setup.py register to set up a page for your application at PyPI. This is an interactive script that will offer you three options for registering your account:
  • Use an existing PyPI account. If you’ve created an account on the PyPI web site already, you can specify your username and password here.

  • Register a new PyPI account. If you’d rather create an account at the command line, you can enter your details here and have the account created during registration.

  • Generate a new PyPI account. If you’d like to take a simpler approach, this option will take the username you’re already using in your operating system, generate a password automatically, and register an account for that combination.

Once you choose your option, the register script will offer to save your account information locally, so you won’t have to go through that step every time. With an account in place, the script will register the application with PyPI, using the information in setup.py. In particular, the name and long_description fields will combine to form a simple web page, with other details shown in a list.

With a page in place to hold the application, the last step is to upload the code itself using the upload command. This must be done as part of a distribution build, even if you had previously built a distribution. That way, you can specify exactly what type of distributions you’d like to send to PyPI. For example, you can upload packages for both Windows and non-Windows users in a single step:

$ python setup.py sdist --format=zip,gztar upload

The distribution files are named according to the name of the application and its version number at the time the distribution was created. The entry in PyPI also contains a reference to the version number, so you can’t upload the same distribution type of the same version more than once. If you try, you’ll get an error from setup.py indicating that you’ll need to create a new version number in order to upload a changed distribution.

Exciting Python Extensions: Secrets Module

The Secrets module offers Python programmers some handy random number and password generating tools. Its main feature though is the cryptographically strong nature of the random number algorithm.

The secrets module, introduced in Python 3.6, has many functions available in it. One is random number generation. And while this has been covered with some other libraries, it is still interesting to examine.

Your computer operating system will factor in on the exact nature of the random numbers generated, but generally for cryptographic work, this random library will do a better job than the other random number generators available in Python. Such cryptographic uses would include: passwords, authentication, and tokens. Read on to see how handy this module is.

Random Numbers

There are quite a few random token and random number generation options. To see how they work, consider that the next example will pick a random number between 0 and 100.

#Secrets example 1
from secrets import *
x=1
while (x <= 10):
    print(randbelow(100))
    x+=1

In the preceding example we selected 10 random values from 1 to 100. Not exciting, but a better cryptographic representation of random values. Next we will consider random password generation.

Password Generation

In this next example, we will use both the string library and the secrets library to generate a password with ASCII letter, digits, punctuation, and uppercase letters:

#Generate six digit passwd with letters, digits, punct, and upper
import string
from secrets import *
chars = string.ascii_letters + string.digits + string.punctuation + string.ascii_uppercase
password = ".join(choice(chars) for i in range(6))
print (password)

If you needed a token for cryptographic work, there are options including urlsafe. Consider the following example:

#Generate a token value which is URL-safe
from secrets import *
value = token_urlsafe(10)
print('token is: ',value)

Here we are using choice, but with this library you might try the following:

#Generate a secrets random choice
from secrets import *
value = choice(['one', 'two', 'three'])
print (value)

Lastly, if you wanted to enter values and select a random set from them, try the following:

#Generate a random choice based on only certain values
from secrets import *
foo=input('Enter 10 random values to choose from:  ')
wow=“.join([choice(foo) for i in range(3)])
print('These are three exciting choices at random:>   ',wow)

There’s nothing here to save the world from a zombie apocalypse, but these examples are still very interesting uses of the Python secrets module.

Taking It With You

As you can see, the process of packaging and distributing a Python application using PyPI is actually fairly straightforward. Beyond PyPI, it’s usually a good idea to put together a dedicated project web site, where you can better promote and support your code. Always remember that distribution isn’t the last step. Your users will expect a certain amount of support and interaction as they use your code and hope to improve it, so it’s best to find a medium that supports those goals for you and your users.

Applications of all different sizes, audiences, and goals are fair game for distribution. It doesn’t matter if you’re writing a small utility to help automate common tasks or an entire framework to power a set of features for other users’ code. The next chapter will show you how to build such a framework from start to finish, building on many of the techniques shown throughout this book.

Footnotes

  1. 1.

    See GNU Operating System, “GNU General Public License,” http://propython.com/gpl .

  2. 2.

    See Open Source Initiative, “Licenses by Name,” http://propython.com/osi-licenses .

  3. 3.

    See GNU Operating System, “Various Licenses and Comments about Them,” http://propython.com/fsf-licenses .

  4. 4.

    See Distributing Python Modules, “2. Writing the Setup Script,” http://propython.com/distutils-setup .

  5. 5.
  6. 6.

    See Python Package Index (PyPl), http://propython.com/pypi .

Copyright information

© J. Burton Browning and Marty Alchin 2019

Authors and Affiliations

  • J. Burton Browning
    • 1
  • Marty Alchin
    • 2
  1. 1.Oak IslandUSA
  2. 2.Agoura HillsUSA

Personalised recommendations