More

    What Happens When you Import a Python Module?

    Deep dive into the import system

    Photo by Mike van den Bos from Unsplash

    Reusability is one of the key metrics to measure the quality of the code. It is the extent to which code can be used in different programs with minimal change. In Python, we use import to import code from a module. But have you ever been curious about how import is implemented behind the scenes? In this article, we will deep dive into the import system of Python. We will also discuss an interesting problem: circular imports. Grab a tea, and let’s get straight to the article.

    Module v.s. Package

    Python is organized into modules and packages. A module is one Python file and a package is a collection of modules. Consider the following example of importing a module:

    import random
    random.randint(1,10)

    random is a Python built-in module. In the first line, it imports random module and makes it available to use, and then it accesses randint(). If you open an IDE and debug the import, you will see the code sit in random.py file.

    You can also import randint like this:

    from random import randint
    randint(1,10)

    Let’s check out an example from a package:

    import pandas
    pandas.DataFrame()

    At the first glance, you can’t really tell whether it’s a module or package. But if you debug the import, it will redirect you to pandas.__init__.py instead of pandas.py. A package contains submodules or recursively, sub-packages and __init__.py is the entry point of the package.

    But it’s not the only way, functions like importlib.import_module() and built-in __import__() can also be used.

    >>> import importlib
    >>> importlib.import_module('random')
    >>> __import__('random')

    Package.__init__.py

    So what is __init__.py?

    A regular Python package contains a __init__.py file. When the package is imported, this __init__.py file is implicitly executed and the objects it defines are bound to names in the package’s namespace. This file can be left empty.

    Let’s see an example. I have a folder structure like this. p1 is my package and m1 is a submodule.

    folder structure (Created by Xiaoxu Gao)

    Inside m1.py , I have a variable DATE that I want to use in the main.py . I will create several versions of __init__.py and see how it affects the import in main.py .

    # m1.py
    DATE = "2022-01-01"

    Case1: empty __init__.py file.

    Since __init__.py file is empty when we import p1 , no submodule is imported, thus it doesn’t know the existence of m1. If we import m1 explicitly using from p1 import m1 , then everything inside m1.py will be imported. But then, we are not actually importing a package, but importing a module. As you can imagine, if your package has a lot of submodules, you need to import every module explicitly which can be quite tedious.

    # main.py
    import p1
    p1.m1.DATE
    >> AttributeError: module 'p1' has no attribute 'm1'from p1 import m1
    from p1 import m2, m3 ...# needs to explictly import every submodule
    m1.DATEWorks!!

    Case2: import submodules in __init__.py file

    Instead of leaving it empty, we import everything from m1 in __init__.py file. Then, import p1 in the main.py file will recognize the variables in m1.py and you can directly call p1.DATE without knowing which module it comes from.

    # __init__.py
    from .m1 import * # or from p1.m1 import *
    from .m2 import *
    # main.py
    import p1
    p1.DATE

    You might have noticed the dot before m1. It is a shortcut that tells it to search in the current package. It’s an example of a relative import. An equivalent absolute import will explicitly name the current package like from p1.m1 import * .

    There is a caveat though. If another submodule in the package contains the same variable, the one that is imported later will overwrite the previous one.

    The advantage of having a non-empty __init__.py is to make all the submodules already available for the client when they import the package, so the client code looks neater.

    How does Python find modules and packages?

    The system of finding modules and packages in Python is called Import Machinery which comprises of finders, loaders, caching, and an orchestrater.

    Import Machinery (Created by Xiaoxu Gao)
    1. Search module in cached sys.modules

    Every time you import a module, the first thing searched is sys.modules dictionary. The keys are module names and the values are the actual module itself. sys.modules is a cached dictionary, if the module is there, then it will be immediately returned, otherwise, it will be searched in the system.

    Back to the previous example. When we import p1, two entries are added to sys.modules. The top-level module __init__.py and the submodule m1.py.

    import p1
    import sys
    print(sys.modules)
    {
    'p1': ,
    'p1.m1':
    ...
    }

    If we import it twice, the second import will read from the cache. But if we deliberately delete the entry from sys.modules dictionary, then the second import will return a new module object.

    # read from cache
    import p1
    import sys
    old = p1
    import p1
    new = p1
    assert old is new
    # read from system
    import p1
    import sys
    old = p1
    del sys.modules['p1']
    import p1
    new = p1
    assert not old is new

    2. Search module spec

    If the module is not in sys.modules dictionary, then it needs to be searched by a list of meta path finder objects that have their find_spec() methods to see if the module can be imported.

    import sys
    print(sys.meta_path)
    [ ,
    ,
    ]

    The BuiltinImporter is used for built-in modules. The FronzenImporter is used to locate frozen modules. The PathFinder is responsible for finding modules that are located in one of these paths.

    • sys.path
    • sys.path_hooks
    • sys.path_importer_cache
    • __path__

    Let’s check out what is in sys.path.

    import sys
    print(sys.path)
    [ '/xiaoxu/sandbox',
    '/xiaoxu/.pyenv/versions/3.9.0/lib/python39.zip',
    '/xiaoxu/.pyenv/versions/3.9.0/lib/python3.9',
    '/xiaoxu/.pyenv/versions/3.9.0/lib/python3.9/lib-dynload',
    '/xiaoxu/.local/lib/python3.9/site-packages',
    '/xiaoxu/.pyenv/versions/3.9.0/lib/python3.9/site-packages']

    PathFinder will use find_spec method to look for __spec__ of the module. Each module has a specification object that is the metadata of the module. One of the attributes is the loader . The loader indicates to the import machinery which loader to use while creating the module.

    import p1
    print(p1.__spec__)
    ModuleSpec(name='p1', loader=, origin='/xiaoxu/sandbox/p1/__init__.py', submodule_search_locations=['/xiaoxu/sandbox/p1'])

    3. Load the module

    Once the module spec is found, the import machinery will use the loader attribute to initialize the module and store it in sys.modules dictionary. You can read this pseudo code to understand what happens during the loading portion of import.

    Python Circular Imports

    In the end, let’s look at an interesting problem of import: Circular Imports. A circular import occurs when two or more modules depend on each other. In this example, m2.py depends on m1.py and m1.py depends on m2.py .

    module dependency (Created by Xiaoxu Gao)
    # m1.py
    import m2
    m2.do_m2()
    def do_m1():
    print("m1")
    # m2.py
    import m1
    m1.do_m1()
    def do_m2():
    print("m2")
    # main.py
    import m1
    m1.do_m1()
    AttributeError: partially initialized module 'm1' has no attribute 'do_m1' (most likely due to a circular import)

    Python couldn’t find attribute do_m1 from module m1. So why does this happen? The graph illustrates the process. When import m1, Python goes through m1.py line by line. The first thing it finds is import m2 , so it goes to import m2.py . The first line is to import m1, but since Python didn’t go through everything in m1.py yet, we get a half-initialized object. When we call m1.do_m1() which python didn’t see it, it will raise an AttributeError exception.

    Circular Imports (Created by Xiaoxu Gao)

    So how to fix circular import? In general, circular imports are the result of bad design. Most of the time, the dependency isn’t actually required. A simple solution is to merge both functions into a single module.

    # m.py
    def do_m1():
    print("m1")
    def do_m2():
    print("m2")
    # main.py
    import m
    m.do_m1()
    m.do_m2()

    Sometimes, the merged module can become very large. Another solution is to defer the import of m2 to import it when it is needed. This can be done by placing the import m2 in the function def do_m1(). In this case, Python will load all the functions in m1.py and then load m2.py only when needed.

    # m1.py
    def do_m1():
    import m2
    m2.do_m2()
    print("m1")
    def do_m1_2():
    print("m1_2")
    # m2.py
    import m1
    def do_m2():
    m1.do_m1_2()
    print("m2")
    # main.py
    import m1
    m1.do_m1()

    Many code-bases use deferred importing not necessarily to solve circular dependency but to speed up the startup time. An example from Airflow is to not write top-level code which is not necessary to build DAGs. This is because of the impact the top-level code parsing speed on both performance and scalability of Airflow.

    # example from Airflow docfrom airflow import DAG
    from airflow.operators.python import PythonOperator

    with DAG(
    dag_id="example_python_operator",
    schedule_interval=None,
    start_date=pendulum.datetime(2021, 1, 1, tz="UTC"),
    catchup=False,
    tags=["example"],
    ) as dag:

    def print_array():
    import numpy as np
    # <- THIS IS HOW NUMPY SHOULD BE IMPORTED IN THIS CASE

    a = np.arange(15).reshape(3, 5)
    print(a)
    return a

    run_this = PythonOperator(
    task_id="print_the_context",
    python_callable=print_array,
    )

    Conclusion

    As always, I hope you find this article useful and inspiring. We take many things in Python for granted, but it gets interesting when discovering how it works internally. Hope you enjoyed it, Cheers!

    What Happens When you Import a Python Module? Republished from Source https://towardsdatascience.com/what-happens-when-you-import-a-python-module-ad6c0efd2640?source=rss----7f60cf5620c9---4 via https://towardsdatascience.com/feed

    Recent Articles

    spot_img

    Related Stories

    Stay on op - Ge the daily news in your inbox