What Happens When you Import a Python Module?

    Deep dive into the import system

    Photo by Mike van den Bos from Unsplash

    Reusability is one of the key metrics to measure the quality of the code. It is the extent to which code can be used in different programs with minimal change. In Python, we use import to import code from a module. But have you ever been curious about how import is implemented behind the scenes? In this article, we will deep dive into the import system of Python. We will also discuss an interesting problem: circular imports. Grab a tea, and let’s get straight to the article.

    Module v.s. Package

    Python is organized into modules and packages. A module is one Python file and a package is a collection of modules. Consider the following example of importing a module:

    import random

    random is a Python built-in module. In the first line, it imports random module and makes it available to use, and then it accesses randint(). If you open an IDE and debug the import, you will see the code sit in file.

    You can also import randint like this:

    from random import randint

    Let’s check out an example from a package:

    import pandas

    At the first glance, you can’t really tell whether it’s a module or package. But if you debug the import, it will redirect you to instead of A package contains submodules or recursively, sub-packages and is the entry point of the package.

    But it’s not the only way, functions like importlib.import_module() and built-in __import__() can also be used.

    >>> import importlib
    >>> importlib.import_module('random')
    >>> __import__('random')

    So what is

    A regular Python package contains a file. When the package is imported, this file is implicitly executed and the objects it defines are bound to names in the package’s namespace. This file can be left empty.

    Let’s see an example. I have a folder structure like this. p1 is my package and m1 is a submodule.

    folder structure (Created by Xiaoxu Gao)

    Inside , I have a variable DATE that I want to use in the . I will create several versions of and see how it affects the import in .

    DATE = "2022-01-01"

    Case1: empty file.

    Since file is empty when we import p1 , no submodule is imported, thus it doesn’t know the existence of m1. If we import m1 explicitly using from p1 import m1 , then everything inside will be imported. But then, we are not actually importing a package, but importing a module. As you can imagine, if your package has a lot of submodules, you need to import every module explicitly which can be quite tedious.

    import p1
    >> AttributeError: module 'p1' has no attribute 'm1'from p1 import m1
    from p1 import m2, m3 ...# needs to explictly import every submodule

    Case2: import submodules in file

    Instead of leaving it empty, we import everything from m1 in file. Then, import p1 in the file will recognize the variables in and you can directly call p1.DATE without knowing which module it comes from.

    from .m1 import * # or from p1.m1 import *
    from .m2 import *
    import p1

    You might have noticed the dot before m1. It is a shortcut that tells it to search in the current package. It’s an example of a relative import. An equivalent absolute import will explicitly name the current package like from p1.m1 import * .

    There is a caveat though. If another submodule in the package contains the same variable, the one that is imported later will overwrite the previous one.

    The advantage of having a non-empty is to make all the submodules already available for the client when they import the package, so the client code looks neater.

    How does Python find modules and packages?

    The system of finding modules and packages in Python is called Import Machinery which comprises of finders, loaders, caching, and an orchestrater.

    Import Machinery (Created by Xiaoxu Gao)
    1. Search module in cached sys.modules

    Every time you import a module, the first thing searched is sys.modules dictionary. The keys are module names and the values are the actual module itself. sys.modules is a cached dictionary, if the module is there, then it will be immediately returned, otherwise, it will be searched in the system.

    Back to the previous example. When we import p1, two entries are added to sys.modules. The top-level module and the submodule

    import p1
    import sys
    'p1': ,

    If we import it twice, the second import will read from the cache. But if we deliberately delete the entry from sys.modules dictionary, then the second import will return a new module object.

    # read from cache
    import p1
    import sys
    old = p1
    import p1
    new = p1
    assert old is new
    # read from system
    import p1
    import sys
    old = p1
    del sys.modules['p1']
    import p1
    new = p1
    assert not old is new

    2. Search module spec

    If the module is not in sys.modules dictionary, then it needs to be searched by a list of meta path finder objects that have their find_spec() methods to see if the module can be imported.

    import sys
    [ ,

    The BuiltinImporter is used for built-in modules. The FronzenImporter is used to locate frozen modules. The PathFinder is responsible for finding modules that are located in one of these paths.

    • sys.path
    • sys.path_hooks
    • sys.path_importer_cache
    • __path__

    Let’s check out what is in sys.path.

    import sys
    [ '/xiaoxu/sandbox',

    PathFinder will use find_spec method to look for __spec__ of the module. Each module has a specification object that is the metadata of the module. One of the attributes is the loader . The loader indicates to the import machinery which loader to use while creating the module.

    import p1
    ModuleSpec(name='p1', loader=, origin='/xiaoxu/sandbox/p1/', submodule_search_locations=['/xiaoxu/sandbox/p1'])

    3. Load the module

    Once the module spec is found, the import machinery will use the loader attribute to initialize the module and store it in sys.modules dictionary. You can read this pseudo code to understand what happens during the loading portion of import.

    Python Circular Imports

    In the end, let’s look at an interesting problem of import: Circular Imports. A circular import occurs when two or more modules depend on each other. In this example, depends on and depends on .

    module dependency (Created by Xiaoxu Gao)
    import m2
    def do_m1():
    import m1
    def do_m2():
    import m1
    AttributeError: partially initialized module 'm1' has no attribute 'do_m1' (most likely due to a circular import)

    Python couldn’t find attribute do_m1 from module m1. So why does this happen? The graph illustrates the process. When import m1, Python goes through line by line. The first thing it finds is import m2 , so it goes to import . The first line is to import m1, but since Python didn’t go through everything in yet, we get a half-initialized object. When we call m1.do_m1() which python didn’t see it, it will raise an AttributeError exception.

    Circular Imports (Created by Xiaoxu Gao)

    So how to fix circular import? In general, circular imports are the result of bad design. Most of the time, the dependency isn’t actually required. A simple solution is to merge both functions into a single module.

    def do_m1():
    def do_m2():
    import m

    Sometimes, the merged module can become very large. Another solution is to defer the import of m2 to import it when it is needed. This can be done by placing the import m2 in the function def do_m1(). In this case, Python will load all the functions in and then load only when needed.

    def do_m1():
    import m2
    def do_m1_2():
    import m1
    def do_m2():
    import m1

    Many code-bases use deferred importing not necessarily to solve circular dependency but to speed up the startup time. An example from Airflow is to not write top-level code which is not necessary to build DAGs. This is because of the impact the top-level code parsing speed on both performance and scalability of Airflow.

    # example from Airflow docfrom airflow import DAG
    from airflow.operators.python import PythonOperator

    with DAG(
    start_date=pendulum.datetime(2021, 1, 1, tz="UTC"),
    ) as dag:

    def print_array():
    import numpy as np

    a = np.arange(15).reshape(3, 5)
    return a

    run_this = PythonOperator(


    As always, I hope you find this article useful and inspiring. We take many things in Python for granted, but it gets interesting when discovering how it works internally. Hope you enjoyed it, Cheers!

    What Happens When you Import a Python Module? Republished from Source via

    Recent Articles


    Related Stories

    Stay on op - Ge the daily news in your inbox