Deep dive into the import system
Reusability is one of the key metrics to measure the quality of the code. It is the extent to which code can be used in different programs with minimal change. In Python, we use
import to import code from a module. But have you ever been curious about how
import is implemented behind the scenes? In this article, we will deep dive into the import system of Python. We will also discuss an interesting problem: circular imports. Grab a tea, and let’s get straight to the article.
Module v.s. Package
Python is organized into modules and packages. A module is one Python file and a package is a collection of modules. Consider the following example of importing a module:
random is a Python built-in module. In the first line, it imports
random module and makes it available to use, and then it accesses
randint(). If you open an IDE and debug the import, you will see the code sit in
You can also import
randint like this:
from random import randint
Let’s check out an example from a package:
At the first glance, you can’t really tell whether it’s a module or package. But if you debug the import, it will redirect you to
pandas.__init__.py instead of
pandas.py. A package contains submodules or recursively, sub-packages and
__init__.py is the entry point of the package.
But it’s not the only way, functions like
importlib.import_module() and built-in
__import__() can also be used.
>>> import importlib
So what is
A regular Python package contains a
__init__.pyfile. When the package is imported, this
__init__.pyfile is implicitly executed and the objects it defines are bound to names in the package’s namespace. This file can be left empty.
Let’s see an example. I have a folder structure like this.
p1 is my package and
m1 is a submodule.
m1.py , I have a variable
DATE that I want to use in the
main.py . I will create several versions of
__init__.py and see how it affects the import in
DATE = "2022-01-01"
__init__.py file is empty when we import
p1 , no submodule is imported, thus it doesn’t know the existence of
m1. If we import
m1 explicitly using
from p1 import m1 , then everything inside
m1.py will be imported. But then, we are not actually importing a package, but importing a module. As you can imagine, if your package has a lot of submodules, you need to import every module explicitly which can be quite tedious.
p1.m1.DATE>> AttributeError: module 'p1' has no attribute 'm1'from p1 import m1
from p1 import m2, m3 ...# needs to explictly import every submodulem1.DATEWorks!!
Case2: import submodules in
Instead of leaving it empty, we import everything from
__init__.py file. Then,
import p1 in the
main.py file will recognize the variables in
m1.py and you can directly call
p1.DATE without knowing which module it comes from.
from .m1 import * # or from p1.m1 import *
from .m2 import * # main.py
You might have noticed the dot before
m1. It is a shortcut that tells it to search in the current package. It’s an example of a relative import. An equivalent absolute import will explicitly name the current package like
from p1.m1 import * .
There is a caveat though. If another submodule in the package contains the same variable, the one that is imported later will overwrite the previous one.
The advantage of having a non-empty
__init__.py is to make all the submodules already available for the client when they import the package, so the client code looks neater.
How does Python find modules and packages?
The system of finding modules and packages in Python is called Import Machinery which comprises of finders, loaders, caching, and an orchestrater.
- Search module in cached
Every time you import a module, the first thing searched is
sys.modules dictionary. The keys are module names and the values are the actual module itself.
sys.modules is a cached dictionary, if the module is there, then it will be immediately returned, otherwise, it will be searched in the system.
Back to the previous example. When we import
p1, two entries are added to
sys.modules. The top-level module
__init__.py and the submodule
If we import it twice, the second import will read from the cache. But if we deliberately delete the entry from
sys.modules dictionary, then the second import will return a new module object.
# read from cache
old = p1
new = p1
assert old is new# read from system
old = p1
new = p1
assert not old is new
2. Search module spec
If the module is not in
sys.modules dictionary, then it needs to be searched by a list of meta path finder objects that have their
find_spec() methods to see if the module can be imported.
BuiltinImporter is used for built-in modules. The
FronzenImporter is used to locate frozen modules. The
PathFinder is responsible for finding modules that are located in one of these paths.
Let’s check out what is in
PathFinder will use
find_spec method to look for
__spec__ of the module. Each module has a specification object that is the metadata of the module. One of the attributes is the
loader . The
loader indicates to the import machinery which loader to use while creating the module.
print(p1.__spec__)ModuleSpec(name='p1', loader=, origin='/xiaoxu/sandbox/p1/__init__.py', submodule_search_locations=['/xiaoxu/sandbox/p1'])
3. Load the module
Once the module spec is found, the import machinery will use the loader attribute to initialize the module and store it in
sys.modules dictionary. You can read this pseudo code to understand what happens during the loading portion of import.
Python Circular Imports
In the end, let’s look at an interesting problem of import: Circular Imports. A circular import occurs when two or more modules depend on each other. In this example,
m2.py depends on
m1.py depends on
m1.do_m1()AttributeError: partially initialized module 'm1' has no attribute 'do_m1' (most likely due to a circular import)
Python couldn’t find attribute
do_m1 from module
m1. So why does this happen? The graph illustrates the process. When
import m1, Python goes through
m1.py line by line. The first thing it finds is
import m2 , so it goes to import
m2.py . The first line is to import
m1, but since Python didn’t go through everything in
m1.py yet, we get a half-initialized object. When we call
m1.do_m1() which python didn’t see it, it will raise an AttributeError exception.
So how to fix circular import? In general, circular imports are the result of bad design. Most of the time, the dependency isn’t actually required. A simple solution is to merge both functions into a single module.
Sometimes, the merged module can become very large. Another solution is to defer the import of
m2 to import it when it is needed. This can be done by placing the
import m2 in the function
def do_m1(). In this case, Python will load all the functions in
m1.py and then load
m2.py only when needed.
import m1def do_m2():
Many code-bases use deferred importing not necessarily to solve circular dependency but to speed up the startup time. An example from Airflow is to not write top-level code which is not necessary to build DAGs. This is because of the impact the top-level code parsing speed on both performance and scalability of Airflow.
# example from Airflow docfrom airflow import DAG
from airflow.operators.python import PythonOperator
start_date=pendulum.datetime(2021, 1, 1, tz="UTC"),
) as dag:
import numpy as np
# <- THIS IS HOW NUMPY SHOULD BE IMPORTED IN THIS CASE
a = np.arange(15).reshape(3, 5)
run_this = PythonOperator(
As always, I hope you find this article useful and inspiring. We take many things in Python for granted, but it gets interesting when discovering how it works internally. Hope you enjoyed it, Cheers!
What Happens When you Import a Python Module? Republished from Source https://towardsdatascience.com/what-happens-when-you-import-a-python-module-ad6c0efd2640?source=rss----7f60cf5620c9---4 via https://towardsdatascience.com/feed