Create Clean and Robust Tests with Property-Based Testing
Imagine you are trying to figure out whether the function
processing_fn is working properly. You use pytest to test the function with an example.
The test passed, but you know that one example is not enough. You need to test the function with more examples to make sure that the function is working properly with any data.
To do that, you might use pytest parameterize, but it is difficult to come up with every example that might result in failures.
Even if you take the time to write all those examples, it takes a long time for you to run all of the tests.
Wouldn’t it be nice if there is a testing strategy that allows you to:
- Write tests easily
- Generate good data for testing
- Detect falsifying examples quickly
- Produce small and straightforward tests
That is when Hypothesis and Pandera come in handy.
Pandera is a simple Python library for validating a pandas DataFrame.
To install Pandera, type:
pip install pandera
Hypothesis is a flexible and easy-to-use library for property-based testing.
Example-based tests use concrete examples and concrete expected outputs. Property-based tests generalize these concrete examples into essential features.
As a result, property-based tests allow you to write cleaner tests and specify the behavior of the code better.
To install Hypothesis, type:
pip install hypothesis
This article will show you how to use these two tools to generate synthesis pandas DataFrame for testing.
First, we will use Pandera to test if the output of a function satisfies some constraints when given one input.
In the code below, we:
pandera.DataFrameSchemato specify some constraints for the output such as the datatype and the range of the values of a column.
- Use the
pandera.check_outputdecorator to test if the output of the function satisfies the constraints.
Since there is no error when running this code, the output is valid.
Next, we will use hypothesis to create data for testing based on the constraints given by
Specifically, we will add:
schema.strategy(size=5)to specify the search strategy that describes how to generate and simplify the data
@givento run the test function over a wide range of matching data from the specified strategy
Run the tests with pytest:
We found a falsifying example in less than 2 seconds! The output is also very simple. For example, instead of choosing an example like the following that could result in an error:
0 1 2
1 2 1
2 3 0
3 4 0
4 5 1
Hypothesis chooses an example that is simpler and easy to understand:
0 0 0
1 0 0
2 0 0
3 0 0
4 0 0
This is very cool because:
- We do not need to specify any concrete examples.
- The examples are straigh-forward enough for us to quickly understand the behavior of the tested function.
- We find the falsifying example in a short amount of time.
Congratulations! You have just learned how to use Pandera and Hypothesis to generate synthesis data for testing. I hope this article will give you the knowledge needed to create robust and clean tests for your Python functions.
Feel free to play and fork the source code of this article here:
Hypothesis and Pandera: Generate Synthesis Pandas DataFrame for Testing Republished from Source https://towardsdatascience.com/hypothesis-and-pandera-generate-synthesis-pandas-dataframe-for-testing-e5673c7bec2e?source=rss----7f60cf5620c9---4 via https://towardsdatascience.com/feed