More

    7 Simple Ways To Enhance Your Matplotlib Charts

    Improve your matplotlib figures with these simple steps

    Photo by Mikael Blomkvist: https://www.pexels.com/photo/person-holding-white-ipad-on-brown-wooden-table-6476589/

    Matplotlib is one of the most popular data visualisation libraries available within Python. It is typically the first data visualisation library that you come across when learning python. Even though you can generate figures with a few lines of code, the plots that are created are often poor, visually unappealing and uninformative.

    To combat this, we can enhance the communication power of the figures with a few extra lines of code. Within this article, we will cover how we can go from a basic matplotlib scatter plot to one that is more visually appealing and more informative to the end user/reader.

    Before and after enhancing a matplotlib figure. Image by the author.

    In the following examples of how a scatter plot can be enhanced within matplotlib we will be using a subset of a larger dataset that was used as part of a Machine Learning competition run by Xeek and FORCE 2020 (Bormann et al., 2020). It is released under a NOLD 2.0 licence from the Norwegian Government, details of which can be found here: Norwegian Licence for Open Government Data (NLOD) 2.0.

    The full dataset can be accessed at the following link: https://doi.org/10.5281/zenodo.4351155.

    For this tutorial, we will need to import both matplotlib and pandas.

    The data is then read into a dataframe using the pandas method read_csv() . We will also calculate a density porosity column which we will be using to plot against neutron porosity.

    import pandas as pd
    import matplotlib.pyplot as plt
    df = pd.read_csv('data/Xeek_Well_15-9-15.csv')df['DPHI'] = (2.65 - df['RHOB'])/1.65df.describe()

    When we run the above code we get back the following summary of the data with our new DPHI column.

    Statistical summary of well log measurements within well 15/9–15 from the Xeek / Force 2020 dataset. Image by the author.

    After the data has been successfully loaded, we can create our first scatter plot. To do this we are going to plot neutron porosity (NPHI) on the x-axis and density porosity (DPHI) on the y-axis.

    We will also set the figure size to 10 x 8.

    plt.figure(figsize=(10,8))plt.scatter(df['NPHI'], df['DPHI'])

    With these two lines of code, we get the above plot. Looks very bland, doesn’t it? Let’s add some colour to make it more visually appealing and to allow us to gain some insight into the data.

    To do that we are going to colour the data by gamma ray (GR) and set the colour range between 0 and 100 (vmin and vmax parameters).

    We will need to display the colour bar by using plt.colorbar() .

    Finally, we will set the x and y limits of the chart to go from 0 to 0.8 by calling upon plt.xlim() and plt.ylim(). This will make both axes start from 0 and go to a maximum of 0.8.

    plt.figure(figsize=(10,8))plt.scatter(df['NPHI'], df['DPHI'], c=df['GR'], vmin=0, vmax=100, cmap='viridis_r')
    plt.xlim(0, 0.8)
    plt.ylim(0, 0.8)
    plt.colorbar()

    The colour map we are using is Viridis in reverse. This colour map provides a nice contrast between high and low values, whilst maintaining uniformity and being colour blind friendly.

    When we run the above code, we get back the following plot.

    Basic scatter plot from matplotlib showing density porosity vs neutron porosity. Image by the author.

    If you want to find out more about choosing colour maps and why some colour maps are not suitable for everyone, then I highly recommend checking out this video here:

    The next change we will make is to remove the black box surrounding the plot. Each side of this box is called a spine. Removing the top and right sides helps make our plot cleaner and more visually appealing.

    We can remove the right and top axis by calling upon plt.gca().spines[side].set_visible(False) where side can be top, right, left or bottom.

    plt.figure(figsize=(10,8))plt.scatter(df['NPHI'], df['DPHI'], c=df['GR'], vmin=0, vmax=100, cmap='viridis_r')
    plt.xlim(0, 0.8)
    plt.ylim(0, 0.8)
    plt.colorbar()
    plt.gca().spines['top'].set_visible(False)
    plt.gca().spines['right'].set_visible(False)

    After removing the spines, our plot looks cleaner and has less clutter. Plus the colour bar now feels like it is part of the plot rather than appearing to be segmented.

    Matplotlib scatter plot after removing right and top spines (edges). Image by the author.

    When looking at the above scatter plot, we may know what each axis represents, but how are others going to understand what this plot is about, what the colours represent and what is plotted against what?

    Adding a title and axis labels to our plot is an essential part of creating effective visualisations. These can simply be added by using:plt.xlabel , plt.ylabel, and plt.title. Within each of these we pass in the text we want to appear and any font attributes such as font size.

    Also, it is good practice to include the units of measurement in the label. This helps readers to understand the plot better.

    plt.figure(figsize=(10,8))plt.scatter(df['NPHI'], df['DPHI'], c=df['GR'], vmin=0, vmax=100, cmap='viridis_r')
    plt.xlim(0, 0.8)
    plt.ylim(0, 0.8)
    plt.colorbar(label='Gamma Ray (API)')
    plt.gca().spines['top'].set_visible(False)
    plt.gca().spines['right'].set_visible(False)
    plt.title('Density Porosity vs Neutron Porosity Scatter Plot', fontsize=14, fontweight='bold')
    plt.xlabel('Neutron Porosity (dec)')
    plt.ylabel('Density Porosity (dec)')
    plt.show()

    When we run the above code we get the following plot. Right away we know what is plotted on the axes, what the chart is about and what the colour range represents.

    Matplotlib scatter plot after adding a title and labels to the axes. Image by the author.

    Depending on the purpose of the plot, we may want to add a grid to so that readers of the chart can visually and easily navigate the plot. This is especially important if we want to quantitively extract values from the plot. However, there are times when grid lines are considered “junk” and they are best left off. For example, if you just want to show general trends within a dataset and don’t want the reader to focus too much on the raw values.

    In this example, we will add some faint gridlines so that they do not detract too much from the data. To do this we need to add in plt.grid() to our code.

    plt.figure(figsize=(10,8))plt.scatter(df['NPHI'], df['DPHI'], c=df['GR'], vmin=0, vmax=100, cmap='viridis_r')
    plt.xlim(0, 0.8)
    plt.ylim(0, 0.8)
    plt.colorbar(label='Gamma Ray (API)')
    plt.gca().spines['top'].set_visible(False)
    plt.gca().spines['right'].set_visible(False)
    plt.title('Density Porosity vs Neutron Porosity Scatter Plot', fontsize=14, fontweight='bold')
    plt.xlabel('Neutron Porosity (dec)')
    plt.ylabel('Density Porosity (dec)')
    plt.grid()
    plt.show()

    However, when we do this we will find that the grid lines appear on top of our plot, and it does not look visually appealing.

    Matplotlib scatter plot after adding gridlines. Image by the author.

    To plot the grid lines behind, we need to move the plt.grid() line so that it is before the call to plt.scatter() and add in the parameter for zorder. This controls the order in which components of the chart are plotted. It should be noted that these values are relative to other items on the plot.

    For the grid we want the zorder value to be less than the value we use for the scatter plot. In this example, I have set the zorder to 1 for the grid and 2 for the scatter plot.

    Additionally, we will add in a few more parameters for the grid, namely color which controls the colour of the grid lines and alpha which controls the transparency of the lines.

    plt.figure(figsize=(10,8))
    plt.grid(color='lightgray', alpha=0.5, zorder=1)
    plt.scatter(df['NPHI'], df['DPHI'], c=df['GR'], vmin=0, vmax=100, cmap='viridis_r', zorder=2)
    plt.xlim(0, 0.8)
    plt.ylim(0, 0.8)
    plt.colorbar(label='Gamma Ray (API)')
    plt.gca().spines['top'].set_visible(False)
    plt.gca().spines['right'].set_visible(False)
    plt.title('Density Porosity vs Neutron Porosity Scatter Plot', fontsize=14, fontweight='bold')
    plt.xlabel('Neutron Porosity (dec)')
    plt.ylabel('Density Porosity (dec)')
    plt.show()

    This returns a much nicer plot and the grid lines are not too distracting.

    Matplotlib scatter plot after grid lines have been moved to the back of the plot and behind the data. Image by the author.

    Next up is changing the size of each of the data points. At the moment, the points are relatively large and where we have a high density of data the data points can overlay each other.

    One way to counteract this is to reduce the size of the data points. This is achieved through the s parameter within the plt.scatter() function. In this example, we will set it to 5.

    plt.figure(figsize=(10,8))
    plt.grid(color='lightgray', alpha=0.5, zorder=1)
    plt.scatter(df['NPHI'], df['DPHI'], c=df['GR'], vmin=0, vmax=100, zorder=2, s=5, cmap='viridis_r')
    plt.xlim(0, 0.8)
    plt.ylim(0, 0.8)
    plt.colorbar(label='Gamma Ray (API)')
    plt.gca().spines['top'].set_visible(False)
    plt.gca().spines['right'].set_visible(False)
    plt.title('Density Porosity vs Neutron Porosity Scatter Plot', fontsize=14, fontweight='bold')
    plt.xlabel('Neutron Porosity (dec)')
    plt.ylabel('Density Porosity (dec)')
    plt.show()

    When we run this code, we can see more of the variation within the data, and a better idea of a point’s true position.

    matplotlib scatter plot after changing point size. Image by the author.

    When creating data visualisations there are often times when we want to draw the reader’s attention to a specific point of interest. This could include anomalous data points or key results.

    To add an annotation we can use the following line:

    plt.annotate('Text We Want to Display', xy=(x,y), xytext=(x_of_text, y_of_text)

    Where xy is the point on the chart we want to point to and xytext is the position of the text.

    If we wanted to, we could also include an arrow point from the text to the point on the chart. This is useful if the text annotation is further away from the point in question.

    To also further highlight a point or add one where a point doesn’t exist, we can add another scatter plot on top of the existing one and pass in a single x and y value, and corresponding colours and style.

    plt.figure(figsize=(10,8))plt.grid(color='lightgray', alpha=0.5, zorder=1)plt.scatter(df['NPHI'], df['DPHI'], c=df['GR'], vmin=0, vmax=100, cmap='viridis_r',
    zorder=2, s=5)
    plt.xlim(0, 0.8)
    plt.ylim(0, 0.8)
    plt.colorbar(label='Gamma Ray (API)')
    plt.gca().spines['top'].set_visible(False)
    plt.gca().spines['right'].set_visible(False)
    plt.title('Density Porosity vs Neutron Porosity Scatter Plot', fontsize=14, fontweight='bold')
    plt.xlabel('Neutron Porosity (dec)')
    plt.ylabel('Density Porosity (dec)')
    plt.scatter(0.42 ,0.17, color='red', marker='o', s=100, zorder=3)
    plt.annotate('Shale Point', xy=(0.42 ,0.17), xytext=(0.5, 0.05),
    fontsize=12, fontweight='bold',
    arrowprops=dict(arrowstyle='->',lw=3), zorder=4)
    plt.show()

    When this code is run we get the following plot back. We can see that we have a potential shale point highlighted on a plot by a red circle and a clean annotation with an arrow pointing to it.

    matplotlib scatter plot of density porosity vs neutron porosity with a text annotation and arrow. Image by the author.

    Note that the point selected is just for highlighting purposes and that a more detailed interpretation would be needed to identify the true shale point within this data.

    If there is an entire area on a plot that we want to highlight, we can add a simple rectangle (or another shape) to shade that region.

    For this, we need to import Rectangle from matplotlib.patches and then call upon the following line.

    plt.gca().add_patch(Rectangle((x_position, y_position), width, height, alpha=0.2, color='yellow'))

    The x_position and y_position represent the lower left corner of the rectangle. From there the width and height are added.

    We can also add in some text indicating what that area represents:

    plt.text(x_position, y_position, s='Text You Want to Display, fontsize=12, fontweight='bold', ha='center', color='grey')

    ha is used to position the text horizontally. If it is set to centre, then x_position and y_position represent the centre of the text string. If it is set to left, then the x_position and y_position represent the left-hand edge of that text string.

    from matplotlib.patches import Rectangleplt.figure(figsize=(10,8))plt.grid(color='lightgray', alpha=0.5, zorder=1)plt.scatter(df['NPHI'], df['DPHI'], c=df['GR'], vmin=0, vmax=100, cmap='viridis_r',
    zorder=2, s=5)
    plt.xlim(0, 0.8)
    plt.ylim(0, 0.8)
    plt.colorbar(label='Gamma Ray (API)')
    plt.gca().spines['top'].set_visible(False)
    plt.gca().spines['right'].set_visible(False)
    plt.title('Density Porosity vs Neutron Porosity Scatter Plot', fontsize=14, fontweight='bold')
    plt.xlabel('Neutron Porosity (dec)')
    plt.ylabel('Density Porosity (dec)')
    plt.scatter(0.42 ,0.17, color='red', marker='o', s=100, zorder=3)
    plt.annotate('Shale Point', xy=(0.42 ,0.17), xytext=(0.5, 0.05),
    fontsize=12, fontweight='bold',
    arrowprops=dict(arrowstyle='->',lw=3), zorder=4)
    plt.text(0.6, 0.75, s='Possible Washout Effects', fontsize=12, fontweight='bold', ha='center', color='grey')plt.gca().add_patch(Rectangle((0.4, 0.4), 0.4, 0.4, alpha=0.2, color='yellow'))plt.show()

    This code returns the following plot, where we have our area highlighted.

    matplotlib scatter plot after adding a shaded area to highlight potential impacts caused by washout. Image by the author.

    Within this short tutorial, we have seen how we can go from a basic scatter plot generated by matplotlib, to one that is much more readable and visually appealing. This shows that with a little bit of work, we can get a much better plot that we can share with others and easily get our story across.

    We have seen how to remove unnecessary clutter by removing the spines, adding gridlines to help with qualitative analysis, adding titles and labels to show what we are displaying, and highlighting key points that we want to bring to the reader’s attention.

    Before and after enhancing a matplotlib figure. Image by the author.

    7 Simple Ways To Enhance Your Matplotlib Charts Republished from Source https://towardsdatascience.com/7-simple-ways-to-enhance-your-matplotlib-charts-a232823efed9?source=rss----7f60cf5620c9---4 via https://towardsdatascience.com/feed

    Recent Articles

    spot_img

    Related Stories

    Stay on op - Ge the daily news in your inbox