Python vs R for Data Science: Did You Make the Right Choice?

    Both are good for data science, but what’s the most suitable option for you?

    Image via Shutterstock under license to Frank Andrade

    Yes, both Python and R are good options for data science, but they have their pros and cons. This means that If you’re new to data science, one option might be more suitable than the other and if you already know one of them, learning the other might still be worth it.

    With Python and R, you can achieve most of the data science tasks you can imagine, so there’s no debate about their capabilities, but other factors can make you choose one over the other.

    One tool might be more convenient for some specific tasks, can be easier to learn for some types of users than for others, might open different job opportunities, and the list goes on.

    Learning something new is tough, so make sure you‘re making the right choice. Here are some things you need to know before learning Python and/or R for data science.

    What’s your background?

    If you’re new to data science, a simple way to choose between Python and R is to consider your background. If you have years of experience coding, learning a new programming language like Python or R wouldn’t be difficult, but things change if you’ve barely worked with tools like Excel or SPSS in the past.

    Let’s have a look at who uses Python and R and what they use them for.

    R is a programming language created by statisticians that is mainly used for statistical computing. That said, R isn’t used only by statisticians, but also by data miners, bioinformaticians, and other professionals who use them for doing data analysis and developing statistical software.

    On the other hand, Python is a general-purpose language that isn’t only used for data science but for building a GUI, developing games, websites, etc. Professionals such as software engineers, web developers, data analysts, and business analysts use Python to accomplish a wide variety of tasks.

    To sum it up, if you’ve come from Excel, SAS, or SPSS, R would probably be easier to pick up, but if you’ve been coding in other programming languages for a while and have developed a programming mindset, Python would be easier to work with and get used to.

    Which one is more popular for data science? What do employers seek in Python and R specialists?

    The popularity of a tool is an important factor to keep in mind before learning it. Believe me, you don’t want to learn something that isn’t used at all in the real world.

    A quick comparison between the keywords “python data science” (blue) and “r data science” (red) on Google Trends reveals the interest in both programming languages over the past 5 years worldwide.

    Google Trends

    Undoubtedly, Python is more popular than R for data science.

    On the other hand, when it comes to data science, employers seek different things in Python and R specialists. A comparison made in job postings that contain the terms data science and R (but not python), and the terms data science and Python (but not R) revealed the most common data science tools and techniques that occur in each set of job postings.

    In the wordcloud, we can see that job postings with the terms data science and R often include things such as “research,” “SQL,” and “statistics,” while those with the terms data science and Python include “machine learning,” “SQL,” “research,” and tools such as AWS and Spark.

    Which one offers the best tools for data science?

    The data science workflow involves things such as data collection, exploration, and visualization. Although both Python and R will get the job done, the tools and packages used both offered have their pros and cons.

    Data Collection: Both R and Python support a wide variety of formats like CSV and JSON and, in addition to that, R allows you to turn files built in Minitab or SPSS into datasets. Also, both allow you to extract data from websites in order to build your own dataset, but Python has more advanced tools like Selenium and complete frameworks like Scrapy.

    Data Exploration: This is a step where data scientists spend a good chunk of their time, so have a look at the packages used in both R and Python. In Python, we mainly use Pandas and Numpy to explore datasets, while R has different packages built for data exploration. A picture is worth a thousand words, so check these simple exploratory data analyses done in R and Python to see the tools used in more detail.

    Data Visualization: In Python, you can use the Pandas library to make basic graphs but whenever you want to create customizable and advanced visualizations you need to learn libraries such as Matplotlib and Seaborn. The problem is that they can be hard to learn (and remember their syntax) and the visualizations created with Python aren’t the most aesthetic. In contrast, data visualization is what R is good at. R comes with built-in support for many standard graphs and provides advanced tools like ggplot2 that improve the quality and aesthetics of your graphs.

    So should you learn R, Python, or both?

    At this point, you probably know which is the most suitable tool for you, but let me share with you what people I know do.

    Some people choose R over Python due to its powerful statistics-oriented nature and great visualization capabilities, while others prefer Python over R due to its versatility, and flexibility that not only allows them to do powerful data science tasks but go beyond that.

    If you already know one, learning the other would be worth it for the different job opportunities and tools they offer.

    Python vs R for Data Science: Did You Make the Right Choice? Republished from Source via

    Recent Articles


    Related Stories

    Stay on op - Ge the daily news in your inbox