R vs Python – which language will rule Data Science Industry

Data science is the most promising and lucrative job of 21st century. This is one of those jobs where demand is much higher than the actual supply. And it is this demand which is making data science a very good career option for everyone.

In order to be a good data scientist one must master one or other analytics tool. The most prominent open source tools that are currently available for data science are R and Python. Which one is superior – is a matter of debate and depends on various aspects. Let us dive deep and see where does R and Python stand now as data science tool.

R is an open source programming language and software environment mainly developed for statistical analysis and computing. The source code of R is written primarily in C, Fortran and R. On the other hand, python is a general purpose, high-level programming language.

Though R was born mainly to analyze data and to do statistical computation, Python was not like that. It is a general purpose language and has no preference for a specific domain. However with days, many libraries are being written in Python which has high-end and highly optimized data science capabilities – for example pandas, numpy, sckit-learn etc. All these libraries bring data science to Python.

Lets see the advantages and downsides of both languages.

1) Cost : In terms of cost both are open source and thus free of cost. Hence in this respect none of them have any preference over the other.

2) Installation : Before doing any serious analysis it is required to setup the software. Installation of R is fairly straightforward as it is a single executable file and after installation there are R GUIs to work with. For python many machines ( mainly MAC and Linux ) comes pre-loaded with python. On windows if it is not pre-installed, the best way to install it is to download it from official python website ( www.python.org ).

3) IDEs : For R we have RStudio – which is a full-fledged and mature IDE ( Integrated Development Environment ). It gives a lot of features which makes R coding easier and maintainable. The closest IDE that resembles RStudio is Rodeo from Yhat. It is still in active development stage but with every new release, a lot of features are being updated.

4) Packages and Libraries : R has a lot of packages available on CRAN. These packages contain various tools and data structures that help implementation of Data Science a breeze. On the other hand, Python has a relatively lower number of packages but it is soon catching up with R in this respect.

5) Speed : R has a reputation of being slow and it is actually true to some respects. This is mainly due to the fact that the developers of R mainly give stress on getting the job done with minimal effort than optimizing the codes. That has been a source of the “slow” tag for R. The good news is that nowadays these lags are gradually being removed and there are attempts to make it fast ( example Microsoft R Open ). Python on the other hand does not have this problem. It is reasonably fast.

6) Getting Help : Getting help on a particular topic is one of the most important things that is required while programming in a language. Both R and Python has online forums, stackoverflow tags and facebook groups to discuss different problems. But the community surrounding R is more vibrant and active than that of Python. Also R has an added advantage that if you do search about any R related problem, it is highly likely that the same problem has been faced by someone in the past and the answer is readily available there. In case of Python this is not fully true. Python is relatively new and it will take some time to reach that same position in terms of R.

7) Demand in Jobs : Demand of data scientists with knowledge in programming is really high. When comparison is made, the demand for R programmers for data science field is more compared to Python. Also the average salary of R programmers is more than that of Python programmers.

It is really difficult to declare one language as a winner compared to the other – this is due to the fact that each language has its own advantage and disadvantage. Latest algorithms do get released in R more faster than in Python. Here R has a preference. On the other hand, Python solves the two language problem. After analyzing data, it is required to integrate the algorithms in server. But R is not a good language to do this. Hence we must take help of other languages like Java. But with Python this is not the case. With Python we can do analysis and integration in the same language. Hence here Python has a upperhand.

To conclude it is always of advantage to keep both of these languages in your knowledge base. This will help to overcome different programming problems using the appropriate language.

What do you think about this ? Please share your opinion in the comment section below.

Leave a Reply

Your email address will not be published. Required fields are marked *