Neeraj Julka, the author of this article is a senior strategy and consulting professional with 14 years of experience, currently working in the Market Research domain. He holds an MBA from Schulich School of Business and is a graduate of the Executive Program in Business Analytics (EPBA) by MISB Bocconi and Jigsaw Academy.
While this is an oft documented and an oft discussed topic, could not resist adding my two cents to the issue given the huge learning curve and the time and effort expended to learn R and Python. There are multiple sources – KDNuggets, forums(Stackexchange, Kaggle) and numerous blogs, articles and posts that post stats and figures on the number of users and map the pros and cons for both. While most give R an edge – in terms of usability, visualization, flexibility and well, salaries, Python definitely has bigger pros both from a learning and deployment perspective.
- Python is fast catching up in the US: With numerous PyCons, an evolving support community and given the fact that there are a lot of learners from myriad backgrounds willing to learn – with a bulk of them from the IT sector, knowing programming makes it easier to adapt to the syntax of Python. Since the US is the largest and the fastest growing market for analytics, trends in the US definitely matter.
- Supplementary learning resources use Python: While learning Machine Learning(ML) algorithms, there are plenty of resources online such as Udemy, Udacity and Coursera that have excellent additional learning material. To deepen knowledge in ML, supplementary learning is essential and though ML is not a rocket science, it is complex with quite a few facets and sticking to one coach may not help develop an all-round perspective or understand use cases. However, quite a few of these also use Python in addition to R and learning Python definitely helps since its easier to go along with the material and solutions.
- The end use matters: The objective at the end of the day is to derive enough information to gain insight on the data / challenge. If visualization is the end goal, this is a rapidly evolving field which also has many out of the box point and click solutions. The other aspect is integrating these algorithms in production environments where Python is handier as compared to R.
- Programming knowledge always benefits: Machine learning and data analytics is moving to real time deployment across big data platforms. Those from non-programming backgrounds will have to invariably develop programming skills to be relevant in such scenarios. Python being a high-level programming language offers the benefit of learning a real programming language and once you get a hang of one language, adapting to and learning others becomes significantly easier.
- Easily configurable reusable syntax: Libraries like Scikit learn are extremely well documented with their own tutorials and sample codes that can be used as is. If one would like to use a separate algorithm for different machine learning models, at times it might just require a change in the name of the model and voila, you’d have built a separate one. The flow for all models is simple – break the data into test and train, build model and validate and most of it is reusable code all within one Scikit learn package. R on the other hand has different packages for different algorithms, each with its own syntax that is not standardised.
- You get to know where you’ve erred: Errors in R take considerable effort to research and find out where you’ve gone wrong and the time taken to resolve these can be time consuming and frustrating. In the initial learning phase, a simple task or assignment would take a couple of days to complete and given the fact that no matter how much one practices, code is not something that one would remember. It’s the skill to navigate, use, understand the nuances of syntax and deploy the program that one develops and no matter what one does, it’s impossible to remember a few hundred lines of code. In the initial phase and perhaps even later at times, R can be excruciating given the fact that its error messages are not specific or helpful. Python here with its specific error messages immediately comes to the rescue.
Having used R, SAS and Python for analysis in cases related to retail, statistical analysis and market size estimations, have also had a better experience with Python. Owing to the experience, coupled with reasons enumerated above, feel that Python has an edge over other tools, especially for machine learning algorithms. Though R is a powerhouse in its own right and has its own merits, those seeking a career in analytics may do well to add Python to their quiver of skills.
Whether R or Python
The Bigger Your Analytics Tool Kit, the Better Your Pay