How to Become a Data Scientist: The Full Scoop
In today’s scenario, where data is still sexy and data skills are still in high demand, the true role of a Data Scientist can be ambiguous and is often confused with that of a Data Analyst or Data Engineer. Those interested in a career in Data Science don’t fully comprehend what the role entails and are often confused as to what skills are needed to excel as a Data Scientist. They are also not sure if they have the intrinsic skills needed to make a career as a Data Scientist in the first place. This article aims to address these and many more questions related to a career in Data Science. Let’s get started!
A Data Scientist is…
To begin with let’s understand the term ‘Data Scientist’. If you do a search for the definition of Data Scientist you will find several interesting definitions, but by far the most honest one is this:
“A Data Scientist is someone who is better at statistics than any software engineer and better at software engineering than any statistician.”
If we broaden this definition we can say that a Data Scientist is someone who:
- knows and can apply statistics and mathematics to a set of data
- can test hypotheses with experiments they design
- has enough programming knowledge to apply to the sourcing, processing, and storing of their data
- can communicate their findings through data visualizations and stories
An Overview of the Industry -Today and Tomorrow
According to a report by International Data Corporation (IDC), the Big Data and Business Analytics market will grow to $203 billion by 2020. The banking industry is expected to be a big driver of this increase in spending, while IT and business services will lead most of the tech investing. Other industries like telecommunications, insurance, transportation, and utilities will also increase their own spending thus spurring growth.
IDC also estimates that the worldwide IOT market will grow to $1.7 trillion in 2020. Devices, connectivity, and IT services will likely make up two-thirds of the IoT market in 2020, with devices (modules/sensors) alone representing more than 30 percent of the total.
Coming to the supply-demand talent gap in the industry, IDC predicts a need for 181,000 people with deep analytical skills by 2018 in the US alone, and a requirement for five times that number of positions with data management and interpretation capabilities.
How Much Does a Data Scientist Get Paid?
Data Scientist is not only the sexiest job of the 21st century, but also a high paying one, both in India and worldwide. In fact, it is one of the top paying jobs out there right now!
A Data Scientist proficient in R can bag a job paying Rs. 10.40 LPA (lacs per annum) on an average, a pay of Rs 10.12 LPA for Python, and Rs. 9.54 LPA for SAS. The best pay is for someone who can work with all three tools, taking home a cool Rs. 12.91 LPA.
The pay is higher than average (compared to other domains) across all experience ranges for a data scientist, starting off at Rs 6.4 LPA and exceeding Rs. 30 LPA at the top end of the spectrum.
You can get more such exciting information about salaries and trends in the Analytics industry with the Analytics Industry Report 2017 – Salaries & Trends. Download now!
All in all some very encouraging news for data scientists!
Successful Data Scientists Have These Traits
Before we go on to discuss the technical and business skills one needs to develop as a Data Scientist, lets first look at some basic innate traits that majority of successful Data Scientists possess. If you are looking at a career in Data Science and find that you can say yes to many of the ones listed below, then you are good to go. If that’s not the case, don’t despair. You can still embark on the Data Science journey, but you will just have to put in that extra mile and commit to developing the skills you can:
- Do you have an inherent aptitude for numbers?
- Are you a curious, persistent person?
- Do you have a knack for problem-solving, a quick analytic mindset?
- Are you apt at the art of visual storytelling?
- Are you an effective communicator? Will you be able to communicate creatively and persuasively, the story the data tells?
- Are you business savvy? (i.e. good leader, great organizer, and amazing planner)?
- Are you a team player?
Basic Educational Background
Though a data scientist can technically come from any stream of education, there is a
clear preference for those with a degree in science, statistics, and mathematics. If you are looking at a long-lived career in technology, a bachelor’s degree in something computing related is worth it. There is a definite advantage for these graduates as a good part of Data Science is about numbers and programming skills, and a solid foundation in computer science, math, modelling, and statistics, will make the journey easier. However, let me also point out that there are many people from other varied disciplines who have gone on to become successful Data Scientists.
Let’s now come to the industry relevant skills and tools, that a Data Scientist needs to develop, to succeed in the analytics industry today.
1. An Integrated Analytical Skill Set– Statistical skills, algorithms, machine learning, and mathematics
It is essential for a Data Scientist to have expertise in diverse analytical tools, as many analytics companies today use a combination of popular data analytics tools and technologies. At the very core, a data scientist needs to be able to understand numbers and use analytics tools to piece together data to discover potential patterns and correlations through statistics. The mandate is clear- If you can’t use the tools, you can’t analyze the data. Therefore, it is vital that a data scientist knows correlation, multivariate regression, and other statistical aspects of modeling, to be able to use those tools effectively.
Though SAS is still one of the more popular data languages, in today’s environment data scientists find themselves increasingly working on projects using multiple tools. So today, recruiters look for people with expertise in a combination of tools, like SAS, R, and Hadoop. As we all know R is one of the great success stories for open source software. It is free and can do pretty much everything SAS can do. As for Hadoop, it is an open-source programming framework that allows data to be spread over large clusters of commodity servers and processed in parallel. Used in parallel (R and Hadoop), organizations can easily and more economically derive useful insights to get improved advantages from their data.
2. Programming expertise:
A data scientist needs to have some level of programming expertise. Even if you don’t have a computer science degree, you need to be comfortable designing and programming in a variety of languages including Java, Python, C++ or C#. You need to be able to determine the right software packages or modules to run, modify them or even design and develop new computational techniques to solve business problems (e.g., machine learning, natural language processing, graph/social network analysis, neural nets, and simulation modelling).
3. Visualization skills:
One of the core functions of a Data Analyst is to visually anatomize exploratory data, and then communicate their findings and insights using interesting and innovative visualization tools. As a Data Scientist, your main objective is to bring insights to the management, to enable them to make better business decisions. What use are excellent data mining and modeling tools, if the results of an analysis are poorly visualized? It is thus imperative, that data scientists are apt at the art of visual storytelling and can creatively and persuasively communicate, the stories their data tell.
Though these are the core skills a data scientist must work at developing, it is also useful to:
- Get exposure to Large Data Sets: You will need to often work on extremely large data sets. Hence it is beneficial to get some exposure to working with large amounts of data, preferably even some data mining algorithms.
- Sharpen your Business skills: In the world of data analytics, business skills like negotiation,persuasion, creativity, and leadership are important to have. You need to be able to feel the pulse of the business, understand business terminologies, have good organization and communication skills, to be able to drive and influence change. Also amidst all the sorting, mining and visualizing data, skills like planning and organizing will go a long way.
- Become an Adept Problem Solver: At the core of everything that a Data Scientist does, lies problem solving. Being able to approach a problem with the right mindset and then breaking the problem down analytically, will ensure better solutions. Though one’s ability to problem solve is largely intrinsic, you can improve and even master problem solving through practice.
- Keep abreast of innovations and news in the analytics industry: It is also imperative that youdevelop a keen understanding of the data analytics industry and keep abreast of latest advancements in the field. Take the time to engage and connect with the data analytics community. Try and subscribe to journals, download free eBooks, and follow interesting blogs by analytics experts. Take advantage of the wealth of free but quality information out there, so that when you are ready to apply for those data analyst jobs, you are well prepared and truly an analytics expert in your own right.
- Keep the Learning Curve Growing Constantly: In an increasingly data-driven, volatile, and hugely competitive business environment one needs to always be one step ahead. Data Scientists who can keep pace with the evolution of data technology will be rewarded while those that do not will be challenged. There are countless new, faster, and open source technologies and tools developing to enhance data analytics capabilities. The key is to be at par and keep the learning curve growing constantly.
- Embrace diversity: Try to engage and connect to publicly available sources of data that may have relevance to your domain area. Try to solve competitions on Kaggle or dive deep into KDnuggets during your free hours. That time will be time worth spent and certainly adds value to your CV if your favorite past time is to play with numbers.
Making the Switch
Let’s now get down to the practicalities. You have done your due diligence and you are now ready to formally set off down the Data Science path. Here is what you need to do:
- Begin to get some exposure into things data related by injecting simple data related tasks into whatever work you currently do. It doesn’t matter the industry you are in, or whether you are in sales, marketing or finance; just start playing with whatever data you have at hand.
- Go online and access the numerous free resources available. Join data analytics groups on LinkedIn, communities like Data Science Central and AnalyticBridge. There are also lots of free courses on Udemy and Coursera that will give you a great head start. And read as many books as you can about Data Science. (At the end of the article there is a list of books we recommend).
- After exhausting available online resources, start researching certifications and programs in Data Science. Make a wish list of things you want out of the curriculum and match them with what is on offer. Give weight-age to those that have strong partnerships with companies in the Data Industry and those that give you the opportunity to practice on real data sets.
- Finally find a mentor, ideally someone who has made a similar transition and who is well embedded into the industry. Nothing like a friend to steer you in the right direction!
How to Prepare for an Analytics Interview?
Now that you are all skilled up and ready to take on the world of analytics, here are some interview tips from Priti Sawant, a staffing expert.
Must read books
- The Data Science Starter Kit
These books give you the tools you need to get started with data from basic statistics to machine learning and new ways to think about visualization. And if you’re already experienced with data, the Starter Kit will push you further. The package includes (13) titles on R, data analysis, Python, machine learning, and visualization. One could also look at purchasing a singular book depending on the need:
- Doing Data Science: Straight Talk from the Frontline by Cathy O’Neil and Rachel Schutt
- Data Science for Business by Foster Provost, Tom Fawcett
- R Cookbook by Paul Teetor
- Machine Learning for Hackers by Drew Conway & John Myles White
- R Graphics Cookbook by Winston Chang
- Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython by Wes McKinney
- Agile Data Science: Building Data Analytics Applications with Hadoop by Russell Jurney
- Bad Data Handbook by Q. Ethan McCallum
- Data Analysis with Open Source Tools By Philipp K. Janert
- Mining the Social Web, 2nd Edition by Matthew A. Russell
- R in a Nutshell, 2nd Edition by Joseph Adler
- Interactive Data Visualization for the Web by Scott Murray
- Feedback Control for Computer Systems by Philipp K. Janer
- Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die by Eric Siegel
One of the most influential personalities in this domain, Eric Siegel talks of the power and perils of prediction in this entertaining book by including case studies from across the globe. Meant for the common man, the book explains predictive modeling and its basics in layman terms.
Perfect for new data scientists, Predictive Analytics offers tangible and easy-to-understand insights into the complex world of data analysis. Read this book to find out how institutions are increasingly predicting human behavior – whether you’re going to click, buy, lie, or die, as the title suggests. Predictive Analytics also shares the “why” and the “how” of behavior prediction – highlighting the many ways in which predictive analysis is able to improve healthcare, fight crime and boost sales – all through the careful analysis of big data.
- The Signal and the Noise: The Art and Science of Prediction by Nate Silver
Political forecaster Nate Silver won a lot of accolades for his accurate prediction of the results of every single state in the 2012 US election. In this book, he reveals how one can develop better foresight in this uncertain world. From the stock market to the poker table, from earthquakes to the economy, he takes us on an enthralling insider’s tour of the high-stakes world of forecasting, showing how we can use information in a smarter way amid a noise of data – and make better predictions in our own lives. Without accurate methods, the sheer abundance of data can make predictions go bad, especially when confronted with the limits of human cognition. Read ‘The Signal and the Noise’ to find out how forecasters are able to overcome biases and unpredictability to uncover accurate, meaningful predictions in a vast sea of noisy data.
Image attribution: <a href=”http://www.freepik.com/free-photos-vectors/background”>Background vector created by Kraphix – Freepik.com</a>