The Definitive Guide to Becoming a Data Scientist
Data is at the heart of everything we do. Technology and science have disrupted traditional businesses with such force and velocity that it’s impossible for a business to thrive without data today. This sounds like a recent development, but the fact remains that we have been collecting and storing data for a long time. However, it’s only recently that we have focused on analyzing this data in a large scale for business purposes.
There’s a lot of confusion and muddled definitions of data science, data analytics, business analytics and data analytics that’s floating around today. In fact, every expert in the market has her own definition of these terms – adding to the confusion for the average person. However, we did our bit to explain the difference in this article.
For a quick, working definition, we can loosely say that Data Science is the domain or industry focused on working with all aspects of data – from data collection to analysis and reporting. Data analytics is the name of the skill (and art form) of working with data to draw insights. Business Analytics is the industry parlance of applying the tenets of Data Science inside a business setting – as opposed to an academic environment.
Both data analytics and business analytics involve similar workflows: Framing research questions, defining and conducting experiments, collecting data, analyzing the data with tools and business domain know-how, interpreting the results and communicating the results to stakeholders – along with recommendations for future action.
The data analytics workflow and the business analytics workflow are similar, but they differ in the distribution and nature of skills. In fact, in the realm of data science, there are many different skill sets that you need to acquire. However, these skills vary greatly depending on the environment and portion of the workflow you are focusing on.
What is Business Analytics
Many companies in many industries are devoting large budgets to developing analytics capabilities. Because it spans multiple industries, business analytics has many different avatars. For an e-commerce company like Amazon, it could refer to the detection of fraudulent transactions. For an online magazine, it could be the analysis of traffic. For a company HR, it could refer to the analysis of employee behavior while in retail, business analytics could be focused on improving sales revenues or optimizing the supply chain. As you can see, the definition of business analytics changes according to the industry vertical and application, but the process remains the same – and so does the objective – using statistical and numerical analysis for business insights that aid decision making.
State of analytics
Technology is at the heart of all the developments that allow us to harness the power of data analytics today. Technological advancements have increased the number of devices and systems that can generate data. For example, we generate data from our interactions social media, the internet, travel, financial transactions, wearable devices and the Internet of things – and that’s just the tip of the iceberg.
Luckily, similar technological advancements have given us the tools to handle and analyse the data we generate. With ever increasing processing power and the dropping costs of storage, companies, and individuals, have many more affordable hardware solutions to do business analytics. In fact, real time data processing and stream analytics are now efficiently possible with the help of today’s supercomputers – giving us the power to perform thousands of calculations on massive datasets that are terabytes or more in size. Similarly, the developments in the software space for data analytics tools has made analysis faster, cheaper and more efficient.
Inevitably, the convergence of data generation and data analysis at never seen before levels indicates one simple fact: Data analytics is the future.
Looking at the pace with which these two complementary developments are growing, now’s the perfect opportunity for you to ride the data analytics wave. As we wrote about earlier in our article titled “What is Analytics?”
“Harvard Business Review recently noted, ‘data scientist’ is the sexiest job of the 21st century. Companies need trained data scientists who are well versed in analytics. Professionals with diverse analytics skills are in high demand. If you can work with data and communicate with stakeholders, you can fast track your success. All you need is the right training and a foot in the door.”
Though the terms data analytics and business analytics are different in terms of application, we have already seen that the workflow they use are similar. As a data analyst, your responsibilities could span the entire workflow – or be limited to one small section, with other teammates working on other aspects of the workflow. Usually, the tasks involve data collection, data cleaning and organizing, data analysis, interpretation of results and communicating findings and recommendations. At a higher level, we can break down the data analytics workflow into three stages: Planning and Data Preparation, Data Analysis and Insights & Reporting.
Stage 1: Planning & Data Preparation
Every data analytics project starts with asking the right questions. To attempt an answer to these questions, data analysts collect data and prepare it for analysis. They collect data from different sources and verticals in variation to a different variable. An e-commerce analyst would collect information about the impact of offers on web traffic, while a retail firm’s analyst would look at how sales and footfall change according to the day of the week or the hour of the day.
After collection, the data is cleaned to eliminate characteristics that could skew the interpretation. Incomplete records and outlier data are also handled according to predefined approaches. Finally, the integrity of the data set is also maintained – ensuring that irrespective of the situation the data is used in, the results are accurate and representative of the sample.
Stage 2: Data Analysis
After collecting the data, analysts slice and dice it to tease out hidden relationships and meaning. Analysts use a number of tools such as R, SAS, Python and a plethora of other tools to analyze the data. They also apply statistical methods and numerical analysis to arrive at findings. For example, an analyst could identify the correlation between website visitors to the company blog and the corresponding sales in the month, or the impact of offering free shipping on a product’s return policy.
Data analysis generally comes in four flavors: Descriptive, Historical, Prescriptive and Predictive analytics. Descriptive analytics involves identifying key attributes of a data set. In simpler terms, analysts use statistics and data manipulation to summarize and explain the data. Historical data analytics looks at past data to figure out what happened – and if possible, why something happened. A close follows up of this is Prescriptive analytics. In prescriptive analytics, analysts look at the data and recommend a course of action. Finally, with predictive analytics, data analysts attempt to accurately forecast the future. Depending on the need of the business, data analysts may apply one or all of the different flavors of data analytics.
Stage 3: Insights & Report
Once the data analysis is complete, the data analyst needs to translate the business findings into business insights. This process requires an understanding of the business context of the data analytics project. These insights are usually packaged in the form of a report with succinct text and accurate, meaningful visualizations. This stage may involve collaboration with colleagues and external vendors to package the data in a way that is easily consumable – and the insights are easily understood.
Getting Started: The Three Step Process
The key criteria to become a data scientist is the ability to think in a logical manner, a love for numbers and the willingness to learn. The broad steps that you need to take to become a practicing data scientist are:
- Learn technical skills
- Learn business skills
All data analysis requires the use of powerful technical tools. To gain maximum ROI from the effort, you need to pick up the skills required to harness these tools. A strong foundational knowledge of Microsoft Excel, R, SAS, Python and other data analytics tools will equip you to face any of the challenges you will face in your career. You will also need to pick up statistical and numerical analysis techniques that will enable you to analyze any data set you to encounter.
Once you have learned the basics of data analysis and are confident about your skills, you need to learn how to apply those skills in a business context. In any data analytics project, gleaning insights is only half the work. The rest is about communicating the discovered insights in a meaningful way that influences business. This process also involves cooperating and coordinating with a number of people from different backgrounds, giving business context and relaying the importance of the outcomes.
Finally, practice makes perfect. To gain a foothold in the industry, you need to demonstrate that you have a practical understanding and a demonstration of the skills you have gained goes a long way in convincing recruiters. You can gain practice in a number of ways: internships, contests, competitions, community projects, personal projects and of course, learning on the job. In fact, in many places, the easiest way to gain some experience is to apply data analytics to your current role and demonstrate value – adding to your resume and skillsets a very valuable trait.
Earlier, we had shared some thoughts on what it takes to become a data scientist. Here’s an excerpt from the article:
- Love of data: If you don’t enjoy working with numbers and data sets, Data Science may not be the right career path for you.
- Curiosity: Curiosity may kill the cat, but it’s the single most defining trait you should cultivate as a data scientist. Being able to make logical deductions by asking the right questions and analysing the relevant data is an invaluable asset.
- Skepticism: Like the hosts of ‘MythBusters’, it’s good to be a skeptic. This trait, as we have discovered, helps you test your assumptions, override your biases and consequently, become a better data scientist.
- Numeracy: There is a curious phenomenon that we have noticed – some people are almost ‘proud’ to say that they cannot multiply large numbers or perform simple arithmetic in their head. But, to be a good data scientist, you need to be ‘numerically literate’. This doesn’t mean being a math genius or a stats geek. What it requires is a comfort with numbers, and an enthusiasm for data.
- Vision: It’s quite easy to miss the forest for the trees. When dealing with numbers and datasets, you need to be able to pull back and see how your work fits in the whole system. You need to know how to ask the right questions and find the best answers.
- Business acumen: Most data analysts will work either in academia or corporate environments. Business acumen, when applied to data science, adds a dimension of relevance and precision when communicating with ‘non-data scientists’.
- Willingness to practice: Getting good at anything takes time. Malcolm Gladwell proposed that it may take 10,000 hours of practice to reach ‘expert status’. Timothy Ferriss and James Clear purport that you need ‘active practice’ i.e. all your practice time should be spent on practicing stuff that you’re bad at to grow. This requires an ironclad will and determination to practice.
- Creativity & visualization: Getting insights from data is only one part of the solution. Presenting these insights in easily understandable ways adds a lot of merits and ‘punch’ to your tool box.
- Communication skills: Paraphrasing Seth Godin, “Ideas are a virus. They spread when you share.” For you to share your insights and help them spread, you need to be able to communicate effectively in print, vocally and on stage.
So, these are all the things you need to know to get into Data Science. I’m sure this guide was comprehensive and an eye-opener of sorts. If you feel analytics is the way forward to you for your career, I highly recommend you get started with Data Science today. The time is right and the opportunities are ripe. Go for it!