With all that’s been going on at the moment and how precarious the job market is, you may be wondering how to secure your future and move up the corporate ladder. Well, things may look a little glum with the unemployment rate rising to 7.78% in February, the highest since October 2019. The important thing […]

The post Know how to secure your future with the ‘Analytics India Salary Study 2020’ appeared first on Analytics Training Blog.

]]>With all that’s been going on at the moment and how precarious the job market is, you may be wondering how to secure your future and move up the corporate ladder. Well, things may look a little glum with the unemployment rate rising to 7.78% in February, the highest since October 2019. The important thing to remember is that following the herd will not lead you anywhere. The herd mentality is poisonous to success. With colleges churning out graduates by the thousands, the actual employability of those graduates is a concern. According to a survey conducted for the Salary Study 2020, we found out that only 46% of the students were found to be employable. Clearly, classroom training without real-world exposure isn’t worth much. With salary hikes in 2020 not expected to ring any celebration bells, it sure doesn’t look very promising. With such grim statistics staring us in the face it’s easy to block our minds to the very real opportunities out there.

History has always seen some domains hold more importance than the rest at various points in time. Our job is to figure out the jobs that have the most scope and build our future around it. The best and easiest way to do that is to upskill and work in fast-growing domains. With stiff competition, getting noticed is the key to success.

Technology has driven innovation and we are more dependent on it than ever before. India has welcomed technology with open arms and with initiatives like ‘Digital India’, it will only encourage the integration of technology in all spheres of life. India had over 480 million Internet users in 2018 and the number is consistently increasing. With so much data being generated, organizations are slowly recognizing the importance of drawing insights from the data and using it to their advantage. How do you think Amazon or Myntra recommends items that are similar to your past shopping sprees? This is all done by analyzing the data collected from the customers over a period of time.

Because of the growing importance of data, the demand for data analysts is at an all-time high. There are expected to be 1.5 lakh new job openings for Data Science professionals in 2020. Salary Study says that Analytics is one of the hottest domains with 97,000 unfilled openings in 2019 and the median salary of analytics practitioners is at 14.4 lakhs.

They draw the highest salaries of all IT professionals. The growth of a domain is evidenced by the salaries the professionals earn and Analytics is a burgeoning field.

Emerging technologies like Cybersecurity, Cloud Computing, Artificial Intelligence (AI), etc are also experiencing a dearth of qualified professionals. These are fast-growing industries and there will be an increase in the number of job openings. The best way to stand out from your peers is to add to your skillset. In today’s world, more emphasis is placed on skills and experience.

As mentioned earlier, herd mentality is to be disposed off if one wishes to progress in their career. Analyzing the facts and drawing conclusions by oneself is the only way to make the right decisions. The **Analytics India Salary Study 2020** by Analytics India Magazine & Jigsaw Academy was prepared with the intention of guiding people intent on progressing in their career. This comprehensive guide on the current state of the Analytics and Data Science domains with an emphasis on salary trends will give you all the information needed to make an informed decision on the best career course. Do your research and make decisions based on numbers and not hearsay.

The post Know how to secure your future with the ‘Analytics India Salary Study 2020’ appeared first on Analytics Training Blog.

]]>Gone are the days when lasso wielding cowboys fought crime atop their horses and saved the day. In this day and age, we need weapons of a different kind to battle crime. Data, algorithms, analytics – these are the newest addition to help law enforcement officials maintain law and order. ‘Predictive policing’ is the art […]

The post Big Data Analytics To Beat Crime appeared first on Analytics Training Blog.

]]>Gone are the days when lasso wielding cowboys fought crime atop their horses and saved the day. In this day and age, we need weapons of a different kind to battle crime. Data, algorithms, analytics – these are the newest addition to help law enforcement officials maintain law and order. ‘Predictive policing’ is the art of predicting crimes before it takes place. It’s not about apprehending the right individuals, but about preventing the crime. Dozens of American cities – Los Angeles, Washington, Atlanta, Georgia, Tacoma, and Santa Cruz – are using an analytics software called PredPol to beat crime. This technology is based on the simple fact that humans are creatures of habit. No matter how much we might like to deny it, we stick to patterns and our preferences. We rarely stray from the known. This same trait is exploited by shopping websites that give us suggestions based on our past shopping sprees.

To oversimplify it, data is collected from public records, social media, and other sources of information and algorithms are used to determine the probability of the crime – the geographical location, approximate time, demographic – and action is taken accordingly. Of course, it is not possible to pinpoint the exact date, time, nature, or the culprit of the predicted crime but it gives the police officers an idea of the crime hotspots so that they can respond as soon as possible and prevent disasters. It’s all about being in the right place at the right time.

The fodder for predictive policing is data. The geographical location, climate, economic conditions, time (paydays, holidays, etc), and a bunch of other factors are keyed in before analyzing the data. The huge amount of data generated every passing day makes it all the more difficult to sift through it and that’s where Big Data comes in. Big Data Analytics is helpful in predicting the kind of crimes an area might be susceptible to at any given time. Maybe burglaries will be more prevalent in a certain neighborhood during a particular time period? This helps to mobilize the police force to watch out for such events. This is especially convenient when the taskforce is stretched thin and concentrated efforts are welcome. Tennessee witnessed a 30% decline in serious crime and a 15% decline in violent crimes on the implementation of predictive analytics. Though human crime analysts have the advantage of instinct, this system has proven to be more effective.

Big Data has also been used to aid in criminal investigations and serve justice when needed. Let’s look at some of the ways analytics has proven to be a boon to the law enforcement sector.

**Gun Violence**

Guns are a common accessory in crimes and the world is witnessing an increase in gun usage, especially in the US. It only makes sense to track these weapons of destruction to catch the criminals themselves. It has become possible to trace the weapon or bullet found at the crime scene to know more about the manufacture, usage, and ownership to aid in the investigation. Guns have more or less become a form of currency for criminals with it being supplied across borders. Keeping tabs on the movement might lead to the criminals themselves.

**Human trafficking**

Technology has made it possible to keep tabs on sex crimes like trafficking, assaults, pedophilia, etc. It’s hard to track them without a system. It’s hard for the officers to draw inferences from the growing pool of data without analytics. Analytics makes it easier to keep tabs on offenders, make connections that might have gone unnoticed, and co-ordinate the work across multiple departments. This has resulted in an increase in the number of cases being solved and the people being rescued.

Think like a criminal to catch a criminal. Criminals are coming up with ingenious ways to commit crimes but we have shiny new armor to combat it in the form of technology. It’s to be expected that it will get shinier with the advancements in tech. Integrating Big Data Analytics and predictive policing is the way forward as it has yielded concrete positive results. It’s always a smart move to head into battle with our strongest armor and arsenal to turn the tide in our favor.

The post Big Data Analytics To Beat Crime appeared first on Analytics Training Blog.

]]>Businesses today are becoming increasingly data-driven, something that has also led to an increase in the need for handling and understanding data. A scenario like this quite inevitably leads to a demand for data analysts and business analysts alike. However, professionals who are new in the field of data analysis may often get confused about […]

The post The difference between data analysts & business analysts appeared first on Analytics Training Blog.

]]>Businesses today are becoming increasingly data-driven, something that has also led to an increase in the need for handling and understanding data. A scenario like this quite inevitably leads to a demand for data analysts and business analysts alike. However, professionals who are new in the field of data analysis may often get confused about the differences between the two.

While both data analyst & business analyst interpret data to make informed business decisions, there are some fundamental differences between the two.

Where the primary role of a data analyst is to gather and analyse data, a business analyst analyses data from a more business point of view. Here is a more detailed breakdown of how a data analyst is different from a business analyst.

A data analyst is a specialist who collects the data, processes it and then statistically performs a report on that data. Business houses or companies can take fruitful decisions for their companies from the analysis provided by data analysts. Various organisations collect relevant data such as logistics, market research, sales figures, and transportation costs, and more, which is where the role of a data analyst fits in. A data analyst can help by studying and analysing available raw data and offering profitable solutions to business problems.

An able data analyst should be able to make a hypothesis, experiments and inferences from the data available at their disposal.*Problem-solving and critical thinking:*A proficient data analyst should be comfortable and skilled in collecting, understanding and manipulating large amounts of data available.*Data management and analysis:*A thorough knowledge of programming is not only useful, but most of the times necessary to solve problems where ready-made software may not be a viable or flexible choice.*Programming:**Visualisation and communication skills***:**A data analyst must be able to put forth the finding in an accessible and informative manner to aid decision-making.

A data analyst’s job is broader and more extensive than a business analyst. Some of the avenues that a data analyst may pursue a career in include data quality, higher education, sales, marketing, data assurance, and more.

The role of a business analyst helps the growth of a business into an influential trade in the given market. A business analyst can indirectly impact the financial prospects of a company by taking critical decisions. Therefore, the primary job of a business analyst entails examining and interpreting data for making changes in policies, information systems, and business processes. A business organisation can move towards better productivity, efficiency, and profitability when guided by an able business analyst. In simpler terms, a business analyst understands how a business works and determines ways to improve its existing processes by identifying and designing new features to implement.

A business analyst needs to lead a team for helping and directing team members in solving problems.*Good leadership skills:*Outstanding analytical skills can help a business analyst for analysing data, user inputs, documents, workflow, and more.*Enhanced analytical skills:*A good hang of database concepts, hardware capabilities, operating systems, and networking skills will come handy for a business analyst to understand operations better.*Technical knowledge:**Processing and planning skills***:**Planning for the scope of the project, understanding its requirements, and knowing how to implement them are a must for any professional to succeed as a business analyst.

Lately, there has been a boom in business analyst jobs, especially in the information technology sector. Some of the business analyst jobs include roles as a computer systems analyst, information security, analyst, budget analyst financial analyst, management analyst, and more.

Whether it’s the job of a data analyst or a business analyst, each comes with its own set of advantages. As a business analyst, you get to have the opportunity to establish a broader network and stronger alliances. You can also have a fast-paced career with visible growth. While, if you choose to be a data analyst, then you have more comprehensive job profiles to choose from. The demand for data analysts is continually rising with most jobs offering a handsome payoff too.

This article is an updated version of the article titled – **What’s The Difference Between Data Analysts And Business Analysts?**

The post The difference between data analysts & business analysts appeared first on Analytics Training Blog.

]]>For every speaker, there is always a saying which goes “be watchful with words you use in your talk”, “you cannot have the same keys words for every speech”. To make the speech interesting and to make it suit the context, the set of words used should be different and must be tweaked based on […]

The post NLP & Topic Modelling to Extract Complex Data appeared first on Analytics Training Blog.

]]>For every speaker, there is always a saying which goes “be watchful with words you use in your talk”, “you cannot have the same keys words for every speech”.

To make the speech interesting and to make it suit the context, the set of words used should be different and must be tweaked based on the audience. It’s very important for every speaker to be very watchful with the words he or she uses in a speech. Be it a faculty, a politician or a comedian or a film star one must be very cautious about the words during their speeches. While speaking people tend to get carried away, usage of certain inappropriate words can create awkward situations to the reputation of the speaker.

Tweaking speeches is the most difficult task and a data driven approach helps in identify the distribution of words used in a speech and the topics which a speaker wants to cover in a speech. Before every speech, the speaker reviews and finalizes different versions speech transcript. Each transcript is reviewed multiple times to arrive at the final speech for delivery.

While preparing the transcript for the speech the speaker lists out the ideas or the topics which is to be conveyed in each of the versions. Too many ideas or too less ideas makes the speech ineffective. This transcript is reviewed multiple times and changed are incorporated based on the need.

Review of speech transcripts can be more comprehensive by following some of the NLP techniques like topic modelling. Topic modelling is a technique which extracts hidden topics from a group of documents. Here one document represents of a speech transcript. For each of the topic extracted, we will get the distribution of words and for each document we shall get the distribution of topics contained in these documents. This would help the speaker to review the usage of words used.

The post NLP & Topic Modelling to Extract Complex Data appeared first on Analytics Training Blog.

]]>Hoping to make a career switch to data science, there are a ton of questions to tackle: Which languages should I learn? Which skills do I need? Should I shell out money for a training program? But most of all, you might be wondering, Where do I start? With this article, we hope to provide […]

The post 5 Tips to Switch Career Towards Data Science appeared first on Analytics Training Blog.

]]>Hoping to make a career switch to data science, there are a ton of questions to tackle: Which languages should I learn? Which skills do I need? Should I shell out money for a training program? But most of all, you might be wondering, Where do I start?

With this article, we hope to provide a starting point.

The dominant traits of anyone who has the goal to become a data scientist include an intense curiosity and the dedication to seek for information.

Therefore, coming from the best, it’s clear that you don’t have to be the most technically-sound person in town to become a data scientist. This should come as an encouragement for all of you out there who are from a non-technical background and do the same thing.

Here is a simple yet effective tips for those who want to transition from a non-technical background to become a data scientist.

It would be highly recommended to enroll for a well-curated course. An ideal curriculum should cover the basics of programming in Python and R and, deep learning, data visualization and Big Data handling, Statistics, and probability.

The best part about having a degree in data science that it would not only amp the value of your CV but also enhance your knowledge in the field through several assignment and examinations.

The most important first step is to speak and think like a Data Scientist. What does that mean? First, learn how data scientists speak. What terms do they throw around frequently (e.g., scikitlearn, matrix-factorization, eigenvectors)? Don’t be afraid, just take notes on the words you don’t understand. Why? Learning the vocabulary is the first step in learning and communicating data science.

I eluded to this a bit earlier but, learning by doing is ultimately the best way to learn. Spend time looking at the kernels in Kaggle competitions to learn from how other Kaggler’s approached the competition. At first, this will be *extremely daunting*, you *won’t understand 95% of the code you’re reading*, let alone, *you probably won’t be able to run the code on your own computer even after you’ve cloned it.*

The most important part of Kaggle to an aspiring Data Scientist is the “Kernels” section. Here, fellow Kaggler’s post their solutions to the problems posed by the competition. Spend at least an hour of your time, TYPING and CODING out their solution — practice typing each line, line-by-line in your own Jupyter Notebook. Run the code and see what happens

**This is where you need to be persistent.**

You aren’t going to learn anything if you get frustrated, so ease yourself into engaging with these challenges and soon enough you’ll be able to understand the kernels you read.

**Remember, when setting goals, be realistic about them (e.g., SMART goals): **Specific, Measurable, Attainable, Realistic, Time-Bound (SMART).

In other words, don’t think you’ll be reading Kaggle kernels within a week.

Give yourself a** specific, realistic and time-bound goal —**

Set small goals, write them down and check them off when you achieve them. When you feel frustrated, go back to these checkmarks and see how far you’ve come since yesterday.

Find a project you’re passionate about, whether it be a problem you’d like to solve or a library you’d like to learn — turn this into a project that you’ll put onto your github as a portfolio piece.

Finding a problem is best done through conversations. Engage with your community, your friends or… Even strangers. Find out what bothers them, or talk to them about ideas you’ve always had.

Hash out your idea, make it simple. Your project isn’t going to change the world. The most important part here is to start on one.

When you step into the field of Data Science, you are more likely to have peers or superiors in the field with a STEM background. Remember that to become a data scientist, knowledge of certain core subjects is indispensable. Although it’s encouraging to know that willpower can get you anywhere in life, there has to be a methodical approach to what you do.

Strengthen your basics and read up on all that you can get your hands on related to data science. Understand that you are never going to finish learning, but you have to keep up the spirit of intellectual curiosity at all times.

This mentality will make your transition from a non-technical field to data science both hassle-free and interesting! For more inspiration, check out this link on real-life examples of people who made it in data science despite their non-technical background.

A career transition is never easy, especially if you’ve just begun your journey. During my transition, I kept this quote close to my heart:

“The best time to start was yesterday, the next best time is **NOW.**”

The fact that you’ve read this entire article and are engaging with this sentence today, should show yourself you’re ready to start your transition.

The post 5 Tips to Switch Career Towards Data Science appeared first on Analytics Training Blog.

]]>Regression is a statistical technique that finds a linear relationship between x (input) and y (output). Hence, the name Linear Regression. The equation for uni-variate regression can be given as Where, y – output/target/dependent variable; x – input/feature/independent variable and Beta1, Beta2 are intercept and slope of the best fit line respectively, also known as regression coefficients. Task […]

The post Fine-Tuning your Linear Regression Model appeared first on Analytics Training Blog.

]]>Regression is a statistical technique that finds a linear relationship between x (input) and y (output). Hence, the name Linear Regression. The equation for uni-variate regression can be given as

Where, y – output/target/dependent variable; x – input/feature/independent variable and Beta1, Beta2 are intercept and slope of the best fit line respectively, also known as regression coefficients.

Task is to find regression coefficients such that the line/equation ** best fits **the given data. Regression makes assumptions about the data for the purpose of analysis. Because of this, Regression is restrictive in nature. It fails to build a good model with datasets which doesn’t satisfy the assumptions hence it becomes imperative for a good model to accommodate these assumptions.

**Example:**

Let us consider an example where we are trying to predict the sales of a company based on its marketing spends in various media like TV, Radio and Newspapers. The dataset is shown below:

Here the columns TV, Radio, Newspaper are (input/independent variables) and Sales (output/ dependent variable). we will try to fit a linear regression for the above dataset. Below is the python code for it:

Once the linear regression model has been fitted on the data, we are trying to use the predict function to see how well the model is able to predict sales for the given marketing spends.

When we apply the regression equation on the given values of data, there will be difference between original values of y and the predicted values of y. They are referred to as Residuals

**Residual e = Observed value – Predicted Value**

The score function displays the accuracy of the model which translates to how well the model can accurately predict for a new datapoint.

**Assumptions for Linear Regression**

**1. Linearity**

Linear Regression can capture only the linear relationship hence there is an underlying assumption that there is a linear relationship between the features and the target. Plotting a scatterplot with all the individual variables and the dependent variables and checking for their linear relationship is a tedious process, we can directly check for their linearity by creating a plot with the actual target variables from the dataset and the predicted ones by our linear model. If the plot trend seems to be linear, we can assume that the features would also be linear.

**2. Normality check for Residuals**

To test for normality in the data, we can use Anderson-Darling test

**Interpretation:**

Each test will return at least two things:

**Statistic:** A quantity calculated by the test that can be interpreted in the context of the test via comparing it to critical values from the distribution of the test statistic.

**p-value:** Used to interpret the test, in this case whether the sample was drawn from a Gaussian distribution.

If p-value <= alpha (0.05) : Reject H0 => Normally distributed

If p-value > alpha (0.05) : Accept H0

Since our p-value 2.88234545e-09 <= 0.5, we accept the alternate hypothesis, which infers us that the data is not normally distributed. To get the data to adhere to normal distribution, we can apply log, square root or power transformations.

To figure out the suitable transformation method to be applied on our data, we must try all of them and check which one gives us more accuracy. I have used power transformation for the dataset.

After applying the transformation, we can once again check for the normality

Since 0.10111624927223171 > 0.05 , we accept H0, which states that the data is normally distributed. The regplot also shows that the same.

**3. Multicollinearity**

Multicollinearity refers to correlation between independent variables. It is considered as disturbance in the data, if present will weaken the statistical power of the regression model. Pair plots and heat maps help in identifying highly correlated features

**Why Multicollinearity should be avoided in Linear Regression?**

The interpretation of a regression coefficient is that it represents the mean change in the target for each unit change in a feature when you hold all of the other features constant. However, when features are correlated, changes in one feature in turn shifts another feature/features. The stronger the correlation, the more difficult it is to change one feature without changing another. It becomes difficult for the model to estimate the relationship between each feature and the target independently because the features tend to change in unison.

**Treatment**

The Variance Inflation Factor (VIF) is a measure of collinearity among predictor variables within a multiple regression. It is calculated by taking the the ratio of the variance of all a given model’s betas divide by the variance of a single beta if it were fit alone.

** V.I.F. = 1 / (1 – R^2).**

VIF measures how much the variance of an estimated regression coefficient increases if your predictors are correlated. The higher the value of VIF for ith regressor, the more it is highly correlated to other variables.

VIF value <= 4 suggests no multicollinearity whereas a value of >= 10 implies serious multicollinearity.

Since the VIF values are not greater than 10, we find that they are not correlated, hence would retain all the 3 features**.**

**4. Autocorrelation**

Autocorrelation refers to the degree of correlation between the values of the same variables across different observations in the data. The concept of autocorrelation is most often discussed in the context of time series data in which observations occur at different points in time (e.g., air temperature measured on different days of the month). For example, one might expect the air temperature on the 1st day of the month to be more similar to the temperature on the 2nd day compared to the 31st day. If the temperature values that occurred closer together in time are, in fact, more similar than the temperature values that occurred farther apart in time, the data would be autocorrelated.

However, autocorrelation can also occur in cross-sectional data when the observations are related in some other way. In a survey, for instance, one might expect people from nearby geographic locations to provide more similar answers to each other than people who are more geographically distant. Similarly, students from the same class might perform more similarly to each other than students from different classes. Thus, autocorrelation can occur if observations are dependent in aspects other than time.

Autocorrelation can cause problems in conventional analyses (such as ordinary least squares regression) that assume independence of observations. In a regression analysis, autocorrelation of the regression residuals can also occur if the model is incorrectly specified. For example, if you are attempting to model a simple linear relationship but the observed relationship is non-linear (i.e., it follows a curved or U-shaped function), then the residuals will be autocorrelated.

**How to detect Autocorrelation**

Autocorrelation can be tested with the help of Durbin-Watson test. The null hypothesis of the test is that there is no serial correlation. The Durbin-Watson test statistics is defined as:

DW statistic must lie between 0 and 4. If DW = 2, implies no autocorrelation, 0 < DW < 2 implies positive autocorrelation while 2 < DW < 4 indicates negative autocorrelation.

The DW values is around 2 , implies that there is no autocorrelation.

Presence of Autocorrelation implies that there is some more information that our model is missing to explain.

**5. Homoscedasticity**

Homoscedasticity describes a situation in which the error term (that is, the “noise” or random disturbance in the relationship between the features and the target) is the same across all values of the independent variables. A scatter plot of residual values vs predicted values is a goodway to check for homoscedasticity. There should be no clear pattern in the distribution and if there is a specific pattern, the data is heteroskedastic.

Generally, non-constant variance arises in presence of outliers or extreme leverage values. Look like, these values get too much weight, thereby disproportionately influences the model’s performance.

The leftmost graph shows no definite pattern i.e constant variance among the residuals,the middle graph shows a specific pattern where the error increases and then decreases with the predicted values violating the constant variance rule and the rightmost graph also exhibits a specific pattern where the error decreases with the predicted values depicting heteroscedasticity.

From the above plot we could infer a U shaped pattern , hence Heteroskedastic.

**How to handle Heteroskedasticity**

Redefine the variables

Weighted regression

Transform the dependent variable

Even after transforming the accuracy remains the same for this data.

The coefficients and intercept for our final model are:

The equation now gets transformed as:

**sales= 0.2755*TV + 0.6476*Radio + 0.00856*Newspaper – 0.2567**

**Question 1: My company currently spending 100$, 48$, 85$ (in thousands) for advertisement in TV, Radio Newspaper. What will be my sales in next quarter? I want to improve sales to 16 (million$)**

Create a test data & transform our input data using power transformation as we have already applied to satisfy test for normality

Manually, by substituting the data points in the linear equation we get the sales to be

The prediction by our linear model is

**How much I need to invest in TV advertisement to improve sales to 20M?**

Target – 20 million

Current sales – 16.58

Difference = 3.42

We should compute difference to be added for the new input as 3.42/0.2755 = 12.413

The new equation is:

**We could see that the sales has now reached 20 million$**

Since we have applied a power transformation, to get back the original data we have to apply an inverse power transformation

**They will have to invest 177.48 (thousand$) in TV advertisement to increase their sales to 20M**

**2.How much I need to invest in Radio advertisement to improve sales to 20M?**

Target – 20 million

Current sales – 16.58

Difference = 3.42

We should compute difference to be added for the new input as **3.42/0.6476= 5.28**

The new equation is:

**We could see that the sales has now reached 20 million$**

Since we have applied a power transformation, to get back the original data we have to apply an inverse power transformation

**They will have to invest 73.76 (thousand$) in Radio advertisement to increase their sales to 20M**

Similarly, you can compute for Newspaper and figure out which media’s marketing spend is lower and at the same time helps us achieve the sales target of 20 (million$).

The post Fine-Tuning your Linear Regression Model appeared first on Analytics Training Blog.

]]>