Business Intelligence Specialist at Bell
In this day and age, data analytics is not just a buzzword, but also a new fact of business life. A variety of corporations rely on data analytics to make business decisions. With the increasingly popularity of data analytics, more and more technology tools appeared in market. This article compares three top data analysis packages: Excel, SPSS and SAS in aspects of user friendliness, data analysis capability and popularity.
Choosing data analytics tool is a tradeoff of costs and benefits. However, sometimes people can be as partisan about their favorite statistical package as they can be about their favorite hockey team. These emotional attachments could cause big trouble. Each statistical package has its own strengths and weaknesses; the choice of data analytics tool should be based on rationales, rather than individual preference.
Selecting a data analytics tool is not just about picking a single package and sticking on it, but a process of learning and utilizing a toolkit of package. Analytics tool should not only satisfy current needs, but also address the strategic objectives of whole project. Knowing a single package is not enough; a combination of several tools is always the best way to solve problems.
Data analytics tools include two divisions: programming operation and windows operation. (Michael, Michele & Ambiga, 2013) Among the tools we discuss, SAS is more programming-oriented while Excel and SPSS are based on interactive windows. SAS is a business application developed by SAS Software. It is a large information system for decision-making and business intelligence support. Excel is developed by Microsoft, which is a spreadsheet with intuitive interface, outstanding computation capability and excellent chart tools. (esProc, 2012) SPSS is the abbreviation of Statistical Product and Service Solutions. The software was produced by SPSS Inc. and acquired by IBM in 2009. (SPSS, 2014)
From the perspective of operation interface, Excel executes user’s commands through interactive windows. Excel users can create spreadsheet, enter data, and insert chart through simply clicking the mouse, not requiring extra efforts to write code or learn programming language. In addition, Excel provides various functions for basic statistical calculation. After type function name, formulas and criteria will pop up automatically so users do not need to memorize function format by themselves. SPSS is also based on windows operation. Most features could be accessible via pull-down menus. It also provides a programming platform that uses a proprietary 4GL command syntax language. Some complex applications can only be programmed in syntax and are not accessible in menu structures. (Levesque, 2007) On the contrary, SAS is more programming oriented. Most of its features should be achieved with SAS programming language and each procedure includes several statements. Users need to remember all of the codes and any tiny errors in programming will cause the stop of execution.
Chart & Graphics
Firstly, a big advantage of Excel is its ability to produce beautiful data visualization on dashboards with graphs, story-telling chart, etc. (Alexander, 2014) Excel offers a great deal of templates for users to quickly create chart as well as edit layout. Users can easily format the graphics with easily pointing-and-clicking. Secondly, SPSS’s chart-builder is designed to produce various types of graphs. SPSS cannot create chart with the spreadsheet data automatically; users have to appoint x-axis, y-axis and variables manually. Besides, SPSS does not provide much use beyond standard statistical graphs; chart produced is not as visually appealing as Excel. Compared with tools above, graphic procedure in SAS is much more complex. Many hours might be spent trying to get SAS GRAPH to do just the right thing – a little tweak here, a little tweak there – and this process can go on, and on. (Rick, 2013)
To present the results, we will use a small example of making a bar chart in different platforms. In Excel, users only need to select the data and insert a bar chart. After that, they could easily click through the options for fill, line, shadow, 3-D effect and glow & soft edges to format the graph. In SPSS, users are required to go to the graph builder page first, and then designate chart-style (Horizontal Bar/Vertical Bar), variables, x-axis and y-axis. For SAS, its graphics are created largely using syntax language, which forces users to write a chart procedure followed by a series of statements regarding the group variables, sub-group variables, visual style and all other elements shown in the chart. While SAS Graphic procedure is very complicated, it has its own advantages: by weaving SAS macros into programming code, users have the ability to regenerate graphs on a routine basis, which provides them more flexibility and significantly reduces their work volume.
Flexibility In Building Model
As discussed above, Excel is based on windows operation and most features can be achieved through interactive menus, which make it extremely convenient for entering data and manipulating rows and columns. However, on the other side of the coin, the user friendliness is at the cost of analysis flexibility. With a cell as a unit, the support of excel for unstructured data is quite poor. (esProc, 2012) Problems of using Excel for statistical analysis include:
- Missing values are handled inconsistently, and sometimes incorrectly
- Many analysis can only be done on one column at one time, making it inconvenient to implement repetitive analysis on many columns
- Output is poorly organized, and there is no record of how an analysis was accomplished
Like Excel, SPSS is also built on interactive operation window. After incorporating with 4GL command syntax language, the software becomes much more flexible, allowing users to customize the model based on programming language. However, the syntax of SPSS is fairly poor with a complex command structure and its menu-style interface is a major obstacle of stepwise computation in regression analysis. (Eva, 2012) Compared with Excel and SPSS, SAS has shown its superiority in implementing advanced statistics analysis. First, SAS system is highly structured and strict to data, which makes it very flexible and trustworthy. Second, SAS is very powerful in the area of data management, allowing users to manipulate data in almost any format possible. Third, SAS is specifically robust in analysis of variance (ANOVA), general linear model and its extensions. Finally, SAS could show data analysis procedure step by step in its log window.
Information Management Capability
Information management has three levels: Descriptive Analytics, Predictive Analytics and Prescriptive Analytics. (Michael, 2013) Descriptive Analytics focuses on answering the question what is happening now and what has happened in the past. It is a method of quantitatively describing the characteristics of a collection of information to make a statistical summary of the sample or observations. Predictive Analytics goes further, which takes use of the past information to predict future outcomes with some degree of likelihood. In business, predictive model exploits patterns hidden behind historical data to identify risks and opportunities. An important assumption of predictive model is outside environment is stable. Only in a constant environment, it is possible to utilize former experiences to direct future activities. However, society is an ever-changing ecosystem rather than an invariant setup. In this situation, predictive model is not enough and prescriptive analytics should be applied to see what is the best outcome given environmental uncertainty. (Michael, 2013)
In terms of information management, Excel is still on the level of descriptive analytics. Users can bring data into Excel, sort and organize data, and use it to create reports and scorecards. However, these results can only indicate what had happened in the past, not what will happen in the future. SPSS could be utilized to build predictive models and conduct other analytics tasks. Its visual interface allows users to leverage statistical and data mining algorithms without programming. But the menu-driven feature of SPSS is a major obstacle in handling complex predictive analysis, let alone prescriptive analysis. SAS is doubtlessly the strongest in information management. This advantage is demonstrated by the fact that SAS has a specific module for predictive and prescriptive analytics, which is capable to glean important information from large databases using sophisticated and powerful data analysis techniques.
In the last twenty years, Excel has been the most popular spreadsheet application on the planet. Microsoft gauges the number of Excel users worldwide are more than 750 million. (Tom Warren, 2011) Based on the estimate of Forrest Research, there are still 50 to 80 percent companies relying on Excel spreadsheet for critical business applications like financial reporting. (Forrest Research, 2011) In the field of advanced and predictive analytics, SAS holds a 35.4% market share with total revenue of 768.3 million. (IDC, 2014) Up to 2013, SAS had been used at more than 70,000 sites in 135 countries and held the leading position since IDC started tracking advanced analytics in 1997. (SAS, 2014) Although SPSS ranked second in advanced analytics industry, its market share (17.1%) is only about half of SAS. From the perspective of growth rate, the revenue of Microsoft in advanced analytics increased by 17% in 2013 when the growth rate of SPSS and SAS was 10.3% and 7% separately. (IDC, 2014)
One important dimension to measure the popularity of software for data analytics is the number of job advertisements for each. To evaluate how popular each data analytics platform is, I searched job postings mentioned Excel, SPSS and SAS at Indeed.ca on Nov 11th, 2014. The results showed as many as 30,012 jobs required Excel skills. 618 job postings mentioned working knowledge in SAS while only 160 wanted-ads referred to SPSS. Therefore, in the data analytics job market, advanced Excel skill is a basic requirement, and expertise in SAS and SPSS is an obvious asset. From the perspective of job advertisement, SAS is much more popular than SPSS in data analytics.
Work place usage frequency is another way to estimate the relative popularity of data analytics software. Based on the data mining survey of Rexer Analytics, 57% of business professionals listed some form of data analysis as part of their daily jobs. Excel was still the most popular data-handling tool with more than 95% people using it on daily basis. But one-third respondents figured out that they only used Excel for basic information management rather than a serious tool for complex data analytics. In terms of SAS, 20% of survey participants utilized it as primary data analysis tool, 8% used it as secondary tool while 10% used it occasionally. For SPSS, 12% of respondents used it as primary analytic tool and 10% used it occasionally in workplace. (Rexer Analytics, 2013). Therefore, the usage frequency of Excel is greater than SAS and SAS is greater than SPSS.
Recommendations of Choosing Analytics Tools
Clarify Objectives First
In today’s data analytics market, each product has its own strengths and weaknesses. A golden role of choosing statistic program is looking for the platform that could serve the research objective better rather than a cure-all solution for all issues. Excel spreadsheet is easy to learn and friendly to use. It has a clear advantage in basic descriptive analysis. SPSS is all you need if you wanted to minimize your data management efforts without formulating complex software programs. SAS is a likely choice for a person who manages complex data analytics on a continuous basis. Choosing analytics tool is a tradeoff between benefits and costs: if you only opened a statistic program three times a week, it is not worthy to spend hundreds of hours to learn SAS language; but for people who use SAS several hours a day, the steep learning curve and complex command structure are only a small price to pay for its extraordinary power. Therefore, it is extremely important to choose a tool that is the most appropriate to achieve your research goals.
Know More Than One Statistic Package
Having knowledge of more than one statistic package is the precondition of making a smart choice because a single package has no way to be the most suitable platform for all of the projects. There are a variety of data analysis products available in the market, which makes it necessary to do some research before determine which platform you are going to use. It is unwise to rely on one singe statistic package to do all jobs in the analysis process. A strategically combination of packages could provide the greatest coverage while use the least time. For example, SAS is extraordinarily strong in cutting edge data analysis, but quite poor for graphics building. In this situation, a combination of SAS and Excel would allow users to present the findings of SAS with nice charts and tables made by Excel, significantly increasing the data analysis efficiency.
Making Strategically Choice
It is not uncommon in the workplace that data analytics proceeds more slowly and more painfully because of failing to select optimal tools. An ideal analytics tool should not only satisfy current data analysis needs but also address the strategic goal of the whole project. Too many analysts select tools simply based on ease of use, which could deny advanced features in other statistic packages that are not so hard to learn but very useful in practices. Besides, some tools might have initial ease of use in the early stage of the research, but become useless soon because with project moves forward, more advanced analysis should be implemented and more complex tools are required. Therefore, it is necessary to forecast the project’s future needs and evaluate various statistical packages based on their utility in the long run.