Look at the chart above. From a cursory view, it tells the sales trend of a product for a certain year. We can notice that the sales go up and down at various months throughout the year. If I were a member of the management team, I would make the interpretation that the sale of this particular product is somewhat stable for that year. However, I would not be able to tell whether how good the product’s sales numbers are in relation to its monthly (or yearly) sales targets, or how I am faring compared to my competitors for similar products. This is where using context can help enhance the quality of your data presentation to your audience.
We all have been in that situation. We go to a service counter for a specific purpose, be it passport renewal, registration of child birth, banking matters, or others. We take a queue number, which is quite a common system in place, and we wait our turn. People usually pass the waiting time by doing various activities, such as playing games, reading newspapers, or talking among themselves, while waiting for their turn. In many cases, as is common with public services that have to serve hundreds, if not thousands, customers daily, the wait can be exhausting.
For some time now, whenever I go patronize one of these services, I would record the service counters’ service time from the current number in queue up to my own number. With certain sample sizes, I was able to predict, with a fair accuracy, how long my number would be called. In cases where I have to go to the same office multiple times, its past data tremendously helped me save time. I could take my number, crunch it in my tracker, and I would have the estimated time for my number to be called. I could then go have some coffee nearby, for instance, without just waiting at the service hall.
There are many areas, processes, and disciplines relating to data science. A cursory search on Google alone would reveal different frameworks, theories, and opinions on wide varieties of topics relating to data science. For the average people, all these diverse topics can be confusing when all they need is just a cursory understanding of what data science is all about. Despite that, in my opinion, majority of the materials, diverse as they are, do share a common pattern and theme when stripped down to the basics. As such, a generalization can be made from the patterns and themes that will enable understanding of data science in lay terms.
The Rule of Three
For that purpose, I shall utilize the “rule of three” to articulate this generalization. Why rule of three? In lay terms, we humans tend to remember things easier in groups of three. Perhaps you have heard of the magical number seven, plus or minus two (7 ± 2), which has been touted as the limit of our brain’s working memory capacity. This research paper by George A. Miller explains the magic number in greater detail. Other research by Steven J. Luck and Edward K. Vogel indicates this number is actually closer to four. In a follow-up research, Edward K. Vogel (same one) and Maro G. Machizawa provide neurological methods that prove the difference in working memory capacity based on individual brains, which may explain why some people can hold more information at a time than others.
Regression analysis is a valuable tool in statistical analysis primarily because it allows analysts to predict, or regress as we prefer to call it, variables from sets of other variables. This method is one of the technique utilized in predictive analytics. Predictive analytics is a powerful arsenal to have in most scenarios as it allows users to envision what an outcome might be based on several inputs derived as a mathematical model.
As the prevalence of predictive analytics has seen a surge in popularity with the advent of things such as big data, I believe that the topic of extrapolation merits some attention. Oxford Dictionaries defines extrapolation as “extend the application of (a method or conclusion) to an unknown situation by assuming that existing trends will continue or similar methods will be applicable” with assuming being the key word here. In mathematical terms, Wikipedia refers to extrapolation as “the process of estimating, beyond the original observation interval, the value of a variable on the basis of its relationship with another variable” with beyond the original observation interval being the key phrase. Therein lies a problem.
For the past few months, I have posts several topics on data science, however none of them are geared for the average Joe. I think that in the spirit of learning from the basics, it is only fitting that I dedicate this post in lay terms for those who want to understand data science definition without delving into any technicalities about it.
I am certain that a lot have been written on data science definition all over the web, but I hope that my writing will enrich anyone’s understanding of the topic in addition to other materials. Please note that this post is written in the most basic way possible without compromising the basic understanding of data science. As such, certain generalizations are made to assist comprehension and understanding. However, please feel free to share any opinions in the comment section; I welcome all discussions on the matter.