We all have been in that situation. We go to a service counter for a specific purpose, be it passport renewal, registration of child birth, banking matters, or others. We take a queue number, which is quite a common system in place, and we wait our turn. People usually pass the waiting time by doing various activities, such as playing games, reading newspapers, or talking among themselves, while waiting for their turn. In many cases, as is common with public services that have to serve hundreds, if not thousands, customers daily, the wait can be exhausting.
For some time now, whenever I go patronize one of these services, I would record the service counters’ service time from the current number in queue up to my own number. With certain sample sizes, I was able to predict, with a fair accuracy, how long my number would be called. In cases where I have to go to the same office multiple times, its past data tremendously helped me save time. I could take my number, crunch it in my tracker, and I would have the estimated time for my number to be called. I could then go have some coffee nearby, for instance, without just waiting at the service hall.
Naturally, arguments can be made that there are many variables that account for the accuracy of such calculation, such as the number of counters open, whether it is weekday or weekend, whether it is in the morning or afternoon, or any other factors. Like many cases in predictive statistical modelling, the more complete data variations that we have, the better the accuracy of the predictive model.
This is where today’s post comes in. I have gathered quite a number of data over the past few years. However, they are in no way robust enough to be of use in every practical situation. At times, I have spent whole days sitting in some public service counters and just collect data after data. As you may guess, without actually procuring the government service, just collecting data can become a very tiring activity.
Nevertheless, I vehemently believe that such tool could be very useful for the common person, if only to have more control of time during such waiting times. Since waiting at public offices is something many of us have to do, that is when I thought about crowd-sourcing the data collection part.
Every each one of us regularly go to banks, post offices, department of motor vehicles, and other public services. It would be nice if some of us can record the relevant data individually and then share the results publicly. Algorithms can then be developed and used to predict service time performance in any applicable public office based on the data, which can then be used by anyone interested to have more predictability in their waiting time at public offices. Given enough variability, we can eventually predict waiting times for irregular times, such as school holidays.
The graphic below is a sample of one of the data sets that I have gathered, which I believe is as good as any for our simple data collection. It has to be simple so that we, the crowd-sourced party, would not be overwhelmed by barrage of numbers and tables. This file is constructed using Microsoft Excel, but any compatible spreadsheet will do. For our crowd-sourced purpose, we have to make the data as accessible and simplistic as possible.
In the sample above, the blue-coloured number represents the data collector’s number, and the current number at the time he entered the premise was 1059. The column Time Called is the actual time a particular number is called. Combined with other actual times, they can be used to develop a simplistic model to predict when our number, 1069, would be called. The Duration between Numbers column calculates the duration between two numbers, and essentially, represent the average duration of all service counters in between two numbers. This data can be further segregated by individual service counters if possible, but I find that for the purpose of a lay person, that level of detail is not necessarily required.
The column Average Duration represents the averages of all the durations gathered in the previous column. It is iteratively calculated at each row to include the row’s Duration between Numbers figure, and as such, this measure tends to stable towards a single figure after a few samples. The column Predicted Time predicts when our number of interest, 1069, would be called. As you may notice, the larger the sample size we have, and the closer we are to the number of interest, the more accurate the predicted time would be – at least in this sample. This is illustrated below.
I realise that this is a simplistic model and in an actual statistical model, we would need much more sample and variability in the data set, along with a more accurate prediction model rather than just a simple calculation of means. However, this sample is only used here to demonstrate what I propose – that if we together as members of a community contribute waiting time data, they can be used to provide fairly accurate prediction for waiting times at various service counters.
Even in this rather simplistic model, accuracy was achieved just after about seven data points. And this is only from one small segment of time. Imagine if we have pools of data to analyse from, the model can be even more accurate.
I hope this entry has provide adequate story about how this crowd-sourced effort can potentially benefit the community and the public at large. I foresee a potential application of it in the practical sense, and would like to develop it further. In the coming posts, I would propose several templates that can be used at different service counters. I would also explain how the data can be gathered centrally and what kind of analysis we can perform on it. Until then, if you have any comments, please, share them below.