By: Kathleen Ayers, Data Integration Engineer Manager
In the world of data, bigger is better. Bigger data produces more insights and more robust analyses. A big database is a competitive advantage and data companies are continually looking for ways to expand and enhance their data sets to get an edge on their competition.
Large databases can provide an infinite number of ways to solve a problem or answer a question. Theoretically, the endless possibilities seem great. However, as Barry Schwartz explains in his TED Talk on the paradox of choice, while some choice is better than no choice, too much choice can produce paralysis rather than liberation.
Let’s say as a hotel revenue manager you notice a year-over-year decline in mid-week transient RevPAR over the past month, and you want to figure out the cause of the decline. You could approach your data from several different angles, but how can you find the best-suited, most accurate solution to your problem without feeling overwhelmed?
Here are three simple steps that will help you maximize your efficiency, minimize your choice paralysis and guide you to the best solution.
1. Formulate a research question
Determine the exact research question you need to answer before diving into your data. If you wander into the forest of data without a set destination, you may get lost (hopefully you left a trail of breadcrumbs). Dive into your data with a clear intention and purpose in mind so that you can avoid distraction and tangents.
In our example, you state your research question as, “Why did transient RevPAR suffer a 10% year-over-year loss in April 2017?”
2. Write a hypothesis
Take an informed guess at the answer to your research question. Quickly brainstorm the potential paths and data points you could use to reach a conclusion and choose a path that seems promising. If your path leads to a dead end, back up and try again.
Given your knowledge of the hotel, you hypothesize “Transient RevPAR decreased in April due to the loss of two large corporate contracts.” When you test this hypothesis, you find that corporate business actually appears relatively stable; therefore, you try again with “Transient RevPAR decreased because of a decline in package offers.”
3. Restrict your sample size
Think about whether you really need to use ALL of the data available to draw a conclusion. In statistics, you learn that you only need a representative sample of a population to draw meaningful conclusions. Sometimes, focusing on a small set of confirmation numbers or a small group of hotels will prove your point.
To test hypothesis 1, you can look specifically at transient corporate RevPAR in April 2016 versus April 2017. To test hypothesis 2, you can look at transient package rates that were offered each year, which leads you to find that several package rates were discontinued since last year. In both cases, your hypothesis is specific enough to allow you to look at a limited set of data to find answers.
Whether you are reading a paper report, or querying your company database, specifying a research question, writing a hypothesis, and restricting your data set will help you navigate the sea of big data efficiently. By limiting your choices, you will find a clear and direct route to your solution.