After years of talk about big data, hearing about small data can feel like a major pivot by the manufacturers of buzzwords. Small data, however, represents its own revolution in how information is collected, analyzed and used. It can be helpful, though, to get a handle on the similarities and differences between big and small data. Likewise, you should consider how the size of the data in a project impacts the project as a whole and what other aspects are worth looking at.
Both are typically the products of systems that extract information from available sources to conduct analysis and derive insights. At the big end of the scale, the goal is to filter through massive amounts of information to identify things like trends, undiscovered patterns and other bits of knowledge that may occur at scales that are hard for an individual analyst to easily identify. Moving to the small end of the scale, you tend to get into more granular data that will often be more digestible for one person.Macro trends tend to be big data. If you're trying to figure out how the bond spread relates to shifts in banking stocks, for example, you're probably working on the big end of the scale.Small data is granular, and it may or may not be the product of drilling down into big data. For example, a company trying to target social media influencers likely isn't looking to just turn up numbers. Instead, they want to have a list of names that they can connect with to put a marketing campaign into action.Another feature of small data is that it's often most prominent at either end of the analysis cycle. When individual user information goes into a database, for example, that's all small data. Similarly, targeted insights, such as the previously mentioned social media marketing plan, represent potential applications.Small data is also frequently more accessible to individual customers. It's hard to tell an e-commerce customer why they should care about macro trends, even if you're looking at what's going to be cool next season. Conversely, if you can identify an interest and send a coupon code, they can put small data to use right away.
The best way to think about this question is to consider the importance of using the right tool for the job. When sending coupon codes, for example, small data is a great tool because you can tailor each offer to an individual, a peer group or an identifiable demographic. As was noted, this can get very granular, such as providing hyper-localization of a push notification that only sends out a targeted offer when the customer is near a physical store location.Small data can be the wrong tool for many jobs, too. An NHL goaltender may need to see 2,000 shots before they can be fully assessed, for example. Thinking too much about a single good or bad season can skew the assessment significantly. A seemingly small data issue, player evaluation, calls for a big data mentality.
A good way to think about the other factors in assessing big versus small is to use the three V's. These are:
Volume speaks to the question of how much data there is. While there's a temptation to always want to feed a model more data, there's an argument on the small data end of things that consumable metrics are better. In others, if you feel like a problem demands volume, it's likely a big data task. Otherwise, it's probably a small data issue.Variety also indicates whether big or small data is the right way to go. If you need to drill down to a handful of metrics, small data is invaluable. If you need to look at many different data points, it may be a job for big data.Velocity matters because data tends to come in waves. This gets a little trickier because both small and big data needs can require constant refreshes. Generally, if you're looking to accumulate, it's big data. If you're trying to stay up to date, it's small data.