Categories
Big Data Business Intelligence Data Analytics

7 Steps to Start Thinking Like a Data Scientist

Having the skills needed to perform data science work is immensely beneficial in a wide range of industries and job functions. But at some point, it is also advantageous to develop a thought process that allows you to tackle problems like a data scientist. Here are 7 steps you can take to start thinking like one.

1. Understand How the Project Lifecycle Works

Every project needs to be guided through a lifecycle that goes from preparation to building and then on to finishing it. Preparation means setting goals, exploring the available data, and assessing how you’ll do the job. Building requires planning, analyzing problems, optimizing your approach, and then building viable code. Finally, finishing requires you to perform revisions, deliver the project, and wrap up loose ends. The lifecycle installs rails around the project to ensure it doesn’t suffer from mission creep.

2. Know How Time Factors into Cost-Benefit Analysis

Scraping the web for all the data you need may prove to be time-consuming, especially if the data needs to be aggressively cleaned up. On the other hand, purchasing data from a vendor can be expensive in terms of capital. There’s rarely a perfect balance between time and money so try to be receptive to which is more important on a particular project.

3. Know Why You’ve Chosen a Specific Programming Language

All programming languages have their unique strengths and weaknesses. For example, MATLAB is a very powerful language, but it often comes with licensing issues. Java handles work with a high level of precision, but it can be cumbersome. R is an excellent choice for people who need core math functions, but it can be limiting when it comes to more advanced functionality. It is essential to think about how your choice of a programming language will influence the outcome of your project.

4. Learn How to Think Outside of Your Segment of Data Science

 It’s easy to get caught in the trap of thinking certain processes are somehow more academically valid than ones aimed at the consumer market or vice versa. While something like A/B testing can feel very simple and grounded in the consumer sector, it may have applications to projects that are seemingly more technically advanced. Be open-minded in digesting information from sectors that are different from your own.

5. Appreciate Why Convincing Others is Important

Another common trap in data science is to just stay in your lane. Being a zealous advocate for your projects can make a difference in terms of getting approval and resources for them.

Develop relationships that encourage the two-way transmission of ideas and arguments. If you’re in a leadership position at a company, foster conversations with individuals who are closer to where the data gets fed into the meat grinder of analysis. Likewise, those down the ladder should be confident in presenting their ideas to people further up the chain. A good project deserves a representative who’ll advocate for it.

6. Demand Clean Data at Every Stage of a Project

Especially when there’s pressure to deliver work products, cleaning up data can sometimes feel like a secondary concern. Oftentimes, data scientists get their inputs and outputs cleaned up to a condition of “good enough” to avoid additional mundane cleaning tasks.

Data sets rarely just go away when a job is done, and that’s simply good practice for the sake of retention, auditing, and reuse. But, that also means someone else may get stuck swimming through a data swamp when they were expecting a data lake. Leave every bit of data you encounter looking cleaner than you found it.

7. Know When to Apply Critical Thinking

Data science should never be a machine that continually goes through the motions and automatically spits out results. A slew of problems can emerge when a project is too results-oriented without an eye toward critical thinking. You should always be thinking about issues like:

  • Overfitting
  • Correlation vs. causation
  • Bayesian inference
  • Getting fooled by noise
  • Independent replication of results

Welcome criticism and be prepared to ask others to show how they’ve applied critical thinking to their efforts. Doing so could very well save a project from a massive misstep.

Back to blog homepage

Categories
Big Data Business Intelligence Data Analytics

Top 5 Critical Big Data Project Mistakes to Avoid

Going to work on a big data project can leave you wondering whether your organization is handling the job as effectively as possible. It’s wise to learn from some of the most common mistakes people make on these projects. Let’s look at 5 critical big data project mistakes and how you can avoid them.

Not Knowing How to Match Tools to Tasks

It’s tempting to want to deploy the most powerful resources available. This, however, can be problematic for a host of reasons. The potential mismatch between your team members’ skills and the tools you’re asking them to use is the most critical. For example, you don’t want to have your top business analyst struggling to figure out how to modify Python code.

The goal should always be to simplify projects by providing tools that match their skills well. If a learning curve is required, you’d much prefer to have non-technical analysts trying to figure out how to use a simpler tool. For example, if the only programming language choices are between Python and R, there’s no question you want the less technically inclined folks working with R.

Failing to Emphasize Data Quality

Nothing can wreck a big data project as quickly as poor quality. The worst of possible scenarios is that low-quality and poorly structured data is fed into the system at the collection phase, ends up being used to produce analysis, and makes its way into insights and visualizations. 

There’s no such thing as being too thorough in filtering quality issues at every stage. You’ll need to keep an eye out for problems like:

  • Misaligned columns and rows in sources
  • Characters that were either scrubbed or altered during processing
  • Out-of-date data that needs to be fetched again
  • Poorly sourced data from unreliable vendors
  • Data used outside of acceptable licensing terms

Data Collection without Real Analysis

It’s easy to assemble a collection of data without really putting it to work. A company can accumulate a fair amount of useful data without doing analysis, after all. For example, there is usually some value in collecting customer service data even if you never run a serious analysis on it.

 If you don’t emphasize doing analysis, delivering sights and driving decision-making, though, you’re failing to capitalize on every available ounce of value from your data. You should be looking for:

  • Patterns within the data
  • Ways to benefit the end customer
  • Insights to provide to decision-makers
  • Suggestions that can be passed along

Most companies have logs of the activities of all of the users who visit their websites. Generally, these are only utilized to deal with security and performance problems after the fact. You can, however, use weblogs to identify UX failures, SEO problems, and response rates for email and social media marketing efforts.

Not Understanding How or Why to Use Metrics

The analysis necessarily noteworthy if it’s not tied to a set of meaningful and valuable metrics. In fact, you may need to run an analysis on the data you have available just to establish what your KPIs are. Fortunately, some tools can provide confidence intervals regarding which relationships in datasets are most likely to be relevant.

For example, a company may be looking at the daily unique users for a mobile app. Unfortunately, that company might end up missing unprincipled or inaccurate activity that causes inflation in those figures. It’s important in such a situation to look at metrics that draw straight lines to meaningful performance. Even if the numbers are legit, having a bunch of unprofitable users burning through your bandwidth is not contributing to the bottom line.

Underutilizing Automation

One of the best ways to recoup some of your team’s valuable time is to automate as much of the process as possible. While the machines will always require human supervision, you don’t want to see professionals spending large amounts of time handling mundane tasks like fetching and formatting data. Fortunately, machine learning tools can be quickly trained to handle jobs like formatting collected data. If at all possible, find a way to automate the time and attention intensive phases of projects.

Back to blog homepage

Categories
Big Data Business Intelligence Data Analytics

What is Search-Driven Analytics?

What is Search-Driven Analytics? 

One of the core features of a data analytics package is usually the dashboard. This gives users access to insights and data from a variety of sources.

Unfortunately, one of the biggest challenges that come with using a dashboard is trying to sift through the available information. This can be especially problematic at businesses that are generating hundreds of insights on a daily basis. So how exactly does someone get right to what they need without sifting through numerous reports or going down an alphabetized list?

What is Search-Driven Analytics Used For?

An answer that’s growing in popularity is search. This is not search in the sense that you’re familiar with from Google, although it operates much in the same way. Instead, we’re talking about a search-driven system that allows users to type in everyday sentences to get results from your data. 

For example, someone might go into the dashboard and type in “How many customer service calls did we take in 2019?” The backend for the dashboard will understand the query and return a list of results that are ranked from most relevant to least. Within seconds, the user can click on the appropriate analysis product and find the information they need.

How Does Search-Driven Analytics Work?

The engine that drives most search-driven technology is natural language processing (NLP). A technology that has been around since the 1950s, NLP has become vastly more powerful in the last decade due to increases in parallel processing in CPUs and GPUs.

An NLP system is usually designed to take a corpus of words to help a machine understand the basic ideas that underpin each bit of a dataset. When a user types in a search, the NLP algorithm will compare the search query against the scores it has for each dataset. Those datasets that look the most likely to match are then returned as a result.

Especially in settings where organizations have narrow interests, NLP can be very powerful and precise. A logistics firm, for example, might have an NLP setup that can answer questions like:

  • “Where are all current shipments from Asia?”
  • “How long does the average delivery take?”
  • “What were the company’s fuel costs for January?”

These are extremely natural questions for a user to want answered, and the search-driven dashboard can address them with ease in the vast majority of cases.

The Role of Data Visualizations

Another benefit of this approach is that data is usually fed straight into the dashboard in the form of visualizations. If you enter a query to see how many widgets were sold each month, you can click on the right item and a graph of widget sales will appear on the right-hand side. There’s no need to run an analysis engine or load anything into Excel to get a viable work product. It simply appears in a matter of seconds.

Why Use This Approach?

The beauty of a search-driven system is that it can help users browse through data, create ad-hoc reports and make decisions on the fly. If someone needs to pull up the total for all store inventories of a handful of items within a retail chain, for example, they can type that into the search bar and get an answer in seconds. They can then see the data, produce any necessary reports or paperwork and move forward with their task of refilling inventory levels.

Notably, this approach makes it much easier for less tech-savvy individuals to follow the data. In turn, that frees software engineers and data scientists within your organization to focus on building more robust systems, working with data and fixing advanced problems.

Conclusion

In the modern data-driven culture, a lot is made of onboarding people. Just 5 or 10 years ago, that often meant excluding those who lacked the technical expertise to function in a data-driven business setting. 

The increasing ease of access to data due to tools like search-driven analytics makes it possible to bring more people on board. Likewise, it allows users to get their answers quickly rather than trying to navigate through complex interfaces. Search-driven analytics allows organizations to be more efficient and effective in leveraging the data they have access to.

Back to blog homepage

Categories
Data Analytics Education

How Data Analytics is Transforming Higher Education

Institutions of higher education are among the best potential adopters for analytics platforms and big data methods. They oftentimes have thousands of students, and their total enrollment numbers over many decades or even centuries can sometimes number more than a million. Students also frequently take more than 40 classes to complete a single bachelor’s degree. Without getting into more granular individual data, these estimations alone represent an abundance of data to work with.

Putting this data into action, however, has required a commitment to ongoing digital and data transformations. These efforts often have been aimed at improving institutional efficiency. While this is a big target, there are several ways to hit it. Let’s take a look at how data analytics systems are transforming higher education.

What’s Being Used?

Traditionally, a good analytics package has to be backed by solid infrastructure. This means deploying database servers, oftentimes cloud-based ones, that can securely store large amounts of raw data. Likewise, these servers have to be designed with privacy and security in mind to protect sensitive student data.

Most data scientists then use a variety of solutions to prep the data for analysis. It’s not uncommon to write bespoke code for this purpose in order to correct minor issues. The data then has to be checked against to make sure nothing is:

  • Altered
  • Lost
  • Place in the wrong column or row in the database

Big Data services can then be connected to analytics packages to conduct research, develop models, generate reports, and produce dashboards. From these, insights can be generated that decision-makers can utilize. Likewise, long-term data warehousing is used to maintain and share the information accumulated from these efforts.

How Can an Institution Use Data Insights?

Student retention is one of the most difficult challenges that higher education institutions face on a yearly basis. Every semester, thousands of students will decide that completing college just isn’t in reach. Frustration will eventually sabotage their academic efforts, and dropping out becomes a real risk.

Predictive and prescriptive analytics are needed to address this problem. First, predictive analytics packages allow researchers to model patterns regarding which students are most likely to have academic trouble as well as when these concerns may become critical. Once a student’s issues have been identified, prescriptive analytics will provide administrators with a list of potential solutions to apply.

Suppose a student comes from a specific geographic background that has a history of running into learning difficulties within the first two years of college. The university might have all students take certain core classes that provide a solid baseline for identifying these students. For example, a section of writing might be included as a core class to distinguish students who struggle with the basic skills needed to produce academic-quality papers.

Upon finishing the first semester of these classes, underperformers might be flagged based on the challenges that similar students have faced. A prescriptive analysis can then be used to assign them to classes or provide them with academic resources that will provide appropriate remediation to close their skills deficits. Academic support may be provided in the form of tutoring, mentoring, and other various resources.

A number of problems in higher education can be handled this way. A university, for example, might use analytics to address:

  • Faculty retention rates
  • The allocation of budgets and supplies
  • Campus crime
  • Sports team performance
  • Frictions between the school and the surrounding community
  • Regulatory compliance issues

What Has to be Done

Higher education is a sector that has a bit of reputation for keeping with traditional or conventional practices. However, adopting data analytics is a bit like quitting smoking: there’s is no better day to get started than today. 

While institutional review processes need to be preserved, that should not stand in the way of aggressively rolling out the use of analytics. Decision-makers have to be onboarded with a data-centric culture, even if that means offering severance to folks who can’t get on board. Appropriate measures have to be taken to acquire machines, adapt existing networks, and integrate the university’s trove of data. With a long-term commitment to becoming more data-driven, an institution can achieve greater efficiency in achieving its goals and service stakeholders.

Back to blog homepage

Categories
Big Data Business Intelligence Data Analytics

How to Leverage Your CRM with Big Data & BI Tools

Customer relationship management (CRM) systems are widely used in many businesses and organizations today. While it’s great to have all your customer information compiled and accessible in one source, you may not be maximizing the value of your CRM. In particular, big data, business intelligence (BI) and analytics packages offer an opportunity to take your customer data to the next level. Let’s take a look at how you can achieve this with your customer data.

How to Leverage CRM as a Source for Analysis

Most CRM systems operate on top of databases that have the necessary capabilities to feed data into analytics and BI systems. While skilled database programmers can get a lot of mileage out of writing complex queries, the reality is that most of the data can simply be pulled into the big data pipeline by way of database connectors. These are small components of applications that are used to talk with databases in languages like MySQL, MongoDB, Fox Pro, and MSSQL.

Once you’ve pulled the data into your analysis engine, a host of functions can be used to study it. For example, a company might use the information from their CRM to:

  • Perform a time-series analysis of customer performance
  • Analyze which appeals from email marketing content seem to drive the greatest returns
  • Determine which customers are at risk of moving on to competitors
  • Find appeals that reinforce customer loyalty
  • Spot customer service failures
  • Analyze social media postings by customers to assess their experience with the company

What Methods Are Used?

Suppose your business wants to determine which email campaign appeals are worth reusing. Working from copies of email content, you can conduct a word cloud analysis that shows which concepts were strongly featured. Response data from the CRM can then be used to identify which words and phrases have performed best. 

These items can then be organized into a BI dashboard widget that tells those writing the emails how to structure their content. For example, a market research firm might find that case studies drive more users into the marketing funnel than news about the practice. Marketers can then write new campaign emails based on that provided data. Almost as important, they can also access email performance and refine their approach until the material is exemplary.

 Tools that are ideal for projects like this include:

  • Sentiment analysis
  • Marketing data
  • Pattern recognition
  • Word cloud analysis

Such analysis will also require a constant stream of data going from the CRM into the analytics engine and onward to the BI dashboards. Done right, this sort of big data program can convert something you’re already accumulating, such as customer relationship data, into insights that drive decision-making at all levels.

How Much Data is There?

Data for analysis can come from a slew of sources, and it’s important to have a CRM that allows you to access as many potential data sources as possible. For example, a company shouldn’t draw the line at collecting email addresses. You can also ask customers to include their social media accounts, such as handles from Twitter, LinkedIn, and Instagram.

Server logs should also be mined for interesting data points. You can, for example, study IP addresses and user logins to determine where a prospective customer might be in the marketing funnel. If you see that a lot of leads are dropping out at a certain stage, such as after signing up to receive your email newsletter, you can then start to analyze what’s misfiring at this stage in the process. 

Once the problem is corrected, you can even use the CRM data to identify which customers you should reconnect with or retarget. You might, for example, send an offer for discounted goods or services to increase your customer lifetime value.

Conclusion

At many businesses, the CRM system is a highly underutilized resource. By coupling it with big data and an effective BI package, you can quickly turn it into a sales-driving machine. Team members will be excited to see the new marketing and sales tools at their disposal, and customers will value the increased engagement.

Back to blog homepage

Categories
Big Data Data Analytics Data Monetization

Shedding Light on the Value in Dark Data

Hearing that your organization has dark data can make you think of your data as quite ominous and menacing. Saying that data is dark, however, is closer in meaning to what people are talking about when they say a room is dark. What they mean is there’s the potential for someone to switch the light on and make what was unseen visible.

What is Dark Data?

Every operation in the world produces data, and most of those entities record at least some of it regardless of whether they make further use of it. For example, many businesses collect information about sales, inventories, losses, and profits just to satisfy the basic reporting requirements for taxes and how their companies are set up. You might also have a complete customer service department that’s producing data all the time through daily chats, emails, and many other forms of communication. Even maintaining a social media presence means creating data.

Such data is considered dark if it isn’t put to other uses. Shining a light on dark data can allow a company to:

  • Conduct analysis
  • Creating sellable data products
  • Learn about relationships
  • Supply insights to decision-makers

By definition, dark data is an unutilized resource. Owning dark data is like keeping things in storage that never or rarely get used. In other words, if you tolerate the existence of dark data within your organization, you’re at risk of leaving money on the table. In fact, you may be taking a loss on dark data because you’re storing it without first turning to monetization.

How to Bring Dark Data into the Light

The first order of business is figuring out exactly what your organization has in the way of data sources. Some things will be fairly obvious, such as turning up sales data from a POS and inventory numbers from an ICS. Other data sources may be trickier to find, but they can be discovered by:

  • Surveying your team members to learn what data different departments collect
  • Conducting audits of computing systems to identify databases, log files, and spreadsheets
  • Scanning through social media feeds, including direct messages from customers
  • Collecting corporate data, such as financial statements and email correspondence
  • Studying call records

It’s also wise to think about places where you could be collecting more data. For example, a customer service system that isn’t sending out surveys is letting a perfectly good opportunity go to waste.

What to Do Now That You See the Data

The second order of business is figuring out how to draw more insights from your data. Companies accomplish this by:

  • Creating data lakes and providing access to them
  • Auditing databases for potentially useful information
  • Implementing or expanding data science projects
  • Developing data-centric corporate cultural practices
  • Adding resources to do machine learning and stats work
  • Hiring new professionals who can dig into the data

Much of this hinges on moving forward with a data-centric culture. Even if you already feel that you have one, there’s a lot to be said for looking at who your team members are and how you can use dark data with experienced data users at your disposal.

The third order of business is establishing goals for your projects. If you run a company that has potential legal risk exposure due to compliance problems involving laws like HIPAA and the GDPR, for example, you might analyze the very way your organization stores information. A company that collects huge amounts of anonymous data from millions of users might figure out how to package that data into sellable information products, such as reports or datasets. You may even cut costs by determining what data is useless, potentially removing terabytes of information from storage.

Conclusion

Modern organizations collect so much data that it’s hard for them to clearly imagine what they have at their disposal. It’s important to take an extensive look at all the ways dark data may be residing in your systems. By being a bit more aggressive and imaginative, you can find ways to improve processes, cut costs, and even drive profits.

Back to blog homepage

Categories
Big Data Data Analytics

New Questions? How to Find Answers in the Face of Uncertainty

Recent world events have exposed a wide range of issues in terms of how companies implement processes and use data to make decisions. It’s abundantly clear that many enterprises came into the new year simply unprepared to make the sorts of decisions needed to persevere through difficult times, especially when the situation involves highly unexpected events.

If you’ve found yourself trying to figure out answers to questions you’ve never even pondered, you may be wondering what tools could help you get ahead of these circumstances. There’s a strong argument to be made about having a data warehouse in place that can make a major difference as organizations struggle to make these sorts of novel decisions. Whether you already have a big data warehousing system in place or are now just realizing the importance of one, it’s also wise to think about precisely how prepared or unprepared you might be. Let’s look at how a well-implemented data warehouse operation can help you get answers quickly in a rapidly evolving situation.

What is a Data Warehouse? 

The difference between a massive collection of data and a data warehouse is that warehousing is designed to enable long-term use of data. A warehouse aggregates massive amounts of structured data from many sources, and it’s designed to enable analysis and reporting. While a database can answer queries, the information in a data warehouse can be used to find relationships between different data points.

How a Solid Data Warehouse Can Help Decision-Making

Suppose you were a producer of paper products based in California at the beginning of 2020. By the end of January, sources of wood pulp from China have dried up and competition for American sources has become fierce. You want to start looking for suppliers in Latin America, but you don’t know where to begin.

A good database will have that information, but it won’t have what a good data warehouse requires. For a data warehouse to contribute to a situation like this one, it needs to be able to tie together data about specific pricing and shipping costs for getting it to California. 

You don’t want to have to research all of this due to the time and money you will be sacrificing in the process. Instead, a well-prepared data warehouse should be constantly digesting the necessary data to give you actionable information at the touch of your company’s analytics dashboard. In fact, a top-quality system with predictive and prescriptive analytics should help you compare and contrast the available options in an instant. You might never have talked with many suppliers in new territories before, but your data warehouse is going to give you a diverse range of commodity pricing and shipping rates from the region so you can start having that conversation today rather than next week.

Structured Data and Being Prepared

You’ll note that the example depends heavily on having large amounts of structured data available right away. A big part of being prepared for unforeseen events is investing early in the necessary infrastructure and data sources. You’re going to want to make serious investments in:

  • Servers
  • Cloud computing
  • Databases and redundant storage media
  • High-speed networking, including fiber optic cabling
  • Computational resources, such as GPUs and machine learning programs
  • Appropriate sources of structured data, including subscriptions to services providers that cater to your industry

It also means having highly failure-tolerant code constantly churning the inputs to ensure analysis can be run at a moment’s notice. Likewise, actionable data is likely to require security, as it has the potential to become a trade secret you’ll want to defend. Finally, you’ll need people who can understand the intricacies of data science and decision-makers who’ve been fully onboarded with using analytics insights on a daily basis.

Conclusion

Becoming a highly prepared, data-centric business is a bit like quitting smoking: there is no better time than today. Regardless of where you’re at in the process, it’s a good idea to think about how prepared your operation is on the data side to help you respond to unexpected crises. Even if you feel thoroughly prepared at this very moment, a culture of constant improvement and preparedness will have your data warehouse ready to answer the next big round of questions you’ve never asked before.

Back to blog homepage

Categories
Big Data Data Analytics

Top 5 Big Data Myths Debunked

Thanks to all the buzz around the term Big Data, there are numerous outlooks and subsequent myths on the topic. While Big Data is an amazing tool that has numerous applications, it’s far from the magical fix for every analytics need. Let’s explore five of the top Big Data myths and the truth behind them.

Myth #1: How Machines Are Always Better Than People

To be clear, computers have two core advantages compared to individuals. First, a computer has the advantage of raw speed, being able to sort through more numbers in a few minutes than a person could in their entire lifetime. Second, a computer can quickly detect patterns and execute formulas across seas of data.

People tend to adapt to situations with limited information much better than machines do. For example, all self-driving car systems on the market still require significant human intervention in a wide range of situations. This is partly due to the myriad situations where unusual or downright novel things occur during any vehicle trip. Put simply, Big Data has to be paired with human decision-making in order to ultimately be effective.

Myth #2: Big Data is the Solution to Everything

It’s common for trends to become trendy far outside their proven applications. After all, everybody wants to be the Uber of something. 

The same issue applies in the world of Big Data. Some problems, however, just don’t lend themselves to mass computation efforts. This can arise due to things that machines struggle to identify, such as:

  • Limited available data
  • Biases built into a dataset
  • Inapplicable information
  • Flawed data

When a Big Data system is asked to analyze a problem, it doesn’t stop to ask a lot of questions about it. If the job is machinable, the computer will accept it.

Myth #3: How Expensive Big Data Is

The word “big” creates an unfair perception that nothing less than building a cluster of supercomputers that each have 8 high-end cards isn’t worth the bother. Nothing could be further from the truth.

Analytics packages have become very accessible, and many can be run right on a typical multicore desktop, laptop or even phone. In fact, many companies have been working hard to deploy cost-effective data processing software which can be used for a variety of data projects. This means that Big Data systems can be deployed in small settings, providing access to IoT devices, machine learning, AI, and dashboards nearly anywhere and at a fair per-unit price.

Myth #4: Why Use Big Data if You’re Not in IT?

The notion of Big Data being about computers is a bit like thinking that carpentry is about nails and hammers. With Big Data, the goal is to create insights that can be applied in a variety of fields. Much as you would use a hammer and nail to build a house, you can use Big Data to build insights that will drive actions and informed business decisions.

The retail sector was an early adopter of Big Data, due to the recognition that retail had missed the train to the .COM boom. Companies in retail use big data to collect and process information like:

  • Social media sentiments
  • Inventory numbers & forecasting
  • Trends in customer tastes
  • Buying processes
  • Global supply chains

If Big Data can revolutionize the way people buy apparel and shoes, it can do a lot of good in many other sectors as well.

Myth #5: Big Data isn’t for the Little Guy

Small operations have some major advantages when it comes to Big Data. Major players often have to deal with the same challenges that come with turning an ocean liner. They struggle to turn because it’s challenging to ignite change within any massive corporation. Conversely, the agility afforded to many small companies allows them to gather insights and react to them in a matter of weeks. Paired with the cost-effectiveness of modern Big Data systems, this presents an advantage that can be rapidly leveraged by small businesses.

Back to blog homepage

Categories
Big Data Data Analytics

Big Data vs. Small Data: Does Size Matter?

After years of talk about big data, hearing about small data can feel like a major pivot by the manufacturers of buzzwords. Small data, however, represents its own revolution in how information is collected, analyzed and used. It can be helpful, though, to get a handle on the similarities and differences between big and small data. Likewise, you should consider how the size of the data in a project impacts the project as a whole and what other aspects are worth looking at.

How Small and Big Data Are Similar and Different

Both are typically the products of systems that extract information from available sources to conduct analysis and derive insights. At the big end of the scale, the goal is to filter through massive amounts of information to identify things like trends, undiscovered patterns and other bits of knowledge that may occur at scales that are hard for an individual analyst to easily identify. Moving to the small end of the scale, you tend to get into more granular data that will often be more digestible for one person.

Macro trends tend to be big data. If you’re trying to figure out how the bond spread relates to shifts in banking stocks, for example, you’re probably working on the big end of the scale.

Small data is granular, and it may or may not be the product of drilling down into big data. For example, a company trying to target social media influencers likely isn’t looking to just turn up numbers. Instead, they want to have a list of names that they can connect with to put a marketing campaign into action.

Another feature of small data is that it’s often most prominent at either end of the analysis cycle. When individual user information goes into a database, for example, that’s all small data. Similarly, targeted insights, such as the previously mentioned social media marketing plan, represent potential applications.

Small data is also frequently more accessible to individual customers. It’s hard to tell an e-commerce customer why they should care about macro trends, even if you’re looking at what’s going to be cool next season. Conversely, if you can identify an interest and send a coupon code, they can put small data to use right away.

Does Size Matter?

The best way to think about this question is to consider the importance of using the right tool for the job. When sending coupon codes, for example, small data is a great tool because you can tailor each offer to an individual, a peer group or an identifiable demographic. As was noted, this can get very granular, such as providing hyper-localization of a push notification that only sends out a targeted offer when the customer is near a physical store location.

Small data can be the wrong tool for many jobs, too. An NHL goaltender may need to see 2,000 shots before they can be fully assessed, for example. Thinking too much about a single good or bad season can skew the assessment significantly. A seemingly small data issue, player evaluation, calls for a big data mentality.

Other Factors to Look At

A good way to think about the other factors in assessing big versus small is to use the three V’s. These are:

  • Volume
  • Variety
  • Velocity

Volume speaks to the question of how much data there is. While there’s a temptation to always want to feed a model more data, there’s an argument on the small data end of things that consumable metrics are better. In others, if you feel like a problem demands volume, it’s likely a big data task. Otherwise, it’s probably a small data issue.

Variety also indicates whether big or small data is the right way to go. If you need to drill down to a handful of metrics, small data is invaluable. If you need to look at many different data points, it may be a job for big data.

Velocity matters because data tends to come in waves. This gets a little trickier because both small and big data needs can require constant refreshes. Generally, if you’re looking to accumulate, it’s big data. If you’re trying to stay up to date, it’s small data.

Back to blog homepage

Categories
Big Data Data Analytics

5 Top Strategies for Building a Data-Driven Culture

For many businesses and organizations, moving toward a data-driven culture is essential to their survival. These sorts of vague exhortations, though, don’t do a great job of setting an organization on the path to becoming a data-driven culture. If you want to make data a focus of your operation, follow these 5 strategies.

1. Understand Why You Want to be Data-Centric

Before you can execute other strategies, it’s critical to make sense of where data fits into your organization’s goals and why you’re heading in this direction. For example, there’s a huge difference in mentality between trying to catch up with competitors and helping your business take advantage of opportunities. You can’t develop a data-driven culture just because it’s the most recent trend.

Nail down what are the basic opportunities in your industry and at your organization. Think about how customers might benefit from dealing with a more data-centric business. A clothing retailer, for example, might determine that it wants to be more data-centric because it needs to:

  • More accurately track and predict trends
  • Streamline its inventory and purchasing processes
  • Identify non-recurring customers and how to increase retention rates

Use this why factor in your list to guide later strategic efforts.

2. Determine Who Must Be Onboarded

As much as companies talk about bringing everyone onboard with a data-centric mentality, the reality is that the cleaning staff probably doesn’t need training sessions to get them on board. Think about who would be the target of a serious quip like, “Don’t you know we’re a data-centric business?” If a person is on that list, they need to be brought on board.

Be aware that executives are especially important in forming the new culture of data in a company. If the people at the top don’t understand why they’re suddenly being overwhelmed with dashboards, charts, and analysis, you’re going to have a hard time getting others to participate.

Also, be prepared to sever ties with people who won’t or can’t get on board. It needs to be clear to employees that the future success of the organization will be dependent upon continuously strengthening the data culture. That applies even if it means taking short-term losses to lengthy hiring processes in order to lose people who aren’t on board and to retain those who are is critical to cultivating data culture.

3. Form a Democratic Attitude Toward Data Access

Departments often hold onto data pools for a variety of reasons, including:

  • Unawareness of the value of their data to others
  • Interdepartmental rivalries
  • Poor organizational practices
  • Lack of social and computer networking to other departments

A data lake that every authorized party in the company has access to can foster innovation. Someone in marketing, for example, might be able to discover trends by looking at data from the inventory side of the operation.

To be clear, there’s a difference between being democratic and anarchistic. Access control is essential, especially for data that is sensitive for compliance, trade secrecy and privacy reasons. Good admins will help you ensure that all parties have appropriate levels of access.

4. Know What Infrastructure Must Be Built Out

A data-driven culture marches on a road paved with cabling, servers and analytics software. If your company hasn’t upgraded networking in over a decade, for example, you may want to look into having the work done to speed up access. Similarly, you’ll have to make decisions about building servers onsite versus using cloud-based solutions, adopting specific software stacks and choosing particular team processes.

5. Learn How to Measure Performance

Lots of great insights come from projects that don’t necessarily put money in the company’s bank accounts on day one. On the other hand, it’s easy to let employees foster pet projects in their own fiefdoms without much supervision if you turn them loose with resources.

The solution is to implement meaningful measures of performance. Promotions and raises need to be tied to turning projects into successes for the whole company. While people need room to be able to learn, they also need encouragement to work efficiently and quickly move onto exploring additional ideas.

Establish the metrics that matter for your data-driven cultural revolution. As the effort moves forward, look at the data to see how well the push is succeeding. Be prepared to revise metrics as conditions change, too. By following the data to its logical conclusions, you’ll find a host of new opportunities waiting to be capitalized on.

Back to blog homepage

Polk County Schools Case Study in Data Analytics

We’ll send it to your inbox immediately!

Polk County Case Study for Data Analytics Inzata Platform in School Districts

Get Your Guide

We’ll send it to your inbox immediately!

Guide to Cleaning Data with Excel & Google Sheets Book Cover by Inzata COO Christopher Rafter