A Comprehensive Guide to Big Data: 5Vs, Types and Challenges

According to a research conducted by the IBM Marketing Cloud in 2017, 90% of the data was generated in last two years. Just imagine how much data has been created since then!

The use of Big Data may be in its early stage but the act of creating, gathering and storing large piles of data is ages old. The concept gained momentum when Doug Laney, an industry analyst defined big data as the three Vs in the early 2000s.

Volume: It refers to the amount of data that has been stored from ages and is growing rapidly. Experts and industrial big-wigs have predicted that over 40 Zettabytes (40,000 Exabytes) will be produced by 2020, which is 300 times more than the year 2005.

Velocity: It is the pace at which different sources and media generate different volume of data. Over 1.03 billion Facebook users are active every day and generate massive and continuous data. This gives a clear idea about social media and its impact in generating data than any other medium. Experts articulated that controlling data velocity can help in generating insights on real-time data.

Variety: The type of data varies due to the different sources that contribute to Big Data. From structured, semi-structured to unstructured, the data can be in any form. Days are long gone when the data was only in the form of excel and databases. Now, the data is being generated and gathered in the form of audios, images, sensor data, videos, and web logs.

The Introduction of Two More V’s: Veracity and Value

These two additional Vs may not be the part of 3Vs, but they have increasingly become significant to the Big Data concept. Veracity refers to the accuracy of data. With the continuous increasing piles of data, it has become very difficult to filter the data which genuinely holds value. To make it clearer, you can take the example of social media content which is often volatile one way or another.

Value refers to the process of analyzing the data and adding value to Big Data content. As Big Data doesn’t analyze like a human brain thus, we cannot expect bringing much value to the content. It is important to find methods to utilize all the data to extract a set of factful and meaningful data for our targeted audiences.

3 Different Types of Big Data

The influx of more media, storage options, and devices has not only increased the volume of data but also the different types of data.

Structured Data

The data that can be stored in a fixed structured format is classified under Structured Data. You can take the example of RDBMS (Relational Database Management System) as a structured data. It is very easy to process and utilize such data as it has a fixed schema, I.e. how the data is constructed. You can use SQL for this purpose.

Semi-Structured

It is a type of data that doesn’t have any fixed structured format or data model, but it does have some organizational features in the form of tags that segregate semantic elements within the data. XML files and emails are some of the examples of semi-structured data.

Unstructured data

According to some experts only 80%-90% of all the data is unstructured, hence it makes sense why we haven’t been able to tag a lot of it, barely 3% of it. Unstructured data refers to such sets of data that are complex to identify and classify by any machine language. It is surprising to know that unstructured data is usually text-heavy. Text messages are one of the biggest examples of unstructured data as they aren’t arranged in a logical way.

Best Examples from Where This Massive Data Comes From

It is mind-boggling that 90% of the data has been created in just past two years. But, from where does this data come, let’s find out:

 Facebook analyzes, processes, and stores more than 30 Petabytes of user generated data.

 Walmart accesses over 1 million customer transaction every hour.

 More than 230 million tweets are created every day.

 Every minute YouTube users upload 48 hours of video and approximately 3 billion videos are watched every day.

 Over 5 billion people text, call, tweet, share, and browse on mobile phones every day.

 Amazon handles more than 15 million customers click stream user data to promote their products every day.

 More than 294 billion emails are sent every single day.

Wow, this is humongous and surely overwhelming!

Also Read: Emerging Big Data Trends to Watch in 2019

Few Challenges to Reckon with

As they say, ‘Not everything in the garden is rosy’. Big Data has indeed made the task easier for many industries to analyze the data and improve customers’ relation but there are also several challenges that come along with the Big Data leverages.

1- Data Quality: This sums up the 4th V i.e. Veracity. The data which is generated every day is messy and uncategorized, hence makes the Big Data process complex. According to a report conducted by Fathom, US invests over $600 billion for eliminating dirty data annually.

2- Storage: Everyday huge sets of data are created therefore, managing data has become a problem. People are looking for methods to store the data while accessing and recovering it when needed.

3- Security: As the data is huge in size, keeping it secure and safe is another hurdle. This includes user authentication, restricting access from unknown user, utilization of data encryption, and access data histories.

4- Discovery: Finding new insights or knowledge on Big Data concept is like finding a needle in a haystack. Also, it is very challenging to analyze petabytes of data to categorize and find right patterns with the help of algorithms.

There is a reason behind the exponential growth of Big Data. It has allowed customer-centricity to come at the forefront. The concept is helping many sectors including telecommunications, finance, healthcare, education, advertising, and IT. Meanwhile, there are some loopholes in the process but emerging technologies like Machine Learning (ML) and AI are working in the direction to add more value and authenticity to the Big Data.