Seriously, what is Big Data?

Big Data Defined

The easiest answer: Big Data is your data, structured and unstructured.

There’s nothing new about the notion of big data, which has been around since at least 2001. It’s the information owned by your company, obtained and processed through new techniques to produce value in the best way possible. Big data may be as important to business – and society – as the Internet has become. Why? More data may lead to more accurate analyses.

Ask any Big Data expert to define the subject and they’ll quite likely start talking about “The three V‘s”. As far back as 2001, industry analyst Doug Laney (currently with Gartner) articulated the now mainstream definition of big data as below:

  • Volume. Many factors contribute to the increase in data volume. Transaction-based data stored through the years. Unstructured data streaming in from social media. Increasing amounts of sensor and machine-to-machine data being collected. In the past, excessive data volume was a storage issue. But with decreasing storage costs, other issues emerge, including how to determine relevance within large data volumes and how to use analytics to create value from relevant data.
  • Velocity. Data is streaming in at unprecedented speed and must be dealt with in a timely manner. RFID tags, sensors and smart metering are driving the need to deal with torrents of data in near-real time. Reacting quickly enough to deal with data velocity is a challenge for most organizations.
  • Variety. Data today comes in all types of formats. Structured, numeric data in traditional databases. Information created from line-of-business applications. Unstructured text documents, email, video, audio, stock ticker data and financial transactions. Managing, merging and governing different varieties of data is something many organizations still grapple with.

Visualization of all editing activity by user “Pearle” on Wikipedia (Pearle is a robot) created by IBM. At multiple terabytes in size, the text and images of Wikipedia are a classic example of big data. To find out more about this project, see (2007). “Visualizing Activity on Wikipedia with Chromograms”. Proceedings of INTERACT.

An August, 2013 blog post by Mark van Rijmenam titled “Why The 3V’s Are Not Sufficient To Describe Big Data,” added “veracity, variability, visualization, and value” to the definition, broadening the realm even further. Rijmenam stated “90% of all data ever created, was created in the past two years. From now on, the amount of data in the world will double every two years.

When dealing with larger datasets, organizations face difficulties in being able to create, manipulate, and manage big data. Big data is particularly a problem inbusiness analytics because standard tools and procedures are not designed to search and analyze massive datasets.

How should we make sense of Big Data?

When dealing with larger datasets, organizations face difficulties in being able to create, manipulate, and manage big data. Big data is particularly a problem in business analytics because standard tools and procedures are not designed to search and analyze massive datasets. Why?

Because, interpreting Big Data focuses on finding hidden threads, trends, or patterns which may be invisible to the naked eyes of traditional methods. Sounds easy, right? Well, it requires new technologies and skills to analyze the flow of material and draw conclusions.

This data, when captured, formatted, manipulated, stored, and analyzed can help a company to gain useful insight to increase revenues, get or retain customers, and improve operations.

Apache Hadoop is one of the solutions. The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models.

Case Study

Netflix turned to Big Data following an outage in 2008 that left some customers without service for three days. As the company prepared to offer more streaming, Netflix moved its storage from internal data centers to Amazon’s cloud. The main advantage of the cloud is that its architecture—which includes the highly scalable open source data processing platform known as Hadoop—allows the company to quickly provision computing resources as its needs grow.

Hadoop processing power allows the company to run massive data analyses, such as graphing traffic patterns for every type of device across multiple markets.

That effort helps Netflix improve the reliability of video feeds on different platforms and plan for future growth of streaming movies and shows. For example, the greater processing capabilities can allow engineers to see where traffic on the network is running slower, allowing them to plan for additional network capacity.  The technology—which can manipulate larger data sets– also helps Netflix to better analyze customer preferences so that it can make improved recommendations.

References:
Share this!

Author: Mo Moghaddas

Building zeeg.me to give users more time back and make scheduling a pleasant experience.

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.