Protect Your Assets
Showing results for 
Search instead for 
Do you mean 

Big Data Security Analytics Part 3: Data science & Putting Structure to the Problem

‎05-08-2014 09:30 AM - edited ‎06-09-2015 11:28 AM

If you go back and read Part 1 and Part 2 of this series, you’ll see that we’re discussing the possibilities and realities of big data security analytics. And, with discussion, come questions. So how do we answer those questions? Various types of security questions can be answered based on the disciplines of data science:


  • Classification: Allows events to be grouped into like sets for context.
  • Correlation: Real-time (HP ArcSight) & historical associations can be recognized, providing context and relational understanding.
  • Clustering: Data point similarity detection across large collections provides a straightforward, yet confident, way to derive true understanding of many events.
  • Affinity Grouping: Similar to clustering, but this can take the context of each data point as it pertains to users, systems, attacks and their interactions. Provides excellent context between multiple, seemingly disparate, data points.
  • Aggregation: Allows a high level view of large amounts of data, distilling often complex sets into simple numerical quantities, e.g. Did this bad event happen often enough in an hour to be of concern?
  • Statistical Analysis: Provides methods for dealing with uncertainty within the data sets yielding a confidence for comprehension.


A “Why” versus “What” mapping will help organize the approach to security analytics.  The “why” half of the mapping lays out the purpose of the inquiry. These typically fall under detection, operations & analytics and compliance. The “what” half of the mapping describes the data source used in the analytics. These can include business systems, applications & databases, servers & desktops, network security appliances and various other sources.




This “Why” vs. “What” mapping is then turned into a use case taxonomy.  Below is a sample taxonomy for real-time correlation within a SIEM. It flows from purpose to deployment method, incorporating event context and an event threshold. The result is a defined action to be taken by the security analyst or an automated system.



This method of breaking down questions into categories, then mapping the “Why” vs. “What, and finally determining use cases is a way to ensure that the results produced by the security analytics solution are fully utilized by the business and the existing processes and procedures.


See how HP HAVEn can help answer your data security questions.


Check out part 4 of this series: Big Data Analytics Part 4: Visualization is Key

0 Kudos
About the Author


Nov 29 - Dec 1
Discover 2016 London
Learn how to thrive in a world of digital transformation at our biggest event of the year, Discover 2016 London, November 29 - December 1.
Read more
Each Month in 2016
Software Expert Days - 2016
Join us online to talk directly with our Software experts during online Expert Days. Find information here about past, current, and upcoming Expert Da...
Read more
View all