Protect Your Assets
Showing results for 
Search instead for 
Do you mean 

Big Data Security Analytics Part 3: Data science & Putting Structure to the Problem

‎05-08-2014 09:30 AM - edited ‎06-09-2015 11:28 AM

If you go back and read Part 1 and Part 2 of this series, you’ll see that we’re discussing the possibilities and realities of big data security analytics. And, with discussion, come questions. So how do we answer those questions? Various types of security questions can be answered based on the disciplines of data science:


  • Classification: Allows events to be grouped into like sets for context.
  • Correlation: Real-time (HP ArcSight) & historical associations can be recognized, providing context and relational understanding.
  • Clustering: Data point similarity detection across large collections provides a straightforward, yet confident, way to derive true understanding of many events.
  • Affinity Grouping: Similar to clustering, but this can take the context of each data point as it pertains to users, systems, attacks and their interactions. Provides excellent context between multiple, seemingly disparate, data points.
  • Aggregation: Allows a high level view of large amounts of data, distilling often complex sets into simple numerical quantities, e.g. Did this bad event happen often enough in an hour to be of concern?
  • Statistical Analysis: Provides methods for dealing with uncertainty within the data sets yielding a confidence for comprehension.


A “Why” versus “What” mapping will help organize the approach to security analytics.  The “why” half of the mapping lays out the purpose of the inquiry. These typically fall under detection, operations & analytics and compliance. The “what” half of the mapping describes the data source used in the analytics. These can include business systems, applications & databases, servers & desktops, network security appliances and various other sources.




This “Why” vs. “What” mapping is then turned into a use case taxonomy.  Below is a sample taxonomy for real-time correlation within a SIEM. It flows from purpose to deployment method, incorporating event context and an event threshold. The result is a defined action to be taken by the security analyst or an automated system.



This method of breaking down questions into categories, then mapping the “Why” vs. “What, and finally determining use cases is a way to ensure that the results produced by the security analytics solution are fully utilized by the business and the existing processes and procedures.


See how HP HAVEn can help answer your data security questions.


Check out part 4 of this series: Big Data Analytics Part 4: Visualization is Key

0 Kudos
About the Author


Leave a Comment

We encourage you to share your comments on this post. Comments are moderated and will be reviewed
and posted as promptly as possible during regular business hours

To ensure your comment is published, be sure to follow the Community Guidelines.

Be sure to enter a unique name. You can't reuse a name that's already in use.
Be sure to enter a unique email address. You can't reuse an email address that's already in use.
Type the characters you see in the picture above.Type the words you hear.
Jun 7-9
Las Vegas
Discover 2016 Las Vegas
Discover 2016 in Las Vegas, the ultimate showcase technology event for business and IT professionals to learn, connect, and grow.
Read more
Sep 13-16
National Harbor, MD
HPE Protect 2016
Protect 2016 is our annual conference and is the place to meet the world’s top information security talent, discuss new products and share information...
Read more
View all