Protect Your Assets
Showing results for 
Search instead for 
Do you mean 

Important Questions for Big Security Data

ChrisCalvert ‎02-11-2014 09:50 AM - edited ‎07-07-2015 11:17 AM

The HAVEn platform is a great toolbox for big data, but the real value from big data is only realized when you actually answer a question that could never be answered before or detect a subtle behavior that was not visible before. HAVEn stands for Hadoop, Autonomy, Vertica, Enterprise Security on n-scale. Big data systems are important technology plumbing, but the business outcome is what actually matters.


There are two aspects to asking questions of big data platforms, these are: "What needle in the haystack can I find that I couldn’t before?" and then, conversely, "What can analysis of the entire haystack tell me that isn’t obvious or intuitive about the security of my business?"


When you apply big data to information security, just what are you trying to accomplish? Fundamentally, it all comes down to making more informed and timely decisions. You have to approach this with some specific questions in mind, but you also have to be open to exploring what the data says as you come across specific incidents or issues. In this blog post, I want to brainstorm on the types of questions we need to answer and group them into categories to give some structure to the problem.


The categories I managed to come up with as I racked my brain were:  business (financial and IT strategy), operational security (detection & response), and governance risk and compliance (and I HATE talking about GRC so that one was hard for me - its just too squishy!). There are also some specific disciplines within data science that can be used to create an effective matrix by type of decision. These are correlation, affinity grouping, aggregation, clustering and classification, and some of these are more effective at answering specific types of questions.


Correlation is the core technology for the ArcSight SIEM and a trusty, if old, example is vulnerability scan and IDS correlation. For affinity grouping I like to use the “beer and diapers” analogy: Walmart discovered, through market basket analysis (a form of affinity grouping), that when a man buys diapers he also buys beer, so you have to walk past beer to get to diapers. Just a funny anecdote. Clustering finds related data along any parameter you care to define, while aggregation does the same thing across multiple parameters. Classification is the addition of relevant context to data. There are many more way to analyze big data but these are some of the core approaches.


Adhoc and post-hoc (or post-hack) data exploration is also a critical component, sometimes you just don't know the important questions until it is too late to build an entire system to answer them. This means that putting your data into a big data solution will allow you to rapidly evolve your security capability as you discover more meaningful questions to answer.


List of Questions


Operational Security:

  • How often do I see specific categories of attacks?
  • Who is conducting recon on me?
  • Where should I allow looser controls to attract attackers?
  • What are my privileged users doing with their access?
  • Where inside my enterprise should I put tripwires to detect malicious users?
  • How do I detect never before seen malicious behaviors or attacks?
  • Can I rapidly define an M.O. of an attack and then look across a large timeframe and dispersed enterprise to find other instances of similar activity? Basically, how compromised am I?
  • What types of logging and context provide the most value for detection?
  • How might I reduce SIEM event volume, or tune the funnel to dramatically increase the fidelity of detection?
  • Can I cluster false positives and nuisance events to remove them from active monitoring?
  • Who is my most suspicious privileged user or senior business person?
  • Where do most of my detections come from? users, data sources, enterprise segments etc...
  • Of the attacks that get past my detection teams what would have detected them? root cause analysis
  • What is the relative skill set demonstrated during this attack?
  • How many accounts are involved in this compromise?


  • How well is my operational security infrastructure managed?
  • Can I quantify the risk reduction impact of specific spending initiatives and track their value over time?
  • How can I most effectively communicate situational awareness to business leaders?
  • What type of metrics are truly actionable and what is the best way to present them to decision makers?
  • What are the most important parameters to focus on for efficiency improvements?
  • How can I increase the efficiency of my IT security spending? Without impacting effectiveness?
  • What is the event volume to detection value ratio for all my enterprise security systems?
  • In root cause analysis of events where could I have broken the attack lifecycle before it impacted me?
  • How long does it take for a business process change until I see the metrics respond? How responsive are my metrics?

Governance Risk and Compliance:

  • What are the norms across any parameter you can define at a moment's notice?
  • Is there one technology choice in my business that aggregates too much risk and should be re-engineered?
  • How effectively are my compliance controls preventing bad things from happening?
  • Where should I focus my controls?
  • Which core business processes have inadequate separation of duties?
  • How accurate is the IT business context used for correlation?
  • Which security domain am I quantitatively weakest at?
  • In terms of systems controls which are best for prevention or detection?
  • Which core business process is being attacked when a specific piece of infrastructure is involved? Are disparate pieces of infrastructure that all support a single core business process being attacked?
  • What is the standard behavior deviation for every class of user in my environment? Can I produce a report for each with a list of outliers?

... And of course, what the #@*^% just happened?


It was really difficult to group these questions and lots of argument could be generated about where these questions belong, and of course any list of questions is going to be incomplete. This is a lot to ask of any system, but these questions need to be answered by our industry and Big Data as delivered by HAVEn is a strong step in the right direction. To learn more about the current state of security operations see the ground breaking paper published here:

0 Kudos
About the Author


27 Feb - 2 March 2017
Barcelona | Fira Gran Via
Mobile World Congress 2017
Hewlett Packard Enterprise at Mobile World Congress 2017, Barcelona | Fira Gran Via Location: Hall 3, Booth 3E11
Read more
Each Month in 2017
Software Expert Days - 2017
Join us online to talk directly with our Software experts during online Expert Days. Find information here about past, current, and upcoming Expert Da...
Read more
View all