Businessmen playing dominoes block game showing different ways to run risk mitigation to manage security threats
Managed Services and Risk: Mitigation or Inherent Acceptance? by Randy Watkins, Chief Technology Officer at Critical Start

Managed Services and Risk: Mitigation or Inherent Acceptance?

With the evolution of cybersecurity over the last decade, it’s easy to forget what security is; the art of dealing with risk. The flood of funding into the space has created a host of marketing buzzwords that pollute the board room and pull the attention from the “why?” of security. What is the reason cybersecurity exists? What is the problem we’re trying to solve?

Control-based vs risk-based

The conversation around security has shifted, and not for the better. Historically, security teams built programs around assessing risk and deciding on how best to deal with it. However, today’s world of endless frameworks focus more on technologies, and less on the risks they’re implemented to address. This controls-oriented program development has led to the emergence of security leadership that show pause at the mention of a “risk register”. This isn’t to say that risk isn’t considered, but more that it isn’t properly enumerated at a level that gives the security team flexibility in addressing the risk.

Security frameworks like NIST, SANS, ISO, etc. are great lists of controls to consider for a security program but are built with a one-size-fits-all approach. By starting with a comprehensive audit, and developing controls that mitigate specific threats, many organizations can move to an acceptable risk posture without many of the “checkbox” controls contained in most frameworks.

Risky decisions

Common risks exist across different organizations, but how those risks are addressed is a business decision the security team develops their strategy around. When handling risk, there are three options:

  • Accept – The risk does not represent itself as a threat worth investing resources to lessen. Accepted risks should be entered into a risk register, naming the business owner that accepted the risk and note why they’ve accepted it; usually due to low probability or low impact.
  • Mitigate – These risks are not accepted and pose enough threat to a business that resource investment is warranted to prevent the risk from coming to fruition, or at least lessening the probability or impact to an acceptable amount.
  • Transfer – The risk is not accepted, but the business will not mitigate on its own. Leveraging third parties, the risks are contractually moved from the business to the provider. Common forms of cyber security risk transference include Cyber Security Insurance and Managed Security Services.

Risks worth transferring

There’s an existential problem in security right now. The problem isn’t new attacker tactics, techniques, and procedures (TTPs), new malware, or the speed of malware to get to market; rather, there are products to identify these threats, but not enough skilled headcount to properly implement the products, and investigate and respond to the alerts! This headcount shortage is an industry epidemic leaving security teams scrambling just to perform basic tasks, forcing most organizations to ignore alerts generated from the implemented security products, assuming the products were properly implemented and configured in the first place.

Alert triage and response

Looking at the tasks security teams perform to achieve risk equilibrium, many require deep knowledge of the organization and continuous communication and participation in meetings like change-control. However, the tasks of identifying a false-positive for a wrongly flagged graphics card driver requires little knowledge of the organization.

Transferring the risk of alert triage and response can free organization resources to focus on security responsibilities that are best kept in-house like GRC, vulnerability management, and policy creation. This transference also lessens the probability or impact of the departure of a single person being a significant detriment to the security team.

The most common cause of shelf-ware (technology that is being paid for, but is no longer, or was never used) is the sole-owner or user of that technology leaving the organization. Regarding incident detection, triage, and response, employee churn presents a much larger threat than underutilized budget. This risk is magnified by the litany of false-positives generated by security products making the required headcount necessary to triage every security alert unattainable.

Leveraging a service provider for certain functions will provide the level of expertise necessary to implement, maintain, and utilize the technology. The shift also transfers the burden of hiring and maintaining the staff necessary to perform these functions to the service provider; ideally removing the shelf-ware dilemma.

Transferring risk to a service provider

Ignoring alerts and foregoing security expertise is not a risk most organizations choose to accept and handling it in-house is often difficult or cost-prohibitive, so it makes sense security service providers (MSSPs), including managed detection and response (MDR), are gaining in popularity. The difficulty comes in choosing the right MDR, and ensuring they’re mitigating risk, rather than accepting it.

The false-positive dilemma

As mentioned earlier, the problem of false-positives and the impacts they have on security teams is significant, but why does this problem exist?

Defining the terms:

  • False-positive – An alert that was generated based on an event that was not malicious.
  • False-negative – An event that was malicious but did not generate an alert.

From a product-manufacturer perspective, a false-negative is brand damaging, but a false-positive is just assumed. Endpoint and network detection technologies are attempting to identify everything an attacker could do to perform malicious activity in an environment. With the skill of attackers improving, products have had to create looser detection rules that allow them to be effective at detecting potentially malicious activity, thus avoiding false-negatives. For an effective, detection-oriented, security product, false-positives are almost necessity. With this understanding, how do service providers, who are providing services for potentially millions of endpoints, profitably scale a service?

The Techniques
  • Build a Bigger Army – This is not scalable or profitable, but it is pursued by some service providers. This approach typically results in sub-par service that provides little value and leads to a frustrated customer that has essentially purchased a different source of alert fatigue.
  • Attack the Source of Alerts – Is a particular detection rule being too noisy? Shut it off! The alert fatigue problem is solved, but it also diminishes the effectiveness of the product.
  • Set an Arbitrary Investigation Threshold – Too many Critical, High, and Medium alerts to investigate? Just look at the Critical and High. Still too many? Critical-only should be fine (if we forget the retail breach was a medium alert).
  • Turn Alerts into Incidents –Rolling up multiple alerts into a single incident is a great way to make, what looks like, a high-fidelity alert, but could also be a group of false-positives. The danger here is creating incidents that take much longer to investigate.
Machine Learning!

Another technique that’s becoming increasingly popular is the use of machine learning to weed through false-positives. Moving past the animosity towards marketing teams for taking real technology and turning it into a glorified way to describe statistics; machine learning can be broken into two main concepts:

  • Supervised – Using a set of training data, an algorithm can be created to determine the relationship between a new piece of data matches and data used for training. This methodology is commonly leveraged in security to identify malware. While useful in scenarios where training data is properly labeled and available, those prerequisites somewhat limit the usefulness in identifying malicious behavior.
  • Unsupervised – Developing a baseline of “normal”, unsupervised machine learning identifies deviations from the baseline. Unsupervised machine learning technically doesn’t generate false-positives, because it is alerting on anomalies, but given all anomalies aren’t necessarily malicious, this technique is usually paired up with cumulative risk scoring to drive anomalous activity past a threshold, where it will generate an alert hopefully more relevant to security.

Inherent risk

Given the available approaches to dealing with false-positives, it’s clear that there is some necessary risk-acceptance that must happen to get the alert count to a level that allows security teams to efficiently deal with the “high-priority” alerts. This acceptance is not based on the organization’s risk tolerance, but instead on the limitation of resources to mitigate, which places an inflated cost on the risk.

The balancing act between the effectiveness of security products and the efficiency of handling the alerts is a constant struggle for security teams, which is why it often leads to outsourcing to an MSSP/MDR. However, with most MSSPs and MDRs working with the same limitations, is an organization getting the mitigation they desire, or the acceptance they wanted to avoid?

Choosing a provider

To avoid the predicament of a risk posture that is contrary to the needs of the business, careful consideration should be taken when choosing a service provider. Knowing the most common methods of detection and dealing with false-positives, prospective service providers should be questioned on how the problem is approached and solved:

  • How does the solution reduce false-positives without introducing the possibility of false-negatives?
  • How are the products feeding into the solution configured?
    • Does the service leverage all detection capabilities?
    • Are threat-intelligence sources enabled?
    • Are there additional correlations (threat hunt, watchlist, indicators) used with the default set?
    • Is the configuration/playbooks/content visible to the end user?
  • What value is added to the detection pre-escalation?
  • What severity of alerts are investigated?
    • What determines severity?
  • Are remediated alerts visible to the end user?

Not all organizations require the investigation of every security alert, but each business decision should be made without cost as a factor for determination. Qualify and quantify the risk of not investigating every security alert and choose a service provider that aligns to the mitigation strategy chosen by the organization. If the decision is made to only investigate Critical or High-priority alerts, ensure the risk register is properly updated with who accepted the risk so if a breach does occur, questions are directed towards the correct executive.