The Cyber Attack Perception Problem
It’s here! The 2024 update to our research on the perception of the causes of cyber attacks that’s helped hundreds of companies better calibrate their cybersecurity strategy. So what’s new this year, what was our methodology, and what was surprising? Keep reading below!
Looking to download your own version?
Click to download now!
Looking at the perceptions of cyber attacks in 2024
For the past few years (2020, 2022, 2023), we’ve shared our research on the cyber attack perception problem - visualizing how cybersecurity attacks occur versus how industry sources and people think they may occur. With another year’s worth of data we’re once again going to end the year by looking at how this data appears to vary widely! We’ll walk you through our methodology, our updates, and some surprising findings.
Got a question or comment? Leave a comment below, or message us on social media (X/Twitter, LinkedIn, Instagram, Facebook, Reddit)
“Where do we start?”
Verizon’s 2024 Data Breach Investigations Report (DBIR)[1], an industry publication that analyzes cybersecurity incident and data breach event data from around the world, found that over 99% of all events fall into one of eight major categories. The following two visuals from the DBIR show how the number of incidents and breaches have evolved over time in those categories.
“What do these categories mean?”
The DBIR definition of cybersecurity incidents is any event that impacts the confidentiality, integrity, and/or availability of data. Cybersecurity breaches are defined as cybersecurity incidents that resulted in the confirmed compromise of data confidentiality specifically i.e. the unauthorized viewing and/or copying of data. Since DDoS attacks and ransomware don’t include unauthorized copying of data, they are not considered breaches. The odd exception is when attackers do copy data out of the environment as part of the DDoS or ransomware attack.
Collectively though, we’re going to refer to cybersecurity incidents and cybersecurity breaches as “cyber attacks” in this research, unless otherwise explicitly noted.
The report includes cybersecurity attack categories that may seem a bit confusing. That’s because they weren’t based on how the cybersecurity industry tends to group cyber events. Here are some examples of the types of cybersecurity attacks that fall within each group:
System Intrusion: Hacking, ransomware, malware, exploiting workstation software (e.g. Plex for LastPass breach) stolen credentials, system memory scrapers. These events typically involve lots of steps. “Your files are locked and only we have the key. Send $300, worth of bitcoin to unlock.“
Basic Web Application Attacks: Exploiting vulnerable software, using stolen passwords. Password stuffing, cracking, guessing, and spraying (our Password Table!?). Usually relatively few steps are required to execute these so they are called basic. “The secret code is … 1 2 3 4 5 … 6!”
Social Engineering: Phishing with emails, texts, and phone calls. These vary in sophistication and may involve years of steps or one clever email. “Hey it’s your boss! Can you pick up some gift cards? It’s urgent.”
Miscellaneous Errors: misconfigured cloud storage to be publicly exposed, email sent to the wrong person etc. directly resulting in compromise of confidentiality, integrity and/or availability. “Oops!”
Privilege Misuse: Disgruntled employee e.g. posts sensitive data publicly.
Lost and Stolen Assets: Laptop or phone left in a cafe or pickpocketed.
Denial of Service: DDoS and DoS based attacks.
Everything Else: Things like ATM card skimmers.
What influences our perception of the causes of a successful cybersecurity attack? Do those perceptions reflect reality? If the news media and the internet are leading us astray, maybe we can pinpoint what those influences are and systematically reduce our bias on those topics.
One way to address the question of what we perceive as cybersecurity attack causes was inspired by Hannah Ritchie and Max Roser’s 2018 article[2] comparing causes of death data to what The New York Times, The Guardian, and Google Trends covered as causes of death. We adapted their approach to investigate the perceptions of cybersecurity attack causes. We looked at five sources of data - The New York Times, The Guardian, Google Trends, Google Search, and scite.ai - to compare how frequently their cybersecurity attack coverage fell into each one of the DBIR’s classification patterns.
📆 Note that the 2024 DBIR is based on data characterizing events that took place between November 1, 2022, and October 31, 2023 so all of our time frames for the other sources align to this period.
“Ok got it. So what are the different perceptions of what caused cyber attacks?”
We were interested in comparing what the DBIR, Google, news outlets, and academia suggested were the causes of cybersecurity attacks. We refreshed our keywords and search terms related to cybersecurity incidents and data breaches for each of the eight DBIR categories. We searched for those terms across Google Trends, Google Search, The New York Times, The Guardian, and Scite.ai.
Here’s what we found:
People create and engage with cybersecurity content for many reasons. Since our approach here was exploratory, we made the assumption that people tend to turn their attention and subsequently dedicate their budgets to the solutions that they feel will solve their most pressing cybersecurity problems (likely based on what they feel is the biggest risk).
Here are some insights we can draw from our analysis:
“Well then what actually caused cyber attacks?”
In last year’s DBIR report[3], Social Engineering and Basic Web Application Attacks accounted for over 50% of all cybersecurity data breach events, with Denial of Service being the leading cause of cybersecurity incidents - responsible for an unprecedented 60% of all events.
Looking at cybersecurity breaches in 2024
In this year’s report, the DBIR listed 🥇System Intrusion (that includes hackers and malware, including ransomware attacks, but not web application attacks) as, once again, the top cause of cybersecurity breaches in the available data. Second place goes to 🥈Social Engineering, overtaking last year’s 2nd placer Basic Web Application Attacks. This year Basic Web Application Attacks didn’t even make top 3! Instead, the bronze went to 🥉Miscellaneous Errors. Oops. 🤦
Looking at cybersecurity incidents in 2024
For cybersecurity incidents the DBIR listed 🥇Denial of Service (DoS) as the top cause of breaches once again and by a long shot! That means 60%🔥 of incidents included DoS in 2023. If you do make it past that chunk of 60% you’ll see 🥈System Intrusion and then 🥉Social Engineering. At least when it comes to incidents our screw-ups (Miscellaneous Errors) don’t make the top 3.
Once again we’re using these as our chart’s baseline:
“What did academic research say about cyber attacks?”
Academic publications aligned most closely with DBIR incidents in terms of coverage proportion across topics. Social Engineering was again dwarfed by Denial of Service in the case of research papers, while System Intrusion and Denial of Service were tied as the focus of citation statements. Citation statements refers to how often researchers cited other research publications that were about these topics. Those can include previous years’ publications.
“What did the news report as having caused cyber attacks?”
For the past two years we reported that The New York Times covering Denial of Service attacks more than the Guardian (12% vs 1%). That was due to an error on our part in the way we were researching on these sites (our own kind of Miscellaneous Errors). They have been tied at 12% and this year are both around 5%.
The New York Times focused heavily (~65%) on Lost and Stolen Assets. Fortunately a lot of that is advice on what to do when you lose your phone or laptop. On the flip side/other side of the Atlantic, the Guardian, equally heavily (~65%), covered System Intrusion. Both covered Social Engineering equally. The Guardian had a whopping two articles on Basic Web Application Attacks but we couldn’t find any from The New York Times. If you do, let us know in the comments so we can update our data!
“What did ‘the internet’ think caused cyber attacks?”
Googlers shifted their 2 year spree focusing on System Intrusion events (ransomware, malware, hackers) to instead preoccupy themselves with Social Engineering. System Intrusion still hung around to take second place, then Denial of Service.
“What content was available to us this time around if we did a Google search?”
For the past four years half of the Google-able internet’s share of these topics was content mentioning System Intrusion, followed by Lost and Stolen Assets, particularly smartphones, and Social Engineering.
This year, System Intrusion still made up half of cybersecurity search results, the other half was evenly split between Lost and Stolen Assets, and Social Engineering.
“Did popular Google searches align with cyber attack causes or did they miss the mark?”
What people Googled (search trends) was equal parts System Intrusion and Social Engineering, with a side of Denial of Service.
What people got for results exceeded their System Intrusion needs, almost met their Social Engineering needs, but barely delivered on Denial of Service content and, for the third year in a row, gave us a lot of uninvited“How to locate your stolen iPhone!” and other lost/stolen device content.
“So how did you gather all of this data?
In the past we leveraged each outlet’s search feature or Application Programming Interface (API) and then sampled the results to ensure the results were actually cybersecurity related. Like last year though, we used Google’s search feature to search the non-Google sources. We found Google did a much better job filtering out spurious results and catching big keywords that were semantically unrelated to the themes but socially/topically very much related e.g. supply chain.
Using Google’s search operators we screened the search results for semantic context (e.g. phish email, not Phish the band or a 🐠), and tallied the number of hits in each category by outlet. Since we were interested in relative proportion, we normalized the remaining counts by dividing them by the total count for the year.
To perform the searches, we compiled a large list of keywords and search terms (like “phishing” or “social engineering”) for each of the eight DBIR cybersecurity incident and data breach categories. We collected the initial keywords and terms from the 2024 DBIR report, the National Institute of Standards and Technology (NIST) Glossary of Key Information Security Terms [4], and from the cybersecurity professionals at Hive Systems. We entered the lists into Google Trends and then added the resulting new “related queries” suggested by Google to the respective lists when they made sense. We repeated the process until the Google Trends related query results were no longer related to the respective DBIR categories. This was particularly helpful because it ensured we included terms like the names given to ransomware strains (like Hive Ransomware) in our searches for “ransomware,” which are the terms people actually use in their search queries.
Counting Google Trends results
We first entered the keywords and terms into Google Trends as-is, but ended up getting more relevant results by using Google Trends’ Topics feature. We went with Topics because the feature includes synonyms and works across multiple languages. We tested using all combinations of Google Trends Topics and Google Categories in order to find which combination of Google Trends settings yielded the most accurate results and the largest number of results. Trends Topics are also useful because they include concepts based on your keywords e.g. if you use the topic “London” Google silently includes results for searches of "Capital of the UK" and “Capital of England” and the words for London in other languages. Trends Categories are also useful because they narrow down concepts to your subject of interest e.g. searching for the term “jaguar” yields results for the animal and car manufacturer of the same name unless you specify one or the other category. What yielded the most results and the fewest errors was using the category "Computers & Electronics" and the trends topics "Web application security,” “phishing,” “security hacker,” “insider threat,” and “denial-of-service attack."
Counting Google search results
We sampled Google search results for relevance by reviewing the last pages in each result set -- pages that Google finds least relevant. We counted results only on pages with results that were related to the topic being queried. We based relevance on the Google preview of the content.
Normalizing the counts
Since we were interested in relative shares of cybersecurity incident and data breach causes by cause group, we normalized each data source's values by dividing the number of events in each category by the sum of all events for the time period.
“Did you have any limitations?”
Our results are based on a convenience sample and we cannot claim that our findings are representative or generalizable.
We did our best to put together an extensive list of cybersecurity attack cause synonyms, including colloquialisms, but don’t claim to have used an exhaustive list.
None of the data sources we used provide raw output (even with their APIs), so the algorithms and biases of the search engines and sources will have influenced these findings.
Google Trends allowed us to pool results from multiple languages, but other sources presumably did not. Google Trends results may be skewed toward results in languages that happen to talk more about one of the topics.
We evaluated Google Search results using the Google summary blurb, but it is possible the blurbs were misleading and that reviewing the whole article is necessary. We were unable to do so due to volume, but could use sampling in the future.
Cybersecurity professionals and people under cyber attack may be more discrete in their searches and opt for search engines like DuckDuckGo instead of Google.
The search engines and news sites we chose are obviously not the only options. We chose them for their convenience (how many search engines have something like Google Trends?) and because of their prominence, coverage, and familiarity to our readers.
If you find a good informative result or article for your search, do you keep Googling it? Any topics people found quickly, they presumably searched for less. This means that good writers are throwing off our stats!
What do you think? Tell us!
Join the discussion below or message us on social media and keep us honest!
If people seem interested in this sort of thing we’d like to add more stacks to the graphic like Wikipedia, Reddit, LinkedIn, Google patent search, security conferences, WayBack Machine, or other search engines.
References
[1] Widup, S., Pinto, A., Langlois, P., & Highlander, D. C. (2024). 2024 Data Breach Investigations Report. https://www.verizon.com/business/resources/reports/2024/dbir/2024-dbir-data-breach-investigations-report.pdf
[2] Hannah Ritchie and Max Roser. (2018) "Causes of Death". Published online at OurWorldInData.org. Retrieved from: 'https://ourworldindata.org/causes-of-death'.
[3] Widup, S., Pinto, A., Langlois, P., & Highlander, D. C. (2023). 2023 Verizon Data Breach Investigations Report. Retrieved from: ‘https://www.verizon.com/business/resources/reports/2023/dbir/2023-data-breach-investigations-report-dbir.pdf’.
[4] National Institute of Standards and Technology (2019) Glossary of Key Information Security Terms. Retrieved from: ‘https://csrc.nist.gov/glossary’.