sensors-23-07978-v2
sensors-23-07978-v2
Article
Comparative Analysis of Open-Source Tools for Conducting
Static Code Analysis
Kajetan Kuszczyński and Michał Walkowski *
Abstract: The increasing complexity of web applications and systems, driven by ongoing digitaliza-
tion, has made software security testing a necessary and critical activity in the software development
lifecycle. This article compares the performance of open-source tools for conducting static code analy-
sis for security purposes. Eleven different tools were evaluated in this study, scanning 16 vulnerable
web applications. The selected vulnerable web applications were chosen for having the best possible
documentation regarding their security vulnerabilities for obtaining reliable results. In reality, the
static code analysis tools used in this paper can also be applied to other types of applications, such as
embedded systems. Based on the results obtained and the conducted analysis, recommendations for
the use of these types of solutions were proposed, to achieve the best possible results. The analysis
of the tested tools revealed that there is no perfect tool. For example, Semgrep performed better
considering applications developed using JavaScript technology but had worse results regarding
applications developed using PHP technology.
1. Introduction
The digital transformation continues to accelerate. More and more businesses and reg-
Citation: Kuszczyński, K.;
ular users are utilizing various types of software, whether for work or entertainment. Access
Walkowski, M. Comparative Analysis
to public services is also undergoing digitization processes. For instance, in Poland, the de-
of Open-Source Tools for Conducting
velopment of the mObywatel application enables the storage of identification documents
Static Code Analysis. Sensors 2023, 23,
7978. https://doi.org/10.3390/
in mobile applications [1], electronic circulation of medical prescriptions (E-recepta) [2],
s23187978
and electronic tax filing (e-PIT) [3]. Newer and larger systems are being developed, com-
prising thousands of lines of code, and numerous libraries and technologies. A steadily
Academic Editor: Alessandra growing number of programmers collaborate on a single project, needing to work closely
Rizzardi
together to deliver a finished product on time. An increasing number of services are avail-
Received: 12 August 2023 able over the Internet, and they also have extended functionality. In summary, the impact
Revised: 1 September 2023 of software security also affects the end user, regardless of the device they use, such as the
Accepted: 18 September 2023 Internet of things (IoT), sensor-equipped devices, embedded systems, or a mobile phone [4].
Published: 19 September 2023 The importance of securing such software applications, which frequently involve complex
codebases, cannot be overstated. Vulnerabilities in these applications can lead to serious
security breaches, data leaks, and even physical harm, if the devices controlled by the
software are critical to safety.
Copyright: © 2023 by the authors. Cybercriminals tirelessly devise new ways to exploit vulnerabilities in application
Licensee MDPI, Basel, Switzerland.
functioning, to cause harm and extract data. Analyzing program code for security purposes
This article is an open access article
is challenging, time-consuming, and expensive. Thus, there is a necessity to support these
distributed under the terms and
processes. Examples of such solutions include tools for conducting static code analysis for
conditions of the Creative Commons
security (SAST), and dynamic code analysis for security (DAST). Both of these processes
Attribution (CC BY) license (https://
have been discussed in the literature [5,6]. However, it can be observed that authors tend
creativecommons.org/licenses/by/
to focus on only one technology, such as C [7,8] or Java [6]. The literature also includes
4.0/).
comparisons of solutions for related technologies, such as C and C++ [9]. Additionally,
the authors in [6] used enterprise-type tools that are not available to every user due to
their high cost. According to the current state of the authors’ knowledge, there is a lack of
a broader comparison of open-source tools available to everyone, supporting current trends
in software development. Furthermore, the literature lacks a perspective on solutions that
can perform static code analysis for more than one technology, such as analyzing code
written in both Java and JavaScript. Such solutions can potentially reduce the number
of tools used in the software development process, thereby simplifying the continuous
integration/continuous delivery process and reducing the amount of data processed in the
big data process [10].
The novel contribution of this paper relates to the comparison of the results of open-
source tools supporting various technologies used in software development for conducting
static analysis and detecting potential errors affecting the security of applications, thereby
enhancing the security of organizations and end users. Based on the analysis of the
obtained results, a recommendation was formulated regarding the utilization of such
solutions, which could significantly enhance the quality of applications developed, even
at the code-writing stage. The analysis was carried out based on the list of vulnerabilities
reported by all tools. Vulnerable web applications were scanned using these tools for the
most popular programming languages [11].
The scope of this work encompassed a review of the available literature; an analysis
of methods for scanning code for vulnerabilities; a determination of the pros and cons
of SAST tools; an overview of existing tools, their comparison, research, and analysis of
the obtained results; as well as effectiveness verification. Within the conducted research,
vulnerable web applications were configured and launched, SAST tools were configured
and launched, the extraction, transformation loading (ETL) process [12] was utilized to
consolidate results from the tools, and the acquired data were processed. Given that web
applications are the most popular type of application enabling access to services through
personal computers, this work specifically focused on them. This paper is divided into the
following subsections:
• Background — describes the research basics and presents the problems, processes,
and compromises that occur in static code analysis research;
• Environment—describes the hardware and software used for conducting the research.
This section also provides an overview of the analyzed tools and the utilized vulnerable
web applications;
• Research Methodology—covers the design of the experiment for the research conducted;
• Results—provides a discussion of the results, and highlights the strengths and weak-
nesses of the examined tools for static code analysis;
• Conclusions—summarizes the obtained results.
2. Background
A web service is a component of an information technology system that can be com-
municated with via the Internet. This could be a web application, an API server, and so on.
With the ever-improving global access to fast and affordable internet connections, these
services have become more popular than ever before—they are used for tasks such as
online banking, information searches, doctor appointments, watching movies, listening to
music, and playing computer games. However, not only has their popularity increased,
but their functionality has also expanded. Bank websites offer more and more features,
government websites enable administrative procedures—all through the use of computers
or smartphones.
Nevertheless, despite the many advantages of web services, as their complexity grows,
so does the likelihood of vulnerabilities that could lead to security threats. In the context of
cybersecurity, a vulnerability (or flaw or weakness) is a defect in an information technology
system that weakens its overall security. Exploiting such a flaw, a cybercriminal could
cause harm to the system owner (or its users) or pave the way for further attacks on the
Sensors 2023, 23, 7978 3 of 33
system. Each vulnerability can also be described as an attack vector—a method or means
that enables the exploitation of a system flaw, thereby jeopardizing security. A set of attack
vectors is termed the attack surface [13]. The utilization of a vulnerability is referred to
as exploitation.
The common weakness enumeration (CWE) is a community-developed list of software
and hardware weakness types [14]. As of today, the CWE database contains 933 entries [15].
Each CWE entry possesses attributes that describe the specific weakness, including an
identification number, name, description, relationships to other entries, consequences,
and examples of exploitation.
The common vulnerability scoring system (CVSS) is a standard for rating the severity
of vulnerabilities [16]. It aims to assign a threat rating to each identified vulnerability (in
the case of a successful exploit). Such a rating allows prioritizing the response efforts of the
reacting team based on the threat severity. The latest version is CVSSv3.1, released in June
2019. The specification is available on the website, and CVSSv4 is currently in the public
consultation phase [17].
The common vulnerabilities and exposures (CVE) database contains entries about
known security vulnerabilities [18]. It differentiates vulnerabilities that can directly lead to
exploitation from exposures that may indirectly lead to exploitation. Each entry is assigned
a unique identifier and includes a description, name, software version, manufacturer,
cross-references to other resources about the entry, and creation date.
Due to the increasing complexity of web systems, software testing for security vul-
nerabilities has become an essential and critical activity in the software development life
cycle (SDLC), especially for web applications [19]. Secure software development life cycle
(SSDLC) is an extension of the SDLC, with additional security measures [20]. It aims to
assist developers in creating software in a way that reduces future security threats. It
includes, among other things, defining and implementing security requirements alongside
the functional requirements of the application being developed, as well as periodically
assessing the security level, for instance, through conducting penetration tests.
Numerous SSDLC models have been proposed and are successfully employed in
contemporary processes [20]. Some of these include the National Institute of Standards
and Technology (NIST) guidelines 800-64 [21], Microsoft’s Security Development Lifecy-
cle (MSSDL) [22], and the Comprehensive Lightweight Application Security Process by
OWASP (OWASP CLASP) [23].
Penetration testing is one of the most popular methods for assessing the security level
of web applications [24]. It constitutes a part of the testing phase within the SSDLC process.
It involves attacking the application to discover the existence or extent of vulnerabilities
within the application’s attack surface. In contemporary cybersecurity solutions, automa-
tion plays a pivotal role. As the complexity of developed solutions continues to grow,
the need for more efficient methods of testing the security of web services arises. In today’s
fast-paced environment, where software updates are released daily, penetration tests must
be conducted swiftly and effectively.
It is impossible to fully automate the entire process of conducting penetration
tests—certain aspects must be carried out by a human. However, many tasks, such as
fuzzing (a technique involving the supply of incorrect, unexpected, or random data),
can be easily automated. Although no automation tool can fully replace the intuition and
abstract thinking of a human tester, it can expedite their work by identifying well-known
and documented vulnerabilities.
Static code analysis tools analyze the source code of a program, utilizing a white-box
testing approach. There are various approaches to conducting this analysis, such as string
pattern matching, lexical analysis, and abstract syntax tree (AST) analysis [25]. The earlier
a software error is detected, the lower the cost of its resolution [26]. SAST tools can scan
code not only during the software testing phase within the SDLC but also during the writing
of the program code by the developer, providing real-time error feedback. They examine
the entire code, ensuring 100% coverage of the software. Besides detecting vulnerabilities
Sensors 2023, 23, 7978 4 of 33
in the code, these scanners often analyze the libraries used by the application, highlighting
those with known vulnerabilities. Despite developers’ positive evaluation of using SAST
tools for error reduction, integrating such tools into SDLC processes encounters certain
challenges, such as low analysis performance, the need for multiple tools, and technical
debt (a phenomenon where choosing a seemingly easier and cheaper option in the short
term becomes less cost-effective in the long run [27]) when implemented late [28]. Scanning
extensive lines of code can result in hundreds or even thousands of vulnerability alerts for
a single application. This generates numerous false positives, prolongs investigation time,
and diminishes trust in SAST tool results [28].
Unlike SAST tools, dynamic application security testing (DAST) tools assess the
behavior of running program code through the user interface and APIs. This is a black-
box approach to penetration testing, as they lack access to the source code of the tested
application. DAST tools are language-independent and can identify issues not present in
the application code, such as misconfigured environments, manipulation of cookies (text
snippets stored by a web browser in a computer’s memory—can be transmitted by a web
service or created locally), and errors in integration with third-party services [5]. As DAST
tools do not have access to the source code of the tested application, there is a significant
likelihood that they may overlook certain parts of the scanned application. They will not
pinpoint the location of vulnerabilities in the code; rather, they will indicate the detected
issues, leaving it up to the programmer to identify the line(s) of code responsible for
the error.
Since DAST requires a functioning application, vulnerabilities are detected towards
the end of the SDLC process, increasing the cost of their remediation. Additionally,
a separate environment is needed for conducting tests, further amplifying the financial
investment—the entire infrastructure of the tested service must be provided, typically
encompassing (but not limited to) the client application, API server, and database. Similarly
to SAST tools, they also generate numerous false alarms.
Static and dynamic code analysis for security are not the only types of code analysis.
Additionally, we cab distinguish the following: interactive application security testing
(IAST), a method that combines SAST and DAST. It utilizes monitoring mechanisms to
observe the behavior of the web application server’s code, while simultaneously attacking
the application through its graphical interface or API [5]; runtime application self-protection
(RASP), a solution that involves using tools to monitor the behavior of a web application
during runtime to detect and block attacks. Unlike IAST, RASP does not attempt to identify
vulnerabilities but rather protects against attacks that might exploit those vulnerabilities.
3. Environment
To ensure accurate research results, an appropriate testing environment was prepared.
It consisted of computer hardware and software that enable a thorough analysis process.
In order to facilitate seamless reproduction of the experiment results, open-source software
and licenses allowing their use for research purposes were utilized. Table 1 presents a list
of the hardware and software used to conduct the research.
• Tiredful API—This vulnerable web application, created using Python, aims to educate pro-
grammers, testers, and cybersecurity specialists about web application vulnerabilities [46].
Table 2. Used web applications.
Name Version
Bearer CLI 1.5.1
Floe phpcs-security-audit 2.0.1
Graudit 3.5
InsiderSec Insider CLI 3.0.0
OWASP FindSecBugs 1.12.0
Progpilot 1.0.2
PyCQA Bandit 1.7.5
Semgrep 1.20.0
ShiftLeft Inc. Scan 2.1.2
SourceCode.AI Aura 2.1
Zup Horusec 2.8.0
Sensors 2023, 23, 7978 7 of 33
• Bearer CLI—This is a SAST tool. Currently, it supports JavaScript, TypeScript, and Ruby
languages. It does not support adding new rules (one would need to download the
source code, add rules, and compile the program), but it allows disabling existing
ones and excluding a specific file (entirely or partially) from the scanning process.
The report can be generated in three formats: JSON, YAML, and SARIF. Each entry
in the report includes the file path, line number of the code, and a description of the
detected vulnerability (along with the CWE identifier(s)) [47];
• phpcs-security-audit—Prepared by Floe, phpcs-security-audit is a set of rules for the
PHP_CodeSniffer tool. It extends its capabilities for detecting vulnerabilities and
weaknesses in PHP code, turning it into a SAST tool. The scan report is displayed
in the system shell console window. It includes file paths, line numbers of the code,
and descriptions of detected vulnerabilities, though without any links to, for example,
the CWE database [48];
• Graudit—Developed by a single programmer, Eldar Marcussen, Graudit is a SAST tool
that searches for potential vulnerabilities in application source code, using another tool,
GNU grep, for text filtering. It supports multiple programming languages (including
all those tested within this work—a complete list can be found on the project’s website).
It is essentially a Bash script. Adding rules involves entering specific rules into files
provided with the tool. Similarly, you can “disable” certain rules by removing them.
The tool’s output is text displayed in the system shell console. It contains only file
paths, the line numbers of the code, and highlighted code snippets that triggered the
rule [49];
• Insider CLI—This is a SAST tool prepared by InsiderSec. It supports Java, Kotlin,
Swift, C#, and JavaScript languages. It does not allow extending the set of rules,
but specific files or folders can be excluded from scanning. The report, available in
HTML or JSON format, includes a list of detected vulnerabilities. Each vulnerability
is accompanied by a CVSS score, file path, line number of the code, description,
and removal recommendation [50];
• Find Security Bugs—This is a plugin for the SpotBugs tool, supporting web applica-
tions written in the Java language. Similarly to phpcs-security-audit, it extends its
capabilities to identifying security vulnerabilities, effectively turning it into a SAST
tool. The report is generated in HTML or XML format. Each entry in the report
includes the file path, identifier of the detected vulnerability (which, when entered
on the tool’s website, provides a detailed description of the vulnerability), and line
number of the code [51];
Sensors 2023, 23, 7978 8 of 33
• Progpilot—This is a SAST tool for scanning code written in the PHP language for
security vulnerabilities. It allows adding custom rules, disabling existing ones, and ex-
cluding files and folders from scanning. The JSON format report contains a list of
vulnerabilities, each having various attributes, such as the file path, line number of
the code, type, and a link to CWE [52];
• Bandit—This SAST tool, developed by PyCQA, is designed for scanning code written
in the Python language for security vulnerabilities. It allows adding custom rules,
disabling existing ones, and excluding files and folders from scanning. The report
can be generated in various formats such as JSON, CSV, or HTML. For each detected
vulnerability, details are included, such as the file path, line number of the code,
vulnerability type and description, along with the CWE identifier [53];
• Semgrep—Semgrep, a versatile SAST tool supporting multiple programming lan-
guages, allows you to add custom rules, disable existing ones, and exclude specific
files and folders from scanning. Reports can be generated in various formats, such as
JSON or SARIF. For each vulnerability, details are provided, including the file path,
line number of the code, vulnerability type and description, along with references to
resources such as the CWE and OWASP Top 10 [54];
• Scan—The SAST tool developed by ShiftLeft supports multiple programming lan-
guages (full list available on the project’s website). It is not a standalone solution;
Scan is a combined set of static analysis tools. The scan report is a combination of
reports from the tools used by Scan. It does not allow for rule disabling or excluding
specific files or folders. The report is distributed across multiple files in formats such
as HTML, JSON, and SARIF. Each entry in the report contains information about the
vulnerability’s location and type [55];
• Aura—The SAST tool developed by SourceCode. AI is used for scanning code written
in the Python programming language. The tool does not allow adding custom rules,
nor can individual rules be blocked. The report can be generated in text, JSON,
or SQLite database file formats. Each detected vulnerability is associated with a file
path, line number, and vulnerability type [56];
• Horusec—A SAST tool for scanning code written in multiple programming languages.
It uses its own scanners, as well as other open-source ones (a full list is available in the
documentation). It allows adding custom rules and excluding specific files and folders
from the scan. The report can be saved in text, JSON, or SonarQube formats. Each
entry in the report contains a description of the detected vulnerability (not always
with a CWE identifier) and its location [57].
4. Research Methodology
Table 5 presents a structured summary of experiments conducted in this paper. Each
row represents a distinct software application and each column corresponds to a specific
SAST tool. The letter “Y” indicates that the experiment was conducted using the given tool
and application. A total of 95 experiments were conducted as part of the research.
ShiftLeft Scan
FindSecBugs
Insider CLI
Bearer CLI
Progpilot
Semgrep
Horusec
Graudit
Bandit
Aura
Application
EasyBuggy N N N Y Y Y Y N N Y Y
Java Vulnerable Lab N N N Y Y Y Y N N Y Y
Sensors 2023, 23, 7978 9 of 33
Table 5. Cont.
Phpcs-Security-Audit
ShiftLeft Scan
FindSecBugs
Insider CLI
Bearer CLI
Progpilot
Semgrep
Horusec
Graudit
Bandit
Aura
Application
Security Shepherd N N N Y Y Y Y N N Y Y
Vulnerable App N N N Y Y Y Y N N Y Y
Broken Crystals N N Y N Y Y Y N N N Y
Damn Vulnerable
N N Y N Y Y Y N N N Y
Web Services
Juice Shop N N Y N Y Y Y N N N Y
NodeGoat N N Y N Y Y Y N N N Y
Conviso Vulnerable
N N N N Y Y N Y Y Y Y
Web Application
Damn Vulnerable
N N N N Y Y N Y Y Y Y
Web Application
WackoPicko N N N N Y Y N Y Y Y Y
Xtreme Vulnerable
N N N N Y Y N Y Y Y Y
Web Application
Damn Small
Y Y N N Y Y N N N Y Y
Vulnerable Web
Damn Vulnerable
GraphQL Y Y N N Y Y N N N Y Y
Application
Damn Vulnerable
Python Web Y Y N N Y Y N N N Y Y
Application
Tiredful API Y Y N N Y Y N N N Y Y
P = TP + FN (1)
• Negative (N)—Represents the number of vulnerabilities reported by all tools that do
not exist in the application. The N indicator is calculated using Equation (2). It is the
same for all tools within a given application.
N = FP + TN (2)
• TOTAL—This indicator represents the total number of vulnerabilities reported by all
tools. It is the same for all tools within a given application.
• Accuracy (ACC)—Accuracy evaluates the portion of all vulnerabilities reported by
all tools that consist of correctly reported vulnerabilities, as well as correctly unre-
ported vulnerabilities using a specific tool. The ACC indicator is determined using
Formula (3).
5. Results
The study conducted in this work examined 11 SAST tools. These tools were used to
scan 16 vulnerable web applications written in four different technologies. The tools and
applications are presented in Section 3.
Table 6. The values of SAST tool indicators for the EasyBuggy application.
• FP—Horusec had the highest count of False FP with 38, which could lead to resource-
intensive investigations of false alarms;
• TN—The tools also correctly identified true TN, where no vulnerabilities were present;
• ACC%—The overall accuracy of the tools ranged from 22.02% (Graudit) to 94.50%
(FindSecBugs), indicating their effectiveness in correctly classifying vulnerabilities;
• SEN%—FindSecBugs achieved the highest SEN at 96.97%, indicating its strong capa-
bility to identify true positives relative to the other tools;
• PRE%—ShiftLeftScan had the highest PRE, at 87.50%, suggesting that when it reported
a positive result, it was often a true positive;
• TP%—The proportion of actual vulnerabilities detected by each tool varied, with Find-
SecBugs achieving 29.36%;
• FN%—The proportion of missed vulnerabilities ranged from 0.92% (Graudit) to
29.36% (Horusec);
• FP%—The proportion of false alarms varied, with Graudit having a high FP rate
of 64.22%;
• TN%—The proportion of correctly identified non-vulnerable instances varied but was
generally high for all tools.
Table 7. The values of SAST tool indicators for the Java Vulnerable Lab application.
Table 8 presents a comprehensive analysis of the various SAST tools applied for the
Security Shepherd application. The results provide valuable insights into the performance
of each tool:
• FindSecBugs detected the highest number of TP, with 196 vulnerabilities identified.
This indicates a strong ability to uncover actual security issues within the application;
• Graudit had a considerably lower number of TPs (19), which suggests it may have
missed several vulnerabilities within the application;
• Horusec detected 177 TP, indicating a good capability to identify security issues;
• Insider identified only six TP, signifying limited effectiveness in detecting vulnerabilities;
• ShiftLeftScan found 173 TP, demonstrating a robust ability to identify security problems;
• Semgrep detected 28 TP, indicating some effectiveness in identifying vulnerabilities;
• FP was highest for FindSecBugs with 767, followed by Horusec with 801. These high
FP counts could lead to resource-intensive investigations of false alarms;
Sensors 2023, 23, 7978 13 of 33
• The tools correctly identified TN where no vulnerabilities were present. The TN count
was highest for Insider (1841), indicating its ability to avoid false alarms;
• The overall ACC of the tools varied, ranging from 59.51% (Horusec) to 90.32% (Insider),
showing differences in their effectiveness in correctly classifying vulnerabilities;
• FindSecBugs achieved the highest SEN at 96.08%, indicating its strong capability to
identify true positives relative to other tools;
• PRE was the highest for Insider at 100.00%, suggesting that when it reported a positive
result, it was almost always a true positive;
• The proportion of actual vulnerabilities detected by each tool varied, with FindSecBugs
achieving 9.58%. However, some tools had much lower proportions of detected
vulnerabilities, such as Graudit with 0.93%;
• The proportion of missed vulnerabilities also varied, with Graudit having the highest
percentage at 9.05%;
• The proportion of FP varied, with FindSecBugs having a high FP rate of 37.51%;
• The proportion of correctly identified non-vulnerable instances (TN) was generally
high for all tools, with Insider achieving the highest TN percentage at 90.02%.
Table 8. The values of SAST tool indicators for the Security Shepherd application.
Table 9 provides an analysis of the various SAST tool indicators for the Vulnerable
App application. The results reveal important insights into each tool’s performance:
• FindSecBugs identified 24 TP, indicating its ability to uncover actual security issues
within the application;
• Graudit detected 12 TP, suggesting a moderate capability to find vulnerabilities;
• Horusec found only four TPs, indicating limited effectiveness in identifying security issues;
• Insider identified just one TP, signifying a low capability for detecting vulnerabilities;
• ShiftLeftScan detected 10 TP, showing a moderate ability to identify security problems;
• Semgrep achieved the highest number of TPs, with 25, indicating a strong capability
for uncovering vulnerabilities;
• FP was observed, with the highest count for Graudit (31) and the lowest for Insider (3).
High FP counts can lead to resource-intensive investigations of false alarms;
• The tools correctly identified TNs, where no vulnerabilities were present. TN counts
ranged from 26 (Graudit) to 55 (Insider), indicating the ability to avoid false alarms;
Sensors 2023, 23, 7978 14 of 33
• The overall ACC of the tools varied, ranging from 39.18% (Graudit) to 76.29% (Sem-
grep), showing differences in their effectiveness in correctly classifying vulnerabilities;
• Semgrep achieved the highest SEN at 62.50%, indicating its strong capability to identify
true positives relative to the other tools;
• PRE was the highest for Semgrep, at 75.76%, suggesting that when it reported a posi-
tive result, it was often a true positive;
• The proportion of actual vulnerabilities detected by each tool varied, with Semgrep
achieving 25.77%. However, some tools had lower proportions of detected vulnerabil-
ities, such as Horusec with 4.12%;
• The proportion of missed vulnerabilities FN varied, with Horusec having the highest
percentage at 37.11%;
• The proportion of FP also varied, with Graudit having a relatively high FP rate
of 31.96%;
• The proportion of correctly identified non-vulnerable instances (TN) was generally
high for all tools, with Insider achieving the highest TN percentage, at 55.67%.
Table 9. The values of SAST tool indicators for the Vulnerable App application.
Table 10 provides average values for the selected SAST tool indicators for applica-
tions developed using Java technology. The results provide valuable insights into the
performance of each tool:
• FindSecBugs had the highest average ACC, at 76.60%, indicating that it was generally
effective in correctly classifying vulnerabilities for Java applications;
• Graudit had the lowest average ACC at 49.66%, suggesting that it had a lower overall
accuracy compared to the other tools;
• FindSecBugs achieved the highest average SEN, at 85.23%, indicating its strong capa-
bility to identify true positives relative to the other tools;
• Insider had the lowest average SEN, at 2.88%, suggesting that it had difficulty in
identifying actual security issues;
• PRE was the highest for FindSecBugs, at 59.85%, indicating that when it reported
a positive result, it was often a true positive;
• Graudit had the lowest average PRE, at 13.43%, indicating a higher rate of false
positives when it reported security issues;
Sensors 2023, 23, 7978 15 of 33
• The proportion of TP among all vulnerabilities detected varied among the tools,
with FindSecBugs achieving the highest average, at 24.07%;
• ShiftLeftScan had the lowest average TP percentage, at 13.16%, indicating a lower
capability to identify true positives relative to the other tools;
• The proportion of FN varied among the tools, with Graudit having the highest average
FN percentage, at 22.19%, indicating missed vulnerabilities;
• Insider had the lowest average FN percentage at 2.88%, suggesting that it had a lower
tendency to miss vulnerabilities;
• FP was observed, with FindSecBugs having the highest average FP percentage,
at 17.83%;
• Insider had the lowest average FP percentage, at 2.02%, indicating a lower rate of
false alarms;
• The proportion of TN among all non-vulnerable instances varied among the tools,
with Insider achieving the highest average TN percentage, at 68.34%;
• FindSecBugs had the lowest average TN percentage, at 52.53%, suggesting a higher
rate of false alarms among non-vulnerable instances.
Table 10. Average values of the selected SAST tool indicators for applications developed using
Java technology.
Based on the comprehensive analysis of various SAST tools applied to multiple Java
applications, we can draw the following conclusions:
• FindSecBugs consistently outperformed the other tools in terms of ACC, SEN, and PRE
across different Java applications. Graudit consistently performed the worst, with the
lowest ACC, SEN, and PRE among the tools;
• ACC—FindSecBugs achieved the highest average ACC (76.60%), indicating its overall
effectiveness in correctly classifying vulnerabilities for Java applications. Graudit had
the lowest average ACC (49.66%), indicating its lower overall accuracy compared to
the other tools;
• SEN—FindSecBugs had the highest average SEN (85.23%), demonstrating its
strong capability to identify true positives relative to the other tools. Insider had
the lowest average SEN (2.88%), indicating that it struggled to identify actual
security issues effectively;
• PRE—FindSecBugs achieved the highest average PRE (59.85%), suggesting that when
it reported a positive result, it was often a true positive. Graudit had the lowest
average PRE (13.43%), indicating a higher rate of false positives when it reported
security issues;
• TP and FN—FindSecBugs consistently identified a higher proportion of TP among all
vulnerabilities detected, indicating its strong capability to find actual security issues.
Graudit had the highest average FN percentage (22.19%), suggesting that it frequently
missed vulnerabilities;
• FP and TN—Insider had the lowest average FP percentage (2.02%), indicating a lower
rate of false alarms among non-vulnerable instances. FindSecBugs had the highest
Sensors 2023, 23, 7978 16 of 33
average FP percentage (17.83%), suggesting a higher rate of false alarms among non-
vulnerable instances.
In summary, FindSecBugs was consistently the top-performing SAST tool, excelling
in accuracy, sensitivity, and precision. It consistently achieved the highest proportion
of true positives, while maintaining a reasonable accuracy. On the other hand, Graudit
consistently performed the worst, with lower accuracy, sensitivity, and precision, and a
higher rate of false negatives. The choice of SAST tool should consider the specific needs
of the application and the importance of minimizing false alarms, while maximizing the
detection of true security vulnerabilities.
Table 11. The values of SAST tool indicators for the Broken Crystals application.
Table 12 presents the results of SAST tool evaluations for the Damn Vulnerable Web
Services application. The results reveal important insights into each tool’s performance:
Sensors 2023, 23, 7978 17 of 33
• Semgrep performed the best in this category, with seven true positives, followed by
Bearer with four and Horusec with three;
• Insider had the highest number of FNs (10), indicating it missed a significant portion of
vulnerabilities. Other tools, such as Horusec and Graudit, also missed vulnerabilities;
• Bearer and Horusec both reported four false positives;
• Semgrep achieved the highest ACC, at 90.70%, indicating that it made fewer misclassi-
fications. Graudit had the lowest accuracy, at 34.88%;
• Semgrep demonstrated the highest sensitivity, at 63.64%, indicating its effectiveness in
identifying true positives. Insider had the lowest sensitivity at 9.09%, implying that it
missed many vulnerabilities;
• Semgrep and Insider achieved the highest precision, at 100.00%. However, Insider
reported a low number of vulnerabilities overall.
Table 12. Values of the SAST tool indicators for the Damn Vulnerable Web Services application.
Table 13 presents the results of the assessment of SAST tools applied to the Juice Shop
application. The results reveal important insights into each tool’s performance:
• Among the tools, Semgrep stands out with the highest number of TPs (20), indicating
its effectiveness in identifying real vulnerabilities in the Juice Shop application. It was
followed by Graudit, with 12 true positives;
• Insider had the highest number of false negatives (29), indicating that it failed to detect
a significant number of vulnerabilities;
• Graudit reported the most FPs (76), followed by Horusec with 45.
• Bearer achieved the highest accuracy, at 84.21%, indicating that it made fewer misclas-
sifications. Graudit had the lowest accuracy, at 36.84%, suggesting a higher likelihood
of misidentifying vulnerabilities;
• Semgrep demonstrated the highest sensitivity at 62.50%, indicating its effectiveness in
identifying true positives. Insider had the lowest sensitivity, at 9.38%, implying that it
missed many vulnerabilities;
• Semgrep achieved the highest precision, at 60.61%. However, Graudit reported a high
number of false positives, resulting in a low precision of 0.00%.
Sensors 2023, 23, 7978 18 of 33
Table 13. The values of SAST tool indicators for the Juice Shop application.
Table 14 presents the results of SAST tool evaluations for the NodeGoat application.
The results reveal important insights into each tool’s performance:
• Semgrep and Bearer both achieved the highest TP count, with seven each;
• Insider and Horusec shared the highest FN count, with nine each;
• Graudit reported the most false positives, with 22;
• Semgrep achieved the highest accuracy, at 93.33%, indicating that it made fewer
misclassifications;
• Semgrep had the highest sensitivity, at 63.64%;
• Semgrep also achieved the highest precision, at 87.50%.
Table 14. The values of SAST tool indicators for the NodeGoat application.
Table 15 presents the average values for the selected indicators for SAST tools applied
to applications developed using JavaScript technology. These averages provide an overview
of the overall performance of each SAST tool across multiple applications:
• Bearer—On average, this SAST tool achieved an ACC% of 83.32%, SEN% of 33.10%,
PRE% of 80.00%, TP% of 6.30%, FN% of 13.36%, and FP% of 3.33%. The TN% averaged
at 77.02%;
• Graudit—The average performance of this tool included an accuracy of 51.68%, sensi-
tivity of 31.34%, precision of 0.00%, true positive rate of 6.15%, false negative rate of
13.49%, and false positive rate of 34.83%. The true negative rate averaged at 45.52%;
• Horusec—On average, Horusec achieved an accuracy of 47.00%, sensitivity of 33.85%,
precision of 15.78%, true positive rate of 6.62%, false negative rate of 13.03%, and false
positive rate of 39.97%. The true negative rate averaged at 40.38%;
• Insider—The average performance of this tool resulted in an accuracy of 78.77%,
sensitivity of 11.01%, precision of 51.39%, true positive rate of 2.06%, false negative
rate of 17.59%, and false positive rate of 3.64%. The true negative rate averaged
at 76.71%;
• Semgrep—On average, Semgrep achieved the highest accuracy of 88.21%, a sensitivity
of 53.00%, precision of 80.78%, true positive rate of 10.65%, false negative rate of 9.00%,
and false positive rate of 2.79%. The true negative rate averaged 77.56%.
These average values provide an overall picture of how each SAST tool performed
when applied to JavaScript-based applications. Semgrep stands out, with a high accuracy,
sensitivity, and precision, making it a strong choice for securing JavaScript applications.
However, the selection of the most suitable tool should consider project-specific require-
ments and constraints;
Table 15. Average values of selected SAST tool indicators for applications developed using
JavaScript technology.
In conclusion, the choice of a SAST tool for JavaScript applications should be made
based on a careful evaluation of the specific requirements and constraints of the project.
While Semgrep consistently exhibited a strong overall performance, other tools may excel in
particular areas or be better suited for specific use cases. A comprehensive security strategy
should involve the selection of the right tools, continuous monitoring, and expert analysis,
to ensure robust protection against vulnerabilities in JavaScript-based applications.
Table 16. The values of the SAST tool indicators for the Conviso Vulnerable Web application.
Table 17 presents the results of an assessment of SAST tools applied to the Damn Vul-
nerable Web application. The results reveal important insights into each tool’s performance:
• Horusec stands out, with a high sensitivity (90.40%), indicating that it successfully
identified a substantial portion of true positives. Conversely, Progpilot showed a sen-
sitivity of only 12.80%, suggesting it missed many true positives;
• Progpilot demonstrated a 100% precision, implying that all reported vulnerabilities
were true positives. However, ShiftLeft Scan had a relatively low precision, at 55.26%,
indicating a higher likelihood of false positives;
• Horusec had a high true positive rate (25.17%), while Progpilot and Semgrep had
lower rates, implying they missed a significant number of true positives;
• Horusec and PHP_CS had relatively high FP rates, indicating they reported some
issues that were not actual vulnerabilities in the application. Semgrep had the lowest
FP rate among the tools;
• Some tools, such as Graudit, PHP_CS, and ShiftLeft Scan, reported TNs, indicating
that they correctly identified non-vulnerable portions of the application;
• Graudit, Progpilot, and ShiftLeft Scan exhibited reasonably high accuracy rates. How-
ever, it is essential to consider accuracy in conjunction with other metrics, to assess the
overall performance of each tool.
Table 17. The values of the SAST tool indicators for the Damn Vulnerable Web application.
Table 18 presents the results of the assessment of SAST tools applied to the WackoPicko
application. The results reveal important insights into each tool’s performance:
• Horusec stands out with a high sensitivity (93.40%), indicating that it successfully
identified a substantial portion of true positives. Conversely, ShiftLeft Scan and
Semgrep had much lower sensitivities, implying they missed many true positives;
• Progpilot demonstrated the highest precision, at 92.00%, implying that the vulnerabili-
ties it reported were highly likely to be true positives. Other tools, such as Graudit
and Horusec, had a lower precision;
• Horusec and Graudit exhibited reasonably high true positive rates. In contrast, Sem-
grep and ShiftLeft Scan had much lower rates, indicating they missed a significant
number of true positives;
Sensors 2023, 23, 7978 22 of 33
• Horusec and Progpilot had relatively low false positive rates, indicating that they
reported fewer false alarms. ShiftLeft Scan and Semgrep had slightly higher false
positive rates;
• Most tools reported true negatives, indicating that they correctly identified non-
vulnerable portions of the application;
• Horusec demonstrated the highest accuracy at 68.00%, followed closely by Progpilot
at 71.56%. Semgrep had the lowest accuracy among the tools.
Table 18. The values of the SAST tool indicators for the WackoPicko application.
Table 19 presents the results of the assessment of SAST tools applied to the Xtreme Vul-
nerable Web application. The results reveal important insights into each tool’s performance:
• Horusec exhibited a high sensitivity (90.20%), indicating its ability to detect a substan-
tial number of true positives. Progpilot also showed good sensitivity (39.22%). On the
other hand, ShiftLeft Scan and Semgrep had lower sensitivity values;
• Progpilot demonstrated the highest precision at 86.96%, indicating that the vulner-
abilities it reported were highly likely to be true positives. Horusec had a notably
lower precision;
• Horusec and Progpilot exhibited reasonable true positive rates. In contrast, Semgrep
and ShiftLeft Scan had lower rates, implying that they missed a significant number of
true positives;
• Horusec and ShiftLeft Scan reported a high number of false positives, while Progpilot
and Semgrep had lower false positive rates;
• Most tools correctly identified true negatives, which are non-vulnerable portions of
the application;
• Graudit and Progpilot demonstrated a high accuracy, with Progpilot being the most
accurate, at 91.15%. Horusec had a notably lower accuracy score.
Table 20 presents the average values of the selected indicators for SAST tools applied
to applications developed using PHP technology. These averages provide an overview of
the overall performance of each SAST tool across multiple applications:
Sensors 2023, 23, 7978 23 of 33
Table 19. The values of SAST tool indicators for the Xtreme Vulnerable Web application.
• Among the SAST tools, Progpilot stands out with the highest average accuracy, at
72.11%, indicating its ability to correctly classify vulnerabilities and non-vulnerable
code. Graudit and ShiftLeft Scan also exhibited relatively high accuracies, while
Horusec, PHP_CS, and Semgrep had lower average accuracy scores;
• Horusec demonstrated the highest average sensitivity, at 93.50%, suggesting that it
excelled in identifying true positives, although this was balanced by other factors.
Graudit also had a decent sensitivity score. On the other hand, Semgrep had a notably
lower average sensitivity;
• Progpilot stands out with the highest average precision score, at 94.74%, indicating
that the vulnerabilities it reported were highly likely to be true positives. ShiftLeft
Scan and Graudit also showed a good average precision. Semgrep had the lowest
average precision;
• Progpilot and Horusec exhibited reasonable average true positive rates, which indi-
catd their effectiveness in identifying actual vulnerabilities. Semgrep had the lowest
average TP rate;
• Semgrep and ShiftLeft Scan had the highest average false negative rates, suggesting
that they missed a substantial number of vulnerabilities in PHP applications. Horusec
had the lowest average FN rate;
• Horusec reported a high average false positive rate, indicating that it identified vul-
nerabilities that were not actually present. In contrast, Progpilot and Semgrep had the
lowest average FP rates;
• Progpilot achieved the highest average true negative rate, suggesting that it effectively
identified non-vulnerable portions of the code. Semgrep and ShiftLeft Scan also
exhibited good average TN rates.
In conclusion, the choice of SAST tool for PHP applications should consider a balance
between accuracy, sensitivity, and precision. Progpilot excels in precision but may miss
some vulnerabilities. Horusec has high sensitivity but reports more false positives. Graudit
and ShiftLeft Scan offer a good trade-off between these metrics. Semgrep demonstrated
a lower overall performance, particularly in sensitivity and precision. The selection should
align with the specific requirements and constraints of the project, and fine-tuning may be
necessary for comprehensive security testing.
Sensors 2023, 23, 7978 24 of 33
Table 20. Average values of the selected SAST tool indicators for applications developed using
PHP technology.
Table 21. The values of the SAST tool indicators for the Damn Small Vulnerable Web application.
In summary, the SAST tools exhibited varying levels of performance when analyzing
the Damn Vulnerable GraphQL Application. ShiftLeft Scan stood out, with the highest accu-
racy and perfect precision, indicating a low rate of false positives. However, it also reported
a relatively higher number of false negatives. Semgrep achieved a balanced performance,
with good precision and sensitivity. Other tools, such as Aura, Horusec, and Bandit, showed
moderate performance with different trade-offs between accuracy, precision, and sensitivity.
Graudit had a limited performance, with no true positives reported.
Table 22. The values of the SAST tool indicators for the Damn Vulnerable GraphQL application application.
Table 23 presents the results of the assessment of SAST tools applied to the Damn
Vulnerable Python Web application. The results reveal important insights into each
tool’s performance:
Sensors 2023, 23, 7978 26 of 33
Table 23. The values of SAST tool indicators for the Damn Vulnerable Python Web application.
Table 24 presents the results of an assessment of SAST tools applied to the Tiredful
API application. The results reveal important insights into each tool’s performance:
• Aura reported two TPs and six FNs. It achieved an ACC of 37.50%, a SEN of 25.00%,
and a PRE of 33.33%;
• Graudit reported no TPs and eight FNs. It had an ACC of 50.00%, a SEN of 0.00%,
and a PRE of 0.00%;
• Horusec achieved two TPs and six FNs. It had an ACC of 62.50%, a SEN of 25.00%,
and a perfect PRE of 100.00%;
• Bandit reported two TPs and six FNs. It had an ACC of 62.50%, a SEN of 25.00%,
and a perfect PRE of 100.00%. ShiftLeft Scan reported six TPs and two FNs. It had
an ACC of 62.50%, a SEN of 75.00%, and a PRE of 60.00%;
Sensors 2023, 23, 7978 27 of 33
• Semgrep reported three TPs and five FNs. It achieved an ACC of 68.75%, a SEN of
37.50%, and a perfect PRE of 100.00%.
In summary, the SAST tools provided varying results when analyzing the Tiredful
API application. Semgrep demonstrated the highest accuracy, sensitivity, and precision,
indicating its ability to identify vulnerabilities effectively. Horusec and Bandit showed
moderate performance, with a balanced accuracy and precision. Aura had the lowest
accuracy and precision among the tools. The choice of a specific tool should consider
the trade-offs between accuracy and precision, depending on the specific application’s
security requirements.
Table 24. The values of the SAST tool indicators for the Tiredful API application application.
Table 25 presents the average values of selected SAST tool indicators for applications
developed using Python technology. These averages provide an overview of the overall
performance of each SAST tool across multiple applications:
• On average, the SAST tools achieved accuracy scores ranging from approximately
34.90% to 63.07%. Semgrep had the highest average accuracy, indicating that it
provided the most correct results on average. This suggests that Semgrep can be relied
upon to accurately identify vulnerabilities in Python code;
• The average SEN scores ranged from around 0.00% to 47.92%. Semgrep and Horusec
exhibited relatively better sensitivity, making them suitable for detecting a broad range
of vulnerabilities;
• The PRE scores varied widely, with Semgrep achieving the highest average precision
(97.50%). This means that when Semgrep flagged a vulnerability, it was highly likely
to be a true positive;
• Semgrep had the highest average TP rate (29.36%), suggesting that it had a reasonably
good ability to find vulnerabilities within Python applications;
• The average FN rates ranged from approximately 12.50% to 65.10%. Graudit had
the highest FN rate, implying that it missed a substantial number of vulnerabilities.
Semgrep and ShiftLeft Scan demonstrated relatively lower FN rates;
• The average FP rates ranged from around 0.00% to 20.54%. ShiftLeft Scan had the
highest FP rate, followed by Aura. Semgrep produced the fewest false alarms;
• Graudit achieved the highest TN rate, followed by Semgrep.
Sensors 2023, 23, 7978 28 of 33
Table 25. Average values of selected SAST tool indicators for applications developed using
Python technology.
Table 26. Comparison of scan duration times. All results presented in the table are in seconds (s).
Phpcs-Security-Audit
ShiftLeft Scan
FindSecBugs
Insider CLI
Bearer CLI
Progpilot
Semgrep
Horusec
Graudit
Aurora
Bandit
Application
EasyBuggy - - - 93 s 1s 60 s 44 s - - 144 s 1s
Java Vulnerable - - - 1s 1s 11 s 7s - - 85 s 1s
Lab
Security Shepherd - - - 7s 2s 56 s 66 s - - 160 s 46 s
Vulnerable App - - - 3s 1s 82 s 66 s - - 155 s 1s
Broken Crystals - - 510 s - 2s 17 s 29 s - - - 4s
Damn Vulnerable - - 40 s - 1s 9s 2s - - - 12 s
Web Services
Juice Shop - - 148 s - 1s 28 s 31 s - - - 92 s
NodeGoat - - 26 s - 1s 10 s 3s - - - 1s
Conviso
- - - - 1s 10 s - 1s 1s 89 s 1s
Vulnerable
Web Application
Damn Vulnerable - - - - 1s 12 s - 1s 3s 72 s 1s
Web Application
WackoPicko - - - - 1s 9s - 1s 1s 5s 1s
Xtreme Vulnerable - - - - 1s 11 s - 1s 1s 9s 33 s
Web Application
Damn Small 1s 1s - - 1s 16 s - - - 1s 1s
Vulnerable Web
Damn Vulnerable
12 s 1s - - 1s 14 s - - - 85 s 52 s
GraphQL
Application
Damn Vulnerable
5s 1s - - 1s 12 s - - - 108 s 63 s
Python Web
Application
Tiredful API 3s 1s - - 1s 11 s - - - 98 s 6s
Average 5s 1s 181 s 26 s 1s 23 s 31 s 1s 2s 84 s 20 s
6. Conclusions
The primary objective of this study was to conduct a comprehensive comparative
analysis of open-source static code analysis tools, with a specific focus on their efficacy in
identifying vulnerabilities. The investigation hinged on the examination of the vulnerabili-
ties cataloged by these tools and their subsequent application in scrutinizing vulnerable
web applications crafted in selected programming languages.
Sensors 2023, 23, 7978 30 of 33
Author Contributions: Conceptualization, K.K. and M.W.; methodology, K.K.; software, K.K.; val-
idation, M.W.; formal analysis, K.K.; investigation, K.K.; resources, K.K.; data curation, K.K.; writ-
ing—original draft preparation, K.K.; writing—review and editing, M.W.; visualization, K.K.; super-
vision, M.W.; project administration, M.W. All authors have read and agreed to the published version
of the manuscript.
Funding: This research was funded by Wrocław University of Science and Technology.
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: Not applicable.
Acknowledgments: The authors wish to thank Wroclaw University of Science and Technology
(statutory activity) for the financial support.
Conflicts of Interest: The authors declare no conflict of interest.
Sensors 2023, 23, 7978 31 of 33
Abbreviations
The following abbreviations are used in this manuscript:
ACC Accuracy
API Application Programming Interface
AST Abstract Syntax Tree
CD Continuous Delivery
CI Continuous Integration
CVE Common Vulnerabilities and Exposures
CVSS Common Vulnerability Scoring System
CWE CommonWeakness Enumeration
DAST Dynamic Application Security Testing
ETL Extraction, Transformation, Loading process
FN False Negative
FP False Positive
IAST Interactive Application Security Testing
MSSDL Microsoft’s Security Development Lifecycle
N Negative
NIST National Institute of Standards and Technology
P Positive
PRE Precision
RASP Runtime Application Self-Protection
SAST Static Application Security Testing
SDLC Software Development Life Cycle
SEN Sensitivity
SSDLC Secure Software Development Life Cycle
TN True Negative
TP True Positive
References
1. mObywatel. Government Technology Website. Available online: https://info.mobywatel.gov.pl (accessed on 12 August 2023).
2. Pacjent. Government Technology Website. Available online: https://pacjent.gov.pl/internetowe-konto-pacjenta/erecepta
(accessed on 12 August 2023).
3. E-PIT. Government Technology Website. Available online: https://www.podatki.gov.pl/pit/twoj-e-pit (accessed on
12 August 2023).
4. Li, H.; Ota, K.; Dong, M.; Guo, M. Mobile crowdsensing in software defined opportunistic networks. IEEE Commun. Mag. 2017,
55, 140–145. [CrossRef]
5. Sast vs. Dast: What They Are and When to Use Them. CircleCI. Available online: https://circleci.com/blog/sast-vs-dast-when-
to-use-them/ (accessed on 12 August 2023).
6. Lenarduzzi, V.; Lujan, S.; Saarimaki, N.; Palomba, F. A critical comparison on six static analysis tools: Detection, agreement, and
precision. arXiv 2021, arXiv:2101.08832.
7. Desai, V.V.; Jariwala, V.J. Comprehensive Empirical Study of Static Code Analysis Tools for C Language. Int. J. Intell. Syst. Appl.
Eng. 2022, 10, 695–700.
8. Miele, P.; Alquwaisem, M.; Kim, D.K. Comparative Assessment of Static Analysis Tools for Software Vulnerability. J. Comput.
2018, 13, 1136–1144. [CrossRef]
9. Arusoaie, A.; Ciobâca, S.; Craciun, V.; Gavrilut, D.; Lucanu, D. A comparison of open-source static analysis tools for vulnerability
detection in c/c++ code. In Proceedings of the 2017 19th International Symposium on Symbolic and Numeric Algorithms for
Scientific Computing (SYNASC), Timisoara, Romania, 21–24 September 2017; pp. 161–168.
10. Wang, J.; Yang, Y.; Wang, T.; Sherratt, R.S.; Zhang, J. Big data service architecture: A survey. J. Internet Technol. 2020, 21, 393–405.
11. 15 Top Backend Technologies to Learn in 2022. HubSpot. Available online: https://blog.hubspot.com/website/backend-
technologies (accessed on 12 August 2023).
12. Vassiliadis, P.; Simitsis, A. Extraction, Transformation, and Loading. Encycl. Database Syst. 2009, 10, 1095–1101.
13. Manadhata, P.K.; Wing, J.M. An attack surface metric. IEEE Trans. Softw. Eng. 2010, 37, 371–386. [CrossRef]
14. Martin, B.; Brown, M.; Paller, A.; Kirby, D.; Christey, S. 2011 CWE/SANS Top 25 Most Dangerous Software Errors. Common
Weakness Enumeration. Mitre. 2011. Available online: https://cwe.mitre.org/top25/archive/2011/2011_cwe_sans_top25.pdf
(accessed on 12 August 2023).
15. Mitre. Common Weakness and Enumeration. Available online: https://cwe.mitre.org/index.html (accessed on 12 August 2023).
16. Nowak, M.R.; Walkowski, M.; Sujecki, S. Support for the Vulnerability Management Process Using Conversion CVSS Base Score
2.0 to 3.x. Sensors 2023, 23, 1802. [CrossRef] [PubMed]
Sensors 2023, 23, 7978 32 of 33
17. FIRST. Common Vulnerability Scoring System: Specification Document. Available online: http://www.first.org/cvss (accessed
on 12 August 2023).
18. Walkowski, M.; Oko, J.; Sujecki, S. Vulnerability management models using a common vulnerability scoring system. Appl. Sci.
2021, 11, 8735. [CrossRef]
19. Jaiswal, A.; Raj, G.; Singh, D. Security testing of web applications: Issues and challenges. Int. J. Comput. Appl. 2014, 88, 26–32.
[CrossRef]
20. de Vicente Mohino, J.; Bermejo Higuera, J.; Bermejo Higuera, J.R.; Sicilia Montalvo, J.A. The application of a new secure software
development life cycle (S-SDLC) with agile methodologies. Electronics 2019, 8, 1218. [CrossRef]
21. Nist, S. Security Considerations in the Information System Development Life Cycle. pp. 800–864. Available online: http:
//csrc.nist.gov/publications/nistpubs/800-64/NIST-SP800-64.pdf (accessed on 12 August 2023).
22. Howard, M.; Lipner, S. The Security Development Lifecycle; Microsoft Press: Redmond, WA, USA, 2006; Volume 8.
23. Gregoire, J.; Buyens, K.; De Win, B.; Scandariato, R.; Joosen, W. On the secure software development process: CLASP and SDL
compared. In Proceedings of the Third International Workshop on Software Engineering for Secure Systems (SESS’07: ICSE
Workshops 2007), Minneapolis, MN, USA, 20–26 May 2007; p. 1.
24. Sajdak, M.; Bentkowski, M.; Piosek, M.; Coldwind, G. Bezpieczeństwo Aplikacji Webowych; Securitum Szkolenia: Kraków,
Polska, 2021.
25. Chess, B.; McGraw, G. Static analysis for security. IEEE Secur. Priv. 2004, 2, 76–79. [CrossRef]
26. Hossain, S. Rework and reuse effects in software economy. Glob. J. Comput. Sci. Technol. C Softw. Data Eng. 2018, 18, 35–50.
27. Li, Z.; Avgeriou, P.; Liang, P. A systematic mapping study on technical debt and its management. J. Syst. Softw. 2015, 101, 193–220.
[CrossRef]
28. Johnson, B.; Song, Y.; Murphy-Hill, E.; Bowdidge, R. Why don’t software developers use static analysis tools to find bugs? In
Proceedings of the 2013 35th International Conference on Software Engineering (ICSE), San Francisco, CA, USA, 18–26 May 2013;
pp. 672–681.
29. K-Tamura/Easybuggy: Too Buggy Web Application. GitHub. Available online: https://github.com/k-tamura/easybuggy
(accessed on 12 August 2023).
30. CSPF-Founder/JavaVulnerableLab: Vulnerable Java Based Web Application. GitHub. Available online: https://github.com/
CSPF-Founder/JavaVulnerableLab (accessed on 12 August 2023).
31. SasanLabs. SasanLabs/VulnerableApp: OWASP VULNERABLEAPP Project: For Security Enthusiasts by Security Enthusiasts.
GitHub. Available online: https://github.com/SasanLabs/VulnerableApp (accessed on 12 August 2023).
32. Owasp. Owasp/SecurityShepherd: Web and Mobile Application Security Training Platform. GitHub. Available online:
https://github.com/OWASP/SecurityShepherd (accessed on 12 August 2023).
33. NeuraLegion. Neuralegion/Brokencrystals: A Broken Application—Very Vulnerable! GitHub. Available online: https:
//github.com/NeuraLegion/brokencrystals (accessed on 12 August 2023).
34. Snoopysecurity. Snoopysecurity/DVWS-Node. GitHub. Available online: https://github.com/snoopysecurity/dvws-node
(accessed on 12 August 2023).
35. Owasp. Juice-Shop/Juice-Shop: Owasp Juice Shop: Probably the Most Modern and Sophisticated Insecure Web Application.
GitHub. Available online: https://github.com/juice-shop/juice-shop (accessed on 12 August 2023).
36. OWASP. OWASP Juice Shop|OWASP Foundation. Available online: https://owasp.org/www-project-juice-shop/ (accessed on
12 August 2023).
37. OWASP. Owasp/NodeGoat. Available online: https://github.com/OWASP/NodeGoat (accessed on 12 August 2023).
38. Convisolabs. Convisolabs/CVWA. Github. Available online: https://github.com/convisolabs/CVWA (accessed on
12 August 2023).
39. Digininja. Digininja/DVWA: Damn Vulnerable Web Application (DVWA). Github. Available online: https://github.com/
digininja/DVWA (accessed on 12 August 2023).
40. Doupé, A.; Cova, M.; Vigna, G. Why Johnny can’t pentest: An analysis of black-box web vulnerability scanners. In Proceedings
of the International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment, Bonn, Germany, 8–9 July
2010 ; pp. 111–131.
41. Adamdoupe. Adamdoupe/Wackopicko. Github. Available online: https://github.com/adamdoupe/WackoPicko (accessed on
12 August 2023).
42. s4n7h0. S4N7H0/xvwa. Github. Available online: https://github.com/s4n7h0/xvwa (accessed on 12 August 2023).
43. Stamparm. Stamparm/DSVW: Damn Small Vulnerable Web. Github. Available online: https://github.com/stamparm/DSVW
(accessed on 12 August 2023).
44. Dolevf. Damn Vulnerable Graphql Application. Github. Available online: https://github.com/dolevf/Damn-Vulnerable-
GraphQL-Application (accessed on 12 August 2023).
45. Anxolerd. Damn Vulnerable Python Web App. Github. Available online: ttps://github.com/anxolerd/dvpwa (accessed on
12 August 2023).
46. Payatu. Tiredful-API. Github. Available online: https://github.com/payatu/Tiredful-API (accessed on 12 August 2023).
47. Bearer. Bearer CLI Documentation. Available online: https://docs.bearer.com/ (accessed on 12 August 2023).
Sensors 2023, 23, 7978 33 of 33
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.