A Deep Learning Based Static Taint Analysis Approach
A Deep Learning Based Static Taint Analysis Approach
A Deep Learning Based Static Taint Analysis Approach for IoT Software Vul-
nerability Location
Weina Niu, Xiaosong Zhang, Xiaojiang Du, Lingyuan Zhao, Rong Cao, Mohsen
Guizani
PII: S0263-2241(19)31005-X
DOI: https://doi.org/10.1016/j.measurement.2019.107139
Reference: MEASUR 107139
Please cite this article as: W. Niu, X. Zhang, X. Du, L. Zhao, R. Cao, M. Guizani, A Deep Learning Based Static
Taint Analysis Approach for IoT Software Vulnerability Location, Measurement (2019), doi: https://doi.org/
10.1016/j.measurement.2019.107139
This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover
page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will
undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing
this version to give early visibility of the article. Please note that, during the production process, errors may be
discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
19122, USA
e College of Engineering, Qatar University, Doha, 2713, Qatar
Abstract
Computer system vulnerabilities, computer viruses, and cyber attacks are root-
ed in software vulnerabilities. Reducing software defects, improving software
reliability and security are urgent problems in the development of software.
The core content is the discovery and location of software vulnerability. Howev-
er, traditional human experts-based approaches are labor-consuming and time-
consuming. Thus, some automatic detection approaches are proposed to solve
the problem. But, they have a high false negative rate. In this paper, a deep
learning based static taint analysis approach is proposed to automatically locate
Internet of Things (IoT) software vulnerability, which can relieve tedious man-
ual analysis and improve detection accuracy. Deep learning is used to detect
vulnerability since it considers the program context. Firstly, the taint from the
difference file between the source program and its patched program selection
rules are designed. Secondly, the taint propagation paths are got using static
taint analysis. Finally, the detection model based on two-stage Bidirectional
Long Short Term Memory (BLSTM) is applied to discover and locate software
vulnerabilities. The Code Gadget Database is used to evaluate the proposed ap-
1. Introduction
Over the next four years, 10 billion Internet of Things (IoT) and connected
devices will be deployed worldwide according to the report from Strategy An-
alytics [1]. The popularity and rapid development of IoT technology have also
5 brought new security and privacy risks, like IoT botnet, cryptocurrency mining
and ransomware attack. Several papers (e.g., [2, 3, 4, 5]) have studied related
security issues. IoT devices are closer to the user state than traditional PC de-
vices or server devices, and their associated privacy data or property data has a
larger number than that of traditional devices, which makes attackers favor IoT
10 devices. These attacks may shorten the lifetime of the battery in IoT network
or ruin the energy supply system, which would influence the basic functions of
the devices and even cause huge economic losses [6]. For example, the notorious
Mirai botnet [7] exploits login vulnerabilities in unsecured IoT devices such as
webcams and home routers has launched the largest DDoS attack known to
15 date. Moreover, the autonomous working mechanism and limitations on energy
resources of IoT devices make them vulnerable to energy resources exhaustion
(ERE) attacks [8, 9]. On the base of the attack method analysis, many research
studies have been carried out on this topic. For example, Boubiche et al. [10]
analyzed Sleep Deprivation, Barrage, Collision and Synchronization attacks at
2
20 the intersection of the physical and data link layers. Goudar et al. [11] discussed
Denial-of-Sleep attacks in WSN, which were caused by manipulating network
packets.
Applications, network protocols, operating systems, and cryptographic algo-
rithms ultimately exist in the form of software. However, it is difficult to guar-
25 antee reliable and safe software development since the design and development
of computer software requires high-intensity mental work and rich experience.
Moreover, the number of software vulnerabilities registered in the Common Vul-
nerabilities and Exposures (CVE) [12] continues to grow since 1999 and reaches
14, 714 in 2017, which is shown in Figure 1. In addition, the US Department of
30 Defense’s Advanced Research Projects Agency (DARPA) has hosted the Cyber
Grand Challenge (CGC) [13] since 2015 to improve the capabilities of a new
generation of fully automated cyberspace defense systems. The five aspects of
the CGC emphasize that all competitions are automated [13]. Therefore, the
fully automated method is the solution to future cyber warfare.
14000
12000 10616
CVE Num
10000
7946
8000 6610 6520 64806447
5632 5736 5297 5191
6000 4935 4652
4155
4000 2451
1677 2156 1527
2000 894 1020
year
3
based [15] and the other is code similarity-based [16, 17]. The first group of
methods relies on human experts for building vulnerability feature database.
40 Therefore, they are labor-consuming and sometimes error-prone. Moreover,
they cannot discover the precise locations of vulnerabilities due to independent
program representation. The second group of methods can solve the problem
by representing each software in the abstract level, and these methods consider
contextual information as well. However, they have a high false negative rate
45 and false positive rate.
Deep learning has dramatically changed the way that computing devices han-
dle human-centric content such as image, video, and audio [18]. The widespread
use of IoT and Network Physics Systems (CPS) in the industry can benefit from
the introduction of deep learning models. For example, images of production
50 vehicles in the assembly line and their annotations are input into a deep learning
system such as AlexNet, GoogLeNet, etc. to achieve visual inspection. Deep
learning can also produce next-generation applications on IoT devices, which
can perform complex sensing and recognition tasks [19, 20].
However, to the best of our knowledge, it is not common to use deep learning
55 to detect software vulnerabilities. Deep learning is mainly used for software
defect prediction, like software language modeling [21], code cloning detection
[22], API learning [23], binary function boundary recognition [24], and malicious
URLs, file paths detection and registry keys detection [25], which is different
from software vulnerabilities detection. Li et al.[26] used deep learning model
60 BLSTM to detect software vulnerabilities and can achieve an accuracy of 0.949.
The work is also used as a comparative experiment of our proposed method.
In response to the above reality and to reduce false positive rates and false
negative rates, deep learning based static taint analysis approach is proposed
in this paper. The designed IoT software vulnerability location system can
65 enhance the automation level and accuracy of software vulnerability discovery
and location, which uses the patch-based taint propagation path method and
deep learning-based vulnerability discovery and location method.
Our contributions. This paper introduces deep learning based static taint
4
analysis approach for IoT software vulnerability. Three are three of our main
70 contributions as follows.
First, we propose three taint selection principles to determine the original
taints. The first is to select the variables shared by the deleted and added lines
in the diff file, the second is to select the parameters of the known vulnerability
function or the ordinary function, and the third is to select the restricted variable
75 in the if conditional statement.
Second, we propose the taint weight calculation method to select taint with
high weight. A large number of initial taints are generated using the three
taint selection strategies, but many initial taints do not actually trigger the
vulnerabilities. In order to further improve the accuracy of the selected taints,
80 further taint screening is performed in combination with the frequency of real
taints.
Third, we develop the deep learning-based IoT software vulnerability loca-
tion system and evaluate its effectiveness using the Code Gadget Database.
Paper organization. The rest of the paper is organized as follows. Sec-
85 tion 2 introduces related work about software vulnerability detection. Section
3 presents some preliminaries. Section 4 elaborates the proposed deep learning-
based vulnerability location approach in detail. Section 5 describes our experi-
mental evaluation and results. Section 6 concludes the present paper.
2. Related work
90 In this section, how traditional software complexity and quality metrics (such
as entropy) contribute to software vulnerability analysis and studies related to
software vulnerability detection and location are discussed. Some latest devel-
opment in related fields is also tracked.
There are three parameters to decide the software complexity, which include
95 overall complexity, input and output complexity, and rectification complexity.
The indicators of overall complexity include the number of code lines, the num-
ber of functions, the number of code lines declaring functions and variables, the
5
complexity of some key algorithms, the complexity of circles and the number of
recursive calls layers. The input and output complexity contains metrics such
100 as global variables used by functions, parameters of functions, heaps and stacks
of function calls. Intuitively, the rectification complexity is the number of code
lines that are annotated.
The quality indicators of the software are as follows:
1) The number of bugs in each code segment/module/time period. Coverity
105 and Checkmarx use this indicator to judge their ability to detect vulnerabilities.
2) Code coverage (the proportion and extent to which the source code was
tested). HP Fortify uses this indicator to judge its ability to detect vulnerabili-
ties.
3) Designing/Development constraints (number of methods/properties in a
110 class).
4) Software complexity.
In summary, the software complexity and quality indicators contribute to
the vulnerability analysis in three main points: pre-judging and cost controlling
for the limitations of vulnerability recurrence, adjusting and improving the vul-
115 nerability tracking solution, and evaluating and repairing of the vulnerability
repair strategy.
Flawfinder [27] is an open-source code analysis tool, which primarily makes
simple text pattern matching with a built-in database of C/C++ functions to
discover well-known problems. However, Flawfinder has high false negative rates
120 since it doesn’t do control flow or data flow analysis. In order to improve univer-
sality, Rough Auditing Tool for Security(RATS) [28] provides a list of potential
trouble spots with C, C++, Perl, PHP, and Python source code. It checks for
risky built-in/library function calls by the rules of RATS. Unfortunately, RATS
has high false negative rates and false positive rates since it performs only a
125 rough analysis of source code. Moreover, manual inspection is still necessary
under the aid of RATS. To support interactive programming environments in
real-time, ITS4 [29] uses a parse tree generated with a context-free parser to
represent the program. It breaks a non-preprocessed file into a series of lexical
6
tokens and then matches them to the vulnerability database. However, ITS4
130 has high false positive rates since it cannot understand the program context.
CxSAST from Checkmarx [30] is an accurate and flexible source code anal-
ysis solution, which is fluent in all major languages. Checkmarx uses a unique
lexical analysis technique and CxQL patent query technology to perform static
analysis. However, Checkmarx has high false negative rates. In order to im-
135 prove detection accuracy and reduce cost, Coverity [31] offers integrations with
key development tools and CI/CD systems. Moreover, Coverity supports multi-
ple programming languages and frameworks. Compared to other static analysis
tools, Coverity has the following characteristics: providing deep, full path cover-
age accuracy; using interprocedural analysis. But, it also has high positive rates.
140 For example, Coverity may report a risk when the pointer pN ext does pN ext++
operation without being assigned or assigned a value of N U LL. However, if the
pN ext pointer is assigned in a while loop below pN ext = N U LL, this report
can be ignored. To reduce time-consuming and effort-consuming, HP Fortify
[32] statically analyzes the source code through the built-in five main analysis
145 engines: data flow, semantics, structure, control flow, configuration flow, etc.
Unfortunately, it cannot effectively locate the location of the vulnerability. The
comparison of mainstream commercial static analysis tools is shown in Table 1.
Neuhaus et al. [33] have developed Vulture, which is used to mine a vulnera-
bility database, a version archive and a code base, and map past vulnerabilities
150 to components. Vulture [33] is able to predict vulnerabilities of new compo-
nents based on their imports and function calls [34]. However, such fine-grained
relationships of Vulture is still at the component level. Based on the empiri-
cal study of 3, 241 Red Hat packages, Neuhaus et al. [35] used support vector
machines on Red Hat dependency data to predict vulnerable packages [36]. To
155 further reduce the vulnerability analysis granularity, Yamaguchi et al. [37] em-
bedded code in a vector space and automatically determined API usage patterns
using the machine learning. However, the false negative rate also exists. Ya-
maguchi et al. [38] extracted abstract syntax trees from the source code and
searched for vulnerabilities based on the idea of vulnerability extrapolation, but
7
Table 1: The comparison of mainstream commercial static analysis tools.
Tool name Checkmarx [30] Coverity [31] HP Fortify [32]
platform Windows multi-platform multi-platform
program
multi-language C/C++, Java multi-language
language
vulnerability
multiple multiple multiple
types
development
Checkmarx Coverity HP
agency
release
2003 2002 2012
time
data flow,
semantics,
Lexical Analysis SAT engine
key structure,
and CxQL and software
technology control flow,
Patent Query DNA map
configuration
flow, etc.
8
160 they cannot identify vulnerabilities automatically. Grieco et al. [15] proposed
a machine-learning-based approach to discover software vulnerability through
lightweight static and dynamic features. Unfortunately, the test prediction er-
ror is high as well. Li et al. [39] presented an automatic software vulnerability
detection system, Vulnerability Pecker (VulPecker). VulPecker [39] generated
165 the signature of the target program and then detected vulnerability using code-
similarity algorithms. However, the effectiveness of the approach needs to be
further improved due to its some heuristics. Kim et al. [16] proposed a scal-
able approach for vulnerable code clone discovery, VUDDY, which leverages
function-level granularity and a length-filtering technique to reduce the num-
170 ber of signature comparisons. However, VUDDY [16] focuses on the vulnerable
discovery of code clone.
The academic community also has a lot of articles about vulnerability detec-
tion and location, which promotes the development of this field. Huang et al.
[40] proposed a new automatic vulnerability classification model (TFI-DNN).
175 The proposed TFI-DNN model outperformed others in accuracy, precision, and
F1-score and performed well in recall rate. It was also superior to SVM, Nave
Bayes and KNN on comprehensive evaluation indexes. Jurn et al. [41] proposed
a Hybrid Fuzzing method based on a binary complexity analysis and intro-
duced an automatic patch technique modifying the PLT/GOT table to translate
180 vulnerable functions into safe functions. The experimental results showed the
proposed model has good performance in open-source binaries. Spanos et al.
[42] proposed a model combined with text analysis and multi-target classifica-
tion techniques to estimate the vulnerability characteristics. They considered
the vulnerability characteristics as a vector of six targets and estimated these
185 characteristics using multi-target classification. Experimental results showed
that the proposed methodology could achieve comparable results. Aakanshi
et al. [43] proposed a mathematical model to predict the bad smells using
the Information Theory. Bad smells were collected using the detection tool
from sub-components of the Apache Abdera project, and different measures
190 of entropy (Shannon, Rnyi, and Tsallis entropy) were used to identify bad s-
9
mells. The experimental results showed that all three entropy approaches are
sufcient to predict the bad smells in software. Madhu et al. [44] proposed bug
dependency-based mathematical models by considering the summary descrip-
tion of bugs and comments submitted by users in terms of the entropy-based
195 measures. The models mainly followed exponential, S-shaped or mixtures of
both types of curves. But some improvement work could be done in the area
of the summary entropy and comment entropymetric-based models using other
project data to make it general.
In addition to the above scholars, there are still some people who have made
200 their own contributions in this field or similar fields. StarWarsIV, Ali Has-
san et al. [45] proposed Hybrid Adaptive Bandwidth and Power Algorithm,
and Delay-tolerant Streaming Algorithm to signicantly optimizes power drain,
battery lifetime, standard deviation. Ali et al. [46] proposed an optimization
scheme aiming at achieving the customer experience quality of vehicle Internet.
205 Abdul et al. [47] evaluated the quality of service computing in health care ap-
plications, proposed AQCA algorithm which is more suitable for the quality of
service computing, and analyzed the impact of each QoE parameter on medical
data processing by estimating QoE perception. Sandeep et al. [48] improved
and managed M-QoS by prioritizing telemedicine services using a decisive and
210 intelligent tool called Analytic Hierarchy Process (AHP). Hina et al. [49] pre-
sented a detailed survey about how 5G has revolutionized medical healthcare
with the help of IoT for enhancing quality and efficiency of the wearable de-
vices. Also, state-of-the-art 5G-based sensor node architecture was proposed for
the health monitoring of the patients with ease and comfort. Ali Hassan et al.
215 [50] proposed a novel joint transmission power control (TPC) and duty-cycle
adaptation based framework, adaptive energy-efficient transmission power con-
trol (AETPC) algorithm a Feedback Control-based duty-cycle algorithm and
system-level battery and energy harvesting models to minimize charge and en-
ergy depletions of the wearable devices. Ali Hassan et al. [51] proposed a
220 forward central dynamic and available approach (FCDAA), a system-level bat-
tery model, a data reliability model for edge AI-based IoT devices over hybrid
10
TPC and duty-cycle network to use resources appropriately. Ali Hassan et al.
[52] proposed a novel energy-efficient adaptive power control (APC) algorithm
to overcome the problem that a constant transmission power and a typical con-
225 ventional transmission power control (TPC) methods are not suitable choices
for WBAN for the large temporal variations in the wireless channel. Muham-
mad et al. [53] put forward the Wireless Body Sensor Networks which provides
ways to monitor individual activity in a variety of scenarios. YezhiLin et al.
[54] developed an efficient, simple and unified way to increase the potential
230 speed of the multicore-system. Chandio et al. [55] proposed a system, which
is named integration of inter-connectivity of information system (i3) based on
service-oriented architecture (SOA) with web services, to monitor and exchange
students information. Lodro et al. [56] proposed the channel modeling of 5G
mmWave cellular communication for urban microcell which was simulated in
235 LOS condition at operating frequency of 28 GHz with multiple antenna ele-
ments at transmitter and receiver. Different parameters affecting the channel
had been considered in simulation using NYUSIM software.
3. Preliminaries
11
to different program characteristics. Common explicit stream taint propagation
methods include direct assignment propagation, propagation through function
(procedure) calls, and propagation through aliases (pointers).
In recent years, researchers have developed a number of tools to conduct taint
255 analysis on other languages like java, but there are only a few tools available for
C/C++. Some famous open-source tools like Saint[58] proposed in 2015 and
Tanalysis [59], built as a plugin for the Frama-C platform, now are no longer
available. Tools are still available such as Marcelo [60], which modifies the clang
static analyzer to perform static taint analysis, but clang has disadvantages of
260 not being able to analyze multiple source files, and it does not have access to the
LLVM which can help with analysis. Lacking of an extensible and configurable
static taint analysis tool is an open opportunity ignored by academia.
3.2. CNN-BLSTM
Neural networks [61] have achieved great success in image processing [62],
265 speech recognition, and NLP [63], but they are rarely used in vulnerability de-
tection. It means that many neural network models may not be suitable for vul-
nerability detection. Therefore, some principles are needed to guide the selection
of neural network models for vulnerability detection. Whether the vulnerability
is included in the code is determined by the context, so a neural network that
270 is able to handle the context can be used for vulnerability detection [64]. The
neural network used for NLP also needs to consider the context. It is feasible
to use the deep learning model to conduct software vulnerability discovery and
location [26]. The structure of Neural networks includes convolution layer and
pooling layer.
275 Convolution layer: the input to each node in the convolutional layer is just
a small piece of the upper layer of the neural network, which we usually call the
kernel. The convolutional layer attempts to further analyze each small block
i
in the neural network to obtain more abstract features. Assuming that wx,y,z is
used to represent the ith node in the output unit node matrix, ax,y,z is used to
280 indicate the the weight of filter input node (x,y,z), bi is used to represent the
12
offset term coefficient corresponding to the ith output node, then the value of
the ith node in the identity matrix g(i) is defined as for formula (1).
X
a X
b X
c
i
g(i) = f ( ax,y,z × wx,y,z + bi ) (1)
x=1 y=1 z=1
Pooling layer: it can effectively reduce the size of the matrix, thus reducing the
parameters in the final full connected laye. Using the pooling layer can both
285 speed up the calculation and prevent over-fitting problems. The calculation of
the pooling layer in the filter is not a weighted sum of nodes, but a simpler
maximum or average operation.
Recurrent Neural Network (RNN)[65] is used to mine the time series infor-
mation in the data and the deep representation of semantic information. It is
290 often used in speech recognition, language modeling, machine translation, and
timing analysis. RNN differs from ordinary fully connected neural networks in
that the nodes between the hidden layers of the RNN are connected. The input
of the hidden layer includes not only the output of the input layer but also the
output of the hidden layer at the previous moment.
295 Long Short Term Memory (LSTM) [66] is a special type of RNN, which
can learn long-term dependency information. LSTM is different from standard
RNN, which has four different structures that interact in a very special way.
LSTM is a special network structure with three ”gate” structures. Bidirectional
Long Short Term Memory (BLSTM) [67] uses a two-way structured LSTM
300 model, taking into account the impact of context on the structure.
Output:IoT software
are vulnerable or not
and vulnerability type
Convolution layer SOFT
... MAX
FC
BLSTM
13
In addition, the RNN model has a Vanishing Gradient problem [68], which
may lead to invalid model training. The Vanishing Gradient problem is solved
with the idea of memory cells into RNNs (including LSTM and GRU), but
LSTM is one-way and is not enough to detect software vulnerability (function
305 parameters may be affected by the previous statement may also be affected
by the following statements). Therefore, it is feasible to use the BLSM model
to conduct software vulnerability discovery and location. To further improve
detection accuracy, CNN-BLSTM neural network is applied in the paper. The
input data size is 100 ∗ 150. After the convolution layer and the pooling layer,
310 the data size is 9 ∗ 128. After the LSTM layer, the data size is 64. Finally, the
data is classified by the fully connected layer. Figure 2. shows the structure
of CNN-BLSTM neural network, which has a convolution layer, a max pooling
layer, a number of BLSTM layers, a fully connected (FC) layer and a softmax
layer.
14
Source code
*.cpp/*.c Taint selection
Patching comparison principles
Static taint analysis
output
Component
Using difflib to obtain Diff file between the
I:Patching
source program and the patched program
comparison
15
330 Diff file is input to the static taint analysis module. The existing source code
comparison tools include DiffMerge [69], Textdiff [70], Meld [71], Git diff [72]
and so on. Most of these open-source tools have a graphical interface. If we call
them directly in the source code, it will take a lot of time to operate manually.
In addition, the source code of most tools has runtime errors, which is not easy
335 to use. As the difflib package of python can achieve the same functions as these
tools, this package is directly used in the paper to obtain Diff file with different
marks.
345 1. Selecting the common variables in the deleted and added rows;
2. Selecting the parameters of the known vulnerability function or the ordi-
nary function;
3. selecting the restricted variable in the if conditional statement.
According to the principles of taint selection, the appropriate taints are ini-
350 tially selected, but many initial taints do not actually trigger the vulnerabilities.
In order to further improve the accuracy of the selected taints, we then rank
taints based on the taint weight calculation method, which is as follows: 1) If
a taint is a parameter of the CWE-119 or CWE-339 vulnerability correlation
function, the tainted weight is 1; 2) If a taint is a parameter of the ordinary
355 function, the tainted weight is 2; 3) If a taint is bound by an if statement, the
tainted weight is 3; 4) Otherwise, the tainted weight is 4. Finally, we generate
taint propagation paths based on static taint analysis and the line number at
which the taint first appears.
16
Source program
Patched program
taint weight calculation
method
17
4.3. Taint propagation paths transforming
360 There are two steps in this module. The first step is transforming taint
propagation paths into symbolic representation, and the second step is encoding
the symbolic representation into vectors. The output of the module is the input
of IoT software vulnerability location module.
18
the number of times the word appears from large to small, the first one is 1,
390 and this is recursive. A numerical representation of the dataset sample is got by
using keras.preprocessing.text.text to word sequence participle. The special
symbol in the participle process should be ignored, such as “!”#$%&()*+,-
./:;<=>?@[]ˆ ’{|}∼ \t\n”.
19
”ants”, output word: ”car”).
300 features
Word vector for Āantsā Probability that if you
= randomly pick a word
X Softmax
nearby Āantsā, that it
300 features is Ācarā
This module has two parts, the first part is the training phase and the second
part is the testing phase.
1) The training phase: Generating Diff file between the source program and
patched program, selecting taint according to taint selection principles and tain-
425 t weight calculation method; getting taint propagation paths using static taint
analysis; transforming taint propagation paths into certain symbolic represen-
tations; encoding taint propagation paths in the symbolic representation into
vectors; training a CNN-BLSTM neural network. The trained CNN-BLSTM
neural network is shown in Figure 7.
430 2) The detection phase: Given one or multiple target programs, Diff files are
generated between source programs and patched programs. Taints are labeled
using the taint selection principles and taint weight calculation method, and
taint propagation paths are obtained based on the taints labeled in the previous
step. Taint propagation path is transformed into symbolic representation and
435 encoded by Word2Vec. At last, the lines of code where the vulnerability exists
are located by applying the trained CNN-BLSM model.
The pseudo code of our proposed method is as follows:
20
21
Figure 7: The trained CNN-BLSTM neural network.
pseudo code of our proposed method
Get the diff file between the source file and the file after patching through
difflib
for diff in diff file:
Remove comments (// & /*...*/) and import header files (beginning
with #)
Select variables that are shared in the difference row;
Remove keywords
if there are taints in the function line:
if taint in sensitive function:
for taint in sensitive function from VulDeePecker [26]:
The vulnerability exists in the line
Extract the taint propagation path of the variable
break
else:
for taint in normal function
Extracting the taint propagation path of the
variable
break
elif taint in if statement:
for taint in taint which in if statement
Extracting the taint propagation path of the variable
break
else
for taint in taints
Extracting the taint propagation path of the variable
break
if training
Establish a CNN-BLSTM network, and use the extracted taint
propagation path to train the network to obtain a model
elif testing
Using the obtained model to22detect whether the taint propagation path
is a vulnerability and a related type;
5. Experimental evaluation and results
440 In this section, the dataset is illustrated to verify the validity of the proposed
approach. The experimental setting and evaluation metrics are given, and then
the experimental results are analyzed.
23
Table 3: Experimental parameters settings.
Parameter Description Value
size of the program
max num len 20000
word dictionary
maximum length of taint
max sequence len 100
propagation path fragment
W ord2V ec : size word vector dimension 150
f it : batch size The size of batch 32
LST M : dropout parameters used to
0.2
&Dense : dropout prevent overfitting
pool size size of pool window 3
kernel size size of convolution window 3
f it : nb epoch the frequency of training 5
whether data is returned
LST M : return sequences T rue
at each time step
loss loss function binary crossentropy
optimizer optimization function adam
24
The minimum length of taint propagation paths in the training set is 2, the
largest length is 2, 698, and the average length is 48. The reasonable parameter
465 value of max sequence len is 100. For English text, the embedding length
is usually 150. Moreover, the larger the embedding length is, the larger the
computational overhead is. Therefore, the value of parameter embedding dim
is 150. The value of dropout is normally set to 0.5. However, because the
number of training is 5, the result shows that there will be no overfitting. Thus,
470 the value of the parameter dropout is set as 0.2. Regarding the parameter
return sequences, by setting this parameter to True, the result is output at each
time step in the LSTM, and finally, all the outputs are stitched together. The
final result contains each time step information, and the test result also indicates
that its better to set the value of the parameter True. Since the task is a two-
475 category task, the output of the final model should use the sigmoid activation
function, and the corresponding loss function should be binary crossentropy.
Whats more, commonly used optimization algorithms such as sgd are prone to
get into minimum values, so the adam optimization algorithm is used, and the
speed is faster and the gradient descent process is smoother.
The following evaluation metrics are chosen to evaluate the proposed IoT
software vulnerability location systems based on patching comparison. T P is
the number of normal programs correctly labeled as normal, F P is the number
of programs with vulnerabilities labeled as normal programs, F N is the number
485 of normal programs labeled as programs with vulnerabilities, and T N is the
number of programs with vulnerabilities correctly detected. F PO and F NO
indicate the FP and FN of other deep learning models, like RNN, LSTM, and
BLSTM. T PCB and F NCB indicate the FP and FN of CNN-BLSTM model,
respectively.
490 Accuracy = (T P + T N )/(T P + T N + F P + F N )
F PI = (F PO − T PCB )/T PCB
F NI = (F NO − F NCB )/F NCB
25
In general, the higher the Accuracy, the better the recognition effect. The
values of F PI and F NI are positive, indicating that CNN-BLSTM model is
495 better; otherwise other models are better.
Some experiments are performed to evaluate the performance of the proposed
approach for detecting IoT software CEW-399/CWE-119 vulnerabilities. The
experiments mainly include the comparison of CNN-BLSTM with other deep
learning models, such as RNN, LSTM, and BLSTM.
26
Table 4: Experimental results of different models for CWE-399 identification.
Model RNN LSTM BLSTM CNN-BLSTM
TN 1239 1362 1419 1416
FP 210 87 30 33
FN 159 76 107 89
TP 2769 2582 2821 2839
Accuracy 0.9157 0.9628 0.9687 0.9721
F PI 5.364 1.636 −0.09 1
F NI 0.786 −0.146 0.2 1
of the RNN is 0.9098. RNN has the worst recognition effect because of its
gradient disappearance problem. LSTM performs worse than BLSTM, mainly
because BLSTM considers the factors before and after. As can be seen from
525 Table 5, the CNN-BLSTM model has a lower false positive rate than other deep
learning models, but its false negative rate is indeed higher than the LSTM and
BLSTM models. What’s more, as seen from F PI and F NI , CNN-BLSTM has
been greatly enhanced on RNN. Although there are negative numbers in F NI
of LSTM and F NI of BLSTM, the improvement in F PI is greater, and the
530 CNN-LSTM model is still better overall.
Finally, we trained a total of 31, 802 samples using CNN-BLSTM and spent
3097.2 seconds. And we tested a total of 7950 samples using CNN-BLSTM and
spent 34.2 seconds.
6. Conclusion
535 In recent years, various kinds of commercial software have been frequently
exposing vulnerabilities, which seriously affect the enterprise’s security. Thus,
the security of third-party applications has received much attention. Existing
dynamic detection methods consuming a lot of CPU resources, and the level of
automation is low. This work uses static analysis and deep learning algorithms
540 to automatically locate vulnerabilities. The proposed approach generates the
27
Table 5: Experimental results of different models for CWE-119 identification.
Model RNN LSTM BLSTM CNN-BLSTM
TN 5170 5763 5833 5954
FP 1183 482 454 306
FN 969 314 274 334
TP 16529 17292 17290 17257
Accuracy 0.9098 0.9666 0.9695 0.9732
F PI 2.866 0.575 0.484 1
F NI 1.9 −0.06 −0.18 1
Diff file between source code and patched program, labels taint sources accord-
ing to the designed taint selection principles, obtains the lines where taints first
appear and taint propagation paths using static taint analysis, transforms tain-
t propagation paths into symbolics, encodes symbolic into vectors, discovers
545 CWE-119/CWE-399 vulnerabilities based on trained CNN-BLSM model and
finds their lines. The vulnerability locator based on deep learning is evaluat-
ed on a dataset consisting of 17, 725 programs with vulnerabilities and 43, 913
benign programs. Experimental results show that the proposed approach can
achieve an accuracy of 0.9732 for CWE-119 and 0.9721 for CWE-399, which is
550 higher than that of the other three models (the accuracy of RNN, LSTM, and
BLSTM is under than 0.97).
In the future, our work can be applied to industrial control security, smart
car security, smart home security, and other fields to ensure the safety of the
Internet of Things equipment. Our work can be used as a detection system to
555 detect these devices before they leave the factory, or as a chip embedded in the
Internet of Things device to detect.
7. Acknowledgment
We thank the anonymous reviewers for their comments that helped us im-
prove the paper. This work was supported in part by in part by the National
28
560 Key R&D Plan under Grant CNS 2016QY06X1205, in part by the Basic research
business fees of central colleges under Grant CNS 20826041B4252, in part by
the National Natural Science Foundation (NSFC) under Grant CNS 61572115,
and in part by the Science and Technology Project of State Grid Corporation
of China under Grant CNS 522722180007. Any opinions, findings, conclusions
565 or recommendations expressed in this material are those of the authors and do
not reflect the views of the funding agencies.
References
[5] X. Du, H.-H. Chen, Security in wireless sensor networks, IEEE Wireless
585 Communications 15 (4) (2008) 60–66.
29
in: Proceedings of the Seventh International Conference on the Internet of
Things, ACM, 2017, p. 13.
590 [7] C. Kolias, G. Kambourakis, A. Stavrou, J. Voas, Ddos in the iot: Mirai
and other botnets, Computer 50 (7) (2017) 80–84.
[9] T. Farzana, A. Babu, A light weight plgp based method for mitigating
vampire attacks in wireless sensor networks, Int. J. Eng. Comput. Sci 3 (7).
[14] W.-C. Lin, S.-W. Ke, C.-F. Tsai, Cann: An intrusion detection system
based on combining cluster centers and nearest neighbors, Knowledge-based
610 systems 78 (2015) 13–21.
30
615 [16] S. Kim, S. Woo, H. Lee, H. Oh, Vuddy: A scalable approach for vulnerable
code clone discovery, in: 2017 IEEE Symposium on Security and Privacy
(SP), IEEE, 2017, pp. 595–614.
[18] F. Wu, J. Wang, J. Liu, W. Wang, Vulnerability detection with deep learn-
ing, in: 2017 3rd IEEE International Conference on Computer and Com-
munications (ICCC), IEEE, 2017, pp. 1298–1302.
[20] H. Li, K. Ota, M. Dong, Learning iot in edge: Deep learning for the internet
of things with edge computing, IEEE Network 32 (1) (2018) 96–101.
[23] X. Gu, H. Zhang, D. Zhang, S. Kim, Deep api learning, in: Proceedings of
the 2016 24th ACM SIGSOFT International Symposium on Foundations
of Software Engineering, ACM, 2016, pp. 631–642.
31
[25] J. Saxe, K. Berlin, expose: A character-level convolutional neural network
with embeddings for detecting malicious urls, file paths and registry keys,
645 arXiv preprint arXiv:1702.08568 (2017) 1–18.
[26] Z. Li, D. Zou, S. Xu, X. Ou, H. Jin, S. Wang, Z. Deng, Y. Zhong, Vuldeep-
ecker: A deep learning-based system for vulnerability detection, arXiv
preprint arXiv:1801.01681 (2018) 1–15.
32
[35] S. Neuhaus, T. Zimmermann, C. Holler, A. Zeller, Predicting vulnerable
670 software components, in: ACM Conference on computer and communica-
tions security, Citeseer, 2007, pp. 529–540.
[39] J. Li, M. D. Ernst, Cbcd: Cloned buggy code detector, in: Proceedings of
the 34th International Conference on Software Engineering, IEEE Press,
2012, pp. 310–320.
685 [40] G. Huang, Y. Li, Q. Wang, J. Ren, Y. Cheng, X. Zhao, Automatic clas-
sification method for software vulnerability based on deep neural network,
IEEE Access 7 (2019) 28291–28298.
33
[44] M. Kumari, A. Misra, S. Misra, L. Fernandez Sanz, R. Damasevicius, V. S-
ingh, Quantitative quality evaluation of software products by considering
summary and comments entropy of a reported bug, Entropy 21 (1) (2019)
91.
34
[52] A. H. Sodhro, Y. Li, M. A. Shah, Energy-efficient adaptive transmission
725 power control for wireless body area networks, IET Communications 10 (1)
(2016) 81–90.
740 [57] I. Medeiros, N. Neves, M. Correia, Detecting and removing web application
vulnerabilities with static analysis and data mining, IEEE Transactions on
Reliability 65 (1) (2016) 54–69.
[58] X. N. Noundou, Saint: Simple static taint analysis tool users manual,
https://archive.org/details/saint (2015).
35
[62] H. Xu, C. Huang, D. Wang, Enhancing semantic image retrieval with lim-
ited labeled examples via deep learning, Knowledge-Based Systems 163
(2019) 252–266.
760 [65] H. Liu, B. Lang, M. Liu, H. Yan, Cnn and rnn based payload classi-
fication methods for attack detection, Knowledge-Based Systems (2018)
S0950705118304325–.
[67] Z. Li, D. Zou, S. Xu, H. Jin, Y. Zhu, Z. Chen, S. Wang, J. Wang, Sysevr: a
framework for using deep learning to detect software vulnerabilities, arXiv
preprint arXiv:1807.06756 (2018) 1–13.
36
[73] C. McCormick, Word2vec tutorial - the skip-
gram model, http://mccormickml.com/2016/04/19/
780 word2vec-tutorial-the-skip-gram-model/ (2017).
37
Training program(source programs and patched
programs)
Component
Using difflib to obtain Diff file between the
I:Patching
source program and the patched program
comparison