0% found this document useful (0 votes)
166 views7 pages

Data Mining Approach For Cyber Security

This document discusses using data mining techniques for cyber security purposes, specifically malware detection. It outlines that data mining can be used to analyze patterns in large datasets to detect cyber threats like malware, denial of service attacks, and ransomware. The document then focuses on how classification techniques in data mining can be used to detect malware by analyzing the behavioral patterns of computer programs and flagging those with abnormal or malicious behavior. Specific statistics on malware prevalence and distribution methods are also presented.

Uploaded by

38 keerti Toravi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
166 views7 pages

Data Mining Approach For Cyber Security

This document discusses using data mining techniques for cyber security purposes, specifically malware detection. It outlines that data mining can be used to analyze patterns in large datasets to detect cyber threats like malware, denial of service attacks, and ransomware. The document then focuses on how classification techniques in data mining can be used to detect malware by analyzing the behavioral patterns of computer programs and flagging those with abnormal or malicious behavior. Specific statistics on malware prevalence and distribution methods are also presented.

Uploaded by

38 keerti Toravi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/348282335

Data Mining Approach for Cyber Security

Article  in  International Journal of Computer Applications Technology and Research · January 2021


DOI: 10.7753/IJCATR1001.1007

CITATIONS READS
2 4,195

3 authors:

Varsha Desai Kavita Oza


V.P.Institute of Management Studies & Research, Sangli Shivaji University, Kolhapur
18 PUBLICATIONS   12 CITATIONS    99 PUBLICATIONS   175 CITATIONS   

SEE PROFILE SEE PROFILE

Poornima Naik
CHHATRAPATI SHAHU INSTITUTE OF BUSINESS EDUCATION AND RESEARCH
138 PUBLICATIONS   139 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Online Entrance Application Management System View project

On-line Generic Elective Selection System for CSIBER View project

All content following this page was uploaded by Varsha Desai on 20 January 2021.

The user has requested enhancement of the downloaded file.


International Journal of Computer Applications Technology and Research
Volume 10–Issue 01, 35-40, 2021, ISSN:-2319–8656

Data Mining Approach for Cyber Security


Varsha P.Desai Dr.K.S.Oza Dr.P.G.Naik
Assistant Professor Assistant Professor Professor
Department of Computer Studies Department of Computer Science Department of Computer Studies,
VPIMSR, Sangli Shivaji University, Kolhapur CSIBER, Kolhapur
India India India
Abstract:

Use of internet and communication technologies plays significant role in our day to day life. Data mining capability is leveraged
by cybercriminals as well as security experts. Data mining applications can be used to detect future cyber-attacks by analysis,
program behavior, browsing habits and so on. Number of internet users are gradually increasing so there is huge challenges of
security while working in the cyber world. Malware, Denial of Service, Sniffing, Spoofing, cyber stalking these are the major cyber
threats. Data mining techniques are provides intelligent approach for threat detections by monitoring abnormal system activities,
behavioral and signatures patterns. This paper highlights data mining applications for threat analysis and detection with special
approach for malware detection with high precision and less time.

Keywords: Malware, Data Mining, Cyber-attack, Cyber Threat, Ransomware.

1] Introduction: intelligence based methods like neural network, fuzzy logic,


genetic algorithms, deep learning are used for complex data
Data mining techniques are implemented to extract hidden analysis and prediction of hidden interesting patterns from
patterns from data. It is scientific research method for complex real time database.
analysis, prediction and determine complex relationship
Data mining techniques provide systematic approach for
between hidden patterns from large volume of data.
discovering vulnerabilities, detection of threats, system
Knowledge discovery in databases (KDD) process consist of
loopholes, monitoring intruder’s behavior and pattern.
data preprocessing, data cleaning, transformation, mining
Passive attack signatures like scanning open network ports,
and pattern evaluation. In data mining classification of data
eavesdropping, phishing, sniffing these passive attacks can
into predefined labeled classes called as supervised leaning.
be identified by using data mining algorithms. Whereas the
Extracting similar behavioral patterns into different clusters
active attack signatures like Denial of service attack,
form huge dataset called as unsupervised learning. The
malware detection, ransomware detection is possible
gaming technique of data mining where machine learning
through data mining and artificial intelligent techniques.
model is trained to take sequence of complex decisions in
Machine learning technique potentially implemented for
uncertain environment as per reward or punishments for
intrusion prevention system for identifying tricks and
specific moves called as reinforcement learning.
methods used by intruder as well as finding vulnerabilities,
Association, classification, clustering, regressions, decision
recording footprints of attack on specific network.
tree, Naïve Bayes, Support vector machine, sequence
mining, time series analysis are the basics techniques of data In supervised approach of data mining target variables can
mining. Appropriate selection and implementation of data be determined according to IP address location, frequencies
mining technique is depends on the type of data, size of data, of web requests and time of requests. Machine learning
complexity and outcome of prediction etc. Artificial model used to predict particular IP address is a part of which

www.ijcat.com 35
International Journal of Computer Applications Technology and Research
Volume 10–Issue 01, 35-40, 2021, ISSN:-2319–8656

attack signature. Implementation of liner and logistic 3.1 Malware detection using Data Mining:
regression, decision tree, support vector machine algorithms
are used in supervised learning. Malicious computer program which causes abnormal
behavior of computer applications though Virus, Trojan’s,
In unsupervised approach of machine learning there is no Warms called as malware. Using classification techniques in
prediction of target variables while finding association data mining malware can be detected and reported to the
between different patterns in datasets. Computer programs system administrator. Malware attack on system due to
such as malware having similar operating behavioral pattern surfing infected websites, games or free apps download,
using clustering & association algorithm. download infected music files, installation of software
application extensions, plugins or toolbar and so on. It is
2] Research Design:
important to read warning messages before downloading any
application, especially permissions while accessing email or
2.1 Type of the research: In the backdrop of above
personal data.
discussion the present research is an attempt to explore
certain key aspects of cyber security. Hence the type of the 3.2 Malware Statistics: As per the research it is
research adopted in this present endeavor is descriptive
found that 80% damage to the system is due to malware
research. [3].
attacks It is found that 92% malware delivered through

2.2. Objective of study: email attachments. Mobile malware infection increase 54%
from year 2018. Overall 98% malware targeted android
To study data mining techniques for malware detection. devices. 99% malware entered through third party app
downloads. Out of 10 payloads 7 are ransomware. Overall
2.3 Scope of the study: The research work is focuses
18 million websites are infected by malware in one week.
on study of cyber security, types of attacks, network 90% financial institutions are targeted of malware from
vulnerability, cyber threats and mechanism for malware 2018. 40% ransomware victim paid the ransom. More than
detection using data mining techniques. 50% ransomware attacks demands for bitcoin[17]

3] Result and Discussion:

Fig.1 Total Malware Infection Growth Rate (In Millions)[17]

www.ijcat.com 36
International Journal of Computer Applications Technology and Research
Volume 10–Issue 01, 35-40, 2021, ISSN:-2319–8656

Now a days Malware detection is an important challenge to execution for testing infected files through virtual
maintain integrity, confidentially and authentication, non- machine.[1] Malware are the malicious software code that
repudiation of data communicated over the internet. Data enters into system through spam mails, email attachments,
mining algorithms helps for early detection of malware as vulnerable services on internet, downloading process and
per their behavior and signature stored in database. browser extensions. This causes compromising computer
system, unauthorized access of personal data, crippling
3.3 Malware Detection: critical infrastructure, bringing down servers, stealing
system as well as network configuration information and so
In behavioral based malware detection both static and
on. Implementation of Future extraction, classification/
dynamic analysis techniques are used for classification of
clustering techniques of data mining are significant methods
program as malware. Static analysis for malware detection
for malware detection [2]. Following diagram shows process
works on binary code which is complex to analyze and detect
of malware detection using data mining.
malware. Dynamic analysis consist of runtime code

Fig.2. Malware Detection using Data Mining

www.ijcat.com 37
International Journal of Computer Applications Technology and Research
Volume 10–Issue 01, 35-40, 2021, ISSN:-2319–8656

When client machine is connected to internet during the dynamic method and the hidden code of packed malware are
scanning process machine data fetch to antimalware extracted by comparing runtime execution of malware and
program. This program start future extraction process by its instance is analyze through static model. Hidden files are
extracting attributes from different files to create dataset. detected by dynamic analyzer while unpacked file monitor
Virus files from corpus dataset is used to store virus by static model [2]
definitions. Static analysis, dynamic analysis as well as
hybrid analysis techniques are used for extracting features or
3.4 Techniques of Malware Detections:
patterns from data. IDA pro disassembler used to generate 3.4.1 Signature based malware detection:
assembly files. Abstract assembly files are generated by Signature database store malware footprints of previous
eliminating operands from assembly code for better results. attacks. When susceptible code is found it is tested by
Extract frequent instruction association from training extracting unique bytes sequence of code as a malware
dataset. Data mining techniques like classification, signature. If it matched with existing signature the report as
association rule mining mechanism using Apriori algorithm malware and pack malicious code file by anti-malware
are applied on training dataset based on their behavior or program. Here anti-malware program need to wait for
signature to generate frequent instructions from assembly.[5] signature until any device is victim of attack [4]. Data mining
Malware detection performance of algorithm is checked techniques like classification, regression are implemented
using statistical tools. Algorithm is trained until we get for categorization of threat as a malware using supervised
expected performance and finally build the model. This learning approach saves the time and improves the accuracy
trained model is applied on the testing dataset to detect and of prediction than traditional method. This method is easy to
report malware type and status information. run, comprehensive malware information, search and
broadly acceptable. [5] Signature database may bypass the
In static analysis technique of feature extraction PE files are
threat using some obfuscation, cryptography methods. [4] It
analyses without actual execution. Detection pattern of
fails to detect the polymorphic malware that replicating
statistical analysis in the form of windows API, N-grams,
information in the huge database. [5]
string, Opcodes or Control Flow Graph (CFG) techniques. It
is one of the useful technique to investigate or explore all
3.4.2 Behavior based malware detection:
[2].
possible execution methods paths in malware samples
Program behavior, speed of execution, response time,
Artificial neural network techniques is used to detect boot
browsing habits, cookies information, and kinds of
sector virus using N-gram method. Hidden dependencies
attachments as well as statistical properties helps to detect
between code sequences in the malware can be detected
abnormal behavior or malicious code. In behavior based
using API call method.
detection assembly features and API calls methods are
In Dynamic analysis debugging or profiling the code by applied using data mining algorithm. Unsupervised
actual execution of code at runtime. This process depend on techniques like clustering, SVM, nearest neighboring
variable value, program input, system configuration. This algorithms can be implemented for behavior analysis and
analysis mechanism is used for detecting new malware detection of hidden malware. This method helps to detect
definitions. Detection pattern of statistical analysis in the polymorphic malwares as well as detect data flow
form of debugger, simulator, emulator and virtual server dependencies in the malicious software program. More time
based environment [2]. and storage space is required to detect complex behavioral
pattern. Following table depicts data mining techniques for
Hybrid analysis techniques combines benefits of static and
malware detection:
dynamic analysis where packed malware first analyze using

www.ijcat.com 38
International Journal of Computer Applications Technology and Research
Volume 10–Issue 01, 35-40, 2021, ISSN:-2319–8656

Type of Malware Data Mining Techniques Data Analysis Method


Polymorphic Malware Detection[6] K-means Dynamic
Android Malware Detection[7][14] SVM, J48, Naïve Bayes Dynamic
API Malware Detection[8] Naïve Bays, SVM, Decision Tree, Random Forest Dynamic

N-gram Malware Detection[9] SVM, ANN Dynamic


Service Oriented Mobile Malware Naïve Bayes, Decision Tree Hybrid
Detection[10]

Sequential Pattern Malware Detection[11] All-Nearest-Neighbor, Hybrid


KNN, SVM

Multi-objective evolutionary Malware Genetic Algorithm Static


Detection[12]

Frequent Pattern Malware Detection[13] Graph Mining Static


Behavioral Malware Detection Regression, SVM, J48 Dynamic
Table 1: Data Mining Techniques for Malware Detection

Above table depicts different data mining techniques used genetic algorithm, deep learning mechanism provides
for malware detection according their signature and intelligent malware detection from behavior and signature
behavioral aspects. To extract hidden patterns from the data database.
static, dynamic and hybrid data analysis techniques are used
for improving accuracy of malware detection. It is the
References:
challenge for cyber security experts to select best algorithm
[1] Monire Norouzi, Alireza Souri, and Majid Samad Zamini
and data analysis techniques for finding the hidden threats
(2016),”A Data Mining Classification Approach for
and provide alerts to provide data from further attacks.
Behavioral Malware Detection”, Volume 2016, Journal of
Computer network and Communications.
Conclusion:

Due to globalization usage of internet and communication [2] Yanfang, Donald Adjeroh, et.al, (2017) “A Survey on
technology is drastically increase. Data leakage, insecure Malware Detection Using Data Mining Techniques”, ACM
Wi-Fi connections, lack of security awareness, hardware, Computing Surveys, Vol. 50, No. 3, Article 41.
software, network vulnerability are the major reasons for
[3] Rieck. K, Willems.T, et.al (2008), Learning and
cybercrime. To mitigate major risk of cyber-attacks like data
classification of malware behavior, 5th international
benches, ransomware attack, DDos attacks it is necessary to
conference on Detection of Intrusions and Malware, and
implement efficient as well as intelligent techniques for early
Vulnerability Assessment. Berlin, Heidelberg: Springer-
detection of cyber threats as a proper security solution.
Verlag, pp. 108–12.
Malware detection is one of challenge for security experts.
Data mining techniques like classification, SVM, regression, [4] Sara Najari, Iman Lotfi, (2014) “Malware Detection
decision tree, graph mining, KNN algorithms can be Using Data Mining Techniques”. International Journal of
integrated with anti-threat system helps to detect malware Intelligent Information Systems. Special Issue: Research and
before enters into system that leads to protect your IT Practices in Information Systems and Technologies in
infrastructure form further attack. Artificial neural network, Developing Countries. Vol. 3, No. 6-1, pp. 33-37.

www.ijcat.com 39
International Journal of Computer Applications Technology and Research
Volume 10–Issue 01, 35-40, 2021, ISSN:-2319–8656

[5] Raviraj Choudhary, Ravi Saharan (2012), “Malware


Detection Using Data Mining Techniques” International [12] Martín A, Menéndez HD, Camacho D (2016)
Journal of Information Technology and Knowledge MOCDroid: multi-objective evolutionary classifier for
Management, Volume 5, No. 1, pp. 85-88. Android malware detection. Soft Comput 21:7405–7415.

[6] Fraley JB, Figueroa M (2016) Polymorphic malware


[13] Hellal A, Romdhane LB (2016) Minimal contrast
detection using topological feature extraction with data
frequent pattern mining for malware detection. Comput
mining. In: SoutheastCon 2016, pp 1–7
Secur 62:19–32.

[7] Sun L, Li Z, Yan Q, Srisa-an W, Pan Y (2016) SigPID: [14] Bhattacharya A, Goswami RT (2017) DMDAM: data
significant permission identification for android malware mining based detection of android malware. In: Mandal JK,
detection. In: 2016 11th international conference on Satapathy SC, Sanyal MK, Bhateja V (eds) Proceedings of
malicious and unwanted software (MALWARE), pp 1–8 the first international conference on intelligent computing
and communication springer Singapore, Singapore, pp 187–
[8] Fan CI, Hsiao HW, Chou CH, Tseng YF (2015) Malware
194.
detection systems based on API log data mining. In: 2015
IEEE 39th annual computer software and applications [15] Norouzi M, Souri A, Samad Zamini M (2016) A data
conference, pp 255–260. mining classification approach for behavioral malware
detection. J Comput Netw Commun 2016:9.
[9] Boujnouni ME, Jedra M, Zahid N (2015) New malware
detection framework based on N-grams and support vector [16] Galal HS, Mahdy YB, Atiea MA (2016) Behavior-
domain description. In: 2015 11th international conference based features model for malware detection. J Comput Virol
on information assurance and security (IAS), pp 123–128 Hacking Tech 12:59–67. https://doi.org/10.1007/s11416-
015-0244-0.
[10] Cui B, Jin H, Carullo G, Liu Z (2015) Service-oriented
mobile malware detection system based on mining [17] Retrieved From: https://purplesec.us/resources/cyber-
strategies. Pervasive Mob Comput 24:101–116. security-statistics/ 22 Dec 2020, 1.30pm.

[11] Fan Y, Ye Y, Chen L (2016) Malicious sequential


pattern mining for automatic malware detection. Expert
System Application 52:16–25.

xxxxxxxxxxxxxxxxx

www.ijcat.com 40

View publication stats

You might also like