An Enhanced Graph-Based Semi-Supervised Learning Algorithm To Detect Fake Users On Twitter
An Enhanced Graph-Based Semi-Supervised Learning Algorithm To Detect Fake Users On Twitter
net/publication/334781666
CITATIONS READS
57 937
6 authors, including:
Karthik Subburathinam
SNS College of Technology
214 PUBLICATIONS 1,142 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
CFC : Call for book chapter "Cyber-Physical Systems for Industrial Transformation" by CRC Press – Taylor & Francis Group View project
Introduction to the special issue on deep learning for remote sensing environments View project
All content following this page was uploaded by Karthikeyan Natesapillai on 03 December 2019.
Abstract
Over the the past decade, social networking services (SNS) have proliferated on
the web. The nature of such sites makes identity deception easy, providing a fast
means for creating and managing identities, and then connecting with and deceiv-
ing others. Fake users are those accounts specifically created for purposes such as
stalking or abuse of another user, for slander, or for marketing. The current system
for detecting deception depends on behavioral, non-behavioral and user-generated
content (UGC) information gathered from users. Although these methods have high
detection accuracy, they cannot be implemented in databases with massive volumes
of data. To address this issue, this paper proposes an enhanced graph-based semi-
supervised learning algorithm (EGSLA) to detect fake users from a large volume
of Twitter data. The proposed method encompasses four modules: data collection,
feature extraction, classification and decision making. Data collected from Twitter
using Scrapy is utilized for the evaluation. The performance of the proposed algo-
rithm is tested with existing game theory, k-nearest neighbor (KNN), support vector
machine (SVM) and decision tree techniques. The results show that the proposed
EGSLA algorithm achieves 90.3% accuracy in spotting fake users.
1 Introduction
Online social networks (OSNs) have become one of the most important ways for
people to interact with their friends, to grow a business or to obtain information. By
some estimates, up to 67% of people aged 18–29 exploit social networking at some
level, and around 22% of the population overall uses social network websites for
* M. BalaAnand
[email protected]
Extended author information available on the last page of the article
13
Vol.:(0123456789)
M. BalaAnand et al.
some reason. These days, social networks such as Twitter and Facebook are driving
different sorts of interaction, collaboration, exchange and dialogue [1].
Typically, individuals tend to use different OSNs concurrently. For example, an
individual uses Facebook to connect with his friends, uses Twitter to post news, and
uses Foursquare for location-centered social activities. As expected, his social activ-
ities and associations are scattered on some networks, so his online information on
a solitary website is frequently inadequate. If precisely incorporate these sites, can
fabricate that individual’s information to be better in addition to more complete to
enhance online services, for example, community discovery, recommendation [2].
Commonly, in a social network, users post their personal traits on their personal
pages, for example, their colleges, majors and interests. This profile data can be used
as a reason for grouping users, sharing content, and suggesting friends. However, in
reality, not every user on a social network is real [3].
Fake users are defined as any account that consistently involves automation over
an observed period, such as the use of the Twitter API or other third-party tools,
or performing actions such as automated likes, tweets, and retweets. Fake users
include marketing bots, social bots, stalker bots and malicious bots [4]. Every bot
has a unique purpose. Marketing bots tweet about a specific product or company in a
predefined design. A social bot retweets love likes and follow in light of certain key
words (e.g. smile). A stalker bot spins around a user or two. Malicious bots spread
spam links [5]. A tweet is an original status and not a retweet, a retweet is a tweet
with “RT” in the text, and status is either a tweet or a retweet. Content on Twitter is
limited to whatever is contained in a tweet: text, URL, picture, and video [6].
The fundamental difficulty in detecting fake users is that they are rapidly refining
their spam strategies to stay ahead of advances in detection measures [7]. For meth-
ods that use essential features focused upon user profiles and message content, fake
users can evade detection by purchasing followers or using tools to post messages
with a similar meaning but with different words [8, 9]. These bots tend to be inter-
connected, surrounding account groups. Supporter accounts help fake users evade
detection by increasing their numbers of followers. In any case, perceived as social
business people are a small percentage of real users, and they will take after back
any person who follows them, with the specific goal of manufacturing their notori-
ety. There are a set number of edges keeping up complementary social connections
between fake users and real users, but the confirmation that real users take after the
fake user more than anticipated was found [10].
Fake users registered on social networks are able to perform a variety of actions
(depending on the specific social network), as follows: log onto the social network
home page and access a personal page, modify personal settings using the graphical
interface of the social network by providing personal information (full name, virtual
communities of interest, educational institutions, list of employers, vacation destina-
tions, areas of interest, recreational activities, and musical and other interests); change
(create, modify or delete) current “status” bar values on personal page (typically con-
tains current activity, disposition or geolocation indicated by the user or the user’s
device); post text, photo or audio materials on the personal page on social network
home page; frame or cut off friendly relations (both one-way and two-way—usually
confirmed by the user); send personal messages to a specific social network user, which
13
An enhanced graph‑based semi‑supervised learning algorithm…
2 Related works
Boshmaf et al. [14] introduced Integro, a scalable defense system that helps OSN
administrators to detect fake accounts by means of a large-scale user ranking sys-
tem. OSNs today are confronted with the problem of identifying fake accounts under
13
M. BalaAnand et al.
13
An enhanced graph‑based semi‑supervised learning algorithm…
13
M. BalaAnand et al.
1. Data collection
• Python web-scraping framework
2. Feature extraction
• Fraction of retweets
• Standard tweet length
• Fraction of URLs
• Average time between tweets:
3. Classification and decision making
• EGSLA algorithm
Graphs are mathematical structures utilized for the modeling of pairwise relation-
ships between objects. Graph models can be classified into directed graphs and undi-
rected graphs. An undirected graph implies that there is no distinction between two
vertices related to every edge. For instance, if each vertex signifies a man at a social
event, an edge between two individuals implies that they shake hands, and after
that the graph will be undirected. Despite what might be expected, a directed graph
incorporates directional edges, and the direction is shown by arrows. For instance, if
each vertex signifies a site page, and edges signify a hyperlink starting with one site
page and then the next, at that point the graph will be directed. The most commonly
utilized graphs in computer science are weighted graphs. The fundamental compo-
nents of weighted graphs are vertex, edge and weight. In its mathematical represen-
tation, a graph is an ordered pair G = (V, E, W) , where V is a set of vertices or points
or nodes, E is a collection of edges or lines or arcs that are two-component subsets
of V, and W is a set of weights of the vertices or the edges.
A graph-based model is utilized for modeling user behavior. Typically, the nodes
of such a graph are associated with individuals, and the edges connecting the nodes
are associated with the type of communication or connection between the gen-
eral population. All fake users display the same example. These parameters can be
restrictively joined to detect the likeness between users.
13
An enhanced graph‑based semi‑supervised learning algorithm…
• Fake users post the same content in multiple accounts. The group of instant mes-
sages posted by the user can totally involve messages which were taken from
various users. Accordingly, for any message of the ith user, the unclear message
shows in the social database that was distributed before by some jth user. The
comparative way participation work aimed at social bot subset that relies upon
weighted factors can be assembled.
• Fake users perform activities at regular intervals. Different users perform the
same activities at regular predefined time intervals.
• Fake user accounts have similar patterns. This is a typical pattern which appro-
priates for every individual data of the ith and jth users and exist after that it
is sensible to assume the registration of ith and jth virtual users to be mostly
administered by sophisticated software. A typical example that is reasonable for
the name order of accounts (“kumar_2000”, “kumar_2001”, “kumar_2002”,
etc.) as though which are prepared by a unique algorithm. Utilizing the regular
pattern naming causes the authorized person to get the job statistics is a rule.
• Fake users have either links or media content in all the posts. Likewise, they uti-
lize trending hashtags. This is done to expand their tweet reach. However, the
content of the post and the hashtags or multimedia are not really related.
3.1 Data collection
The initial step is gathering the necessary data for the analysis. Here, 2 months of
Twitter data have been collected, and as many tweets as possible are collected from
the Twitter handles that tweet with links and trending topics. It is assumed that fake
users tweet with links in every tweet. In addition, fake users like to tweet with trend-
ing hashtags to increase their visibility. Trending topics and external links are the
fundamental data sources. In this approach, the data are collected using the Python
web-scraping framework. Web scraping is an automated method utilized to extract
large quantities of data from websites. The data on the websites are unstructured.
Web scraping aids in collecting this unstructured data and storing it in a structured
form. A crawler is implemented in the Python Scrapy framework to download the
above-mentioned parameters in Twitter. The raw data gathered by the Scrapy system
is stored in a Neo4j instance. Neo4J is a graph database that stores nodes and the
relationships between nodes. Neo4J is the most popular graph database according to
DB-Engines Ranking, and the 22nd most popular database overall. Neo4J has sev-
eral types of graphs, one of which is a directed graph, where nodes and relationships
by default are not bidirectional. Twitter relationships are of this type, where a user
can follow certain profiles in this social network without them following him. It can
scale limitlessly and can efficiently perform graph calculations (for example, “dis-
cover the nodes that are linked with a node that is linked with another node through
a specific relationship”). Finally, the Twitter data collection consists of user id, sta-
tus, followers list, following information, posting time, retweet status, tweet length
and time between tweets.
The data are preprocessed utilizing Apache Spark to format and standardize the raw
data to create data sets. Apache Spark is a fast computing technology; it increases the
13
M. BalaAnand et al.
processing speed of the application and also supports multiple languages. The preproc-
essor comprises two classes, one for connecting with the Neo4j instance and gather-
ing the data centered upon a cipher query, and a subsequent class that performs all the
formatting and normalizing. The preprocessed data contain the graph of user id, status,
follower list and following information for the users. The data were gathered periodi-
cally at 1-h intervals to extract all trending data from Twitter. The crawler utilized the
breadth-first search algorithm to crawl Twitter. A total of 21,473 users and 2,915,147
tweets were ultimately acquired.
3.2 Feature extraction
This section exhibits the formulation for all of the features employed for fake user
detection. A user typically posts his UGC on different accounts at similar time periods.
Such temporal distribution reveals his online action pattern. For instance, some users
like to post UGC toward the start of the day, while other users conduct their activity at
night. Such users’ online patterns are very helpful for user identification. For a user, this
can extract the posting time from his public UGC and acquire a succession of posting
times. Similar content is found on various accounts, which demonstrates that fake users
generally publish their data on numerous different accounts. This UGC contains a con-
siderable number of similar words or synonyms. The similarity of UGC content can be
indicated by the semantic similarity of words or the numbers of the same words.
3.2.1 Fraction of retweets
This measures the number of times the user distributed a retweet divided by the number
of tweets the user has distributed, calculated as:
|{x|x ∈ tweetsu , x = retweet}|
Retweet(u) = (1)
|tweetsu |
Whether a tweet is a retweet or not is decided by looking at the “retweeted status”
field of the tweet’s data when returned through the Scrapy system. If the field contains
tweet information, view it as a retweet.
A tweet’s text is limited to 280 characters, but it is possible that bots will post less than
that, as they could simply be advancing a URL or a hashtag. To take this into account,
the average length of the user’s tweets is checked. It is the total of the characters in the
whole tweets the user has distributed divided by the number of tweets the user has dis-
tributed, formally:
∑�tweetsu � � �
�tweetsui �
Length(u) =
i � � (2)
�tweetsu �
13
An enhanced graph‑based semi‑supervised learning algorithm…
| |
where |tweetsui | is the length of the user u’s ith tweet, estimated by the quantity of
| |
characters in the tweet.
3.2.3 Fraction of URLs
This measures the number of times the user published a tweet containing a Uniform
Resource Locator (URL) divided by the total number of tweets the user has published:
|{y|y ∈ tweetsu , y = URL}|
URL(u) = (2)
|tweetsu |
Whether a tweet contains a URL is determined by looking at the “entities” field of
the tweet returned by the application programming interface. Bots attempt to per-
suade genuine users to go to external sites that are operated by their controller. This
component allows one to distinguish bots that are attempting to promote URLs in
their tweets. The entry point of the URL is found for each tweet. It gauges the fre-
quency with which every URL occurs in the tweet.
It was discovered that various bots tweet in a “bursty” nature, distributing an extensive
proportion of their tweets within a short period. This lead is measured as:
N
1 ∑
Time(u) = (t − t ) (3)
|tweets | − 1 i−2 i j
u
where ti is the time stamp of the ith tweet in user u’s timeline, tj is the time stamp
of the jth tweet in user u’s timeline, sorted chronologically in ascending order (i.e.
ti > tj).
The similarity of posting time can be signified as the distinction in time. Also, dole
out weight w1im to posting time tim
1
. For two users (u1i , u2k ) can calculate the distinction
between two successions of time, and obtain a time similarity matrix
Mikt = {Mmn × w1im × w2kn } (4)
where Mmn indicates the contrast between the mth posting time of user v1i and the
nth time of user v2k . When considering the distinction on the users’ online behavior
patterns, computed the time similarity in two unique granularities, date and time,
in addition get two related matrices Mikt1 and Mikt2 . To signify the temporal features,
extracted the time distinction distribution from Mikt1 and Mikt2 , and mean these fea-
tures as the vector Vikt . Likewise, for two users (u1i , u2k ) , it can calculate the similarity
between any two sets of UGC, and obtain a content similarity matrix Mik = {Mmn }.
p
13
M. BalaAnand et al.
The task of distinguishing true from false stories is frequently seen as a binary clas-
sification task, although, depending upon the application, it may be perceived as a
task evaluating the probability of trickiness. The graph-based approach depends on
user activity and their account details. This graph-based learning provides a helpful
approach for modeling data on classification problems. The reason for choosing a
graph-based approach is that some data sets are naturally represented by a graph,
which represents heterogeneous data uniformly, and is easily parallelizable, scalable
to large data and effective in practice. One can classify each account as fake or real
by recognizing the unique features of the account. A test data set is created with
plausible positive and negative sentiment scores. With regard to the sentiment score
range, for example, if it ranges from − 2 to 2, and − 2 is a negative score, then − 1
is a negative and 1 is a positive score. Keeping in mind the end goal of determining
the similarity, 800 arbitrary users’ tweets about a trending hashtag are utilized. The
extracted features are trained using an EGSLA. It uses labeled and unlabeled sam-
ples to train itself, which circumvents the issue of costly group training samples with
labels and significantly helps the measure of accessible training samples.
Assuming a set of l labeled samples and u unlabeled samples are provided:
{(xi )}l+u
i=l+1
Unlabeled samples (6)
{ }l
Xtest = xi i=1 Test samples (7)
13
An enhanced graph‑based semi‑supervised learning algorithm…
by fitting a function appropriately on the given training set and to avoid overfitting.
Regularization functions vary in high density, which is given as
n
� � �2
⟨f , lf ⟩ = ⟨f , (D − K)f ⟩ = wij fi − fj (11)
i,j=1
where l denotes the labeled samples and wij denotes the weight function.
Various techniques can be employed for the selection of S and 𝜒 . For instance, the
graph Laplacian matrix (the Laplacian matrix is used to compute the number of con-
nected components of a graph and can represent the graph structure)S = D − K can be
engaged for S, where K is the kernel matrix of the graph, which is expressed as
∑
K= 𝜂i 𝜓i 𝜓iT (12)
i
where 𝜓i ’s implies the eigenvectors of the graph Laplacian, 𝜂i ≥ 0 are the eigen-
values of the kernel matrix, T denotes the transverse value of eigenvector which is
obtained from the related kernel function, and D is the equivalent degree matrix.
Instead, the normalized graph Laplacian matrix can be engaged:
(13)
1 1
S = I − D− 2 KD− 2
where I is the identity matrix. Once S is determined, 𝜒(f ) can be deduced by utilizing
𝜒(f ) = ‖ ‖
‖Fu ‖ (14)
The local and global consistency error function is
� 2
1 � �� Fic Fjc � � �
1
���
E(F) = wij � √ − √ � + − 1 �Fic − Bic �2 (15)
2 i,j � � � �
�
c � D ij D �
ij �
𝛼 i c
where c denotes the local and global consistency of the label score F. Fic and Fjc
represent the label score and Bic represents the initial label assignments. The label
scores are assigned for each node in the graph; the label scores differ smoothly over
the graph structure. The parameter 0 ≤ 𝛼 ≤ 1 controls the rate at which the label
information is propagated around the network. To avoid computational and memory
cost the operation uses the power method to iteratively update F:
( )
Ft+1 = Z −1 (1 − 𝛼)B + 𝛼LFt (16)
where Ft denotes the current label score, Ft+1 denotes the updated label score, Z −1
signifies a diagonal matrix used to normalize the rows of F, and B denotes the initial
label assignments. The proposed EGSLA pseudo code is exhibited in Fig. 2.
13
M. BalaAnand et al.
4 Results and analysis
The proposed approach is assessed based on real posts from Twitter. Using the pro-
posed method, 2,915,147 tweets from 21,473 users during the period from Decem-
ber 2017 to February 2018 were recovered. A review of the acquired data set is
shown in Table 1.
4.1 Performance analysis
13
An enhanced graph‑based semi‑supervised learning algorithm…
to gauge the use of each lead to the records in the benchmark data set, thought about
the going with standard appraisal measurements:
• True positive (TP) the number of fake users perceived by the rule as fake users
• True negative (TN) the number of real users perceived by the rule as real users
• False positive (FP) the number of real users perceived by the rule as fake users
• False negative (FN) the number of fake users perceived by the rule as real users
The following accuracy, false positive, precision, recall and Matthews correlation
coefficient metrics are used to determine the model’s efficiency.
(a) Accuracy
Accuracy is a measure of the accounts that are correctly classified:
(TN + TP)
(17)
(TN + FP + FN + TP)
(b) Precision
Precision is calculated as the number of correct positive predictions divided
by the total number of positive predictions. The precision calculation is given as
TP
(18)
TP + FP
(c) Sensitivity
Sensitivity is calculated as the number of correct positive predictions divided
by the total number of positives. The sensitivity measure is mathematically
expressed as
TP
(19)
TP + FN
(d) Specificity
Specificity is calculated as the number of correct negative predictions divided
by the total number of negatives. The specificity calculation is given as
TN
(20)
TN + FP
(e) Matthews correlation coefficient (MCC)
The estimator of the correlation between the predicted class and the real class
of the samples is characterized as:
Real user TN FP
Fake user FN TP
13
M. BalaAnand et al.
TP ⋅ TN − FP ⋅ FN
√ (21)
(TP + FN)(TP + FP)(TN + FP)(TN + FN)
Each of the above measures gets a substitute piece of the forecast nature of the
examples that have a place with the huge class. Precision measures what quanti-
ties of tests are adequately recognized in both of the classes; by and by, it does not
express if the noteworthy class is better seen over the other one. Also, there are con-
ditions where some prediction models perform better than others, despite having
lower accuracy. MCC attempts to pass on in one single value the nature of a gauge,
uniting alternate measurements. Additionally, MCC uses all four quadrants of the
confusion matrix.
The experiment was conducted for every rule considering two classes of accounts,
the fake users and the real users. To epitomize the outcome of every experiment,
some were considered as assessment metrics in light of two standard indicators,
accuracy and MCC. The estimation results are shown in Tables 3 and 7, and the
estimation results for the precision, specificity and sensitivity are shown in Tables 4,
5 and 6.
The accuracy of the proposed EGSLA based on positive, negative and neutral
labels in train sentiment and test sentiment is given in Table 3. The neutral value is
87.2 in both the train and test sentiment. In the positive label, the positive value is
90.3 and the negative value is 84.1. Figure 3 shows the proposed EGSLA accuracy
on negative, positive and neutral labels.
Table 4 shows the precision of the proposed EGSLA on negative, positive, and
neutral labels for 800 users. The precision is 92.3, which is obtained in the testing
sentiment of the positive level. The graphical representation of the precision value
of the proposed EGSLA is shown in Fig. 4
13
An enhanced graph‑based semi‑supervised learning algorithm…
Table 5 shows the specificity of the proposed EGSLA on negative, positive and
neutral labels for 800 users. The performance is shown in train sentiment and test
sentiment phases. A pictorial representation of the specificity value is shown in
Fig. 5.
Table 6 shows the sensitivity of the proposed EGSLA on negative, positive, and
neutral labels for 800 users. The sensitivity value is 90.8 in positive phase. In the
train sentiment positive label, the positive value is 90.8 and the negative value is
84.3. In the train sentiment negative label, the negative value is 87.9. The sensitivity
value is represented as shown in Fig. 6.
Table 7 shows the MCC performance of the proposed EGSLA on negative, posi-
tive and neutral labels. In the train sentiment and test sentiment, the neutral value is
85.5. From this observation, the positive value is better than the negative value. Fig-
ure 7 shows MCC of the proposed EGSLA on negative, positive and neutral labels.
13
M. BalaAnand et al.
The tweets are organized into real user and fake user groups, and based on
these two groups, the tweets are labeled for examining the transition sequences.
The tweets are characterized into five labels utilizing EGSLA. A group of 800
users is considered for assessment of #budget2018. The average values of each
label for 800 users is shown in Table 8
The labels are given in increasing request of the total of the estimations of
each measurement of the component vector of the names’ centroid. Label 1 is
the group whose total measurement of the centroid’s element vector is the low-
est. Label 5 is the group whose total of the measurement of the centroid’s ele-
ment vector is to the greatest.
13
An enhanced graph‑based semi‑supervised learning algorithm…
4.2 Comparative analysis
13
M. BalaAnand et al.
achieving 89.4% MCC, 88.2% true F-measure and 89.7% false F-measure. Table 9
shows the performance comparison between different ML algorithms used to detect
the fake users on OSNs.
Figure 8 shows the accuracy of various ML algorithms compared with the
EGSLA algorithm.
In this experiment, it was found that if the percentage of labeled data of the
anticipated characteristic is very large, the prediction accuracy can be increased by
gradually increasing the percentage of labeled data. If the proclivity estimation of
a property is greater than 10, users having a similar incentive on this characteristic
push toward getting to be friends more successfully. The time and retweet likeness
were found to be the best comparability measure expected for inferring this quality.
13
An enhanced graph‑based semi‑supervised learning algorithm…
Therefore, increasing the weight of retweet and time closeness can increase predic-
tion accuracy.
5 Conclusion
References
1. Hanna R, Rohm A, Crittenden VL (2011) We’re all connected: the power of the social media eco-
system. Bus Horiz 54(3):265–273
2. Doan A, Ramakrishnan R, Halevy AY (2011) Crowdsourcing systems on the World-Wide Web.
Commun ACM 54(4):86–96
3. Ding Y, Yan S, Zhang Y, Dai W, Dong L (2016) Predicting the attributes of social network users
using a graph-based machine learning method. Comput Commun 73:3–11
4. Krombholz K, Merkl D, Weippl E (2012) Fake identities in social media: a case study on the sus-
tainability of the Facebook business model. J Serv Sci Res 4(2):175–212
5. Chu Z, Gianvecchio S, Wang H, Jajodia S (2012) Detecting automation of Twitter accounts: are you
a human, bot, or cyborg? IEEE Trans Dependable Secure Comput 9(6):811–824
6. Gilani Z, Farahbakhsh R, Tyson G, Wang L, Crowcroft J (2017) Of bots and humans (on Twitter),
In: Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks
Analysis and Mining 2017, pp 349–354. ACM
7. Qiang F, Feng B, Guo D, Li Q (2018) Combating the evolving spammers in online social networks.
Comput Secur 72:60–73
8. Yang C, Harkreader R, Guofei G (2013) Empirical evaluation and new design for fighting evolving
Twitter spammers. IEEE Trans Inf Forensics Secur 8(8):1280–1293
9. Aggarwal A, Rajadesingan A, Kumaraguru P (2012) PhishAri: Automatic Realtime Phishing Detec-
tion on Twitter, eCrime Researchers Summit (eCrime), 2012, pp 1–12, IEEE
10. Yan G (2013) Peri-Watchdog: hunting for hidden botnets in the periphery of online social networks.
Comput Netw 57(2):540–555
11. Drevs Y, Svodtsev A (2016) Formalization of criteria for social bots detection systems. Procedia-
Soc Behav Sci 236:9–13
12. Farasat A, Nikolaev A, Srihari NS, Blair RH (2015) Probabilistic graphical models in modern social
network analysis. Soc Netw Anal Min 5(1):5–62
13
M. BalaAnand et al.
13. Ramalingam D, Chinnaiah V (2017) Fake profile detection techniques in large-scale online social
networks: a comprehensive review. Comput Electr Eng 65:165–177
14. Boshmaf Y, Logothetis D, Siganos G, Lería J, Lorenzo J, Ripeanu M, Beznosov K, Halawa H
(2016) Íntegro: leveraging victim prediction for robust fake account detection in large scale OSNs.
Comput Secur 61:142–168
15. Escalante HJ, Villatoro-Tello E, Garza SE, López-Monroy AP, Montes-y-Gómez M, Villaseñor-
Pineda L (2017) Early detection of deception and aggressiveness using profile-based representa-
tions. Expert Syst Appl 89:99–111
16. Tsikerdekis M (2017) Real-time identity deception detection techniques for social media: optimiza-
tions and challenges. IEEE Internet Comput 99:1–11
17. Kuruvilla AM, Varghese S (2015) A detection system to counter identity deception in social media
applications, In: International Conference Circuit, Power and Computing Technologies (ICCPCT),
2015, pp 1–5, IEEE
18. Gera T, Singh J (2015) A parameterized approach to deal with sock puppets, In: 2015 Third Interna-
tional Conference Computer, Communication, Control and Information Technology (C3IT), pp 1–6,
IEEE
19. Jiang X, Li Q, Ma Z, Dong M, Wu J, Guo D (2018) QuickSquad: a new single-machine graph
computing framework for detecting fake accounts in large-scale social networks. Peer-to-Peer Netw
Appl 1–18
20. Yuan W, Yang M, Li H, Wang C, Wang B (2018) End-to-end learning for high-precision lane keep-
ing via multi-state model. CAAI Trans Intell Technol 3:185–190
21. Shi Q, Lam HK, Xiao B, Tsai SH (2018) Adaptive PID controller based on Q-learning algorithm.
CAAI Trans Intell Technol 3(4):235–244
22. Wang K, Zhu N, Cheng Y, Li R, Zhou T, Long X (2018) Fast feature matching based on r-nearest
k-means searching. CAAI Trans Intell Technol 3(4):198–207
23. BalaAnand M, Karthikeyan N, Karthik S (2018) Designing a framework for communal software:
based on the assessment using relation modelling. Int J Parallel Prog. https://doi.org/10.1007/s1076
6-018-0598-2
24. Solomon Z, Sivaparthipan CB, Punitha P, BalaAnand M, Karthikeyan N (2018) Certain investiga-
tion on power preservation in sensor networks. In: 2018 International Conference on Soft-Comput-
ing and Network Security (ICSNS), Coimbatore, India, https://doi.org/10.1109/icsns.2018.8573688
25. Sivaparthipan CB, Karthikeyan N, Karthik S (2018) Designing statistical assessment healthcare
information system for diabetics analysis using big data. Multimed Tools Appl
Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published
maps and institutional affiliations.
Affiliations
13
An enhanced graph‑based semi‑supervised learning algorithm…
1
Department of Computer Science & Engineering, V.R.S. College of Engineering & Technology,
Viluppuram, India
2
Department of MCA, SNS College of Engineering, Coimbatore, India
3
Department of Computer Science & Engineering, SNS College of Technology, Coimbatore,
India
4
Department of Computer Science & Engineering, Anna University, Chennai, India
5
Department of Computer Science, University of California, Davis, USA
13
View publication stats