0% found this document useful (0 votes)

3K views8 pages

WSMA Lab Manual 2

The document describes a lab course on web and social media analytics. The course aims to expose students to techniques for text analytics, sentiment analysis, web analytics, and search engine optimization. The document lists experiments covering preprocessing text, sentiment analysis, web analytics, search engine optimization, and using Google Analytics tools. It also provides resources and textbooks for the course.

Uploaded by

Ashish Kurapathi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3K views8 pages

WSMA Lab Manual 2

Uploaded by

Ashish Kurapathi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

WEB AND SOCIAL MEDIA ANALYTICS LAB

B.Tech. IV Year I Sem. L T P C

0 0 2 1

Course Objectives: Exposure to various web and social media analytic techniques.

Course Outcomes:
1. Knowledge on decision support systems.
2. Apply natural language processing concepts on text analytics.
3. Understand sentiment analysis.
4. Knowledge on search engine optimization and web analytics.

List of Experiments
1. Preprocessing text document using NLTK of Python
a. Stopword elimination
b. Stemming
c. Lemmatization
d. POS tagging
e. Lexical analysis
2. Sentiment analysis on customer review on products
3. Web analytics
a. Web usage data (web server log data, clickstream analysis)
b. Hyperlink data
4. Search engine optimization- implement spamdexing
5. Use Google analytics tools to implement the following
a. Conversion Statistics
b. Visitor Profiles
6. Use Google analytics tools to implement the Traffic Sources.

Resources:
1. Stanford core NLP package
2. GOOGLE.COM/ANALYTICS

TEXT BOOKS:
1. Ramesh Sharda, Dursun Delen, Efraim Turban, BUSINESS INTELLIGENCE AND
ANALYTICS: SYSTEMS FOR DECISION SUPPORT, Pearson Education.

REFERENCE BOOKS:
1. Rajiv Sabherwal, Irma Becerra- Fernandez,” Business Intelligence –
Practice,Technologies and Management”, John Wiley 2011.
2. Lariss T. Moss, Shaku Atre, “Business Intelligence Roadmap”, Addison-Wesley It Service.
3. Yuli Vasiliev, “Oracle Business Intelligence: The Condensed Guide to Analysis and Reporting”,
SPD Shroff, 2012.
1. Preprocessing text document using NLTK of Python
a. Stopword elimination

import nltk

def stopword_elimination(text):
stopwords = nltk.corpus.stopwords.words('english')
filtered_words = [word for word in text if word not in stopwords]
return filtered_words

if __name__ == '__main__':
text = "This is a sample text with stopwords."
filtered_words = stopword_elimination(text)
print(filtered_words)

Output
['This', 'sample', 'text', 'with']

b) Stemming
import nltk
from nltk.stem import PorterStemmer

def stemming(text):
stemmer = PorterStemmer()
stemmed_words = []
for word in text:
stemmed_words.append(stemmer.stem(word))
return stemmed_words

if __name__ == '__main__':
text = "This is a sample text with stemming."
stemmed_words = stemming(text)
print(stemmed_words)

Output

python stemming.py
['this', 'sampl', 'text', 'stem']

C) Lemmatization

import nltk
from nltk.stem import WordNetLemmatizer

def lemmatization(text):
lemmatizer = WordNetLemmatizer()
lemmatized_words = []
for word in text:
lemmatized_words.append(lemmatizer.lemmatize(word))
return lemmatized_words

if __name__ == '__main__':
text = "This is a sample text with lemmatization."
lemmatized_words = lemmatization(text)
print(lemmatized_words)

Output

python lemmatization.py
['this', 'sample', 'text', 'lemmatization']

D) POS tagging

import nltk

def pos_tagging(text):
tagged_words = nltk.pos_tag(text)
return tagged_words

if __name__ == '__main__':
text = "This is a sample text with POS tagging."
tagged_words = pos_tagging(text)
print(tagged_words)

Output

python pos_tagging.py

[('This', 'DT'), ('is', 'VBZ'), ('a', 'DT'), ('sample', 'NN'), ('text', 'NN'), ('with', 'IN'), ('POS', 'NN'), ('tagging', 'VBG')]

E) Lexical analysis

import nltk

def lexical_analysis(text):
tokens = nltk.word_tokenize(text)
tagged_tokens = nltk.pos_tag(tokens)
return tagged_tokens

if __name__ == '__main__':
text = "This is a sample text with lexical analysis."
tagged_tokens = lexical_analysis(text)
print(tagged_tokens)

Output

python lexical_analysis.py
[('This', 'DT'), ('is', 'VBZ'), ('a', 'DT'), ('sample', 'NN'), ('text', 'NN'), ('with', 'IN'), ('lexical', 'JJ'), ('analysis', 'NN')]
2. Sentiment analysis on customer review on products

import nltk
from nltk.sentiment.vader import SentimentIntensityAnalyzer

def sentiment_analysis(text):
analyzer = SentimentIntensityAnalyzer()
sentiment = analyzer.polarity_scores(text)
return sentiment

if __name__ == '__main__':
text = "This is a sample text with positive sentiment."
sentiment = sentiment_analysis(text)
print(sentiment)

Output

python sentiment_analysis.py
{'neg': 0.0, 'neu': 0.1, 'pos': 0.9, 'compound': 0.9306}

3. Web analytics
a. Web usage data (web server log data, clickstream analysis)

import pandas as pd

def web_usage_analysis(log_file):
log_data = pd.read_csv(log_file)
# Analyze the data
...
# Print the results
...

if __name__ == '__main__':
log_file = 'web_log.csv'
web_usage_analysis(log_file)

Output

python web_usage_analysis.py

The output of the program will depend on the data in the log file. However, the output might include the following
information:

• The number of unique visitors to the website

• The most popular pages on the website
• The time of day when most visitors come to the website
b. Hyperlink data

import requests
import bs4

def hyperlink_analysis(url):
response = requests.get(url)
soup = bs4.BeautifulSoup(response.content, 'html.parser')
links = soup.find_all('a')
# Analyze the links
link_counts = {}
for link in links:
anchor_text = link.text
url = link['href']
if url not in link_counts:
link_counts[url] = 0
link_counts[url] += 1
# Print the results
for url, count in link_counts.items():
print(f'{url}: {count}')

if __name__ == '__main__':
url = 'https://www.google.com/'
hyperlink_analysis(url)

Output

pip install requests

pip install bs4
python hyperlink_analysis.py

The output of the program will depend on the page at the URL that you specify. However, the output might include
the following information:

• The number of links on the page

• The most popular links on the page
• The links that point to external websites

4. Search engine optimization- implement spamdexing

import nltk
def spamdexing(text):
stopwords = nltk.corpus.stopwords.words('english')
keywords = ['keyword1', 'keyword2', 'keyword3']
filtered_text = [word for word in text if word not in stopwords]
for keyword in keywords:
filtered_text.append(keyword * 10)
return filtered_text

if __name__ == '__main__':
text = "This is a sample text with stopwords."
filtered_text = spamdexing(text)
print(filtered_text)

Output

['This', 'is', 'a', 'sample', 'text', 'with', 'stopwords.', 'keyword1', 'keyword1', 'keyword1', 'keyword1',
'keyword1', 'keyword1', 'keyword1', 'keyword1', 'keyword2', 'keyword2', 'keyword2', 'keyword2',
'keyword2', 'keyword2', 'keyword2', 'keyword2', 'keyword3', 'keyword3', 'keyword3', 'keyword3',
'keyword3', 'keyword3', 'keyword3', 'keyword3']

5. Use Google analytics tools to implement the following

a. Conversion Statistics

import requests

def get_conversion_data(conversion_id):
url = 'https://analytics.google.com/analytics/v3/data/ga?'
params = {
'ids': 'ga:{conversion_id}',
'start-date': '2023-01-01',
'end-date': '2023-08-01',
'metrics': 'ga:conversions',
'dimensions': 'ga:date',
'samplingLevel': '1'
}
response = requests.get(url, params=params)
return response.json()

if __name__ == '__main__':
conversion_id = '1234567890'
conversion_data = get_conversion_data(conversion_id)
print(conversion_data)

output

python conversion_tracking.py

The output of the program will depend on the data in the data file. However, the output might include the following
information:

• The conversion rate

• The number of conversions
• The number of visitors
b. Visitor Profiles

import requests

def get_visitor_profiles(profile_ids):
url = 'https://analytics.google.com/analytics/v3/data/ga?'
params = {
'ids': 'ga:{profile_ids}',
'start-date': '2023-01-01',
'end-date': '2023-08-01',
'metrics': 'ga:sessions,ga:bounceRate,ga:pageviews',
'dimensions': 'ga:source,ga:medium,ga:deviceCategory',
'samplingLevel': '1'
}
response = requests.get(url, params=params)
return response.json()

if __name__ == '__main__':
profile_ids = '1234567890,1234567891'
visitor_profiles = get_visitor_profiles(profile_ids)
print(visitor_profiles)

output

python visitor_profiles.py

• ga:sessions: The number of sessions that the visitor had.

• ga:bounceRate: The bounce rate of the visitor.
• ga:pageviews: The number of pageviews that the visitor had.
• ga:source: The source of the visitor.
• ga:medium: The medium of the visitor.
• ga:deviceCategory: The device category of the visitor.

6. Use Google analytics tools to implement the Traffic Sources.

import requests

def get_traffic_sources(profile_id):
url = 'https://analytics.google.com/analytics/v3/data/ga?'
params = {
'ids': 'ga:{profile_id}',
'start-date': '2023-01-01',
'end-date': '2023-08-01',
'metrics': 'ga:sessions',
'dimensions': 'ga:source,ga:medium',
'samplingLevel': '1'
}
response = requests.get(url, params=params)
return response.json()

if __name__ == '__main__':
profile_id = '1234567890'
traffic_sources = get_traffic_sources(profile_id)
print(traffic_sources)

Output

python traffic_sources.py

• ga:sessions: The number of sessions from the traffic source.

• ga:source: The source of the traffic.
• ga:medium: The medium of the traffic.

{
"rows": [
{
"ga:sessions": 100,
"ga:source": "google",
"ga:medium": "organic"
},
{
"ga:sessions": 50,
"ga:source": "facebook",
"ga:medium": "social"
},
{
"ga:sessions": 20,
"ga:source": "twitter",
"ga:medium": "social"
},
{
"ga:sessions": 10,
"ga:source": "direct",
"ga:medium": "none"
}
]
}

Essential n8n Playbook
From Everand
Essential n8n Playbook
Leandro Calado
No ratings yet
List of SIH Based Machine Learning Projects From Smart India Hackathon (SIH)(2)
100% (1)
List of SIH Based Machine Learning Projects From Smart India Hackathon (SIH)(2)
32 pages
BMW Remote Coding Through VPN
75% (4)
BMW Remote Coding Through VPN
10 pages
Ontology Unit 2 Notes
100% (1)
Ontology Unit 2 Notes
17 pages
KRR Model Question
No ratings yet
KRR Model Question
1 page
Deep Learning R18 Jntuh Lab Manual
0% (1)
Deep Learning R18 Jntuh Lab Manual
21 pages
NLP Question Paper Solution
No ratings yet
NLP Question Paper Solution
27 pages
NLP QB
100% (2)
NLP QB
14 pages
Artificial Intelligence Question Bank
100% (1)
Artificial Intelligence Question Bank
3 pages
Data Flow Testing - Strategies
100% (8)
Data Flow Testing - Strategies
16 pages
HTML Attributes: Attribute Description
No ratings yet
HTML Attributes: Attribute Description
27 pages
Web and Social Media Analytics Lab
No ratings yet
Web and Social Media Analytics Lab
34 pages
Unit-2.4 Searching With Partial Observations - CSPs - Back Tracking
100% (2)
Unit-2.4 Searching With Partial Observations - CSPs - Back Tracking
42 pages
Data Analytics III I
No ratings yet
Data Analytics III I
86 pages
Data Analytics Unit-I
No ratings yet
Data Analytics Unit-I
25 pages
Unit 5
100% (1)
Unit 5
19 pages
Internet of Things UNIT-III
100% (1)
Internet of Things UNIT-III
23 pages
NLP UNIT 2 (Ques Ans Bank)
No ratings yet
NLP UNIT 2 (Ques Ans Bank)
26 pages
Characteristics of Soft Computing
88% (8)
Characteristics of Soft Computing
11 pages
Web Security Unit 5
No ratings yet
Web Security Unit 5
22 pages
PAT Trees and PAT Arrays
No ratings yet
PAT Trees and PAT Arrays
12 pages
Conceptual Dependency in Artificial Intelligence
100% (6)
Conceptual Dependency in Artificial Intelligence
34 pages
Unit II Data Analytics
100% (1)
Unit II Data Analytics
17 pages
NLP Lab Manual (R20)
50% (2)
NLP Lab Manual (R20)
24 pages
IRS Spectrum
100% (1)
IRS Spectrum
150 pages
Data Analytics - Object Segmentation UNIT-IV
100% (1)
Data Analytics - Object Segmentation UNIT-IV
33 pages
Unit 1 (Fiot)
No ratings yet
Unit 1 (Fiot)
38 pages
Data Analytics Regression UNIT-III
No ratings yet
Data Analytics Regression UNIT-III
26 pages
NLP LAB MANUAL 3-2 AIML R22 UPDATE (1)
100% (1)
NLP LAB MANUAL 3-2 AIML R22 UPDATE (1)
20 pages
Unit 4 NLP
No ratings yet
Unit 4 NLP
51 pages
IRS Study Material
100% (1)
IRS Study Material
87 pages
r18 - Big Data Analytics - Cse (DS)
0% (1)
r18 - Big Data Analytics - Cse (DS)
1 page
IR Question Bank
100% (2)
IR Question Bank
29 pages
Unit 3 Notes
No ratings yet
Unit 3 Notes
34 pages
NLP Notes For Students
No ratings yet
NLP Notes For Students
18 pages
101905CS502H - Neural Networks and Deep Learning - Model Question Paper
100% (1)
101905CS502H - Neural Networks and Deep Learning - Model Question Paper
4 pages
Artificial Intelligence Conceptual Graphs
No ratings yet
Artificial Intelligence Conceptual Graphs
27 pages
NLP Unit 1 Notes
100% (1)
NLP Unit 1 Notes
19 pages
RL Model Question Paper
100% (1)
RL Model Question Paper
1 page
Ccs369 - Text and Speech Analysis - Lab Manual
100% (1)
Ccs369 - Text and Speech Analysis - Lab Manual
23 pages
5.5.2 Video To Text With LSTM Models
No ratings yet
5.5.2 Video To Text With LSTM Models
10 pages
Deep Learning - Question Papers
50% (2)
Deep Learning - Question Papers
7 pages
MCQ
100% (1)
MCQ
9 pages
CN Lab Viva Questions
100% (3)
CN Lab Viva Questions
4 pages
Graph Matrices and Applications
88% (8)
Graph Matrices and Applications
47 pages
Computer Networks JNTUH Unit1 Notes
No ratings yet
Computer Networks JNTUH Unit1 Notes
6 pages
Natural Language Processing
No ratings yet
Natural Language Processing
17 pages
IRS III Year UNIT-3 Part 1
50% (2)
IRS III Year UNIT-3 Part 1
18 pages
NLP - (Natural Language Processing Lab Manual)
No ratings yet
NLP - (Natural Language Processing Lab Manual)
12 pages
Irs Unit-V
No ratings yet
Irs Unit-V
48 pages
C & Ds Notes 2022-2023 r22 Syllabus
100% (1)
C & Ds Notes 2022-2023 r22 Syllabus
210 pages
21cs644 Module 3
No ratings yet
21cs644 Module 3
95 pages
Ccw332-Digital Marketing-Lab Content
100% (1)
Ccw332-Digital Marketing-Lab Content
19 pages
Online Search Agents
100% (2)
Online Search Agents
13 pages
NLP UNIT-II
No ratings yet
NLP UNIT-II
71 pages
Hci - Web Interface Design
No ratings yet
Hci - Web Interface Design
54 pages
Winston Learning in AI
No ratings yet
Winston Learning in AI
2 pages
NLP Unit-3-Semantics-And-Pragmatics
No ratings yet
NLP Unit-3-Semantics-And-Pragmatics
20 pages
Unit 5
No ratings yet
Unit 5
20 pages
Lect6-IoT-Cloud Storage Models and Communication APIs1
100% (1)
Lect6-IoT-Cloud Storage Models and Communication APIs1
25 pages
DATA ANAYTICS Notes UNIT4
100% (1)
DATA ANAYTICS Notes UNIT4
45 pages
Lab Manual
No ratings yet
Lab Manual
10 pages
CE2307 Network Lab Set3
No ratings yet
CE2307 Network Lab Set3
3 pages
What Is A Firewall - Cisco
No ratings yet
What Is A Firewall - Cisco
4 pages
Role Play - Final
No ratings yet
Role Play - Final
2 pages
Chapter 5 E-Commerce Security and Payment Systems
No ratings yet
Chapter 5 E-Commerce Security and Payment Systems
8 pages
Class 6 AI Answer Key
No ratings yet
Class 6 AI Answer Key
2 pages
Building_Microservices.pdf
No ratings yet
Building_Microservices.pdf
4 pages
Can Package Size Accelerate Usage Volume PDF
No ratings yet
Can Package Size Accelerate Usage Volume PDF
16 pages
Contoh Konfigurasi Firewall Pada Voip
100% (1)
Contoh Konfigurasi Firewall Pada Voip
2 pages
Ransomware Memo
No ratings yet
Ransomware Memo
2 pages
Hacking LLM
No ratings yet
Hacking LLM
33 pages
Computer Security: R. Shipsey
No ratings yet
Computer Security: R. Shipsey
36 pages
Sadasdasdasd 1111
No ratings yet
Sadasdasdasd 1111
9 pages
How To Get Started With Weebly
No ratings yet
How To Get Started With Weebly
2 pages
Archer 7 Ug Rev5.0.1
No ratings yet
Archer 7 Ug Rev5.0.1
115 pages
1-1.quick Stratup Guide For OneOcean - Normal Ordering
No ratings yet
1-1.quick Stratup Guide For OneOcean - Normal Ordering
39 pages
Irule in F5 Big-Ip: Blog Config & Trblshoot Interview Q&A Datasheets Cheatsheets Free Zone E-Store Training
No ratings yet
Irule in F5 Big-Ip: Blog Config & Trblshoot Interview Q&A Datasheets Cheatsheets Free Zone E-Store Training
6 pages
E-Commerce and Operations Management
0% (1)
E-Commerce and Operations Management
18 pages
Competitive Analysis
No ratings yet
Competitive Analysis
36 pages
Networking Questions and Answers
No ratings yet
Networking Questions and Answers
18 pages
Building Webapps With Wordpress - Preview
No ratings yet
Building Webapps With Wordpress - Preview
5 pages
ShareFile - Folder Permissions
No ratings yet
ShareFile - Folder Permissions
3 pages
IS Project Complete
No ratings yet
IS Project Complete
14 pages
Examen Kaspersky Certified
100% (3)
Examen Kaspersky Certified
38 pages
OAuth2 in Python PDF
No ratings yet
OAuth2 in Python PDF
19 pages
Lab 6 - Developing A Servlet Using HTML Forms
No ratings yet
Lab 6 - Developing A Servlet Using HTML Forms
4 pages
Administration Console and CLI Certificate Tools - Zimbra - Tech Center
No ratings yet
Administration Console and CLI Certificate Tools - Zimbra - Tech Center
17 pages
Firewall Bypass
No ratings yet
Firewall Bypass
22 pages
AMP Endpoints Basics Lab v1.6-US
No ratings yet
AMP Endpoints Basics Lab v1.6-US
325 pages

Uploaded by

Uploaded by

WEB AND SOCIAL MEDIA ANALYTICS LAB

B.Tech. IV Year I Sem. L T P C

• The number of unique visitors to the website

pip install requests

• The number of links on the page

4. Search engine optimization- implement spamdexing

5. Use Google analytics tools to implement the following

• The conversion rate

• ga:sessions: The number of sessions that the visitor had.

6. Use Google analytics tools to implement the Traffic Sources.

• ga:sessions: The number of sessions from the traffic source.

You might also like