0% found this document useful (0 votes)

75 views6 pages

Ii Semester 2004-2005CS C415/is C415 - Data Mining

1. The document contains a comprehensive exam for a data mining course, including multiple choice and open response questions. 2. The open response questions ask students to identify the best algorithm for mining long patterns, explain how to remove redundant rules, differentiate noise from outliers with an example, discuss problems with k-means clustering and how to address them, and find the best hub and most authoritative page in a mini web graph. 3. The exam also includes exercises to find frequent itemsets using the Apriori and FP-Growth algorithms, classify instances using naive Bayes classification and construct a decision tree, and perform average link and DBSCAN clustering on a dataset.

Uploaded by

ramanadk

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

75 views6 pages

Ii Semester 2004-2005CS C415/is C415 - Data Mining

Uploaded by

ramanadk

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

BIRLA INSTITUTE OF TECHNOLOGY & SCIENCE, PILANI

II SEMESTER 2004-2005CS C415/IS C415 – DATA MINING

Comprehensive Exam
Part-A (Closed book) Max Marks:12 Date: 13th May 2005
Each Question caries 0.5 marks except 19-20. More than one option may be correct.
1. Data Mining
(a) tasks can be descriptive or Predictive
(b) refers to current business trends in collecting and cleaning the transactional
data and making them available for analysis and decision support
(c) focuses on overall processes of knowledge discovery
(d) is an interdisciplinary field with a general goal of predicting outcomes and
uncovering relationships in data
2. Data warehouse is an ideal source of data for data mining because data in data
warehouse is
(a) integrated (c) clean
(b) historical (d) all
3. Which AR would you prefer to find rare item set (wherever they occur, they occur
together)
(a) high support high confidence (c) low support high confidence
(b) high support low confidence (d) low support low confidence
4. According to the principle of frequent pattern growth 123456 is a frequent
pattern iff 12345 is frequent in DB and
(a) 6 is frequent patterns in DB
(b) 23456 is frequent pattern in DB
(c) 6 is frequent in the transactions containing 12345
(d) 12345=>6 has confidence more than the min_sup
5. k-fold cross validation
(a) divides the dataset into k subsamples
(b) randomly selects subsample k times
(c) uses k-1 subsamples for training data and one for testing data
(d) performs training and testing k times
6. The best solution for classification problem will have entries in the confusion
matrix
(a) All zero values at diagonal
(b) All nonzero vales at diagonal and out side the diagonal
(c) Only zero values are outside the diagonal
(d) None of the above

7. The Abundance problem is that

(a) Thousands of irrelevant documents are to be searched
(b) Hundreds of irrelevant documents are returned in response to search a
query
(c) 99% of information of no interest to 99% of the people
(d) Internet sources are hidden behind search interfaces
(e)
8. The following classifier(s) are eager learners
(a) Case-based reasoning
(b) Decision tree
(c) K-nearest neighbors
(d) Bayesian classifier
9. The quality of clusters will depend on the
(a) low intra-class similarity
(b) high inter-class similarity
(c) similarity measures used by the clustering algorithm
(d) choice of the predicates used
10. Maximum distance between clusters {2,4,10,12,32} and {20,30,3,11}
(a) 28 (b) 29 (c) 18 (d) 21

11. pick the odd one out

(a) CLARA
(b) DBSCAN
(c) DENCLUE
(d) CLIQUE
12. Pick the incorrect statement(s) about STING
(a) It is a query independent technique
(b) Incremental updates are possible
(c) Cluster boundaries are either horizontal, vertical or diagonal
(d) O(k), k is number of data points
13. _____________ and _________________ are two techniques to improve the
classifier accuracy

14. ________________ selectively searches the web and incrementally modifies the
index.
15. Bayesian classifier that allows dependencies among sets of attributes

_______________________
16. OPTICS is ______________________________________________________
17. In DBSCAN, a cluster is defined as

_______________________________________________________________
18. Name data sources for web usage mining

19. Give two important factors for evaluation of an classification algorithm

20. Classify the following three attributes as binary, discrete or continuous. Further
classify the attributes as qualitative (nominal or ordinal) or quantitative (interval
or ratio)
(a) seasons of a year _________________, __________________
(b) color of the eyes _________________, __________________
(c) decay in a radioactive element __________________, _______________
(d) pair-wise distances between cities _________________, _______________

21. Write data mining task against the following examples of mined patterns
(a) People with age less than 25 and salary>40 K drive sports cars _________
(b) 80% of images containing a car as an object also contain a blue sky______
(c) Set of images that contain a car as an object ________________________
(d) Stocks of companies A and B perform similarly ______________________
(e) Sale of furniture increases with the improvement of real estate business ___
(f) Predicting water level at a potential point in the river based on the data
collected by the sensors _________________________________________
BIRLA INSTITUTE OF TECHNOLOGY & SCIENCE, PILANI
II SEMESTER 2004-2005 CS C415/IS C415 – DATA MINING
Comprehensive Exam
Part-A (Closed book) continued Max Marks:8
Use supplementary sheet to answer the following questions.

1. Identify the best algorithm to mine long patterns. Justify your answer.

2. How redundant multilevel rules can be removed? Explain through an example.

3. Differentiate noise and outlier through an example.

4. What are the problems with k-means? How can you deal with these problems?

5. Find the best hub and the most authoritative page from the following mini web.

Z Y

(1.5+1.5+1+2+2)
BIRLA INSTITUTE OF TECHNOLOGY & SCIENCE, PILANI
II SEMESTER 2004-2005 CS C415/IS C415 – DATA MINING
Comprehensive Exam
Part-B(Open book) Max Marks: 20 Date: 13th May 2005
1. (a) Find the all-frequent itemsets using Frequent Pattern Growth. For the data
given. Also generate all interesting association rules for minconf 70%
(b) Use Apriori to find all frequent itemsets for the data given. Also give non
frequent itemsets, itemsets not generated and itemsets pruned at each
step.
(4+3)
Trans No. Items
1 K,S,T
2 J,T
3 J,S,T
4 J,S
5 K,S
6 J,K
7 J,K,S,T
8 J,S,T
2. Use the following dataset.
Outlook Temperature Humidity Windy Play
Sunny Hot High False No
Sunny Hot High True No
Overcast Hot High False Yes
Rainy Mild High False Yes
Rainy Cool Normal False Yes
Rainy Cool Normal True No
Overcast Cool Normal True Yes
Sunny Mild High False No
Sunny Cool Normal False Yes
Rainy Mild Normal False Yes
Sunny Mild Normal True Yes
Overcast Mild High True Yes
Overcast Hot Normal False Yes
Rainy Mild High True No
1(a) How would the naïve Bayes classifier classify the data instance X = (sunny,
hot, high, false)?
2(b) Does this agree with the classification given in for the data instance X =
(sunny, hot, high, false)?
3(c) Consider a new data instance X’= (overcast, cool, high, true). How would
the naïve Bayes classifier classify X’?
4(d) Construct a decision tree from the above table
5(e) Compare the decision tree model with Bayesian classifier for (a), (b) and
(c).
(1.5+0.5+1+5+1)
3. Perform average link and DBSCAN to find the clusters in the given data. Use
min.pts=2 and ∈ = 2 for calculations.
(4)
Item A B C D E
A 0 1 4 5 7
B 1 0 2 6 8
C 4 2 0 3 4
D 5 6 3 0 4
E 7 8 4 4 0

OsokeyServerlessComputingSeismicWhitepaperAWS 2019
No ratings yet
OsokeyServerlessComputingSeismicWhitepaperAWS 2019
24 pages
Shelly Cashman Series Microsoft Office 365 and Access 2016 Introductory 1st Edition Pratt Solutions Manual 1
100% (70)
Shelly Cashman Series Microsoft Office 365 and Access 2016 Introductory 1st Edition Pratt Solutions Manual 1
11 pages
Iso 14641-2018
No ratings yet
Iso 14641-2018
50 pages
Odat-53 - Oracle Database 19C: New Features For Administrators ED1
No ratings yet
Odat-53 - Oracle Database 19C: New Features For Administrators ED1
2 pages
Knime Seventechniquesdatadimreduction
No ratings yet
Knime Seventechniquesdatadimreduction
266 pages
4.MBBS NRI Declaration 2023 24
No ratings yet
4.MBBS NRI Declaration 2023 24
1 page
Data Mining Model Qns
No ratings yet
Data Mining Model Qns
14 pages
Data Mining-1,2,3,4, & 5-Units & Qps
No ratings yet
Data Mining-1,2,3,4, & 5-Units & Qps
160 pages
Ad Server-Design Bav
No ratings yet
Ad Server-Design Bav
17 pages
Big Data Module 1 - Print-Ready Workbook (Letter)
No ratings yet
Big Data Module 1 - Print-Ready Workbook (Letter)
99 pages
Biology For 12th Class Cbse
100% (2)
Biology For 12th Class Cbse
245 pages
Data Mining MCQs unit1&2
No ratings yet
Data Mining MCQs unit1&2
11 pages
Chapter 11: Indexing and Storage: Modified From: Database System Concepts, 6 Ed
No ratings yet
Chapter 11: Indexing and Storage: Modified From: Database System Concepts, 6 Ed
53 pages
SAP HANA Interview Questions and Answers
No ratings yet
SAP HANA Interview Questions and Answers
2 pages
Datamining Quiz
No ratings yet
Datamining Quiz
173 pages
20027-InfosysImprovingProductivity (1) PDF
No ratings yet
20027-InfosysImprovingProductivity (1) PDF
22 pages
DWDM MID - 2 Question Paper and Online Bits
No ratings yet
DWDM MID - 2 Question Paper and Online Bits
3 pages
STATISTICS For MGT Summary of Chapters
100% (1)
STATISTICS For MGT Summary of Chapters
16 pages
SAML Presentation01
No ratings yet
SAML Presentation01
18 pages
Certified Tranter: Data Sheet
No ratings yet
Certified Tranter: Data Sheet
5 pages
MCQ
100% (7)
MCQ
37 pages
UNIT 3 PPTT
No ratings yet
UNIT 3 PPTT
35 pages
Data Mining Question Bank
No ratings yet
Data Mining Question Bank
4 pages
Sovia Khorun Nisa - Telenursing - Makalah
No ratings yet
Sovia Khorun Nisa - Telenursing - Makalah
13 pages
Data Mining MCQ Multiple Choice Questions With Answers: Eguardian
No ratings yet
Data Mining MCQ Multiple Choice Questions With Answers: Eguardian
15 pages
Mcq on Data Mining
No ratings yet
Mcq on Data Mining
20 pages
InfyTQ DBMS PART 2
No ratings yet
InfyTQ DBMS PART 2
19 pages
BE Information Technology 0
No ratings yet
BE Information Technology 0
655 pages
Data Warehousing and DatabySRS
No ratings yet
Data Warehousing and DatabySRS
8 pages
Song 2class sip69SIP2011
No ratings yet
Song 2class sip69SIP2011
8 pages
Health Management Information System: Lesson 6
No ratings yet
Health Management Information System: Lesson 6
8 pages
Data Security Platform and The SQL Server Sa Account 10-18-2022
No ratings yet
Data Security Platform and The SQL Server Sa Account 10-18-2022
11 pages
Strain
No ratings yet
Strain
12 pages
Final Report Format ITE English
No ratings yet
Final Report Format ITE English
23 pages
Data Mining Mid 1_Students-1
No ratings yet
Data Mining Mid 1_Students-1
4 pages
Data Mining List of Important Question
No ratings yet
Data Mining List of Important Question
4 pages
DMW MCQ
No ratings yet
DMW MCQ
388 pages
Assignment Software Design and Architecture
No ratings yet
Assignment Software Design and Architecture
8 pages
Optimizing Large Database Imports: Logical
No ratings yet
Optimizing Large Database Imports: Logical
11 pages
Retrieve Data Using Query 11
No ratings yet
Retrieve Data Using Query 11
11 pages
A Systematic Review On Business Analytics: Abstract: Purpose
No ratings yet
A Systematic Review On Business Analytics: Abstract: Purpose
14 pages
DMDA Viva Questions-1
No ratings yet
DMDA Viva Questions-1
7 pages
Data Mining
No ratings yet
Data Mining
32 pages
B.Tech May2022 Comp CSPE-64 Sem4
No ratings yet
B.Tech May2022 Comp CSPE-64 Sem4
4 pages
Data Mining (Gtu Sem-6)002
No ratings yet
Data Mining (Gtu Sem-6)002
5 pages
Sattriya and Bharatanatyam
No ratings yet
Sattriya and Bharatanatyam
13 pages
Datamining Bits
No ratings yet
Datamining Bits
16 pages
Hayagriva Telugu Meanings
No ratings yet
Hayagriva Telugu Meanings
7 pages
Netflix Case Study Clustering
No ratings yet
Netflix Case Study Clustering
2 pages
DWDM SR2
No ratings yet
DWDM SR2
21 pages
Chapter 01-Database 3CS
No ratings yet
Chapter 01-Database 3CS
10 pages
Hostel Rules & Regulations
No ratings yet
Hostel Rules & Regulations
3 pages
CEGP013091: 49.248.216.238 08/12/2018 13:08:58 Static-238
No ratings yet
CEGP013091: 49.248.216.238 08/12/2018 13:08:58 Static-238
3 pages
Land Records and Titles in India
No ratings yet
Land Records and Titles in India
24 pages
Dcs 7302
No ratings yet
Dcs 7302
17 pages
Project Management Gantt Chart type 2
No ratings yet
Project Management Gantt Chart type 2
2 pages
Book-Bank-Management-System
No ratings yet
Book-Bank-Management-System
2 pages
SemSuggestions DM
No ratings yet
SemSuggestions DM
6 pages
Subject Code: 80359 Subject Name: Data Warehousing and Data Mining Common Subject Code (If Any)
No ratings yet
Subject Code: 80359 Subject Name: Data Warehousing and Data Mining Common Subject Code (If Any)
9 pages
Power BI and Excel
No ratings yet
Power BI and Excel
6 pages
unit 4- Question Bank
No ratings yet
unit 4- Question Bank
11 pages
DM-Question Bank 2024-25 Objective Question Bank
No ratings yet
DM-Question Bank 2024-25 Objective Question Bank
14 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
13 pages
III Yr B.Tech. - Computer Science & Engineering/Information Technology Data Mining
No ratings yet
III Yr B.Tech. - Computer Science & Engineering/Information Technology Data Mining
2 pages
MCQS Unit -5
No ratings yet
MCQS Unit -5
4 pages
DMDW Question Bank
No ratings yet
DMDW Question Bank
17 pages
DM
No ratings yet
DM
7 pages
Ipl Cricket Score
No ratings yet
Ipl Cricket Score
8 pages
DM imp bits
No ratings yet
DM imp bits
4 pages
Data Mining Merged
No ratings yet
Data Mining Merged
10 pages
Seperated
No ratings yet
Seperated
11 pages
Data Mining
No ratings yet
Data Mining
8 pages
unit4 mcqs
No ratings yet
unit4 mcqs
7 pages
DM Obj
No ratings yet
DM Obj
16 pages
Important Questions dwdm-2
No ratings yet
Important Questions dwdm-2
7 pages
Data Mining Imp. Questions in English
No ratings yet
Data Mining Imp. Questions in English
21 pages
126VW122019
No ratings yet
126VW122019
2 pages
DWDM_QB[1]
No ratings yet
DWDM_QB[1]
6 pages
unit-5 -Question bank
No ratings yet
unit-5 -Question bank
5 pages
DM - IV Cse-Quiz I Paper
No ratings yet
DM - IV Cse-Quiz I Paper
2 pages
DWDM_Mid-1
No ratings yet
DWDM_Mid-1
3 pages
Support Vector Machines & Kernels: David Sontag New York University
No ratings yet
Support Vector Machines & Kernels: David Sontag New York University
19 pages
2020 Hindu Religious Calendar in Color in 1 Page
No ratings yet
2020 Hindu Religious Calendar in Color in 1 Page
1 page
mcq-on-data-mining
No ratings yet
mcq-on-data-mining
20 pages
DM Question Bank
No ratings yet
DM Question Bank
5 pages
Data Warehousing and Mining April 2019
No ratings yet
Data Warehousing and Mining April 2019
4 pages
Aie - Concept of Data Mining
No ratings yet
Aie - Concept of Data Mining
5 pages
Sample Question DMW
No ratings yet
Sample Question DMW
4 pages
Power BI Interview Questions
No ratings yet
Power BI Interview Questions
5 pages
mcqs unit 3
No ratings yet
mcqs unit 3
6 pages
QUESTION BANK BCA_IDS
No ratings yet
QUESTION BANK BCA_IDS
3 pages
unit 3 Question Bank
No ratings yet
unit 3 Question Bank
8 pages
DMDW Lab Oral Question Bank
No ratings yet
DMDW Lab Oral Question Bank
4 pages
DM_MCQS_UNIT-1
No ratings yet
DM_MCQS_UNIT-1
4 pages
Enhancing Decision Making: © 2007 by Prentice Hall
No ratings yet
Enhancing Decision Making: © 2007 by Prentice Hall
15 pages
DM IV YR MID2 Set2
No ratings yet
DM IV YR MID2 Set2
4 pages
Data Mining IMP Objective Questions_Sep 2023
No ratings yet
Data Mining IMP Objective Questions_Sep 2023
4 pages
5952 Frequently Asked Questions 9001 2015
No ratings yet
5952 Frequently Asked Questions 9001 2015
4 pages
Selenium - A Brief Overview
No ratings yet
Selenium - A Brief Overview
27 pages
Data Mining and Warehousing
No ratings yet
Data Mining and Warehousing
12 pages
Iv Semester: Data Mining Question Bank: Unit 2 2 Mark Questions)
No ratings yet
Iv Semester: Data Mining Question Bank: Unit 2 2 Mark Questions)
5 pages
804 Syllabus
No ratings yet
804 Syllabus
9 pages
Apache Cassandra Developer Associate - Exam Practice Tests
From Everand
Apache Cassandra Developer Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet
100 Puzzles to Learn Data Warehousing
From Everand
100 Puzzles to Learn Data Warehousing
Cristian Scutaru
No ratings yet

Uploaded by

Uploaded by

BIRLA INSTITUTE OF TECHNOLOGY & SCIENCE, PILANI

II SEMESTER 2004-2005CS C415/IS C415 – DATA MINING

7. The Abundance problem is that

11. pick the odd one out

19. Give two important factors for evaluation of an classification algorithm

2. How redundant multilevel rules can be removed? Explain through an example.

3. Differentiate noise and outlier through an example.

You might also like