0% found this document useful (0 votes)
75 views6 pages

Ii Semester 2004-2005CS C415/is C415 - Data Mining

1. The document contains a comprehensive exam for a data mining course, including multiple choice and open response questions. 2. The open response questions ask students to identify the best algorithm for mining long patterns, explain how to remove redundant rules, differentiate noise from outliers with an example, discuss problems with k-means clustering and how to address them, and find the best hub and most authoritative page in a mini web graph. 3. The exam also includes exercises to find frequent itemsets using the Apriori and FP-Growth algorithms, classify instances using naive Bayes classification and construct a decision tree, and perform average link and DBSCAN clustering on a dataset.

Uploaded by

ramanadk
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
75 views6 pages

Ii Semester 2004-2005CS C415/is C415 - Data Mining

1. The document contains a comprehensive exam for a data mining course, including multiple choice and open response questions. 2. The open response questions ask students to identify the best algorithm for mining long patterns, explain how to remove redundant rules, differentiate noise from outliers with an example, discuss problems with k-means clustering and how to address them, and find the best hub and most authoritative page in a mini web graph. 3. The exam also includes exercises to find frequent itemsets using the Apriori and FP-Growth algorithms, classify instances using naive Bayes classification and construct a decision tree, and perform average link and DBSCAN clustering on a dataset.

Uploaded by

ramanadk
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

BIRLA INSTITUTE OF TECHNOLOGY & SCIENCE, PILANI

II SEMESTER 2004-2005CS C415/IS C415 – DATA MINING


Comprehensive Exam
Part-A (Closed book) Max Marks:12 Date: 13th May 2005
Each Question caries 0.5 marks except 19-20. More than one option may be correct.
1. Data Mining
(a) tasks can be descriptive or Predictive
(b) refers to current business trends in collecting and cleaning the transactional
data and making them available for analysis and decision support
(c) focuses on overall processes of knowledge discovery
(d) is an interdisciplinary field with a general goal of predicting outcomes and
uncovering relationships in data
2. Data warehouse is an ideal source of data for data mining because data in data
warehouse is
(a) integrated (c) clean
(b) historical (d) all
3. Which AR would you prefer to find rare item set (wherever they occur, they occur
together)
(a) high support high confidence (c) low support high confidence
(b) high support low confidence (d) low support low confidence
4. According to the principle of frequent pattern growth 123456 is a frequent
pattern iff 12345 is frequent in DB and
(a) 6 is frequent patterns in DB
(b) 23456 is frequent pattern in DB
(c) 6 is frequent in the transactions containing 12345
(d) 12345=>6 has confidence more than the min_sup
5. k-fold cross validation
(a) divides the dataset into k subsamples
(b) randomly selects subsample k times
(c) uses k-1 subsamples for training data and one for testing data
(d) performs training and testing k times
6. The best solution for classification problem will have entries in the confusion
matrix
(a) All zero values at diagonal
(b) All nonzero vales at diagonal and out side the diagonal
(c) Only zero values are outside the diagonal
(d) None of the above

7. The Abundance problem is that


(a) Thousands of irrelevant documents are to be searched
(b) Hundreds of irrelevant documents are returned in response to search a
query
(c) 99% of information of no interest to 99% of the people
(d) Internet sources are hidden behind search interfaces
(e)
8. The following classifier(s) are eager learners
(a) Case-based reasoning
(b) Decision tree
(c) K-nearest neighbors
(d) Bayesian classifier
9. The quality of clusters will depend on the
(a) low intra-class similarity
(b) high inter-class similarity
(c) similarity measures used by the clustering algorithm
(d) choice of the predicates used
10. Maximum distance between clusters {2,4,10,12,32} and {20,30,3,11}
(a) 28 (b) 29 (c) 18 (d) 21

11. pick the odd one out


(a) CLARA
(b) DBSCAN
(c) DENCLUE
(d) CLIQUE
12. Pick the incorrect statement(s) about STING
(a) It is a query independent technique
(b) Incremental updates are possible
(c) Cluster boundaries are either horizontal, vertical or diagonal
(d) O(k), k is number of data points
13. _____________ and _________________ are two techniques to improve the
classifier accuracy

14. ________________ selectively searches the web and incrementally modifies the
index.
15. Bayesian classifier that allows dependencies among sets of attributes

_______________________
16. OPTICS is ______________________________________________________
17. In DBSCAN, a cluster is defined as

_______________________________________________________________
18. Name data sources for web usage mining

19. Give two important factors for evaluation of an classification algorithm

20. Classify the following three attributes as binary, discrete or continuous. Further
classify the attributes as qualitative (nominal or ordinal) or quantitative (interval
or ratio)
(a) seasons of a year _________________, __________________
(b) color of the eyes _________________, __________________
(c) decay in a radioactive element __________________, _______________
(d) pair-wise distances between cities _________________, _______________

21. Write data mining task against the following examples of mined patterns
(a) People with age less than 25 and salary>40 K drive sports cars _________
(b) 80% of images containing a car as an object also contain a blue sky______
(c) Set of images that contain a car as an object ________________________
(d) Stocks of companies A and B perform similarly ______________________
(e) Sale of furniture increases with the improvement of real estate business ___
(f) Predicting water level at a potential point in the river based on the data
collected by the sensors _________________________________________
BIRLA INSTITUTE OF TECHNOLOGY & SCIENCE, PILANI
II SEMESTER 2004-2005 CS C415/IS C415 – DATA MINING
Comprehensive Exam
Part-A (Closed book) continued Max Marks:8
Use supplementary sheet to answer the following questions.

1. Identify the best algorithm to mine long patterns. Justify your answer.

2. How redundant multilevel rules can be removed? Explain through an example.

3. Differentiate noise and outlier through an example.

4. What are the problems with k-means? How can you deal with these problems?

5. Find the best hub and the most authoritative page from the following mini web.

Z Y

(1.5+1.5+1+2+2)
BIRLA INSTITUTE OF TECHNOLOGY & SCIENCE, PILANI
II SEMESTER 2004-2005 CS C415/IS C415 – DATA MINING
Comprehensive Exam
Part-B(Open book) Max Marks: 20 Date: 13th May 2005
1. (a) Find the all-frequent itemsets using Frequent Pattern Growth. For the data
given. Also generate all interesting association rules for minconf 70%
(b) Use Apriori to find all frequent itemsets for the data given. Also give non
frequent itemsets, itemsets not generated and itemsets pruned at each
step.
(4+3)
Trans No. Items
1 K,S,T
2 J,T
3 J,S,T
4 J,S
5 K,S
6 J,K
7 J,K,S,T
8 J,S,T
2. Use the following dataset.
Outlook Temperature Humidity Windy Play
Sunny Hot High False No
Sunny Hot High True No
Overcast Hot High False Yes
Rainy Mild High False Yes
Rainy Cool Normal False Yes
Rainy Cool Normal True No
Overcast Cool Normal True Yes
Sunny Mild High False No
Sunny Cool Normal False Yes
Rainy Mild Normal False Yes
Sunny Mild Normal True Yes
Overcast Mild High True Yes
Overcast Hot Normal False Yes
Rainy Mild High True No
1(a) How would the naïve Bayes classifier classify the data instance X = (sunny,
hot, high, false)?
2(b) Does this agree with the classification given in for the data instance X =
(sunny, hot, high, false)?
3(c) Consider a new data instance X’= (overcast, cool, high, true). How would
the naïve Bayes classifier classify X’?
4(d) Construct a decision tree from the above table
5(e) Compare the decision tree model with Bayesian classifier for (a), (b) and
(c).
(1.5+0.5+1+5+1)
3. Perform average link and DBSCAN to find the clusters in the given data. Use
min.pts=2 and ∈ = 2 for calculations.
(4)
Item A B C D E
A 0 1 4 5 7
B 1 0 2 6 8
C 4 2 0 3 4
D 5 6 3 0 4
E 7 8 4 4 0

You might also like