Ii Semester 2004-2005CS C415/is C415 - Data Mining
Ii Semester 2004-2005CS C415/is C415 - Data Mining
14. ________________ selectively searches the web and incrementally modifies the
index.
15. Bayesian classifier that allows dependencies among sets of attributes
_______________________
16. OPTICS is ______________________________________________________
17. In DBSCAN, a cluster is defined as
_______________________________________________________________
18. Name data sources for web usage mining
20. Classify the following three attributes as binary, discrete or continuous. Further
classify the attributes as qualitative (nominal or ordinal) or quantitative (interval
or ratio)
(a) seasons of a year _________________, __________________
(b) color of the eyes _________________, __________________
(c) decay in a radioactive element __________________, _______________
(d) pair-wise distances between cities _________________, _______________
21. Write data mining task against the following examples of mined patterns
(a) People with age less than 25 and salary>40 K drive sports cars _________
(b) 80% of images containing a car as an object also contain a blue sky______
(c) Set of images that contain a car as an object ________________________
(d) Stocks of companies A and B perform similarly ______________________
(e) Sale of furniture increases with the improvement of real estate business ___
(f) Predicting water level at a potential point in the river based on the data
collected by the sensors _________________________________________
BIRLA INSTITUTE OF TECHNOLOGY & SCIENCE, PILANI
II SEMESTER 2004-2005 CS C415/IS C415 – DATA MINING
Comprehensive Exam
Part-A (Closed book) continued Max Marks:8
Use supplementary sheet to answer the following questions.
1. Identify the best algorithm to mine long patterns. Justify your answer.
4. What are the problems with k-means? How can you deal with these problems?
5. Find the best hub and the most authoritative page from the following mini web.
Z Y
(1.5+1.5+1+2+2)
BIRLA INSTITUTE OF TECHNOLOGY & SCIENCE, PILANI
II SEMESTER 2004-2005 CS C415/IS C415 – DATA MINING
Comprehensive Exam
Part-B(Open book) Max Marks: 20 Date: 13th May 2005
1. (a) Find the all-frequent itemsets using Frequent Pattern Growth. For the data
given. Also generate all interesting association rules for minconf 70%
(b) Use Apriori to find all frequent itemsets for the data given. Also give non
frequent itemsets, itemsets not generated and itemsets pruned at each
step.
(4+3)
Trans No. Items
1 K,S,T
2 J,T
3 J,S,T
4 J,S
5 K,S
6 J,K
7 J,K,S,T
8 J,S,T
2. Use the following dataset.
Outlook Temperature Humidity Windy Play
Sunny Hot High False No
Sunny Hot High True No
Overcast Hot High False Yes
Rainy Mild High False Yes
Rainy Cool Normal False Yes
Rainy Cool Normal True No
Overcast Cool Normal True Yes
Sunny Mild High False No
Sunny Cool Normal False Yes
Rainy Mild Normal False Yes
Sunny Mild Normal True Yes
Overcast Mild High True Yes
Overcast Hot Normal False Yes
Rainy Mild High True No
1(a) How would the naïve Bayes classifier classify the data instance X = (sunny,
hot, high, false)?
2(b) Does this agree with the classification given in for the data instance X =
(sunny, hot, high, false)?
3(c) Consider a new data instance X’= (overcast, cool, high, true). How would
the naïve Bayes classifier classify X’?
4(d) Construct a decision tree from the above table
5(e) Compare the decision tree model with Bayesian classifier for (a), (b) and
(c).
(1.5+0.5+1+5+1)
3. Perform average link and DBSCAN to find the clusters in the given data. Use
min.pts=2 and ∈ = 2 for calculations.
(4)
Item A B C D E
A 0 1 4 5 7
B 1 0 2 6 8
C 4 2 0 3 4
D 5 6 3 0 4
E 7 8 4 4 0