ISR Question Bank
ISR Question Bank
8
23
P273 [Total No. of Pages : 2
ic-
B.E/INSEM/APR-600
tat
5s
B.E. (Information Technology) (Semester - II)
1:0
414464B : INFORMATION STORAGE AND RETRIEVAL
02 91
3:4
(2015 Pattern) (Elective - III)
0
01
Time : 1 Hour] 4/0 13 [Max. Marks : 30
Instructions to the candidates:
0
3/2
.23 GP
8
C
23
ic-
Q1) a) You are developing a text processing system for use in an automatic
16
tat
retrieval System. What are different steps of conflation algorithm. [6]
8.2
5s
.24
1:0
91
b) Draw and explain IR system block diagram. [4]
49
3:4
30
01
OR
01
02
3/2
GP
8
23
.23
ic-
16
tat
8.2
5s
1:0
91
49
3:4
OR
3/2
GP
4/0
P.T.O.
49
Q5) a) Define and explain following terms - Precision & Recall. [6]
8
23
b) Explain the terms Harmonic mean, E measure. [4]
ic-
tat
OR
5s
1:0
Q6) a) Explain User oriented measures for evaluating performance of IR system.
02 91
3:4
[6]
0
01
b) 4/0 13
Define and explain following terms : [4]
0
3/2
.23 GP
i) MRR.
E
ii) NDCG.
80
8
C
23
ic-
16
tat
8.2
5s
.24
1:0
91
49
3:4
30
01
01
02
3/2
GP
4/0
CE
80
8
23
.23
ic-
16
tat
8.2
5s
.24
1:0
91
49
3:4
30
01
01
02
3/2
GP
4/0
CE
80
.23
16
8.2
.24
B.E/INSEM/APR-600 2
49
Total No. of Questions : 6] SEAT No. :
8
23
P1423 [Total No. of Pages : 2
ic-
BE/Insem./APR-259
tat
1s
B.E. (Information Technology)
2:0
INFORMATION STORAGE AND RETRIEVAL
01 91
3:4
(2015 Course) (Semester -II) (Elective - III) (414464B)
0
91
3/0 13
0
Time : 1 Hour] [Max. Marks :30
3/2
.23 GP
8
C
23
2) Figures to right indicate full marks.
ic-
16
tat
Q1) a) What is a document representative? Explain with a suitable example. [5]
8.2
1s
b) Draw and explain IR system block diagram. [5]
.24
2:0
91
49
OR
3:4
30
91
8
Q3) a) Find the similarity of the following query with documents - D1, D2, D3
23
.23
tat
Query Keywords
8.2
1s
Q mouse, dog
.24
2:0
91
3:4
30
D3 cat gnu dog eel fox Cat dog eel fox gnu
CE
81
b) Show how single link clusters may be derived from the dissimilarity
.23
OR
.24
BE/Insem.-259 1 P.T.O.
49
Q4) a) Explain Latent Semantic Indexing with a suitable example. [5]
8
23
b) Compare Neural network-based retrieval and Fuzzy set retrieval methods.
ic-
[5]
tat
1s
2:0
01 91
3:4
Q5) a) Define and explain following terms - Precision & Recall. [5]
0
91
b) 3/0 13
Define and explain following terms - [5]
0
3/2
.23 GP
i) MRR
E
ii) NDCG
81
8
C
23
OR
ic-
16
tat
8.2
1s
Q6) a) Define and explain following concepts. [5]
.24
2:0
91
49
3:4
i) Cross fold validation.
30
91
8
23
.23
ic-
ii) Interface support for search process.
16
tat
8.2
1s
EEE
.24
2:0
91
49
3:4
30
91
01
01
3/2
GP
3/0
CE
81
.23
16
8.2
.24
BE/Insem.-259 2
49
Total No. of Questions : 10] SEAT No. :
8
23
P4009 [5561]-716
[Total No. of Pages : 2
ic-
B.E. (Information Technology)
tat
8s
INFORMATION STORAGE & RETRIEVAL
6:4
(2015 Course) (414464B) (Semester - II) (Elective - III)
01 91
3:4
0
Time : 2½ Hours] [Max. Marks : 70
91
4/0 13
Instructions to the candidates:
0
1) Neat diagrams must be drawn wherever necessary.
5/2
.23 GP
8
C
23
Q1) a) Differentiate between data retrieval and information retrieval. [6]
ic-
b) List with definition different measures of association. [4]
16
tat
OR
8.2
8s
Q2) a) Compare Boolean and vector model. [6]
.24
6:4
b) List and explain steps of conflation algorithm. [4]
91
49
3:4
30
histogram. [5]
01
01
diagram. [5]
4/0
OR
CE
82
8
Q4) a) Dissimilarity matrix is given as follows. [5]
23
.23
1
ic-
16
2 0.6
tat
8.2
3 0.6 0.8
8s
.24
3:4
1 2 3 4 5 6
01
01
Apply single link algorithm and calculate cluster for above 6 objects.
GP
4/0
OR
.24
P.T.O.
49
Q6) a) Describe multimedia data support in commercial DBMS. [9]
8
23
b) Describe the architecture of distributed IR. [9]
ic-
tat
8s
Q7) a) What is web crawling? Explain techniques used by web crawlers to crawl
6:4
the web. [8]
01 91
3:4
b) Write short note on web data mining. [8]
0
91
4/0 13 OR
0
5/2
Q8) a) Discuss the challenges involve in web search engine. [8]
.23 GP
8
C
23
ic-
16
tat
8.2
8s
.24
3:4
b) Explain semantic web in detail. [8]
30
91
OR
01
01
8
23
.23
ic-
16
tat
8.2
8s
.24
6:4
91
49
3:4
30
91
01
01
5/2
GP
4/0
CE
82
.23
16
8.2
.24
2
49
[5561]-716
Total No. of Questions : 10] SEAT No. :
P3260 [Total No. of Pages :2
[5461] - 298
1
8:2
B.E. (Information Technology)
3:3
INFORMATION STORAGE AND RETRIEVAL
81
1/2 1
9
(2012 Course) (End Sem.)(Semester - II) (414463 C) (Elective - III)
01
8 2 30
.23 01
Time : 2½ Hours] [Max. Marks :70
2/1
Instructions to the candidates:
8.2 G
P
1
8:2
49
3:3
Q1) a) Write difference between Data retrieval and Information Retrieval. Define
81
91
Index term. [5]
01
30
1/2
b) Let [5]
01
2/1
Printer}
.23
CE
Compiler}.
8.2
8:2
3:3
v) Overlap coefficient.
82
.23
OR
CE
16
8.2
[5461]-298 1 P.T.O.
Q3) a) Explain Vector model in detail. [5]
b) Explain ontology and ontology life cycle. [5]
OR
1
8:2
Q4) Discuss cluster based retrieval strategy. Also explain how to define cluster
3:3
representative. [10]
81
1/2 1
9
01
8 2 30
Q5) a) How queries are processed in distributed IR?
.23 01 [8]
b) Explain GEMINI approach for multimedia IR. [8]
2/1
P
OR
8.2 G
1
8:2
49
3:3
Q7) a) What are meta crawlers? Explain with example. [8]
81
91
b) Explain centralized crawler indexer and harvest distributed architecture
01
30
OR
2/1
GP
82
Q8) a) What is meant by web crawling? Explain processing steps in web crawling.
[6]
.23
CE
8:2
49
3:3
81
91
OR
01
2/1
.24
49
[5461]-298 2
Total No. of Questions : 10] SEAT No. :
4
[5354]-708
9:1
3:5
B.E. (Information Technology)
81
5/2 1
INFORMATION STORAGE AND RETRIEVAL
9
01
8 2 30
(Elective - III) (2012 Pattern) (End-Sem)
.23 01
Time : 2½ Hours] [Max. Marks :70
2/0
Instructions to the candidates:
8.2 G
P
4
9:1
49
3:5
Q1) What is term weighting? Explain the TF-IDF scheme to calculate the weight
81
91
of index term. Find the weight of following terms. [10]
01
30
5/2
D1 SNMP,SNMP,FTP SNMP,FTP
GP
82
D2 HTTP,FTP,HTTP,ARP, SNMP,FTP,HTTP,ARP
.23
CE
HTTP,SNMP,HTTP
16
8.2
D3 NIC,INTERNET,HTTP, NIC,HTTP,PROTOCOL,
4
.24
PROTOCOL,HUB HUB,INTERNET
9:1
49
3:5
OR
81
91
Q3) a) Explain inverted File structure with the help of diagram. State how it
CE
16
[5]
49
P.T.O.
OR
Q4) What is signature file? Describe false drop and search optimization using
4
9:1
signature files. Justify with example [10]
3:5
Q5) a) What is the need of distributed IR? Draw and explain architecture of
81
5/2 1
9
distributed IR system. [8]
01
8 2 30
b) What do you mean by collection partitioning & source selection in
Distributed IR? [8]
.23 01
2/0
P
OR
8.2 G
.24 CE
Q6) a) Explain with example some of the predicates used in multimedia query
16
language. [8]
b) What is indexing? How to index multimedia objects? [8]
4
9:1
49
3:5
Q7) a) Compare any two search engines with respect to features they support.
81 [6]
91
b) What is page ranking? Explain with example how to calculate a page
01
30
OR
GP
82
.23
4
.24
9:1
Q9) a) How to collect and integrate specialized information on the web? [8]
49
3:5
OR
01
30
5/2
01
vvvv
49
[5354]-708 2
Total No. of Questions :6] SEAT No. :
OR
Q2) a) Why single pass algorithm is better than Rocchio's Algorithm? [10]
Form the document cluster of following document term matrix using
single pass clustering algorithm.
Consider
Membership function : Sum of product
centroid calculation function : Average
Threshold = 11
D1 D2 D3 D4 D5
T1 1 1 0 1 1
T2 2 1 2 3 0
T3 3 0 1 0 1
T4 2 2 0 3 0
T5 2 2 1 2 1
P.T.O
Q3) a) Compare boolean model and vector model. Explain how vector model
can be used to retrieve partial matching documents. [6]
b) What are inverted files? Explain how these file can be used to answer
Boolean queries. [4]
OR
Q4) a) Explain working of suffix tree. Construct suffix tree for following example
"This is a text. A text has many words. Words are made from letters."[6]
Q5) a) Wirte a note on user oriented measures used for evaluating the performance
of any retrieval system. Also explain their significance. [6]
OR
Q6) a) Write a note on " Ontology languages for semantic web". [5]
iii
BE/Insem-73 2
Total No. of Questions : 10] SEAT No :
6
R3
P3142 [5154]-708
[Total No. of Pages :2
VE
B.E.(I.T.)
ER
INFORMATION STORAGE AND RETRIEVAL
S
:02
(2012 Course) (Elective-III) (414463 C) (Semester-II)
17 91
:38
/20 30
13
Time : 2½ Hours 1 Max.Marks:70
Instructions to the candidates:
22 P0
/05
36
4) Assume suitable data, if necessary.
C
ER
34
RV
Q1) a) Show how single link clusters may be derived from the dissimilarity
coefficient by thresholding it. [5]
SE
1.3
:02
91
retrieval System. Explain the following parts: [5]
14
:38
Removal of high frequency words.
30
13
Suffix stripping.
01
OR
/20
GP
/05
Q2) a) Find the similarity of following query with D1,D2,D3, using vector model.
CE
36
22
[6]
ER
34
Query keywords
RV
9.2
SE
1.3
q ant, dog
.14
:02
:38
D2 dog bee dog hog dog ant dog ant bee dog hog
01
17
/20
D3 cat gnu dog eel fox cat dog eel fox gnu
GP
/05
OR
.14
[5154]-708 1 P.T.O.
14
Q4) Consider a reference collection and its set of example information request. If
6
R3
q is the information request and a set Rq= (d3,d5,d9,d25,d39,d44,d50,d70,
VE
d80,d120). Now consider new retrieval algorithm has been designed and has
ER
been evaluated for information request q returns, ranking of the documents
in the answer set as. [10]
S
1.d120 6.d9 11.d38
:02
17 91
2.d84 7.d58 12.d48
:38
/20 30
3.d50 8.d129 13.d230
13
4.d6 9.d143 14.d113
1
5.d8 10.d25 15.d3
22 P0
The documents that are relevant to the query q are underlined. Calculate
9.2 EG
/05
precision and recall for the documents that are relevant to the query q.
36
Q5) a) Describe the architecture of distributed IR. [8]
C
ER
34
RV
query predicates. [8]
OR
SE
1.3
Q6) a) What are the issues in distributed IR computing? Write the techniques
.14
:02
91
used to address these issues. [8]
14
:38
b) Write a note on MULTOS. [8]
30
13
36
22
ER
34
RV
9.2
OR
SE
1.3
:02
:38
30
OR
/05
S S S
9.2
1.3
.14
[5154]-708 2
14
Total No. of Questions : 10] SEAT No. :
P3647 [4959]-1138
[Total No. of Pages :2
B.E.(Information Technology)
c: INFORMATION STORAGE AND RETRIEVAL
(2012 Course) (Semester-II)(Elective-III) (414463)
Time :2½Hours] [Max. Marks : 70
Instructions to the candidates:
1) All questions are compulsory.
2) Neat diagrams must be drawn wherever necessary.
3) Figures to the right indicate full marks.
4) Assume suitable data if necessary.
[4959]-1138 1 P.T.O.
Q4) a) Explain the term precision and recall and calculate the same for the
following example [5]
A set of relevant documents for query
q = {d3,d7,d8,d11,d14,d19,d23,d25}
A new retrieval algorithm returns following answer set
= {d1,d2,d3,d7,d9,d10,d14,d20,d23,d24,d25}
b) Explain the terms Harmonic mean, E measure, R precision, Precision
histogram [5]
OR
OR
Q10)a) Explain the method for extracting data from text [8]
b) Explain Collecting and Integrating Specialized Information on the web.
[8]
tu tu tu
[4959]-1138 2
Total No. of Questions : 10] SEAT No. :
[5059] - 678
B.E. (Information Technology)
INFORMATION STORAGE AND RETRIEVAL
(2012 Pattern) (Elective - III)
Time : 2½ Hours] [Max. Marks :70
Instructions to the candidates:
1) All questions are compulsory.
2) Neat diagrams must be drawn wherever necessary.
3) Figures to the right indicate full marks.
4) Assume Suitable data, if necessary.
Q1) Why single pass algorithm is better than Rocchio’s Algorithm? Form the
document cluster of following document term matrix using single pass
clustering algorithm. Consider [10]
Membership Function: Sum of product
Centroid calculation Function: Average
Threshold = 11
Dl D2 D3 D4 D5
T1 1 1 0 1 1
T2 2 1 2 3 0
T3 3 0 1 0 1
T4 2 2 0 3 0
T5 2 2 1 2 1
OR
Q2) a) Explain working of suffix tree. Construct suffix tree for following
example. [6]
“This is a text. A text has many words. Words are made from letters.”
b) Write a short note on matching coefficients. [4]
P.T.O.
Q3) a) Write a note on “Ontology languages for semantic web”. [5]
b) Write a note on “cluster based retrieval”. [5]
OR
Q4) Consider a reference collection and its set of example information request.
If q is the information request and a set [10]
Rq=(d3, d5, d9, d25, d39, d44, d50, d70, d80, dl20). Now consider new
retrieval algorithm has been designed and has been evaluated for information
request q returns, ranking of the documents in the answer set as.
1. d120 6. d9 11. d38
2. d84 7. d58 12. d48
3. d50 8. d129 13. d230
4. d6 9. d143 14. d113
5. d8 10. d25 15. d3
The documents that are relevant to the query q are underlined. Calculate
precision and recall for the documents that are relevant to the query q
OR
OR
[5059]-678 2
Q8) a) What are the challenges while searching the web? [12]
b) What is the role of crawler in web searching? Explain the strategies
used by web crawler. [6]
OR
Q10)a) Explain the concept of semantic web .How it is useful in web searching?
[8]
b) Explain in detail content based recommendation of documents. [8]
vvvv
[5059]-678 3