0% found this document useful (0 votes)
169 views

IIRS Quiz-1 Bits

In the information retrieval model [D,Q,F,R(qi,dj)],
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as XLSX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
169 views

IIRS Quiz-1 Bits

In the information retrieval model [D,Q,F,R(qi,dj)],
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as XLSX, PDF, TXT or read online on Scribd
You are on page 1/ 15

Question

________ is the Activity of obtaining material which can be documents of unstructured nature
Three major component of Information Retrieval system are document subsystem ,Retrieval Subsystem and _____
__________ of a retrieval system has to translate his information need into a query in the language provided by the sy
In the language of the World Wide Web, retrieval and browsing are called as ____
provided by most modern information retrieval systems.

Overhead from a user's perspective is the time required to find tile information needed,
search composition, search execution, and ___________ are all aspects of information retrieval overhead.
Once the logical view of the documents is defined, the database manager (using the DB Manager Module) builds ___
text.
The full text is clearly the most __________________ of a document but its usage usually implies higher
The task whose main objectives are not clearlydefined in the beginning and whose purpose might change during the
interaction
Which operations transform the original documents and genera'te a logical view of them?
Which of the folllowing is correct search statement for "Select all items that discuss software and not storage or hardw
the item"?

In the information model quadraple [D,Q,F,R(qi,dj)], what D represents…

Major drawback suffered by Boolean model is due to the fact …..


Which of the following is not a classical model of information retrieval?

Which of the following is False with respect to boolean model?


The resultant effect retrieved by the_______ model is document answer set
The probabilistic model was introduced by…
In which model, the query term defines a fuzzy set..
Document surrogates are the _______ of full document

What is the reason for having information retrieval model ?


In Boolean retrieval, each item in the list which records that the term is appeared in the dicument is called as ….
An instance of a sequence of characters in the particular document that are grouped together as a useful semantic unit
processing is called as…
Relevance is determined by
__________provide a formal definition or mathematical framework for querying semi-structured textual databases
Given a document containing the sentence “I left my left bag at my home” the number of tokens in the sentence is
Which of the following items is not a component of a complete search system?
Weighted zone scoring is referred to as:
A group of related documents against which information retrieval is employed is called:
The standard approach to information retrieval system evaluation involves around the notion of:

What is the advantage of boolean model?


Which of the following statement is false with regards to boolean retrieval model?
Context queries uses search for the word ___
_______ is a sequence of single-word queries
With proximity query, a query "enhance retrieval" is seached such that two word should appear with the four word. C
match will be
In boolean query syntax tree leaves corresponds to
Which of the boolean query selects all documents that satisfy e1 and not e2 ?
With boolean querying system _______ of document is not provided
If a pattern "ter" matches with the word is matches document "tester" then it is called as
If a pattern "any flow" matches with the word is matches document "many flowers" then it is called as
Which of the following operator can be used to build the statement, "if e1 and e2 are regular expressions, then (e1 |
e2)matches what e1 or e2 matches " ?
__________ is a query protocol is based on the classical boolean model .
The user is presented with a list of the retrieved documents and,after examining them, marks those which are relevant
strategy is called as
The application of relevance feedback to the _______ considers that the term-weight vectors of the documents identif
relevant ( to a given query )have similarities among themselves

Simplicity of query expansion and term reweighting of vector model is because of:

In user relevance feedback which of the following is a disadvatage of probabilistic model ?


The classic probabilistic model introduced by
_________ is composed of classes which group correlated terms in the context of the whole collection.

The similarity between two clusters is defined as

A patterrn is set of

In pattern matching, more powerful the set of pattern is allowed then


This is the most basic pattern in pattern matching
Which of the following is NOT a benefit of index compression?
A web server communicates with a client (browser) using which protocol:
A model of information retrieval in which we can pose any query in which search terms are combined with the operat
AND, OR, and NOT:
An approach to computing scores in an IR system that orders documents in the posting list of a term by decreasing or
term frequency is called:
Which of the following item is not a component of a complete search system?
Which of the following is a technique for context sensitive spelling correction?
Which of the following software text search algorithm is fastest?
The bitmap Imge format proposed for the Internet is ....
______ provides a set of evolving features and architecture extensions to HTML and Web browsers that include casc
style sheets and document object model

A Thesaurus stands for...


A metric derived by taking the log of N divided by the document frequency where N is the total number of document
collection is called:
__________ is designed to transform and style highly structured, data rich documents written in XML.
without loosing control over the content, structure, and lauout of that document
A good compression algorithm is able to reduce the text to ______ of its original
The motivation for building thesaurus is based on the functional idea of using a ______ for the indexing and searchin
Lemmatization is a technique for:
The gain obtained from compressing text is that

One important consequence of using _______________ is the possibility of performing direct searching on compress
In which compression method, it is first necessary to establish the parameter b for each term?
Which compression technique has slow compression speed?
Compression models can be adaptive, static, or ________

Pick up the correct statement

The third step of the GEMINI methodology is to___


_________generation of Web query languages were aimed at combining content with structure
_______ search in the Web is equivalent to sequential text searching
The _____can also be compressed independently of the index.
________ is the Activity of obtaining material which can be documents of unstructured nature
An inverted index arranges data in a sorted order as per
The vocabulary size (unique words) of a text can be estimated using
A metric used measure the importance of a term in a text document collection is called
Information is composed of
In an Information Retrieval system, the ___________aspect has been the only data type that lent itself to full functional proce
Example of overhead in IRS can be
process considers query as a document.
The users generate the document and query in _________
SGL stands for ______
The gross structure of a document affects the way the document is __
the implicit structure of a document is used for ______
An _______ is a set of processes and procedures that transform data into information and knowledge
ensures that systems are developed on time, within budget and with acceptable quality.
Information is __________
For taking decisions, data must be:
The organized set of documents is called_____
The _________modifies the calculation on which relevance decisions are made
are qualitative and quantitative fuzzy descriptors
The word proximity depends on
Stemming a large document is _______
Stemming strips off______
Each item in the list is called as______________.
SEO stands for _____________ .
Dictionary performed by _________________pair
URL Stands for ______________________
NLTK stands for ______________
The primary storage medium for storing archival data
Spelling correction only depends on___________factor.
Boolean query operator?
Data by itself is not useful unless
Strategic information is required by
The number of times that a word or term occurs in a document is called the:
Best implementation approach for dynamic indexing is
Data represent in_________________format IR System
Term document incidence matrix is
Structured data allows for
The formula used to estimate the vocabulary size of a collection is known as:
To search document by _______________ in IR
The first large information retrieval research group was formed by
allows the user to issue a query on particular web site
_______________is the Activity of obtaining material which can be documents of unstructured nature
Three major component of Information Retrieval system are document subsystem ,Retrieval Subsystem and _______
____________is the unit of information that we want to return as a result of
_________is the topic about which the user desires to know more   
______- is the smallest unit of information in a query
Permuterm index is a index form of_____index
A Query like mon is called as ________________wildcard query
___________ algorithm is the algorithm for phonetic hashing
Edit distance is _____type of spelling correction
In Permuterm index________- symbol is used to mark the end of a term
____________is fraction of returned result relevant to information need
   Within Documernt collection each document has unique serial number known as _____
__is the process of selecting how to organize the work of answering a query so that least amount of work need to be done by
______-is a model for information retrieval in which we can pose any query which is in the form of Boolean Expression of term
______ is a fraction of relavant documents in the collection were returned by the system

The idea to use computers for searching information was published in the article As We May Think by_______in 194
_____ is the data structure for faster information retrieval which is collection of selected words and associated pointers
_______-were the first to adopt Information retrival system for retriving information
 In Document subsystem Abstracting contains
In Document Subsystem file organization contains term by term list of records under each term is called as ___
etr term is called _________k-grams wildcard query.

Variable size postings lists is used when


String of symbols associated with
What is information?
IR can be defined by
The term which is used to search
keywords are used to _____the description of the information.
IR systems contain three components
A single error object means in dbms
Number of funtional processes in IRS
Schema of databases include
DDL is
DML is
DDL and DML are
A major strength of the DataDefinitonLanguage of DBMS is the capability to define
IRS query language is nearer to
In DBMS, one normally looks for
Two other systems frequently described in the context of information retrieval are
All three systems are repositories of information and their primary goal is to satisfy
There is significant overlap between these two systems
Full form of KDD
Option1 Option2
Information Retrieval Boolean Retrieval
User Subsystem Query Subsystem
The manager The user
accessing actions. modifying actions.
Information and knowledge retrieval Information or knowledge retrieval
excluding the time for actually reading the including the time for actually reading the
relevant data. relevant data.
reading relevant items reading non-relevant items

an Appendix the Bibliography


complete logical view incomplete logical view

retrieval serching
The text operations The query operations
SOFTWARE OR (STORAGE NOT SOFTWARE OR STORAGE NOT
HARDWARE) HARDWARE
is a set composed of logical review for the is a set composed of logical review for the
user information need document in the collection
Model based on set theort and bollean
algebra Queries are specified as boolean expresion
Boolean Cluster
It does not capture information regarding term
It does not perform query spell checking position in the document
Boolean Vector
John Butler Robertson and Sparck Jone
Boolean Vector
Limited representation Predefined structure
as it is giving entire information about
As it helps to store the data securely information
Ranking Posting

Type Term
structure of document structure of query
Set Theorotic model Strucured text retrieval model
2 4
Document cache Indexes
ranked Boolean retrieval Zipf retrieval
Corpus Text Database
Quantity of documents in the collection Relevant and non relevant documents.
retrieval criteria is based on binary
decision criteria In reality, It is a more data retrival model
It answers query based on boolean
expression It views document as a set of terms
Which appear close to each other Which appear far away from each other
Phrase Pattern

Enhance power of retrieval Enhance the power of retrival


A operands result of operation
e1 OR e2 e1 AND e2
Highlights occurance of the word Ranking
Prefix matching String matching
Prefix matching Sub-String matching

Union Intersection
WAIS CCL

User relevance feedback System relevant feedback

Vector model Boolean model


modified term weights are computed
directly from the set of retrieved
documents. optimality criterion is adopted:
the feedback process is directly related to
the derivation of new weights for query document term weights are taken in to account
terms during the feed backloop
Robertson and Sparck Jones Croft
Similarity Thesaurus Global clustering
the maximum of the similarities between The minimum of the similarities between all
all pairs of outer-cluster documentsthe pairs of inter-cluster documents the minimum
minimum of the similarities between all of the similarities between all pairs of inter-
pairs of inter-cluster documents cluster documents
Systematic and meaning full ranking of
documents Combination of boolean feature and ranking
Implementation of search function
becomes more complex Search function can be implemented easily
Prefix Suffix
Simplified algorithm design Reduction of disk space
HTML HTTP

Ad Hoc Retrieval Ranked Retrieval Model

Champion list Impact ordering


Document cache Indexers
Jaccard Coefficient Soundex algorithms
Brute Force Approach Knuth-Morris-Pratt
JBIG WSQ

DOM VRML
Objective of removing affixes and
allowing the retrieval of document
containing syntactical variations of query Objective of filtering out words with very low
term descrimination value for retrieval purpose

Document frequency tf-idf weight


XLL XSL
ODA SDML
20-25% 30-35%
Controlled Vocabulary Posting List
Ranking documents Case folding
it requires less computation time it requires less access time

byte Huffman coding Huffman coding


Arithmetic Ziv-Lempel
Arithmetic Character Huffman
semi-static static
Inverted files are quite amenable to Inverted files are quite intractable to
compression compression
Finds one or more numerical feature-
extraction functions, to provide a 'quick- Shows that the distance in feature space lower-
and-dirty' test. bounds the actual distance.
the third generation the second generation
Dynamic search Semi-dynamic search
image audio
Information Retrieval Boolean Retrieval
documents the frequency of each document
Zipf’s law Scientific law
 Inverse Document Frequency Term Frequency
Text . Images,
Images Audio & Video
query generation query execution
. Matching Mapping
Ectosystem Endosystem
System generalized markup language Standard generalized markup language
Matched and mapped Stored and processed
Matching Encoding
Information system Knowledge system
Systems designer . Project manager
Data Processed data
Very accurate Massive
Vector space Document space
The weighting of terms Probability of terms
Small and fairly important Tall and highly relevant
A number of intermediate words Total words in the same document
Very easy Time consuming
Suffixes Suffixes and prefixes
Items Posting
) Search English Optimization Search Engine Operator
Key and Value Value and Number
Uniform Ravar Location Uniform Resource Locator
Natural Language Toolkit Natural Lang Tool
floppy disk magnetic disk
Query term
+' "-"
It is massive It is processed to obtain information
Middle managers Line managers
Proximity Operator Vocabulary Lexicon
Periodic re indexing Using Invalidation bit vector for deleted docs
audio video
Sparse Depends upon the data
Does not depend on data complexity Less complex queries
Zipf's law Power law
id docID
Gerard Salton Ratan Tata
VIBE InfoCrystal
Information Retrieval Boolean Retrieval
User Subsystem Query Subsystem
Documents    Collection

Query Information Need


Term Document
Inverted Real
    Trailing      Heading

Soundex        Phonetic

     Isolated term correction Context Sensitive correction


$ %
     Precision Recall
      

Document Identifier Document Number


Query optimization Query minimization
    Boolean Retrieval Dictonary retrieval
     Precision .       Recall
     Vannever Bush Holmstrom
   Index L ibrary
      Libraries     Google

       Summarizing Bibliographic description


       Inverted       Sequential

3 4
More seek time is desired and the corpus Less seek time is desired and the corpus is
is dynamic dynamic
objects people
processed data manipulated data
storage retrieval
keyword term
search summarize
System people
total failure semi failure
1 2
external conceptual
database def language data definition language
database man language data maipulation language
database languages irs language
Data integrity constraints data definition language
natural language toolkit
exact match relevant match
Digital Libraries Data Warehouses
queries user needs
Information Storage Retrieval System.
Knowledge data discovery Knowlege data database
Option3 Option4 Answer
Dictionary Retrieval Tolerant Retrieval 1
Feedback Problem Subsystem
The designer The administrator 2
pulling actions. pushing actions 3
Information and data retrieval Information or data retrieval 3
excluding the time for actually reading the including the time for actually reading the
non- relevant data. non- relevant data. 1
writing relevant items writing non-relevant 2
an index
the Glossary 4
complete physical view incomplete physical view 1
sorting 3
browsing
Indexing operations Searching Operations 1
SOFTWARE AND NOT STORAGE OR SOFTWARE OR NOT STORAGE AND 4
HARDWARE HARDWARE
Ranking function 2
Framework for representing documet
Retrieval strategy is based on binary decision Model predict that each document is either 3
criteria relevant or non-relevant
Probabilistc Vector space 2
It uses term frequency information to rank
It does not consider the document structure results 4
Probabilistc Set theorotic 1
B. Fourozon Jones Smith 2
Probabilistc Set theorotic 4
Classification Detail representatio 1
as the models can serve as a blueprint to as it used for information security 3
implement an actual retrieval system
Indexing Grepping 2

Token Index 3
the similarity matching function structure of query and document both 3
Latent semantic index model extended boolean model 2
68 4
Spell correction Horizontal 4
Ad Hoc query retrieval Jaccard retrieva 1
Index Collection Repository 1
Accuracy user happiness 2
Provide a framework which is easy to grasp Boolean expression has precise semantics
by common users 3
It is very precise as it meets a very specific It cannot combine two operators "AND
condition NOT" and "OR-NOT 4
Which appears Continuos Whch appear only once 1
Sentence Keywords 1
Emhance power of all the retrieval 2
Enhance retrive process
a operator a basic qurey 4
e1 BUT e2 e1 NOT e2 3
Sorting Feedback 2
Pharse matching Suffix matching 4
Pharse matching Suffix matching 2
1
Concatenation Repetation
CD-RDx Z39.50 2
Retrieve and examine strategy
Initiation strategy 1

Probabilistic model Fuzzy set model 1

modified query vector does reflect a a portion


of the intended query semantics a given query q is known in advance 1

weights of terms in the previous query No query expansion 4


formulations are regarded
Claude shannon Donald kluth and george 1
Local clustering Statistical Thesaurus 4
the maximum of the similarities between all the minimum of the similarities between all
pairs of inter-cluster documentsthe minimum pairs of inter-cluster documentsthe
of the similarities between all pairs of outer- minimum of the similarities between all
cluster documents pairs of outer-cluster documents 2
Syntactic feature 4
Symmentric feature
Probability of pattern matching decreases 1
Large no of document will be retrieved
Word Substring 3
Faster transfer of data from disk to memory Increased Use of caching 1
FTP Telnet 2
3
Boolean Information Model Proximity Query Model
2
Cluster pruning Tiered indexes
Spell correction Horizontal index 4
k-gram overlap Levenshtein 2
Boyer-Moore Robin-Karp 3
PNG TGA 3
SDML
SGML 2

Method of allowing the expansion of


Objective of treating digits, hyphens, original query with related term
punctuation marks, and the case of letters 4

collection frequency inverse document frequency 4


MathML SMIL 2
VML STEP 1
40-45% 25-30% 2
Dictionary Tokenisation 1
Normalization Tokenization 4
it requires less storage space it requires less formatting 3

Adaptive dictionary methods p Arithmetic coding 1


Golomb Word Huffman 3
Word Huffman Ziv-Lemp 1
semi-dynamic dynamic 1
Inverted files are quite uncooperative to Inverted files are quite noncompliant to
compression compression 1

Uses a SAM (e.g., an R-tree), to store and


Determines the distancemeasure between two retrieve the ƒ-D feature vectors.
time series. 2
the first generation the fourth generation 3
Static search Semi-Static search 1
text video 3
Dictionary Retrieval Tolerant Retrieval 1
the frequency of each term the terms 4
 Heaps’ law  Inverted index rule 3
Inverse Term Frequency  Document Frequency 1
Audio & Video All of the above 4
Text None of the above 3
scanning results of query to select items to read All of the above 3
. Both a and b None of the above 2
Both a and b None of the above 3
Static generalized markup language None of the above 2
Stored and accessed Organized and processed 3
Information retrieval Error checking 1
Database system Computer system 1
Systems owner External system user 2
Manipulated input Computer output 2
Processed correctly Collected from diverse sources 4
Both a and b None of the above 2
Both a and b None of the above 3
Colorful and partially relevant All of the above 4
Vocabulary list Indexing language 1
Not possible Changes the original document 2
Prefixes Sentences 2
Query Information 1
Search Engine Operation Search Engine Optimization 4
id and Number Name and code 1
Uni Resource Locate Uniform Reverse Locator 2
Natural Long Tooltip Nature Language Toolkit 4
)magnetic tape CD- ROM 1
indexpowerd Postings 2
AND OR NOT / 3
It is collected from diverse sources None of the above 2
Top managers workers 3
Term Frequency Indexing Granularity 3
Using logarithmic merge None of the above 1
image a,b,c 4
Dense Cannot predict 1
No relationship More complex queries 4
Heap's law Compression ratio 3
)number #digits 1
Ramesh Bush Think Roy 1
Mapuccino Seesoft 3
Dictionary Retrieval Tolerant Retrieval 1
Feedback Problem Subsystem 2
Posting List    Index 1
Term Document 2
word index 2
Term Dictonary 3
new real 1
k-gram
        Edit distance 1
k-gram correction     both and b 4
& @ 1
corpus      Relavance 2
   Document Id        Both b and c 1
Query Answering   Both a and c 4
       Tolerent retrieval   Both a and b 1
  Corpus    Relavance
1
      Gerard Salton      Both a and c 1
    Metadata Term 1
     NIST    Both a and c 1
     Acquisition Both a and c 2
       Combination   Both a and b 4
1 2 1
Less seek time is desired and the corpus is More seek time is desired and the corpus is
static dynamic 1
events All of the above 4
unprocessed data None of the above 1
maintanance all the above 4
word none of the above 1
dynamic none of the above 2
documents All of the above 4
success None of the above 1
4 3 3
physical All of the above 4
data def language none of the above 2
data man language All of the above 2
both a and b None of the above 1
data manipulation language None of the above 1
frramework None of the above 1
both None of the above 1
Both a and b None of the above 3
user information needs all the above 3
both none of the above 3
Knowledge data data Knowledge discovery database 4

You might also like