Os Unit Iv
Os Unit Iv
File System Storage- File Concept –File Access Methods- Directory Structure-File
Sharing- File Protection- File System Implementation- File System Structure-
Allocation Methods – Free Space Management- Mass Storage structure- Disk
Scheduling and Management- RAID Levels
File
A file is a named collection of related information that is recorded on secondary
storage such as magnetic disks, magnetic tapes and optical disks. In general, a file
is a sequence of bits, bytes, lines or records whose meaning is defined by the files
creator and user.
File Access Methods
Files store information. The information in the file can be accessed in
several ways.
1. Sequential Access
The simplest access method is sequential access. Information in the file is
processed in order, one record after the other.
A read operation-read next-reads the next portion of the file and automatically
advances a file pointer, which tracks the I/O location. Similarly, the write
operation-write next-appends to the end of the file and advances to the end of the
newly written material
2 Direct Access
Another method is direct access. A file is made up of fixed length logical
records that allow programs to read and write records rapidly in no particular
order. The direct-access method is based on a disk model of a file, since disks
allow random access to any file block. For direct access, the file is viewed as a
numbered sequence of blocks or records. Thus, we may read block 14, then read
block 53, and then write block 7. There are no restrictions on the order of reading
or writing for a direct-access file.
3.Other Access Methods
Other access methods can be built on top of a direct-access method. These
methods generally involve the construction of an index for the file. The index, like
an index in the back of a book, contains pointers to the various blocks. To find a
record in the file, we first search the index and then use the pointer to access the
file directly and to find the desired record.
From this search, we learn exactly which block contains the desired record and
access that block. This structure allows us to search a large file doing little I/O.
With large files, the index file itself may become too large to be kept in memory.
One solution is to create an index for the index file. The primary index file would
contain pointers to secondary index files, which would point to the actual data
items.
FILE DIRECTORIES
Collection of files is a file directory. The directory contains information about the
files, including attributes, location and ownership. Much of this information,
especially that is concerned with storage, is managed by the operating system. The
directory is itself a file, accessible by various file management routines.
2. TWO-LEVEL DIRECTORY
In this, separate directories for each user is maintained.
Path name:Due to two levels there is a path name for every file to locate that
file.
Now,we can have same file name for different user.
Searching is efficient in this method.
3. TREE-STRUCTURED DIRECTORY :
Directory is maintained in the form of a tree. Searching is efficient and also there is
grouping capability. We have absolute or relative path name for a file.
File Protection
When information is stored in a computer system, we want to keep it safe from
physical damage and improper access.
Reliability is generally provided by duplicate copies of files. Many computers
have systems programs that automatically copy disk files to tape at regular
intervals to maintain a copy should a file system be accidentally destroyed.
Protection can be provided in many ways. For a small single-user system, we
might provide protection by physically removing the floppy disks and locking
them in a desk drawer or file cabinet. In a multiuser system, however, other
mechanisms are needed.
Types of Access
The need to protect files is a direct result of the ability to access files. Systems
that do not permit access to the files of other users do not need protection.
Protection mechanisms provide controlled access by limiting the types of file
access that can be made. Several different types of operations may be controlled:
1. Read. Read from the file.
2.Write. Write or rewrite the file.
3. Execute. Load the file into memory and execute it.
4. Append. Write new information at the end of the file.
5. Delete. Delete the file and free its space for possible reuse.
6. List. List the name and attributes of the file.
Access Control
The most common approach to the protection problem is to make access
dependent on the identity of the user. Different users may need different types of
access to a file or directory. The most general scheme to implement identity
dependent access is to associate with each file and directory an access-control
list (ACL) specifying user names and the types of access allowed for each user.
When a user requests access to a particular file, the operating system checks
the access list associated with that file. If that user is listed for the requested
access, the access is allowed. Otherwise, a protection violation occurs, and the
user job is denied access to the file.
Three classifications of users in connection with each file:
1. Owner. The user who created the file is the owner.
2. Group. A set of users who are sharing the file and need similar access is a
group, or work group.
3. Universe. All other users in the system constitute the universe.
Contiguous Allocation
Linked Allocation
Indexed Allocation
The main idea behind these methods is to provide:
Advantages:
Both the Sequential and Direct Accesses are supported by this. For direct
access, the address of the kth block of the file which starts at block b can
easily be obtained as (b+k).
This is extremely fast since the number of seeks are minimal because of
contiguous allocation of file blocks.
Disadvantages:
External fragmentation will occur, making it difficult to find contiguous
blocks of space of sufficient length. Compaction algorithm will be necessary
to free up additional space on disk.
Also, with pre-allocation, it is necessary to declare the size of the file at the
time of creation.
2. Linked Allocation(Non-contiguous allocation) : Allocation is on an
individual block basis. Each block contains a pointer to the next block in the
chain. Again the file table needs just a single entry for each file, showing the
starting block and the length of the file. Although pre-allocation is possible,
it is more common simply to allocate blocks as needed. Any free block can
be added to the chain. The blocks need not be continuous. Increase in file
size is always possible if free disk block is available. There is no external
fragmentation because only one block at a time is needed but there can be
internal fragmentation but it exists only in the last disk block of file.
Advantages:
This is very flexible in terms of file size. File size can be increased easily
since the system does not have to look for a contiguous chunk of memory.
This method does not suffer from external fragmentation. This makes it
relatively better in terms of memory utilization.
Disadvantages:
Internal fragmentation exists in last disk block of file.
There is an overhead of maintaining the pointer in every disk block.
If the pointer of any disk block is lost, the file will be truncated.
It supports only the sequencial access of files.
3. Indexed Allocation:
It addresses many of the problems of contiguous and chained allocation. In this
case, the file allocation table contains a separate one-level index for each file: The
index has one entry for each block allocated to the file. Allocation may be on the
basis of fixed-size blocks or variable-sized blocks. Allocation by blocks eliminates
external fragmentation, whereas allocation by variable-size blocks improves
locality. This allocation technique supports both sequential and direct access to the
file and thus is the most popular form of file allocation.
Advantages:
This supports direct access to the blocks occupied by the file and therefore
provides fast access to the file blocks.
It overcomes the problem of external fragmentation.
Disadvantages:
The pointer overhead for indexed allocation is greater than linked allocation.
For very small files, say files that expand only 2-3 blocks, the indexed
allocation would keep one entire block (index block) for the pointers which is
inefficient in terms of memory utilization. However, in linked allocation we
lose the space of only 1 pointer per block.
Free-Space Management
Since disk space is limited, we need to reuse the space from deleted files for new
files, if possible. To keep track of free disk space, the system maintains a free-
space list. The free-space list records all free disk blocks-those not allocated to
some file or directory.
Bit Vector
Frequently, the free-space list is implemented as a bit map or bit vector. Each
block is represented by 1 bit. If the block is free, the bit is 1; if the block is
allocated, the bit is O.
For example, consider a disk where blocks 2, 3, 4, 5, 8, 9, 10, 11, 12, 13, 17,
18,25,26, and 27 are free and the rest of the blocks are allocated. The free-space
bit map would be 001111001111110001100000011100000 ...
The main advantage of this approach is its relative simplicity and its efficiency in
finding the first free block or n consecutive free blocks on the disk.
Linked List
Another approach to free-space management is to link together all the free disk
blocks, keeping a pointer to the first free block in a special location on the disk and
caching it in memory. This first block contains a pointer to the next free disk block,
and so on. In our earlier example, we would keep a pointer to block 2 as the first
free block. Block 2 would contain a pointer to block 3, which would point to block
4, which would point to block 5, which would point to block 8, and so on.
Grouping
A modification of the free-list approach is to store the addresses of n free blocks in
the first free block. The first n-1 of these blocks are actually free. The last block
contains the addresses of another n free blocks, and so on.
Mass storage structure
General overview of the physical structure of secondary and tertiary storage
devices.
Magnetic Disks
Magnetic disks provide the bulk of secondary storage for modern
computer systems. A large database may require hundreds of disks.
Physical Characteristics of Disks
Each disk platter has a flat circular shape. Its two surfaces are covered with a
magnetic material, and information is recorded on the surfaces. Platters are made
from rigid metal or glass.
There is a read–write head positioned just above the surface of the platter. The disk
surface is logically divided into tracks, which are subdivided into sectors.
A sector is the smallest unit of information that can be read from or written to
the disk. In currently available disks, sector sizes are typically 512 bytes; there are
over 16,000 tracks on each platter, and 2 to 4 platters per disk. The inner tracks
(closer to the spindle) are of smaller length, and in current-generation disks, the
outer tracks contain more sectors than the inner tracks; typical numbers are around
200 sector per track in the inner tracks, and around 400 sectors per track in the
outer tracks.
The purpose of disk scheduling algorithms is to reduce the total seek time.
Various disk scheduling algorithms are
1. FCFS Algorithm
2. SSTF Algorithm
3. SCAN Algorithm
4. C-SCAN Algorithm
5. LOOK Algorithm
6. C-LOOK Algorithm
1. FCFS Disk Scheduling Algorithm
FCFS stands for First Come First Serve.
As the name suggests, this algorithm entertains requests in the order they
arrive in the disk queue.
It is the simplest disk scheduling algorithm.
Advantages
It is inefficient.
Problem
Consider a disk queue with requests for I/O to blocks on cylinders 98, 183, 41,
122, 14, 124, 65, 67. The FCFS scheduling algorithm is used. The head is initially
at cylinder number 53. The cylinders are numbered from 0 to 199. The total head
movement (in number of cylinders) incurred while servicing these requests is
_______.
Solution
= (98 – 53) + (183 – 98) + (183 – 41) + (122 – 41) + (122 – 14) + (124 – 14) +
(124 – 65) + (67 – 65)
The requests which are far from the head might starve for the CPU.
It provides high variance in response time and waiting time.
Switching the direction of head frequently slows down the algorithm.
Problem:
Consider a disk queue with requests for I/O to blocks on cylinders 98, 183, 41,
122, 14, 124, 65, 67. The SSTF scheduling algorithm is used. The head is initially
at cylinder number 53 moving towards larger cylinder numbers on its servicing
pass. The cylinders are numbered from 0 to 199. The total head movement (in
number of cylinders) incurred while servicing these requests is _______.
Solution
= (65 – 53) + (67 – 65) + (67 – 41) + (41 – 14) + (98 – 14) + (122 – 98) + (124 –
122) + (183 – 124)
= 12 + 2 + 26 + 27 + 84 + 24 + 2 + 59= 236
As the name suggests, this algorithm scans all the cylinders of the disk back
and forth.
Head starts from one end of the disk and move towards the other end
servicing all the requests in between.
After reaching the other end, head reverses its direction and move towards
the starting end servicing all the requests in between.
The same process repeats.
SCAN Algorithm is also called as Elevator Algorithm.
This is because its working resembles the working of an elevator.
Advantages
It causes long waiting time for the cylinders just visited by the head.
It causes the head to move till the end of the disk even if there are no requests to
be serviced.
Problem
Consider a disk queue with requests for I/O to blocks on cylinders 98, 183, 41,
122, 14, 124, 65, 67. The SCAN scheduling algorithm is used. The head is initially
at cylinder number 53 moving towards larger cylinder numbers on its servicing
pass. The cylinders are numbered from 0 to 199. The total head movement (in
number of cylinders) incurred while servicing these requests is _______.
Solution
Total head movements incurred while servicing these requests
= (65 – 53) + (67 – 65) + (98 – 67) + (122 – 98) + (124 – 122) + (183 – 124) +
(199 – 183) + (199 – 41) + (41 – 14)
= 12 + 2 + 31 + 24 + 2 + 59 + 16 + 158 + 27
= 331
The waiting time for the cylinders just visited by the head is reduced as
compared to the SCAN Algorithm.
It provides uniform waiting time.
It provides better response time.
Disadvantages
Consider a disk queue with requests for I/O to blocks on cylinders 98, 183, 41,
122, 14, 124, 65, 67. The C-SCAN scheduling algorithm is used. The head is
initially at cylinder number 53 moving towards larger cylinder numbers on its
servicing pass. The cylinders are numbered from 0 to 199. The total head
movement (in number of cylinders) incurred while servicing these requests is
_______.
Solution
Total head movements incurred while servicing these requests
= (65 – 53) + (67 – 65) + (98 – 67) + (122 – 98) + (124 – 122) + (183 – 124) +
(199 – 183) + (199 – 0) + (14 – 0) + (41 – 14)
= 12 + 2 + 31 + 24 + 2 + 59 + 16 + 199 + 14 + 27
= 386
5. LOOK Disk Scheduling Algorithm
LOOK Algorithm is an improved version of the SCAN Algorithm.
Head starts from the first request at one end of the disk and moves towards the
last request at the other end servicing all the requests in between.
After reaching the last request at the other end, head reverses its direction.
It then returns to the first request at the starting end servicing all the requests in
between.
The same process repeats.
The main difference between SCAN Algorithm and LOOK Algorithm is-
SCAN Algorithm scans all the cylinders of the disk starting from one end to the
other end even if there are no requests at the ends.
LOOK Algorithm scans all the cylinders of the disk starting from the first
request at one end to the last request at the other end.
Advantages
It does not cause the head to move till the ends of the disk when there are no
requests to be serviced.
It provides better performance as compared to SCAN Algorithm.
It does not lead to starvation.
It provides low variance in response time and waiting time.
Disadvantages
There is an overhead of finding the end requests.
It causes long waiting time for the cylinders just visited by the head.
Problem
Consider a disk queue with requests for I/O to blocks on cylinders 98, 183, 41,
122, 14, 124, 65, 67. The LOOK scheduling algorithm is used. The head is initially
at cylinder number 53 moving towards larger cylinder numbers on its servicing
pass. The cylinders are numbered from 0 to 199. The total head movement (in
number of cylinders) incurred while servicing these requests is _______.
Solution
= 12 + 2 + 31 + 24 + 2 + 59 + 142 + 27
= 299
Solution
= (65 – 53) + (67 – 65) + (98 – 67) + (122 – 98) + (124 – 122) + (183 – 124) +
(183 – 14) + (41 – 14)
= 12 + 2 + 31 + 24 + 2 + 59 + 169 + 27
= 326
RAID
The data storage requirements of some applications (in particularWeb,
database, and multimedia data applications) have been growing so fast that a large
number of disks
are needed to store data.
Having a large number of disks in a system presents opportunities for
improving the rate at which data can be read or written, if the disks are operated in
parallel.
A variety of disk-organization techniques, collectively called redundant
arrays of
independent disks (RAID), have been proposed to achieve improved performance
and reliability.
Improvement of Reliability via Redundancy
If we store only one copy of the data, then each disk failure will result in loss of
a significant amount of data. Such a high rate of data loss is unacceptable.
The solution to the problem of reliability is to introduce redundancy; that is, we
store extra information that is not needed normally, but that can be used in the
event of failure of a disk to rebuild the lost information.
The simplest (but most expensive) approach to introducing redundancy is to
duplicate every disk. This technique is called mirroring.
Improvement in Performance via Parallelism
With multiple disks, we can improve the transfer rate as well (or instead) by
striping data across multiple disks. In its simplest form, data striping consists of
splitting the bits of each byte across multiple disks; such striping is called bit-level
striping. For example, if we have an array of eight disks, we write bit i of each
byte to disk i. The array of eight disks can be treated as a single disk with sectors
that are eight times the normal size, and, more important, that has eight times the
transfer rate.
Bit-level striping can be generalized to a number of disks that either is a
multiple of 8 or a factor of 8. For example, if we use an array of four disks, bits i
and 4 + i of each byte go to disk i.
Block-level striping stripes blocks across multiple disks. It treats the array of
disks as a single large disk, and it gives blocks logical numbers; we assume the
block numbers start from 0.With an array of n disks, block-level striping assigns
logical block I of the disk array to disk (i mod n) + 1; For example, with 8 disks,
logical block 0 is stored in physical block 0 of disk 1, while logical block 11 is
stored in physical block 1 of disk 4.
RAID Levels
Mirroring provides high reliability, but it is expensive. Striping provides
high datatransfer
rates, but does not improve reliability. The schemes are classified into RAID
levels, as in the following Figure.(In the figure, P indicates error-correcting bits,
and C indicates a second copy of the data.)
• RAID level 0 refers to disk arrays with striping at the level of blocks, but without
any redundancy.
• RAID level 1 refers to disk mirroring with block striping.
• RAID level 2, known as memory-style error-correcting-code (ECC) organization,
employs parity bits. Memory systems have long used parity bits for error detection
and correction. Each byte in a memory system may have a parity bit associated
with it that records whether the numbers of bits in the byte that are set to 1 is even
(parity = 0) or odd (parity = 1). If one of the bits in the byte gets damaged (either a
1 becomes a 0, or a 0 becomes a 1), the parity of the byte changes and thus will not
match the stored parity. Similarly, if the stored parity bit gets damaged, it will not
match the computed parity. Thus, all 1-bit errors will be detected by the memory
system. Error-correcting schemes store 2 or more extra bits, and can reconstruct
the data if a single bit gets damaged.
• RAID level 3, bit-interleaved parity organization, improves on level 2 by
exploiting the fact that disk controllers, unlike memory systems, can detect
whether a sector has been read correctly, so a single parity bit can be used for error
correction, as well as for detection.
RAID level 3 is as good as level 2, but is less expensive in the number of
Extra disks.
• RAID level 4, block-interleaved parity organization, uses block level striping,
like RAID 0, and in addition keeps a parity block on a separate disk for
corresponding blocks from N other disks. If one of the disks fails, the parity block
can be used with the corresponding blocks from the other disks to restore the
blocks of the failed disk.
• RAID level 5, block-interleaved distributed parity, improves on level 4 by
partitioning data and parity among all N + 1 disks, instead of storing data in N
disks and parity in one disk. In level 5, all disks can participate in satisfying read
requests, unlike RAID level 4, where the parity disk cannot participate, so level 5
increases the total number of requests that can be met in a given
amount of time.
• RAID level 6, the P + Q redundancy scheme, is much like RAID level 5, but
stores extra redundant information to guard against multiple disk failures. Instead
of using parity, level 6 uses error-correcting codes such as the Reed– Solomon
codes.