100% found this document useful (2 votes)
4K views

3rd Term DP Notes For Ss2

Computer maintenance is important to keep computers functioning properly. It includes cleaning hardware to remove dust and debris, ensuring proper cooling and voltages, and keeping software updated. An effective computer maintenance plan includes general cleaning, installation, troubleshooting, and repairs. Hardware maintenance focuses on preventative measures like cleaning and corrective actions when issues arise. Effective maintenance extends the lifespan of computers and reduces downtime.

Uploaded by

DORCAS GABRIEL
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
100% found this document useful (2 votes)
4K views

3rd Term DP Notes For Ss2

Computer maintenance is important to keep computers functioning properly. It includes cleaning hardware to remove dust and debris, ensuring proper cooling and voltages, and keeping software updated. An effective computer maintenance plan includes general cleaning, installation, troubleshooting, and repairs. Hardware maintenance focuses on preventative measures like cleaning and corrective actions when issues arise. Effective maintenance extends the lifespan of computers and reduces downtime.

Uploaded by

DORCAS GABRIEL
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 60

SS2 DP, WK 1

WHITEBOARD SUMARY SS2 DATA PROCESSING, WEEK 1-10


COMPUTER MAINTENANCE (1-2)
Computer maintenance is the practice of keeping computers in a good state of repair. It describes
various steps to keep your computer safe and functioning at an optimal performance level from a
software and hardware point of view. It is a set of maintenance tasks and procedures that help to keep
the computer software and hardware updated and operational. A computer containing accumulated
dust and debris may not run properly.
Dust and debris will accumulate as a result of air cooling. Any filters used to mitigate this need regular
service and changes. If the cooling system is not filtered then regular computer cleaning may prevent
short circuits and overheating.
Effective computer maintenance can be the difference between a reliable system and one that is
plagued with problems. Computer maintenance can take many different forms, but all are centered
around extending the lifespan of your IT equipment through careful use and taking sensible precautions.
By maintaining your computer effectively, you can help your business get the best possible return on its
IT investment.
Maintenance includes both hardware & software in it. Maintenance is a continuous process.
Hardware maintenance includes cleaning dust, maintaining constant voltage, etc.
Software maintenance includes reinstallation, upgradation & removal of different software.
What is Computer Maintenance Plan?
Computer Maintenance Plan is a list of predefined procedures and tasks needed to keep a computer in
good condition. There are thorough maintenance checks needed on a computer to avoid unnecessary
downtime and failure.
The effective & efficient working of a system depends on the following four features:

 General cleaning and servicing of computer hardware


 Installation
 Trouble shooting
 Repairs
Servicing is mainly associated with hardware equipment. Servicing includes checkups, repairs and
updation of all physical components. Service is something that we have to do, operationally, if we wish
to achieve an item’s inherent reliability. It goes deeper than maintenance.
Service provider should have proper knowledge about various components and their installation
procedures.
Troubleshooting is the detection and removal of faults in the computer system. If a problem is detected
in a system, it is to be sorted out immediately.
Troubleshooting is of two types:
o Hardware troubleshooting.
o Software troubleshooting.

1
SS2 DP, WK 1

Repairing means to rectify the problem in the hardware or software. Either the part has malfunctioned
or it has become worn to the point where the part needs to be replaced in order to maintain the
performance of your computer system. While finding or analyzing the faults, it can be decided which
hardware or software can be repaired.
Repairing may also include replacement of a component. It is an essential part of troubleshooting.
Repair of components may result into adding up of cost and delay in operations. Some failures occur
because of repairs, it is called repair generated failures.
Repairs are termed as corrective maintenance. Corrective maintenance is done when a fault occurs.
Preventive maintenance should be favoured over corrective maintenance. It may add to the cost but
saves operation time.
Preventive maintenance is often neglected and the emphasis is on repair maintenance policy. It
enforces maintenance through servicing.
Repair generated failures
These failures depend on the performance of the technician. The technician during repairing process
may leave some loose connections, wrong connections or some broken pins / broken wires. These can
be avoided if the technician rechecks/revise the work done.
MAINTENANCE
Maintenance is a process which starts with installation of the system and runs throughout the life of it.
It includes both
• Hardware maintenance and
• Software maintenance.
Hardware Maintenance
Computer hardware maintenance involves taking care of the computer's physical components, such as
its keyboard, hard drive and internal CD or DVD drives. Cleaning the computer, keeping its fans free
from dust, and defragmenting its hard drives regularly are all parts of a computer hardware
maintenance program. It includes proper cleaning, servicing, repairing or replacing components of the
computer.
Maintaining hardware helps to extend the computer's lifespan. It helps to prevent wear and tear, and
keeps the system functioning smoothly.
The following are the two types of maintenance methods used to keep the hardware intact:
• Preventive maintenance.
• Corrective maintenance.
Preventive maintenance means maintenance through preventions. Careful handling of the computer
enhances the life of the system and is called preventive maintenance. Preventive maintenance can be
done by taking some general precautions and some special precautions.
Corrective Maintenance

2
SS2 DP, WK 1

It refers to the maintenance procedures that are adopted when any error occurs in the system. It is
contrary to preventive maintenance and starts when a failure or crash occurs in the system. It includes
repair and troubleshooting techniques.
Corrective maintenance steps
• In case of failure general troubleshooting concepts should be performed first.
• If problem remains, locate the fault using different tools or diagnostic software.
• Once fault is determined, troubleshoot or replace the component, as required.
• Corrective maintenance also includes periodic enhancements.
Various tools that can be used during corrective maintenance: Data recovery tools from operating
system, third party data recovery tools, virus vaccines, etc.
Though preventive maintenance is better yet there are times that corrective maintenance is used due to
unseen factors leading to sudden failures.

Hardware Maintenance Tips/Precautions


1. Ensure all peripherals are switched off before the main power is switched off.
2. Remove all CD’s from the drives before switching off the system.
3. Do not switch off the system when activity LED in hard disk drive glows.
4. Store CDs in clean and cool place where electromagnetic interference is absent.
5. Do not obstruct air circulation to the computer site.
6. Do not eat or drink while working on the computer system.
7. When not in use, use dust covers for monitor, printer, etc.
8. Do not bend or scratch on CDs.
9. Do not apply force on key switches.
10. Do not rest hands on the keyboard.
11. Do not play with the keyboard after switching off the power.
12. Cable at keyboard end should not be subjected to high stress.
13. Do not use rough materials to clean the components of the system.
14. Use quality ribbon or ink to avoid damage to print head.
15. The internal parts of printer like stepper motor, print head, etc. should be cleaned properly
periodically.
16. Switch off power before plugging and removing a cable, or inserting and removing a PCB (Printed
circuit board).
17. The position where the system is kept should be dry and away from direct sunlight or rain.
18. Ensure the hard disk is backed up properly.
19. Remove dust from circuit boards using air blower.
20. Run the diagnostic software periodically.
Cleaning tools

3
SS2 DP, WK 1

Although computer cleaning products are available, you can also use household items to clean your
computer and its peripherals. Below is a listing of items you may need or want to use while cleaning
your computer.
o Cloth - A cotton cloth is the best tool used when rubbing down computer components. Paper
towels can be used with most hardware, but we always recommend using a cloth whenever
possible. However, only use a cloth when cleaning components such as the case, a drive, mouse,
and keyboard. You should not use a cloth to clean any circuitry such as the RAM or motherboard.
o Water or rubbing alcohol - When moistening a cloth, it is best to use water or rubbing alcohol.
Other solvents may be bad for the plastics used with your computer.
o Portable Vacuum - Sucking the dust, dirt, hair, cigarette particles, and other particles out of a
computer can be one of the best methods of cleaning a computer. However, do not use a
vacuum that plugs into the wall since it creates lots of static electricity that can damage your
computer.
o Compressed Air- Using compressed air for electronics can protect the components in your
devices. They prevent them from overheating and shorting out. It is not just components, but
your computer gets an extended life. Compressed air for electronics is one of the easiest and
fastest ways of cleaning. Cotton swabs - Cotton swaps moistened with rubbing alcohol or water
are excellent tools for wiping hard to reach areas in your keyboard, mouse, and other locations.
o Foam swabs - Whenever possible, it is better to use lint-free swabs such as foam swabs.

Can of Compressed Air Computer vacuum/ Blower

General Computer Cleaning Tips


• Never spray or squirt any liquid onto any computer component. If a spray is needed, spray the
liquid onto a cloth.
• You can use a vacuum to suck up dirt, dust, or hair around the computer. However, do not use a
vacuum inside your computer as it generates static electricity that can damage your computer. If
you need to use a vacuum inside your computer, use a portable battery powered vacuum or
try compressed air.
• When cleaning a component or the computer, turn it off before cleaning.
• Be cautious when using any cleaning solvents; some people have allergic reactions to chemicals
in cleaning solvents, and some solvents can even damage the case. Try always to use water or a
highly diluted solvent.

4
SS2 DP, WK 1

• When cleaning, be careful to not accidentally adjust any knobs or controls. Also, when cleaning
the back of the computer, if anything is connected make sure not to disconnect the plugs.
• When cleaning fans, especially smaller fans, hold the fan or place something in-between the fan
blades to prevent it from spinning. Spraying compressed air into a fan or cleaning a fan with a
vacuum may cause damage or generate a back voltage.
• Limit smoking around the computer.

 LCD/LED Monitor cleaning


Why? Dirt, dust, and fingerprints can cause the computer screen to be difficult to read.
Procedure: Unlike a CRT computer monitor, the LCD or LED monitor is not glass and requires special
cleaning procedures.
When cleaning the LCD or LED screen, it is important to remember to not spray any liquids onto the
screen directly. Press gently while cleaning and do not use a paper towel since it can scratch the screen.
To clean the LCD or LED screen, use a non-rugged microfiber cloth, soft cotton cloth, or Swiffer duster. If
a dry cloth does not completely clean the screen, you can apply rubbing alcohol to the cloth and wipe
the screen with a damp cloth. Rubbing alcohol is used to clean LCD and LED monitors before it leaves
the factory.

 Keyboard cleaning
Dust, dirt, and bacteria
The computer keyboard is usually the most germ infected items in your home or office. A keyboard may
even contain more bacteria than your toilet seat. Cleaning it helps remove any dangerous bacteria and
keeps the keyboard working properly.
Procedure: Before cleaning the keyboard, first turn off the computer or if you are using a USB keyboard
unplug it from the computer. Not unplugging the keyboard can cause other computer problems as you
may press keys that cause the computer to perform a task you do not want it to perform.
Many people clean the keyboard by turning it upside down and shaking. A more efficient method is to
use compressed air. The crumbs, dust, and other particulate that fall between the keys and build up
underneath are loosened by spraying pressurized air into the keyboard, then removed with a low-
pressure vacuum cleaner.
After the dust, dirt, and hair have been removed. Spray a disinfectant onto a cloth or use disinfectant
cloths and rub each of the keys on the keyboard. As mentioned in our general cleaning tips, never spray
any liquid onto the keyboard.
A plastic-cleaning agent applied to the surface of the keys with a cloth is used to remove the
accumulation of oil and dirt from repeated contact with a user's fingertips. If this is not sufficient for a
more severely dirty keyboard, keys are physically removed for more focused individual cleaning, or for
better access to the area beneath

 Computer mouse cleaning

5
SS2 DP, WK 1

Optical or laser mouse


Why? A dirty optical or laser mouse can cause the mouse cursor to be difficult to move or move
erratically.
Procedure: Use a can of compressed air that is designed for use with electronic equipment, spraying
around the optical sensor on the bottom of the mouse. Blowing air on the bottom of the mouse clears
away any dirt, dust, hair, or other obstructions that may be blocking the optical sensor.
Avoid using any cleaning chemicals or wiping a cloth directly on the optical sensor, as it could scratch or
damage the optical sensor.
All types of mice
Why? To help keep the mouse clean and germ-free.
Procedure: Use a cloth moistened with rubbing alcohol or warm water and rub the surface of the mouse
and each of its buttons.

 CD, DVD, and other discs cleaning


Why? Dirty CDs can cause read errors or cause CDs not to work at all.
Procedure:

 Use a cleaning kit or damp clean cotton cloth to clean CDs, DVDs, and other discs. The cleaning
kit is comprised of a single disc that is designed to spin in user’s drive a nd remove all dust from
the lens.
 Place the CD/DVD laser lens cleaning disc inside the DVD drive’s tray and close the tray.
 As it spins, it will clear most if not all the dust on the lens.
 As an extra precaution, use a can of air spray to gently spray into the open tray to remove any
residual dust
 Try to read or write a DVD once again to make sure everything is working well now.
You can also clean the face of a disc. When cleaning a disc, wipe against the tracks, starting from the
middle of the CD or DVD and wiping towards the outer side. Never wipe with the tracks; doing so may
put more scratches on the disc.
Tip: If the substance on a CD cannot be removed using water, pure alcohol can also be used.

LAPTOP BATTERIES – CHARGING AND REPLACEMENT


Below are some general guidelines that should be followed when charging your laptop computer
battery. Keep in mind these are general suggestions. Consult your laptop or battery documentation for
precise information. If your documentation states something different, those directions should be
followed.

6
SS2 DP, WK 1

New battery or first use


After purchasing a new laptop computer or battery for your laptop, it is recommend that the battery be
charged for no less than 24-hours. A 24-hour charge makes sure the battery is fully charged and helps
with the battery's life expectancy. Once it is fully charged, you should not discharge it fully, if possible.
Lithium-ion batteries (the type used in modern laptop computers) are strained, and may be weakened,
when they are fully discharged. Doing so frequently can shorten the battery's lifespan.
Don't worry about overcharging the battery. Modern laptops will stop charging the battery when it is
fully charged and switch over to AC power while the laptop is plugged into an outlet.
If you are excited to use your new laptop, it can still be used while it is plugged into an outlet. However,
it’s better not to unplug it until it's been charged for that length of time.
Battery Memory Effect
This is an effect observed in nickel-cadmium and nickel–metal hydride rechargeable batteries (both are
older type of batteries) that causes them to hold less charge. It describes the situation in which nickel-
cadmium batteries gradually lose their maximum energy capacity if they are repeatedly recharged after
being only partially discharged. The battery appears to "remember" the smaller capacity.
The battery memory effect is a reduction in the longevity of a rechargeable battery's charge, due to
incomplete discharge in previous uses. The effect can also be caused by poorly-designed chargers.
In lithium-based batteries this is in fact a myth, it only applies to older Nickle-based batteries. So fully
discharging and charging the battery is completely useless and even harmful. The modern lithium
battery can be charged regardless of its current percentage, given that it has absolutely no negative
effect in its performance.
All other charges
After the computer battery has gone through its initial charge, all battery charging should go until the
laptop has reached its capacity. Often this will be after a few hours of charging.
Note: Even when the computer is off, as long as it's plugged in, it will continue to charge.
Did you know that heat and duration of use can affect the life and charging capacity of your battery?
One of the more typical questions raised by laptop users is whether the battery should be removed from
its socket when the A/C adapter is plugged in. Well, the answer is both a YES and a NO, it depends on
the situation.

7
SS2 DP, WK 1

Having a battery fully charged and the laptop plugged in is not harmful, because as soon as the charge
level reaches 100% the battery stops receiving charging energy and this energy is bypassed directly to
the power supply system of the laptop.

However there's a disadvantage in keeping the battery in its socket when the laptop is plugged in, but
only if it's currently suffering from excessive heating caused by the laptop hardware.

So:
- In a normal usage, if the laptop doesn't get too hot (CPU and Hard Disk around 40ºC to 50ºC) the
battery should remain in the laptop socket;

- In an intensive usage which leads to a large amount of heat produced (i.e. Games, temperatures above
60ºC) the battery should be removed from the socket in order to prevent unwanted heating.

The heat, among the fact that it has 100% of charge, is the great enemy of the lithium battery and not
the plug, as many might think so.

Charging tips:

• For regular usage or when the laptop doesn’t go above 40ºC to 50ºC, keep the battery attached to its
socket.
• When the laptop is new or when a replacement battery is initially installed, be sure to fully charge it
before usage.
• Do not keep the battery and the A/C adapter plugged in too frequently and during intensive use. This
will cause chemical reaction which reduces the battery’s capacity to hold charges. What’s worse is that
eventually, it won’t be able to hold any charge without the AC plugged in.
• The battery should be in low charge levels before recharging. This significantly increases the likelihood
for a longer serviceable life.
• Do not leave it plugged in all the time.

Calibrating the Battery


So you’re using your laptop and, all of the sudden, it dies. There was no battery warning from Windows
—in fact, you recently checked and Windows said you had 30% battery power left. What’s going on?

8
SS2 DP, WK 1

Even if you treat your laptop’s battery properly, its capacity will decrease over time. Its built-in power
meter estimates how much juice available and how much time on battery you have left—but it can
sometimes give you incorrect estimates.
This basic technique will work in Windows 10, 8, 7, Vista. Really, it will work for any device with a
battery, including older MacBooks. It may not be necessary on some newer devices, however.
If you’re taking proper care of your laptop’s battery, you should be allowing it to discharge somewhat
before plugging it back in and topping it off. You shouldn’t be allowing your laptop’s battery to die
completely each time you use it, or even get extremely low. Performing regular top-ups will extend your
battery’s life.
However, this sort of behavior can confuse the laptop’s battery meter. No matter how well you take
care of the battery, its capacity will still decrease as a result of unavoidable factors like typical usage,
age, and heat. If the battery isn’t allowed to run from 100% down to 0% occasionally, the battery’s
power meter won’t know how much juice is actually in the battery. That means your laptop may think
it’s at 30% capacity when it’s really at 1%—and then it shuts down unexpectedly.

Calibrating the battery won’t give you longer battery life, but it will give you more accurate estimates of
how much battery power your device has left.
Manufacturers that do recommend calibration, often recommends calibrating the battery every two to
three months. This helps keep your battery readings accurate.
In reality, you likely don’t have to do this that often if you’re not too worried about your laptop’s battery
readings being completely precise.

How to perform a calibration (full discharge)


The most adequate method to do a full discharge (100% to a minimum of 3%) consists of the following
procedure:
 Fully charge the battery to its maximum capacity (100%);
 Let the battery "rest" fully charged for 2 hours or more in order to cool down from the charging
process. You may use the computer normally within this period;
 Unplug the power cord and set the computer to hibernate automatically at 5% as described by
the image below. To find these options, head to Control Panel > Hardware and Sound > Power
Options > Change plan settings > Change advanced power settings. Look under the “Battery”
category for the “Critical battery action” and “Critical battery level” options. If you cannot select
5%, then you should use the minimum value allowed, but never below 5%;

9
SS2 DP, WK 1

 Leave the computer discharging, non-stop, until it hibernates itself. You may use the computer
normally within this period;
 When the computer shuts down completely, let it stay in the hibernation state for 5 hours or
even more;
 Plug the computer to the A/C power to perform a full charge non-stop until its maximum
capacity (100%). You may use the computer normally within this period.
After the calibration process, the reported wear level is usually higher than before. This is natural, since
it now reports the true current capacity that the battery has to hold charge. Lithium Ion batteries have a
limit amount of discharge cycles (generally 200 to 300 cycles) and they will retain less capacity over
time.

Many people tend to think "If calibrating gives higher wear level, then it's a bad thing". This is wrong,
because like said, the calibration is meant to have your battery report the true capacity it can hold, and
it's meant to avoid surprises like, for example, being in the middle of a presentation and suddenly the
computer shuts down at 30% of charge.

Prolonged storage

10
SS2 DP, WK 1

To store a battery for long periods of time, its charge capacity should be around 40% and it should be
stored in a place as fresh and dry as possible. A fridge can be used (0ºC - 10ºC), but only if the battery
stays isolated from any humidity.
One must say again that the battery's worst enemy is the heat, so leaving the laptop in the car in a hot
summer day is half way to kill the battery.

Replacing a Laptop Battery


Typically, laptop batteries have 1-2 years life span or equivalent to 400 recharges. After this period, it
becomes defective and its run time starts to deteriorate. Once the battery no longer serves its purpose,
it is time to get a replacement. But what should you look for in a replacement battery?
Specs:
• Voltage. It is important that you get the same and exact voltage as the laptop battery you are
disposing. A higher or lesser voltage can cause damage to your computer’s internal components or even
burn it at the very least.
• Wattage. The wattage determines how much energy is needed to power the laptop. Expect a longer
battery life for your laptop with a higher watt-hour capacity.
• mAh (Milliamperes). This refers to the amount of power of a battery pack by means of voltage and
amperes ratings. One thousand milliamperes is equals to 1 amperes. This means that you should go with
the battery with a higher mAh rating. On the average, a 4000 mAh allows 3-4 hours of battery life.

Software Maintenance
Software maintenance includes updation, enhancements, changes, repair and replacements.
Altered environment or changed conditions may result in software maintenance.
It is of the following types:

 Corrective
 Adaptive
 Perfective
 Preventive

 Corrective maintenance is concerned with fixing errors that are observed when the software is in
use.

 Adaptive maintenance is concerned with the change in the software that takes place to make the
software adaptable to new environment such as to run the software on a new operating system.

 Perfective maintenance is concerned with the change in the software that occurs while adding
new functionalities in the software.

 Preventive maintenance involves implementing changes to prevent the occurrence of errors. The
distribution of types of maintenance by type and by percentage of time consumed.

11
SS2 DP, WK 1

INDEXES (week 3)
Indexing in Databases
Indexing is a way to optimize performance of a database by minimizing the number of disk accesses
required when a query is processed.
An index or database index is a data structure which is used to quickly locate and access the data in a
database table (to speed up query). They are similar to textbook indexes. In textbooks, if you need to go
to a particular chapter, you go to the index, find the page number of the chapter and go directly to that
page. Without indexes, the process of finding your desired chapter would have been very slow.
The same applies to indexes in databases. Without indexes, a DBMS has to go through all the records in
the table in order to retrieve the desired results. This process is called table-scanning and is extremely
slow. On the other hand, if you create indexes, the database goes to that index first and then retrieves
the corresponding table records directly.
Indexes are created using some database columns.

12
SS2 DP, WK 1

 The first column is the Search key that contains a copy of the primary key or candidate key of the
table. These values are stored in sorted order so that the corresponding data can be accessed
quickly (Note that the data may or may not be stored in sorted order).
 The second column is the Data Reference which contains a set of pointers holding the address of
the disk block where that particular key value can be found.

Overview of Indexes
As we noted earlier, an index on a file is an auxiliary structure designed to speed up operations that are
not efficiently supported by the basic organization of records in that file.
An index can be viewed as a collection of data entries, with an efficient way to locate all data entries
with search key value k. Each such data entry, which we denote as k*, contains enough information to
enable us to retrieve (one or more) data records with search key value k. (Note that a data entry is, in
general, different from a data record!) The following figure shows an index with search key sal that
contains (sal, rid) pairs as data entries. The rid component of a data entry in this index is a pointer to a
record with search key value sal.

Two important questions to consider are:


1. How are data entries organized in order to support efficient retrieval of data entries with a given
search key value?
2. Exactly what is stored as a data entry?
One way to organize data entries is to hash data entries on the search key. In this approach, we
essentially treat the collection of data entries as a file of records, hashed on the search key. This is how
the index on sal shown in previous figure is organized. The hash function h for this example is quite

13
SS2 DP, WK 1

simple; it converts the search key value to its binary representation and uses the two least significant
bits as the bucket identifier.
Another way to organize data entries is to build a data structure that directs a search for data entries.
Several index data structures are known that allow us to efficiently find data entries with a given search
key value.
Based on this, there are two ways indexing can be done:
1. Ordered indices: Indices are based on a sorted ordering of the values. The indices are usually
sorted so that the searching is faster. The indices which are sorted are known as ordered indices.
2. Hash indices: Hashing is the transformation of a string of characters into a usually shorter fixed-
length value or key that represents the original string. Hashing is used to index and retrieve items
in a database because it is faster to find the item using the shorter hashed key than to find it using
the original value.
Hash indices are based on the shorter fixed-length values being distributed uniformly across a
range of buckets. The buckets to which a value is assigned is determined by function called a hash
function (refer to note on file organization).
There is no comparison between both the techniques, it depends on the database application on which
it is being applied.
 Access Types: e.g. value based search, range access, etc.
 Access Time: Time to find particular data element or set of elements.
 Insertion Time: Time taken to find the appropriate space and insert a new data.
 Deletion Time: Time taken to find an item and delete it as well as update the index structure.
 Space Overhead: Additional space required by the index.

Alternatives to Data Entries in an Index


A data entry k* allows us to retrieve one or more data records with key value k. We need to consider
three main alternatives:
1. A data entry k∗ is an actual data record (with search key value k).
2. A data entry is a (k, rid) pair, where rid is the record id of a data record with search key value k.
3. A data entry is a (k, rid-list) pair, where rid-list is a list of record ids of data records with search
key value k.
Observe that if an index uses Alternative (1), there is no need to store the data records separately, in
addition to the contents of the index.

INDEXING METHODS

14
SS2 DP, WK 1

 Clustered Vs. Non-Clustered Indexing


Clustered Indexes
A clustered index defines the order in which data is physically stored in a table. Table data can be sorted
in only way, therefore, there can be only one clustered index per table.
Clustering index is defined on an ordered data file. The data file could be ordered on a non-key field (In
some cases, the index is created on non-primary key columns which may not be unique for each record).
In such cases, in order to identify the records faster, we will group two or more columns together to get
the unique values and create index out of them. This method is known as clustering index. Basically,
records with similar characteristics are grouped together and indexes are created for these groups.
A typical example of a clustered index is the traditional phone book. The actual document is the index,
entries are sorted/organized and each page has a key showing the range of names in it.

The example below contains different levels of pointer pointing to the base table.

15
SS2 DP, WK 1

Clustered index sorted according to EID (Search key)

 Non-Clustered Indexes

A non-clustered index does not sort the physical data inside the table. In fact, a non-clustered index is
stored at one place and table data is stored in another place and the index would have pointers to the
storage location of the data. A table can have multiple non-clustered indices because the index in the
non-clustered index is stored at a different place. For example, a book can have more than one index,
one at the beginning which shows the contents of a book unit wise and another index at the end which
shows the index of terms in alphabetical order.

It is important to mention here that inside the table the data will be sorted by a clustered index.
However, inside the non-clustered index, data is stored in the specified order. The index contains
column values on which the index is created and the address of the record that the column value
belongs to.
A non-clustered index just tells us where the data lies, i.e. it gives us a list of virtual pointers or
references to the location where the data is actually stored. Data is not physically stored in the order of
the index. Instead, data is present in leaf nodes. For e.g., the contents page of a book. Each entry gives
us the page number or location of the information stored. The actual data here (information on each
page of book) is not organized but we have an ordered reference (contents page) to where the data
points actually lie.

16
SS2 DP, WK 1

When a query is issued against a column on which the index is created, the database will first go to the
index and look for the address of the corresponding row in the table. It will then go to that row address
and fetch other column values. It is due to this additional step that non-clustered indexes are slower
than clustered indexes.

It requires more time as compared to clustered index because some amount of extra work is done in
order to extract the data by further following the pointer. In case of clustered index, data is directly
present in front of the index.

Clustered Versus Non-clustered Index


BASIS FOR CLUSTERED INDEX NON-CLUSTERED INDEX
COMPARISON
Basic Determines the storage order Determines the storage order of the
of the rows in a table as a rows in a table with the help of
whole. separate physical structure.
Number of Only one clustered index Multiple non-clustered indices
indexes allowed
per table
Data accessing Faster as data is physically Slower as compared to the clustered
stored in index order index because the non-clustered index
has to refer back to the base table

Additional disk 1. Not needed. Clustered Required to store the indices


index stores the base separately. In a non-clustered index,

17
SS2 DP, WK 1

space table data in same the index is stored in a separate


physical order as index’s location which requires additional
logical order, so it does storage space
not require additional
storage space.

 Dense Indexes Vs. Sparse Indexes


Dense Index

In the dense index, there is an index record for every search key value in the database. This makes
searching faster but requires more space to store index records itself. Index records contain search key
value and a pointer/reference to the actual record on the disk (the first data record with that search key
value).

Sparse Index
 The index record appears only for a few items in the data file. Each item points to a block as
shown.
 To locate a record, we find the index record with the largest search key value less than or equal to
the search key value we are looking for.
 We start at that record pointed to by the index record, and proceed along the pointers in the file
(that is, sequentially) until we find the desired record.

18
SS2 DP, WK 1

 Primary Indexes Vs. secondary Indexes


Primary Indexes
A primary index is an index on a set of fields that includes the unique primary key for the field and is
guaranteed not to contain duplicates. In this case, the data is sorted according to the search key. It
induces sequential file organization.
In this case, the primary key of the database table is used to create the index. As primary keys are
unique and are stored in sorted manner, the performance of searching operation is quite efficient.
The index contains the key fields and pointers to the other non-key fields of the table. The primary index
is created automatically when a table is created and the primary key specified in a database. It contains
1:1 relation between the records. Searching data using the primary index is efficient because it stores
data in the sorted order.
Primary index could be sparse or dense.

19
SS2 DP, WK 1

Secondary Index

It is used to optimize query processing and access records in a database with some information other
than the usual search key (primary key).
It helps to reduce the size of mapping by introducing another level of indexing. In this, two levels of
indexing are used in order to reduce the mapping size of the first level and in general. At the initial
stage, it selects a range for the columns. Therefore, the mapping size of the first level becomes smaller.
Then, this index method reduces each range into smaller ranges. Generally, the primary memory stores
the first level mappings to fetch addresses faster. Furthermore, the secondary memory stores the
mapping of the second level and the actual data. . Actual physical location of the data is determined by
the second mapping level.
Initially, for the first level, a large range of numbers is selected so that the mapping size is small. Further,
each range is divided into further sub ranges.

20
SS2 DP, WK 1

Conclusion

21
SS2 DP, WK 1

The clustered index is a way of storing data in the rows of a table in some particular order. So that when
the desired data is searched, the only corresponding row gets affected that contain the data and is
represented as output. On the other hand, the non-clustered index resides in a physically separate
structure that references the base data when it is searched. A non-clustered structure can have a
different sort order.

 Primary Index : Index Created on primary key. (Primary Key + Ordered)


 Clustering Index : Index Created on Non Key Column but they are ordered.
 Secondary Index : Index Created in Non-Key column but they are Not ordered.

Indexes Using Composite Search Keys


The search key for an index can contain several fields; such keys are called composite search keys or
concatenated keys.
A composite key, in the context of relational databases, is a combination of two or more columns in a
table that can be used to uniquely identify each row in the table. Uniqueness is only guaranteed when
the columns are combined; when taken individually the columns do not guarantee uniqueness.
A composite key can be defined as the primary key. This is done using SQL statements at the time of
table creation. It means that data in the entire table is defined and indexed on the set of columns
defined as the primary key.
As an example, consider a collection of employee records, with fields name, age, and sal, stored in
sorted order by name;

The figure above illustrates the difference between a composite index with key (age, sal), a composite
index with key (sal, age), an index with key age, and an index with key sal.
All indexes shown in the figure use Alternative (2) above, for data entries.
If the search key is composite, an equality query is one in which each field in the search key is bound to
a constant. For example, we can ask to retrieve all data entries with age = 20 and sal = 10. The hashed
file organization supports only equality queries, since a hash function identifies the bucket containing
desired records only if a value is specified for each field in the search key.

22
SS2 DP, WK 1

A range query is one in which not all fields in the search key are bound to constants. For example, we
can ask to retrieve all data entries with age = 20; this query implies that any value is acceptable for the
sal field. As another example of a range query, we can ask to retrieve all data entries with age < 30 and
sal > 40.

CRASH RECOVERY (week 4)


THE RECOVERY MANAGER
Recovery to a consistent state is required after any kind of system failure. Recovery process restores
database to most recent consistent state before time of failure.
Recovery Manager (RMAN) is a DBMS utility that can back up, restore, and recover database files. The
recovery manager of a DBMS is responsible for ensuring two important properties of transactions:
atomicity and durability. It ensures atomicity by undoing the actions of transactions that do not commit
and durability by making sure that all actions of committed transactions survive system crashes and
media failures (e.g., a disk is corrupted), i.e. redoing (all) the actions of committed transactions.
In the context of transaction processing in databases, four properties must be ensured in the face of
concurrent accesses and system failures (ACID properties):
 Atomicity: Either all actions of a transaction are carried out or none at all (all or none rule).
Atomicity is the Responsibility of the Recovery Manager.
 Consistency: Each transaction (run by itself with no concurrent execution) must preserve the
consistency of the database. In other words, the set of operations taken together should move
the system for one consistent state to another consistent state. Consistency is the Responsibility of
the User
 Isolation: Execution of one transaction is isolated (or protected) from the effects of other
concurrently running transactions. Isolation is the Responsibility of the Transaction Manager.
 Durability: If a transaction commits, its effects persist/is permanent (even of the system crashes

before all its changes are reflected on disk). Durability is the Responsibility of the Recovery Manager.

23
SS2 DP, WK 1

The recovery manager is one of the hardest components of a DBMS to design and implement. It must
deal with a wide variety of database states because it is called on during system failures.

The Recovery Manager also interacts to a lesser degree with the Buffer Manager and the Transaction
Manager. It is invoked by the Transaction Manager for transaction rollback. It requests the Buffer
Manager for the Dirty Page list and the Transaction Manager for the Transaction table.
In a DBMS, it is also necessary to keep track of current and completed transactions. This is done using a
data structure called log. Each log entry typically describes the operation performed, the initial value of
any updated item and the final value of any updated item. The log must be written to stable storage
because it is needed by the recovery manager for recovery.

24
SS2 DP, WK 1

On restart after a failure, the basic recovery process is to use the log stored on stable storage to undo
the effects of aborted and incomplete transactions (in reverse order) and to redo the effects of
committed transactions (in forward order).
A complication is that both the database itself and the log use memory buffers, so data written to the
database or to the log are not necessarily recorded in stable storage immediately. Related to this is the
fact that both the database and the log are written to disk a page at a time, not an item at a time. A
transaction is regarded as committed when the ``commit'' entry written to the log is recorded on stable
storage.
Key choices for the recovery manager implementor include:

 Whether to require all changed database pages to be written to disk when a transaction commits
(``force''). Forcing avoids the need to redo on restart.
 Whether to allow changed database pages to be written to disk before the transaction commits
(``steal''). Stealing requires undo on restart.
Recovery managers may decide whether or not to use force and/or steal independently. Most recovery
managers use WAL (Write ahead logging) to allow STEAL/NO-FORCE without sacrificing correctness.

Checkpoints are used to periodically write the log and changed database pages to disk, recording the
fact on the disk, to reduce the work required on restart after failure.
It's important that restart be idempotent: if a failure occurs during restart, and a second restart is
performed, the effect should be the same as if the first restart had completed.
Some of the terms encountered above (in boldface) will later be discussed in detail.

SOME CONCEPTS RELATED TO


RECOVERY
Transaction Management
A transaction can be defined as a group of tasks. Series of reads & writes, followed by commit or abort.
A single task is the minimum processing unit which cannot be divided further.
Let’s take an example of a simple transaction. Suppose a bank employee transfers #500 from A's
account to B's account. This very simple and small transaction involves several low-level tasks.

States of Transactions
Active − In this state, the transaction is being executed. This is the initial state of every transaction.

25
SS2 DP, WK 1

Partially Committed − When a transaction executes its final operation, it is said to be in a partially
committed state.
Failed − A transaction is said to be in a failed state if any of the checks made by the database recovery
system fails. A failed transaction can no longer proceed further.
Aborted − If any of the checks fails and the transaction has reached a failed state, then the recovery
manager rolls back all its write operations on the database to bring the database back to its original
state where it was prior to the execution of the transaction. Transactions in this state are called aborted.
The database recovery module can select one of the two operations after a transaction aborts –
- Re-start the transaction
- Kill the transaction
Committed − If a transaction executes all its operations successfully, it is said to be committed. All its
effects are now permanently established on the database system.
In a general sense, a commit is the updating of a record in a database. In the context of
a database transaction, a commit refers to the saving of data permanently after a set of tentative
changes. A commit ends a transaction within a relational database and allows all other users to see the
changes.

States of Transactions
Transaction failure
A transaction has to abort when it fails to execute or when it reaches a point from where it can’t go any
further. This is called transaction failure where only a few transactions or processes are hurt.
DATABASE RECOVERY TECHNIQUES

There are both automatic and non-automatic ways


for both, backing up of data and recovery from any
failure situations. The techniques used to recover
the lost data due to system crash, transaction
26
SS2 DP, WK 1

errors, viruses, catastrophic failure, incorrect


commands execution etc. are database recovery
techniques. So to prevent data loss recovery
techniques based on deferred update and
immediate update or backing up data can be used.
Recovery techniques are heavily dependent upon the existence of a special file known as a system log. It contains
information about the start and end of each transaction and any updates which occur in the transaction. The log
keeps track of all transaction operations that affect the values of database items. This information is needed to
recover from transaction failure.

A transaction T reaches its commit point when all its operations that access the database have been executed
successfully i.e. the transaction has reached the point at which it will not abort (terminate without completing).
Once committed, the transaction is permanently recorded in the database. Commitment always involves writing a
commit entry to the log and writing the log to disk. At the time of a system crash, item is searched back in the log
for all transactions T that have written a start_transaction(T) entry into the log but have not written a commit(T)
entry yet; these transactions may have to be rolled back to undo their effect on the database during the recovery
process

 Undoing – If a transaction crashes, then the recovery manager may undo transactions i.e. reverse the
operations of a transaction. This involves examining a transaction for the log entry write_item(T, x,
old_value, new_value) and setting the value of item x in the database to old-value.There are two major
techniques for recovery from non-catastrophic transaction failures: deferred updates and immediate
updates.
 Deferred update – This technique does not physically update the database on disk until a transaction has
reached its commit point. Before reaching commit, all transaction updates are recorded in the local
transaction workspace. If a transaction fails before reaching its commit point, it will not have changed the
database in any way so UNDO is not needed. It may be necessary to REDO the effect of the operations
that are recorded in the local transaction workspace, because their effect may not yet have been written
in the database. Hence, a deferred update is also known as the No-undo/redo algorithm
 Immediate update – In the immediate update, the database may be updated by some operations of a
transaction before the transaction reaches its commit point. However, these operations are recorded in a
log on disk before they are applied to the database, making recovery still possible. If a transaction fails to
reach its commit point, the effect of its operation must be undone i.e. the transaction must be rolled back
hence we require both undo and redo. This technique is known as undo/redo algorithm.
 Caching/Buffering – In this one or more disk pages that include data items to be updated are cached into
main memory buffers and then updated in memory before being written back to disk. A collection of in-
memory buffers called the DBMS cache is kept under control of DBMS for holding these buffers. A
directory is used to keep track of which database items are in the buffer. A dirty bit is associated with
each buffer, which is 0 if the buffer is not modified else 1 if modified.

 Shadow paging – The AFIM (After Image) does not overwrite its BFIM (Before Image) but recorded at
another place on the disk. Thus, at any time a data item has AFIM and BFIM (Shadow copy of the data
item) at two different places on the disk.

27
SS2 DP, WK 1

- To recover, it is sufficient to free the modified pages and discard the current directory. The state of
the database before transaction execution is available through the shadow directory. Database can
be returned to its previous state.
- Committing a transaction corresponds to discarding the previous shadow directory.

Definition of Some other recovery concepts/Terms

 Page

In a computer's random access memory (RAM), a page is a group of memory cells that are accessed as
part of a single operation. That is, all the bits in the group of cells are changed at the same time. In some
kinds of RAM, a page is all the memory cells in the same row of cells. In other kinds of RAM, a page may
represent some other group of cells than all those in a row.

In computer systems that use virtual memory (also known as virtual storage), a page is a unit of data
storage that is brought into real storage (on a personal computer, RAM) from auxiliary storage (on a
personal computer, usually the hard disk) when a requested item of data is not already in real storage
(RAM). It is a fixed-length contiguous block of virtual memory.

Pages are the internal basic structure to organize the data in the database files.
 Dirty Page
When a page is read from disk into memory, it is considered a clean page because it is similar to its
equivalent on disk.
However, once the page has been modified in memory due to data modification (Insert/update/delete), it
is marked as a dirty page means any pages which are available in buffer pool different from disk are
known as Dirty Pages. Simply we can say that the pages which are modified in the buffer cache is called as
a ‘Dirty page’.
A dirty page is simply a page that has been changed in memory since it was loaded from disk and is now
different from the on-disk page. "Dirty" pages contain data that has been changed but has not yet been
written to disk.

 Cache
A cache is the part of the memory which transparently stores data so that future requests for that data
can be served faster.

28
SS2 DP, WK 1

We want to keep as much data as possible in memory, especially those data that we need to access
frequently. We call the technique of keeping frequently used disk data in main memory caching. A cache
is also something that has been "read" from the disk and stored for later use.

 Buffer
A buffer is a region of a physical memory storage used to temporarily hold data while it is being moved
from one place to another. Operating systems generally read and write entire blocks. Thus, reading a
single byte from disk can take as much time as reading the entire block. We call the part of main
memory where a block being read or written is stored a buffer.
The buffer keeps track of changes happening in a running program by temporarily storing them before
the changes are finally saved in the disk. A buffer is something that has yet to be "written" to disk.

A buffer pool is an area of main memory that has been allocated by the database manager for the
purpose of caching table and index data as it is read from disk.
When a row of data in a table is first accessed, the database manager places the page that contains that
data into a buffer pool. Pages stay in the buffer pool until the database is shut down or until the space
occupied by the page is required by another page.
Pages in the buffer pool can be either in-use or not, and they can be dirty or clean.

Buffer Management Policies (Buffer Pool Option)


Queries need to access data stored on disk. Some considerations with respect to data access are:
- Limited memory
- Not possible to keep all relations in memory
- Need policies to decide what pages to keep in memory
In multi-user environments, queries compete for resources such as CPU, Memory, Disk, etc. A number
of services work to ensure that data access and resources are well managed;
- Buffer pool management
- File system
- Scheduling, process management, and IPC (Instructions per cycle)
- Consistency control
The Buffer Manager is the software module of the DBMS whose responsibility is to serve to all the data
request and take decision about choosing a buffer and to manage page replacement.

29
SS2 DP, WK 1

Buffer Management Policies specify rules that govern when a page from the database cache can be
written to disk

 Steal versus No-Steal Buffer Management


A page with modifications by an uncommitted transaction is a dirty page until either commit or rollback
processing for that transaction has been completed. The buffer manager can either distinguish dirty
pages from clean pages when deciding which page to remove from the buffer pool, or it can ignore the
update status of a page.
In the latter case, the buffer manager uses a steal policy, which means pages can be written out to disk
even if the transaction having modified the pages is still active (this means writing an updated buffer
before the transaction commits or the possibility of a buffer being stolen by a new transaction).
Suppose a transaction T1 wants to read a data object X, but the working memory is full with all other
transactions' work. So T1 needs to clear some memory, which it does by kicking some other page in
working memory to stable storage. This can be dangerous, because we can't be sure that what T1 is
pushing to stable storage has been committed yet. This is known as stealing. Therefore, if a steal policy
is in effect, the changes made to an object in the buffer pool by a transaction can be written to disk
before the transaction commits.
The alternative is the no-steal policy, in which case all dirty pages are retained in the buffer pool until
the final outcome of the transaction has been determined (Buffer page updated by a transaction cannot
be written to disk before the transaction commits). This is useful for ensuring atomicity without UNDO
logging but can cause poor performance.
Advantages:
The steal policy implies that rollback of a transaction requires access to pages on disk in order to
reestablish their old state.
With the no-steal policy, no page on disk ever has to be touched when rolling back a transaction.
Consequently, no log information for UNDO procedure will be needed.
Roll back of a transaction during normal processing is also facilitated by the no-steal policy since all
pages modified by such a transaction are simply marked “invalid” by the buffer manager. The problem
with this policy is the size of the buffer pool + necessity of page locking.

 Force versus No-Force Buffer Management

30
SS2 DP, WK 1

Force versus no-force concerns writing of clean pages from the buffer pool. The simple question here is:
who decides, and when, that a modified page is written out to disk? There are two basic approaches:
Force policy. At phase 1 of a transaction’s commit, the buffer manager locates all pages modified by that
transaction and writes the pages to disk. All pages updated by a transaction are immediately written to
disk before the transaction commits. It provides durability without REDO logging, but can cause poor
performance.
Forcing means that every time a transaction commits, all the affected pages will be pushed to stable
storage. This is inefficient, because each page may be written by many transactions and will slow the
system down.
No-force policy. This is the liberal counterpart. A page, whether modified or not, stays in the buffer as
long as it is still needed. Only if it becomes the replacement victim it will be written to disk. A no-force
policy is in effect if, when a transaction commits, we need not ensure that all the changes it has made to
objects in the buffer pool are immediately forced to disk.
Advantage of the force policy
It avoids any REDO recovery during restart. If transaction is successfully committed, then, by definition,
all its modified pages must be on disk.
Why not use it as a standard buffer management policy? Because of “hotspot” pages.
The force policy simplifies restart, because no work needs to be done for transactions that committed
before the crash – it avoids REDO. The price for that is significantly more I/O for frequently modified
pages.
Another drawback is that a transaction will not be completed before the last write has been executed
successfully, and the response time may be increased significantly as a consequence. With no-force
policy, the only synchronous write operation goes to the log, and the volume of data to be written is
usually about two orders of magnitude less.
Most crash recovery uses a steal/no-force approach, accepting the risks of writing possibly uncommitted
data to memory to gain the speed of not forcing all commit effects to memory. This avoids need for very
large buffer space and reduces disk I/O operations for heavily updated pages

INTRODUCTION TO ARIES

ARIES stands for Algorithms for Recovery and Isolation Exploiting Semantics. It has the following
general characteristics:

 It uses a write-ahead log.


 It uses a steal/no-force approach (and hence requires both undo and redo).
o Steal policy - uncommitted writes may be output to disk (contrast with no-steal policy)
o No-force policy (updated pages need not be forced to disk before commit)

31
SS2 DP, WK 1

 It maintains various data structures to identify dirty pages in the memory buffers and the active
transactions. (Pages are dirty if they are changed but not written to disk.)
 On restart, it redoes the actions of all transactions to restore the state at the time of the failure.
 It then undoes the actions of all uncommitted transactions.
Phases of ARIES
When the recovery manager is invoked after a crash, restart proceeds in three phases:
Analysis: Identifies dirty pages in the buffer pool (i.e., changes that have not been written to disk) and
active transactions at the time of the crash, by scanning through the log and other records. Determine
which transactions committed since checkpoint and which ones failed.
By the end of the analysis phase, REDO phase has the information it needs to do its job
Redo: Repeats all actions, starting from an appropriate point in the log, and restores the database state
to what it was at the time of the crash. To REDO an action, the logged action is reapplied.
Undo: Undoes the actions of transactions that did not commit, so that the database reflects only the
actions of committed transactions.

Consider the simple execution history illustrated in the figure.


When the system is restarted,

 The Analysis phase identifies T1 and T3 as transactions that were active (therefore not
committed) at the time of the crash, and therefore to be undone;
 T2 as a committed transaction, and all its actions, therefore, to be written to disk; and P1, P3,
and P5 as potentially dirty pages.
 All the updates (including those of T1 and T3) are reapplied in the order shown during the Redo
phase.
 Finally, the actions of T1 and T3 are undone in reverse order during the Undo phase; that is, T3’s
write of P3 is undone, T3’s write of P1 is undone, and then T1’s write of P5 is undone.

32
SS2 DP, WK 1

ARIES PRINCIPLES

There are three main principles behind the ARIES recovery algorithm:
Write-ahead logging: Any change to a database object is first recorded in the log (more on log shortly);
the record in the log must be written to stable storage before the change to the database object is
written to disk.
Repeating history during Redo: Upon restart following a crash, ARIES retraces all actions of the DBMS
before the crash and brings the system back to the exact state that it was in at the time of the crash.
Then, it undoes the actions of transactions that were still active at the time of the crash (effectively
aborting them).
Logging changes during Undo: Changes made to the database while undoing a transaction are logged in
order to ensure that such an action is not repeated in the event of repeated (failures causing) restarts.
The second point distinguishes ARIES from other recovery algorithms and is the basis for much of its
simplicity and flexibility. In particular, ARIES can support concurrency control protocols that involve
locks of finer granularity than a page (e.g., record-level locks). The second and third points are also
important in dealing with operations such that redoing and undoing the operation are not exact inverses
of each other.

THE LOG

The log, sometimes called the trail or journal, is a history of actions executed by the DBMS. Physically,
the log is a file of records stored in stable storage, which is assumed to survive crashes; this durability
can be achieved by maintaining two or more copies of the log on different disks (perhaps in different
locations), so that the chance of all copies of the log being simultaneously lost is negligibly small.
The most recent portion of the log, called the log tail, is kept in main memory and is periodically forced
to stable storage. This way, log records and data records are written to disk at the same granularity
(pages or sets of pages).

33
SS2 DP, WK 1

Every log record is given a unique id called the log sequence number (LSN). As with any record id, we
can fetch a log record with one disk access given the LSN.
A log record is written for each of the following actions:

 Updating a page: After modifying the page, an update type record is appended to the log tail.
The pageLSN of the page is then set to the LSN of the update log record. (The page must be
pinned in the bufferpool while these actions are carried out.)
 Commit
 Abort
 End
 Undoing an update
OTHER RECOVERY-RELATED DATA STRUCTURES
In addition to the log, the following two tables contain important recovery-related information:
Transaction table: This table contains one entry for each active transaction. The entry contains
(among other things) the transaction id, the status, and a field called lastLSN, which is the LSN of the
most recent log record for this transaction.
The status of a transaction can be that it is in progress, is committed, or is aborted. (In the latter two
cases, the transaction will be removed from the table once certain ‘clean up’ steps are completed.)
Dirty page table: This table contains one entry for each dirty page in the buffer pool, that is, each
page with changes that are not yet reflected on disk. The entry contains a field recLSN, which is the
LSN of the first log record that caused the page to become dirty. Note that this LSN identifies the
earliest log record that might have to be redone for this page during restart from a crash.
During normal operation, these are maintained by the transaction manager and the buffer manager,
respectively, and during restart after a crash, these tables are reconstructed in the Analysis phase of
restart.
THE WRITE-AHEAD LOG PROTOCOL
Before writing a page to disk, every update log record that describes a change to this page must be
forced to stable storage. This is accomplished by forcing all log records up to and including the one with
LSN equal to the pageLSN to stable storage before writing the page to disk.
The importance of the WAL protocol cannot be overemphasized—WAL is the fundamental rule that
ensures that a record of every change to the database is available while attempting to recover from a
crash. If a transaction made a change and committed, the no-force approach means that some of these
changes may not have been written to disk at the time of a subsequent crash. Without a record of these
changes, there would be no way to ensure that the changes of a committed transaction survive crashes.
Note that the definition of a committed transaction is effectively “a transaction whose log records,
including a commit record, have all been written to stable storage”!
When a transaction is committed, the log tail is forced to stable storage, even if a no-force approach is
being used. It is worth contrasting this operation with the actions taken under a force approach: If a
force approach is used, all the pages modified by the transaction, rather than a portion of the log that

34
SS2 DP, WK 1

includes all its records, must be forced to disk when the transaction commits. The set of all changed
pages is typically much larger than the log tail because the size of an update log record is close to (twice)
the size of the changed bytes, which is likely to be much smaller than the page size.
Further, the log is maintained as a sequential file, and thus all writes to the log are sequential writes.
Consequently, the cost of forcing the log tail is much smaller than the cost of writing all changed pages
to disk.
CHECKPOINTING
A checkpoint is like a snapshot of the DBMS state, and by taking checkpoints periodically, the DBMS can
reduce the amount of work to be done during restart in the event of a subsequent crash.
Checkpoint is a mechanism where all the previous logs are removed from the system and stored
permanently in a storage disk. Checkpoint declares a point before which the DBMS was in consistent
state, and all the transactions were committed.
Checkpointing in ARIES has three steps.

 First, a begin checkpoint record is written to indicate when the checkpoint starts.
 Second, an end checkpoint record is constructed, including in it the current contents of the
transaction table and the dirty page table, and appended to the log.
 The third step is carried out after the end checkpoint record is written to stable storage: A special
master record containing the LSN of the begin checkpoint log record is written to a known place
on stable storage.
While the end checkpoint record is being constructed, the DBMS continues executing transactions and
writing other log records; the only guarantee we have is that the transaction table and dirty page table
are accurate as of the time of the begin checkpoint record.
This kind of checkpoint is called a fuzzy checkpoint and is inexpensive because it does not require
quiescing* the system or writing out pages in the buffer pool (unlike some other forms of
checkpointing). On the other hand, the effectiveness of this checkpointing technique is limited by the
earliest recLSN of pages in the dirty pages table, because during restart we must redo changes starting
from the log record whose LSN is equal to this recLSN. Having a background process that periodically
writes dirty pages to disk helps to limit this problem.
When the system comes back up after a crash, the restart process begins by locating the most recent
checkpoint record. For uniformity, the system always begins normal execution by taking a checkpoint, in
which the transaction table and dirty page table are both empty.
* To quiesce is to pause or alter a device or application to achieve a consistent state, usually in
preparation for a backup or other maintenance.

MEDIA RECOVERY

35
SS2 DP, WK 1

Media recovery is most often used to recover from media failure, such as the loss of a file or disk, or a
user error, such as the deletion of the contents of a table. Media recovery can be a complete recovery
or a point-in-time recovery.
Media recovery is based on periodically making a copy of the database. Because copying a large
database object such as a file can take a long time, and the DBMS must be allowed to continue with its
operations in the meantime, creating a copy is handled in a manner similar to taking a fuzzy checkpoint.
When a database object such as a file or a page is corrupted, the copy of that object is brought up-to-
date by using the log to identify and reapply the changes of committed transactions and undo the
changes of uncommitted transactions (as of the time of the media recovery operation).
What is the difference between media recovery & crash recovery?
Media recovery is a process to recover database from backup when physical disk failure occur.
Crash recovery is an automated process taken care by a DBMS when instance failure occur, i.e. when
there is failure with an instance of the database.

DISTRIBUTED AND PARALLEL DATABASE SYSTEMS (week 5-6)


In recent years, Distributed and Parallel database systems have become important tools for data
intensive applications. The prominence of these databases are rapidly growing due to organizational and
technical reasons. There are many problems in centralized architectures;
In centralized database:

 Data is located in one place (one server)


 All DBMS functionalities are done by that server this includes enforcing ACID properties of
transactions, concurrency control, recovery mechanisms, answering queries, etc;
distributed databases have become a solution to those complications. Parallel databases are designed to
increase performance and availability. It enhances throughput, response time and flexibility.
DISTRIBUTED DBMS (DDBMS)
A Distributed Database Management System permits a user to access and manipulate data from
different databases that are distributed to several sites. In Distributed database system architecture
sites are organized as specialized servers instead of general purpose computers. In distributing
environment, we use different servers for specific purpose like application servers, database servers.
A distributed database management system (DDBMS) is a centralized software system that manages a
distributed database in a manner as if it were all stored in a single location.
For example, a bank implements database System on different computers. Computer systems are
located at different branches, but network link enables communication between them. The difference

36
SS2 DP, WK 1

between Database Management System and DDBMS is local DBMS is allowed to access single site where
as DDBMS is allowed to access several sites.
Distributed DBMS should have at least the following components.

 Network software and hardware


 Computer workstations
 Communication media
 Transaction processor
 Data Manager
Factors Encouraging DDBMS

 Distributed Nature of Organizational Units


 Need for Sharing of Data
 Support for Both OLTP and OLAP − Online Transaction Processing (OLTP) and Online Analytical
Processing (OLAP)
 Database Recovery − Replication of data automatically helps in data recovery if database in any
site is damaged.
 Support for Multiple Application Software − Most organizations use a variety of application
software each with its specific database support. DDBMS provides a uniform functionality for
using the same data among different platforms.
Advantages of Distributed Databases
o Modular Development − If the system needs to be expanded to new locations or new units, in
centralized database systems, the action requires substantial efforts and disruption in the
existing functioning. However, in distributed databases, the work simply requires adding new
computers and local data to the new site and finally connecting them to the distributed system,
with no interruption in current functions.
o More Reliable − In case of database failures, the total system of centralized databases comes to
a halt. However, in distributed systems, when a component fails, the functioning of the system
continues may be at a reduced performance. Hence DDBMS is more reliable.
o Better Response − If data is distributed in an efficient manner, then user requests can be met
from local data itself, thus providing faster response. On the other hand, in centralized systems,
all queries have to pass through the central computer for processing, which increases the
response time.
o Lower Communication Cost − In distributed database systems, if data is located locally where it
is mostly used, then the communication costs for data manipulation can be minimized. This is
not feasible in centralized systems.
o Management of distributed data with different levels of transparency, Hardware, Operating
System, Network and Location.
Disadvantages of Distributed Database
o Complexity: DBAs may have to do extra work to ensure that the distributed nature of the system
is transparent. Extra work be required to be done to maintain multiple unrelated systems,
instead of big one.

37
SS2 DP, WK 1

o Economics: Increased complexity and a more extensive infrastructure means extra labour costs.
o Security: Remote database fragments must be secured, and they are not centralized so the
remote sites must be secured as well.
o Difficult to Maintain Integrity: In a distributed database, enforcing integrity over a network may
require too much of the network's resources to be feasible.
o Lack of Standards: There are no tools or methodologies yet to help users convert a centralized
DBMS into a distributed DBMS.
o Additional software is required

Types of Distributed Database

o HOMOGENEOUS

In a homogeneous distributed database, all the sites use identical DBMS and operating systems. Its
properties are −
• The sites use very similar software.
• The sites use identical DBMS or DBMS from the same vendor.
• Each site is aware of all other sites and cooperates with other sites to process user requests
(there is transparency).
• The database is accessed through a single interface as if it is a single database.
There are two types of homogeneous distributed database −
• Autonomous − Each database is independent that functions on its own. They are integrated by a
controlling application and use message passing to share data updates.
• Non-autonomous − Data is distributed across the homogeneous nodes and a central or master
DBMS co-ordinates data updates across the sites.

o HETEROGENEOUS

38
SS2 DP, WK 1

In a heterogeneous distributed database, different sites have different operating systems, DBMS
products and data models. Its properties are −
• Different sites use dissimilar schemas and software.
• The system may be composed of a variety of DBMSs like relational, network, hierarchical or
object oriented.
• Query processing is complex due to dissimilar schemas.
• Transaction processing is complex due to dissimilar software.
• A site may not be aware of other sites and so there is limited co-operation in processing user
requests (no transparency).

Distributed DBMS Architectures


There are three alternative approaches to separating functionality across different DBMS-related
processes; these alternative distributed DBMS architectures are called

 Client-Server
 Collaborating Server
 Middleware.
Client-Server Systems
A Client-Server system has one or more client processes and one or more server processes, and a
client process can send a query to any one server process. Clients are responsible for user-
interface issues, and servers manage data and execute transactions.
Thus, a client process could run on a personal computer and send queries to a server running on
a mainframe.
Collaborating Server Systems
The Client-Server architecture does not allow a single query to span multiple servers because the
client process would have to be capable of breaking such a query into appropriate subqueries to
be executed at different sites and then piecing together the answers to the subqueries.
The client process would thus be quite complex, and its capabilities would begin to overlap with
the server; distinguishing between clients and servers becomes harder.
Eliminating this distinction leads us to an alternative to the Client-Server architecture: a
Collaborating Server system. We can have a collection of database servers, each capable of
running transactions against local data, which cooperatively execute transactions spanning
multiple servers.
When a server receives a query that requires access to data at other servers, it generates
appropriate subqueries to be executed by other servers and puts the results together to
compute answers to the original query. Ideally, the decomposition of the query should be done

39
SS2 DP, WK 1

using cost-based optimization, taking into account the costs of network communication as well
as local processing costs.
Middleware Systems
The Middleware architecture is designed to allow a single query to span multiple servers,
without requiring all database servers to be capable of managing such multisite execution
strategies. It is especially attractive when trying to integrate several legacy systems, whose basic
capabilities cannot be extended.
The idea is that we need just one database server that is capable of managing queries and
transactions spanning multiple servers; the remaining servers only need to handle local queries
and transactions.
We can think of this special server as a layer of software that coordinates the execution of
queries and transactions across one or more independent database servers; such software is
often called middleware. The middleware layer is capable of executing joins and other relational
operations on data obtained from the other servers, but typically, does not itself maintain any
data.
PARALLEL DATABASE SYSTEMS
Parallel DBMS improves performance through parallelizing various operations: loading data, indexing,
query evaluation. Data may be distributed, but purely for performance reasons. In parallel database
system, parallelization of operations is performed for enhancing the performance of the architecture;
• divide a big problem into many smaller ones to be solved in parallel
• Increase bandwidth (in our case decrease queries’ response time)
In real time, there are situations where centralized systems are not enough flexible to handle some
applications.
The architectures related to Parallel DBMS are

 Shared memory: In this architecture, a common global memory is shared by all processors. Any
processor has access to any memory module.

40
SS2 DP, WK 1

 Shared disk: All processors have private memory (not accessible by others), but direct access to
all disks in the system. The number of disks does not necessarily match the number of
processors.

 Shared nothing: Each processor has exclusive access to its own main memory and disk unit. In
this, each memory/disk owned by processor acts as server for data.
It the most common architecture nowadays.

Types of Parallelism:
1. Data-partitioned parallelism (Intra-operation): the input data is partitioned and we work on
each partition in parallel. A task divided over all machines to run in parallel.
2. Pipe-Lined Parallelism (Interoperation): Execution of different operations in pipe-lined
fashion, one operator consumes the output of another operator. For instance, if we need to
join three tables, one processor may join two tables and send the result set records as and
when they are produced to the other processor. In the other processor the third table can be
joined with the incoming records and the final result can be produced.
It involves ordered (or partially ordered) tasks and different machines are performing
different tasks.

Advantages of Parallel Databases

41
SS2 DP, WK 1

1) Capacity: A parallel database allows a large online trader to have thousands of users accessing
information at the same time.
2) Speed: The server breaks up a user database request into parts and post each part to a separate
computer. They work on the parts concurrently and combine the results, passing them back to
the user. This speeds up, allowing faster access to very complex databases.
3) Reliability: A parallel database, properly configured, can continue to work in spite of the failure
of any computer in the cluster.
Disadvantages of Parallel Database
1) Programming to target Parallel architecture is a bit difficult but with proper understanding and
practice you are good to go.
1) Various code alteration has to be performed for different target architectures for improved
performance.
2) Communication of results might be a problem in certain cases.
3) Power utilization is huge by the multi core architectures.
4) Also, better cooling technologies are required in case of clusters.

NB: Distributed processing usually imply parallel processing (not vice versa). You can have parallel
processing on a single machine
Assumptions about architecture
Parallel Databases
• Machines are physically close to each other, e.g., same server room
• Machines connects with dedicated high-speed LANs and switches
• Communication cost is assumed to be small
• Can be shared-memory, shared-disk, or shared-nothing architecture

Distributed Databases
• Machines can far from each other, e.g., in different continent
• Can be connected using public-purpose network, e.g., Internet
• Communication cost and problems cannot be ignored
• Usually shared-nothing architecture

42
SS2 DP, WK 1

COMPUTER VIRUS (week 7)

Meaning of computer virus

43
SS2 DP, WK 1

A computer virus is a piece of code that spreads from one computer to another by attaching itself to
other files through a process called self-replication. In other words, the computer virus spreads by itself
into other executable code or documents. The code in the virus usually executes when the file it is
attached to is opened.
The purpose of creating a computer virus is to infect vulnerable systems, gain admin control and steal
user sensitive data. Hackers design computer viruses with malicious intent and prey on online users by
tricking them.
Other forms of malicious software (malware) are worms, adware, spyware, Trojans, ransomeware, logic
bomb, etc. If you own or use a computer, you are vulnerable to malware. Computer viruses are
deployed every day in an attempt to wreak havoc, whether it be by stealing your personal passwords, or
as weapons of international sabotage.
VIRUS is said to be an acronym meaning Vital Information Resource Under Siege.
How does a computer virus operate?
A computer virus operates in two ways. The first kind, as soon as it lands on a new computer, begins to
replicate. The second type plays dead until the trigger kick starts the malicious code. In other words, the
infected program needs to run to be executed. Therefore, it is highly significant to stay shielded by
installing a robust antivirus program.

Types of computer virus

Any user who has ever been infected can tell you that computer viruses are very real. These programs
are typically distributed from host to host via email or a website that has been compromised. Some are
even attached to legitimate files and unknowingly executed by a user when they launch a particular
program. A virus is much more than the commonly perceived malicious code that functions with the
intent to destroy. They are classified by type, origin, location, files infected and degree of damage.
These common attributes are relative to most and all can have an adverse effect on your operating
system.
Computer viruses come in different forms to infect the system in different ways. Some of the most
common viruses are;

 Boot Sector Virus


This type of virus infects the master boot record and it is challenging and a complex task to remove this
virus and often requires the system to be formatted. Mostly it spreads through removable media.

 Macro viruses (attack on document virus)

It is written in a macro language and infects Microsoft Word or similar applications (e.g., word
processors and spreadsheet applications) and causes a sequence of actions to be performed
automatically when the application is started or something else triggers it.As the name suggests, the
macro viruses particularly target macro language commands in applications like Microsoft Word. The
same is implied on other programs too.

44
SS2 DP, WK 1

In MS Word, the macros are keystrokes that are embedded in the documents or saved sequences for
commands. The macro viruses are designed to add their malicious code to the genuine macro
sequences in a Word file. However, as the years went by, Microsoft Word witnessed disabling of macros
by default in more recent versions. Thus, the cybercriminals started to use social engineering schemes
to target users. In the process, they trick the user and enable macros to launch the virus.

 Executable virus (File infectors)

An executable virus is a non-resident computer virus that stores itself in an executable file and infects
other files each time the file is run. The majority of all computer viruses are spread when a file is
executed or opened. A non-resident virus is a computer virus that does not store or execute itself from
the computer memory. Executable viruses are an example of a non-resident virus.
Few file infector viruses come attached with program files, such as .com or .exe files. Some file infector
viruses infect any program for which execution is requested, including .sys, .ovl, .prg, and .mnu files.
Consequently, when the particular program is loaded, the virus is also loaded.
Besides these, the other file infector viruses come as a completely included program or script sent in
email attachments.

 Multipartite Virus
This type of virus spreads through multiple ways. It infects both the boot sector and executable files at
the same time.

 Polymorphic Virus
These type of viruses are difficult to identify with a traditional anti-virus program. This is because the
polymorphic viruses alters its signature pattern whenever it replicates.
More and more cybercriminals are depending on the polymorphic virus. It is a malware type which has
the ability to change or mutate its underlying code without changing its basic functions or features. This
helps the virus on a computer or network to evade detection from many antimalware and threat
detection products.
Since virus removal programs depend on identifying signatures of malware, these viruses are carefully
designed to escape detection and identification. When security software detects a polymorphic virus,
the virus modifies itself thereby; it is no longer detectable using the previous signature.

 Overwrite Virus
This type of virus deletes all the files that it infects. The only possible mechanism to remove is to delete
the infected files and the end-user has to lose all the contents in it. Identifying the overwrite virus is
difficult as it spreads through emails.
The virus design purpose tends to vary and Overwrite Viruses are predominantly designed to destroy a
file or application’s data. As the name says it all, the virus after attacking the computer starts
overwriting files with its own code. Not to be taken lightly, these viruses are more capable of targeting
specific files or applications or systematically overwrite all files on an infected device.

45
SS2 DP, WK 1

On the flipside, the overwrite virus is capable of installing a new code in the files or applications which
programs them to spread the virus to additional files, applications, and systems.

 Spacefiller Virus

This is also called “Cavity Viruses”. This is called so as they fill up the empty spaces between the code
and hence does not cause any damage to the file.

Example of viruses:

- Sleeper
- Alabama virus
- Christmas virus
- Friday the 13th
- ILOVEYOU (ILOVEYOU is one of the most well-known and destructive viruses of all time)
- MyDoom
- Storm Worm
- Melissa virus
-

Sources of viruses:
 Downloading Programs
Programs that contains the downloadable files are the commonest source of malware such as freeware,
worms, and other executable files. Whether you download an image editing software, a music file or an
e-book, it is important to ensure the reliability of the source of the media. Unknown, new or less
popular sources should be avoided.

 Pirated or Cracked Software


Are you aware of software cracking? Well, every time you open a cracked software, your antivirus
software might flag it as a malware as the cracks consist of malicious scripts. Always say “No” to cracks
as they can inject malicious script into your PC.

 Email Attachments
Anyone can send you an email attachment whether you know them or not. Clicking on unknown links or
attachments can harm your device. Think twice before clicking anything and make sure that file type is
not ‘.exe’.

 Internet
One of the easiest ways to get a virus on your device is through the Internet. Make sure to check URL
before accessing any website. For a secured URL always look for ‘https’ in it. For example, when you
click videos published on social media websites, they may require you to install a particular type of plug-
in to watch that video. But in reality, these plug-ins might be malicious software that can steal your
sensitive information.

46
SS2 DP, WK 1

 Booting Data from Unknown CDs


A malicious software can get into your device through an unknown CD. A good practice to be safe from
malicious infection is to remove CD when your device is not working at all. Your system could reboot the
CD if it is not removed before switching off the computer.

 Bluetooth
Bluetooth transfers can also infect your system, so it is crucial to know what type of media file is being
sent to your computer whenever a transfer takes place. An effective armor would be to allow Bluetooth
connectivity with only known devices and activate it only when required.

 Unpatched Software
Often overlooked, unpatched software is also a leading source of virus infection (Unpatched software
means there are vulnerabilities in a program or code that a company is aware of and will not or cannot
fix). Security holes in a software are exploited by attackers and are unknown to software makers until
the attackers release them in the form of zero-day attacks. It is therefore recommended to install
software updates as soon as they are available on your PC.
 Infected diskettes; infected CD-ROMS;
 illegal duplication of Software, etc.

Apart from above-mentioned sources, file sharing networks can also be a source of computer virus
attacks too. Therefore, use PC security software keep your device safe and secure from malicious
attempts.

Virus Warning Signs

It is vital for any computer user to be aware of these warning signs –


• Slower system performance/ slowing down of response time

• Pop-ups bombarding the screen/ presence of tiny dots wandering across the screen

• Programs running on their own

• Files multiplying/duplicating on their own

• New files or programs in the computer

• Files, folders or programs getting deleted or corrupted

 Incomplete saving of file

 corruption of the system set- up instructions

 Appearance of strange characters

• The sound of a hard drive

If you come across any of these above-mentioned signs then there are chances that your computer is
infected by a virus or malware. Not to delay, immediately stop all the commands and download an

47
SS2 DP, WK 1

antivirus software. If you are unsure what to do, get the assistance of an authorized computer
personnel.

How to help protect against computer viruses?

How can you help protect your devices against computer viruses? Here are some of the things you can
do to help keep your computer safe.

 Use a trusted antivirus product, such as Norton AntiVirus Basic, and keep it updated with the
latest virus definitions. Norton Security Premium offers additional protection for even more
devices, plus backup
 Avoid clicking on any pop-up advertisements.
 Always scan your email attachments before opening them.
 Always scan the files that you download using file sharing programs.

Virus detection (Antivirus)

You can take two approaches to removing a computer virus. One is the manual do-it-yourself approach.
The other is by enlisting the help of a reputable antivirus program.
Antivirus software was originally developed to detect and remove computer viruses, hence the name.
However, with the proliferation of other kinds of malware, antivirus software started to provide
protection from other computer threats.
Antivirus software is practically a requirement for anyone using the Windows operating system. While
it's true you can avoid computer viruses if you practice safe habits, the truth is that the people who
write computer viruses are always looking for new ways to infect machines. There are several different
antivirus programs on the market -- some are free and some you have to purchase. Keep in mind that
free versions often lack some of the nicer features you'll find in commercial products.
Some examples of antivirus;
- Norton Anti-virus
McAfee Virus scan
Dr. Solomon’s Took Kit, etc.
- Kaspersky
- Avast Antivirus
- Panda Cloud Antivirus
- Microsoft Security Essentials
- Avira AntiVirus
- AVG Anti-Virus
- Comodo Antivirus
- Immunet Protect
- PC Tools AntiVirus
- Bitdefender Family Pack
- Trendmicro

48
SS2 DP, WK 1

- Norton 360
- Watchdog
- ESET

Assuming your antivirus software is up to date, it should detect malware on your machine. Most
antivirus programs have an alert page that will list each and every virus or other piece of malware it
finds. You should write down the names of each malware application your software discovers.
Many antivirus programs will attempt to remove or isolate malware for you. You may have to select an
option and confirm that you want the antivirus software to tackle the malware. For most users, this is
the best option -- it can be tricky removing malware on your own.
If the antivirus software says it has removed the malware successfully, you should shut down your
computer, reboot and run the antivirus software again. This time, if the software comes back with a
clean sweep, you're good to go. If the antivirus software finds different malware, you may need to
repeat the previous steps. If it finds the same malware as before, you might have to try something else.

CAREER OPTIONS IN DATA PROCESSING (week 8-10)

49
SS2 DP, WK 1

INTRODUCTION
• After having learnt the rudiments of data processing for several months, you should know the
career options available.
• As computers and technology continue to become the cornerstone for just about every business,
data processors will be in constant demand to help corporations, individuals, and government
offices adapt and more effectively use technology in the office and in the home.
• From creating computer networks within a company that allow offices to share files and data, to
working as a computer service administrator, data processing majors will be invested with a wide
array of computer and office skills that have real practical applications to the job market.
CAREER OPTIONS
The career options for computer graduates can be classified into different categories (Some of these
professionals have similar functions):
1. Programming & Software dev.
2. Information Systems Operation and Management
3. Telecoms and Networking
4. Computer Science Research
5. Web and Internet
6. Graphics & Multimedia
7. Training and Support
8. Computer Industry Specialists
Some careers require additional training or study and experience or working in the field.
1. Programming & Software development
Computer programmers of any kind write and test code that allows computer applications and software
programs to function properly. They turn the program designs created by software developers and
engineers into instructions that a computer can follow.
a) System Analyst
Computer systems analysts study an organization’s current computer systems and procedures and
design information systems solutions to help the organization operate more efficiently and effectively.
They bring business and information technology (IT) together by understanding the needs and
limitations of both.

b) System Consultant
Systems Consultant. The systems consultant reviews a firm's internal processes and aids the customer
network department and IT staff in providing initial technical support to end-users. Leads/participates
on projects that apply technology solutions to business problems

50
SS2 DP, WK 1

c) Software Engineer
A software engineer is a person who applies the principles of software engineering to the design,
development, maintenance, testing, and evaluation of computer software.
Computer software engineers apply engineering principles and systematic methods to develop
programs and operating data for computers. They follow the SDLC (Software Development Life Cycle)
phases in developing software Systems.
d) Systems Programmer
System programmer engages in the activity of programming computer system software.
The primary distinguishing characteristic of systems programming when compared to application
programming is that application programming aims to produce software which provides services to the
user directly (e.g. word processor), whereas systems programming aims to produce software and
software platforms which provide services to other software, are performance constrained, or both (e.g.
operating systems, computational science applications, game engines, industrial automation, and
software as a service applications).
Most programmers are application programmers. This is in contrast with systems programmer.
e) Database Analyst
A person responsible for analyzing data requirements within an organization and modeling the data and
data flows from one department to another. Formerly called a "data administrator," the database
analyst may also perform "database administration" functions, which deal with the particular databases
employed.
f) Artificial intelligence (AI) Programmer
An artificial intelligence programmer helps develop operating software that can be used for robots,
artificial intelligence programs or other artificial intelligence applications. They may work closely with
electrical engineers or robotics engineers and others in order to produce systems that utilize artificial
intelligence.
This refers to the capability of adapting or changing based on adding data. It may also mean
programming a system to look for or seek out specific conditions and respond based on those factors.
For example, their programming may enable robots to learn to interact with other robots or work
together collaboratively. Other systems they program may be designed to take specific actions only
under certain conditions.
g) Scientific Application Programmer
An individual who writes scientific application programs.
In computer programming, a scientific language is a programming language optimized for the use of
mathematical formulas and matrices. Although these functions can be performed using any language,
they are more easily expressed in scientific languages.
h) UI (User Interface) Designer

51
SS2 DP, WK 1

User Interface Design is a crucial subset of UX (User eXperience). User interface (UI) design is the
process of making interfaces in software or computerized devices with a focus on looks or style.
Designers aim to create designs users will find easy to use and pleasurable. UI design typically refers to
graphical user interfaces but also includes others, such as voice-controlled ones.
The role is one part Graphic Designer and one part behaviorist. UI Designers figure out the steps
consumers will use when accessing technology, then design models that shorten or streamline the steps
in the process to create a better user experience.
The role of a UI designer is one part Graphic Designer and one part behaviorist. UI Designers figure out
the steps consumers will use when accessing technology, then design models that shorten or streamline
the steps in the process to create a better user experience.
i) Embedded Systems Application Programmer
An embedded system is a controller with a dedicated function within a larger mechanical or electrical
system, often with real-time computing constraints. It is embedded as part of a complete device often
including hardware and mechanical parts.
2. Information Systems Operation and Management.

a) EDP (Electronic Data Processing) Auditor/ Data Processing


A person who analyses system functions and operations to determine adequate security and controls.
An EDP analyst evaluates systems and operational procedures and reports findings to senior
management.
b) DBA
A database administrator is responsible for the performance, integrity and security of a database.
However, depending on the organization and the level of responsibility, the role can vary from inputting
information through to total management of data.
c) System Administrator
Computer systems administrators install, maintain, and support an organization's information
technology systems. They test system components to ensure that computers, software, and network
equipment function seamlessly together.
Systems administrators may be in charge of the company's LAN, WAN, intranet or Internet systems.
Some administrators focus on specialist roles such as network security, IT audit, or system upgrade
research.

d) Management/ IT consultants/ Computer Manager


Management consulting, generally, is the practice of helping organizations to improve their
performance.

52
SS2 DP, WK 1

Information technology (IT) management consultants analyze the technology needs of organizations and
then make computer systems recommendations. They are mostly involved in decision making.
3. Telecommunications and Networking
a) Network Engineer/Consultant
The Network Consultant is an experienced and educated professional who certifies network
functionality and performance. They are responsible for designing, setting up and maintaining computer
networks at either an organization or client location.
Consultants meet with the organization’s manager, network engineers to discuss networking
requirements
b) Network administrator
The same as a Systems Administrator. Network and computer systems administrators are responsible
for the day-to-day operation of these networks. They organize, install, and support an organization's
computer systems, including local area networks (LANs), wide area networks (WANs), network
segments, intranets, and other data communication systems
4. Computer Science Research
a) Computer Scientist/Researcher
A computer and information research scientist is an expert in the field of computer science, usually
holding a PhD or professional degree. These scientists use the collective knowledge of the field of
computer science to solve existing problems and devise solutions to complex situations.
They invent and design new approaches to computing technology and find innovative uses for existing
technology. They study and solve complex problems in computing for business, science, medicine, and
other fields.
Computer scientists are often hired by software publishing firms, scientific research and development
organizations where they develop the theories that allow new technologies to be developed. Computer
scientists are also employed by educational institutions such as universities.
b) Computer Science Professor
Computer Science Professors teach courses in computer science. May specialize in a field of computer
science, such as the design and function of computers or operations and research analysis. Includes
both teachers primarily engaged in teaching and those who do a combination of teaching and research.

c) AI Researcher
An AI researcher carries out research involving reasoning, knowledge representation, planning, learning,
natural language processing, perception and the ability to move and manipulate objects. General
intelligence is among the field's long-term goals.
d) Data Miner

53
SS2 DP, WK 1

Data mining involves exploring and analyzing large blocks of information to gather meaningful patterns
and trends, it involves discovering patterns in large data sets.
The Data Miner/Data Mining Specialist's role is to design data modeling/analysis services that are used
to mine enterprise systems and applications for knowledge and information that enhances business
processes.
e) Bioinformatics Specialist
Bioinformatics specialists are computer scientists who apply their knowledge to the management of
biological and genomic data. They build databases to contain the information, write scripts to analyze it,
and queries to retrieve it.
Bioinformatics scientists conduct research to study huge molecular datasets including DNA, microarray,
and proteomics data.
5. Web and Internet
a) Web/Internet Applications programmer
Internet/Web Application Programming focuses on systems that are used over the Internet or an
intranet. A web application is a computer program that utilizes web browsers and web technology to
perform tasks over the Internet.
Web/Internet Applications programmer creates these programs.
b) Internet Consultant
Internet consultants use their technological and computer skills to help people or businesses access and
utilize the Internet. Their work may include implementing or refining a networking system, creating a
Web site, establishing an online ordering or product support system, or training employees to maintain
and update their newly established Web site. Some consultants work independently, and others may be
employed by a consulting agency.
c) Web developer/Webmaster
Creates or maintains a Web site. Provides content and programming or supervises writers and
programmers. Monitors the performance and popularity of the site. Provides secure forms and
transactions for Internet-based businesses.
Web developers assess the needs of users for information-based resources. They create the technical
structure for websites and make sure that web pages are accessible and easily downloaded through a
variety of browsers and interfaces.
Web developers structure sites to maximize the number of page views and visitors through search
engine optimization. They must have the communication ability and creativity to make sure the website
meets its user's needs.
d) Digital/Internet Advertising Designer
This professional design online adds for businesses and organizations using tools like cookies, search
engine marketing, email ads, banner ads, blogs, social network ads and more.

54
SS2 DP, WK 1

6. Graphics & Multimedia


a) Animation/Special Effects Developer
Special effects animation is a sub-field of the graphic arts industry that requires artistic skill and
technical proficiency.
Special effects animators/artists create realistic-looking imagery for movies, mobile apps, multimedia
presentations and video games, among other productions and products. They create mechanical, optical
and computer-generated effects that are used in video games or for television shows, music videos or
movies. They must be skilled with computer programs that generate effects and need the ability to work
under pressure and meet deadlines
b) Multimedia Developer
Multimedia covers a variety of communications delivered in a number of ways. A multimedia developer
designs, creates, manipulates and tailors graphics, images, sound, animation, video and text to create
integrated multimedia programs. A multimedia developer is able to combine design skills with technical
knowledge to create products such as CD ROMs, DVDs and websites
c) Computer Game Designer
A game designer is a person who designs gameplay, conceiving and designing the rules and structure of
a game. The skills of a game designer are drawn from the fields of computer science and programming,
creative writing and graphic design. Many designers start their career in testing departments, other
roles in game development or in classroom conditions, where mistakes by others can be seen first-hand.
d) Electronic Sound Producer
Uses electronic or digital instruments, computers, electronic effects units, software or digital audio
equipment to make, perform or record sound/music.
7. Training and Support
a) Technical Support Rep/Help Desk Rep
Technical support /Help Desk representatives answer incoming phone calls and provide support to
callers experiencing computer problems of all kinds. They listen to descriptions of customer issues and
determine how and if they can be fixed.

b) Trainer/Computer Instructor or Educator


A computer instructor/educator is an education professional that is responsible for teaching
computer programming or usage skills to students in school (including basic schools, tertiary
institutions or training centers) and training centers. In this career, the duties include developing
classroom lesson plans, delivering lectures and info to a class, and working with students on a one
on one basis.
They teach a wide range of computer skills to the students/trainees.

55
SS2 DP, WK 1

c) Technical Writer
A technical writer is a professional writer that communications complex information. They create
technical documentation that includes things like instruction manuals, user manuals, quick reference
guides, and white papers. They may also create more common types of content including social media
posts, press releases, and web pages.
Essentially, technical writers break down complex technical products into easy-to-understand guides
that help the end-user understand how to use the products and services.
d) Computer Operator
A Computer Operator is responsible for the technical operation of the computer system. They resolve
user problems by answering questions and requests.
8. Computer Industry Specialists
a) System Integrator
Abbreviated as SI, an individual or company that specializes in building complete computer systems by
putting together components from different vendors. Unlike software developers, systems integrators
typically do not produce any original code. Instead they enable a company to use off-the-shelf hardware
and software packages to meet the company's computing needs.
b) IT Recruitment Consultant
IT Recruitment consultants are responsible for attracting candidates for IT jobs and matching them to
temporary or permanent positions with client companies. They look for and discover talents.
c) IT Sales Professional
A sales professional is someone who sells products or services to potential customers. They seek to
solve prospects' challenges through the products they sell. Great sales professionals will have strong
selling and communication skills.
The role of an IT Sales Professional falls into three categories; pre-sales, sales and post-sales support.
d) Journalist, Computer-Related Publicist

Practices journalism in IT only. Publicists work as the bridge between their customers and the media.
They represent their clients by managing the media's perception of them.

QUALITIES OF A GOOD PROFESSIONAL

 Excellent Analytical Skills


Great computer professionals have excellent analytical skills that can be applied to solve problems or
develop new ideas.

 An Attention to Detail
The slightest mistake can affect how a web page looks or how a program runs. Computer personnel
must pay close attention to detail to ensure everything works correctly and efficiently.

56
SS2 DP, WK 1

 A Commitment to Learning
Technology is constantly changing, and those who keep abreast of the latest developments in
information technology are the ones who will be the most successful.

 Good Communication Skills


The soft skills of verbal and written communication are increasingly important as non-techies rely on
technological tools for their everyday business. Understanding a client's needs and the ability to meet
those needs depend heavily on a steady stream of open communication.

 An Aptitude or flair for Math


Strong math skills are necessary because math is used in many computer applications, such as when
dealing with circuits or programming.

 The Ability to Learn & Memorize Programming Languages


Computer professionals must know many programming languages and how to use a wide variety of
computer software programs. A great memory helps keep work efficient.

 An Ability to Handle Multitasking


People working with computers are often involved in many tasks at once and must be able to manage all
of their responsibilities simultaneously. Time management skills and an ability to prioritize are assets as
well.

 Solid Problem Solving/Troubleshooting Capabilities


Computer professionals are called upon to solve problems with networks, software, and other
programs. They are expected to solve these problems very quickly, and having sharp troubleshooting
skills most definitely is a benefit.

 Technical Writing Skills


Technical writing skills help a computer-savvy person explain complex concepts to those who have
limited knowledge of the computer world.

 Versatility
The most successful computer professionals will be the ones who have skills that extend beyond
information technology, such as skills in business and finance.

COMPUTER PROFESSIONAL BODIES


Meaning of Professional bodies
The main role of a professional body is to promote and support the particular profession by protecting
the interests of the professionals themselves and also protecting the public interest. Some professional
bodies such as those that set standards for professional competencies may heavily focus on protecting
the public interest. Other professional bodies, such as trade unions, may choose to focus mainly on
members' rights.

57
SS2 DP, WK 1

In some cases, membership of a professional body is a prerequisite to working in that particular


profession. A professional body may also lobby authorities on behalf of its members, and most usually
provide the public with information regarding their fields. Professional bodies may provide services to
members.
COMPUTER PROFESSIONAL BODIES AND THEIR FUNCTIONS
a. The Nigerian Computer Society (NCS)
It is a professional body for Computer professionals in Nigeria. It also encompasses professionals in the
Information Technology industry, Interest Groups and other stakeholders in Computing and Information
Technology industry. It was established in 1978 and was known then as the Computer Association of
Nigeria (COAN). The name was changed to Nigerian Computer Society in 2002 when the association was
harmonized with other interest groups and stakeholders. NCS as a professional body is a national
platform that helps in the advancement of the science and practice of information technology in Nigeria.

Functions of the NCS


• Promoting the education of Computer Engineers, Computer and Information Scientists,
Information Technology and Systems Professionals in Nigeria, and Information Architects.
• Encouraging and promoting actively research work and helping to disseminate the results of
various scientific works.
• Promoting the exchange of information about the art and science of information processing and
its management among computer and information professionals and the public.
• Promoting the development of competence by and integrity of its members.
• Protecting and promoting the professional interests of registered members.

b. Computer Professionals Registration Council of Nigeria (CPN)


The Computer Professionals Registration Council of Nigeria (CPN) was formed in 1993 by Decree No. 49
of 1993. The decree was promulgated on the 10th of June and published on the 9th of August the same
year. This corporate body is vested with the power to control and supervise the computing profession in
Nigeria.
Functions of the Computer Professionals Registration Council of Nigeria (CPN)
• Determining the standards of the knowledge and the skills that must be possessed by anyone
going into the computing profession. And reviewing and improving those standards.

58
SS2 DP, WK 1

•Establish and maintain a register of professionals registered under the decree to practice the
profession of computing in Nigeria. And to publish a list of persons registered from time to time.
• Carrying out every other function that has been granted to it by the provisions of the decree
which include Organizing and controlling the practice of computing in the country.
• Supervising the computing profession in Nigeria. Screening of all individuals who want to be
registered as computer professionals. Screening and registering of all corporate organizations that are
involved or wants to be involved in selling or using computing facilities and providing computing
professional services in Nigeria. Maintaining high standards of professional ethics, professionalism, and
discipline.
• Determine the academic standards in computing programmes/degrees such as computer
engineering, computer science, information science, etc. Accreditation of degree awarding institutions
and their courses. And evaluation of the certificates in computing. Conducting of professional exams in
conjunction with associations and bodies that are external to the council.
• Publicizing of the activities of the council. Making publication of computing professional works
such as books, journals, magazines, newsletters, etc.
Note: The above are the two major professional bodies for professionals in the computing, information
technology, and system industry, in Nigeria. There are other associations.

c. Information Technology (Industry) Association of Nigeria (ITAN)

ITAN is an association of over 350 Information Technology driven companies in Nigeria. It was founded
in 1991 to promote IT literacy and penetration in Nigeria; and to promote members’ interest in the area
of trade, public policy formulation and negotiations with government on IT policy matters.
ITAN keeps its members informed about ongoing trends and issues relevant to the industry.
Their Services include:

 Public policy making


 Sensitization:
 Exchange of views (specific to IT)
 Articulate common policy: They assist organizations both public and private to concretize
common policy thrust to accelerate IT development in Nigeria.
 Networking and contacts: ITAN provides networking opportunities for its members and key IT
experts through its numerous events the year round. Such opportunities involve International
trade tours, for necessary networking and contracts that are keys to business profitability.

d. Nigeria Internet Group (NIG)

The Nigeria Internet Group (NIG), founded in 1995, is a not-for-profit, non-governmental organization,
promoting the Internet in Nigeria.
To achieve its mandate, the Group engages in a number of activities which include; policy advocacy,
awareness creation and education.

59
SS2 DP, WK 1

e. The Institute for Management of Information Systems (IMIS)

The Institute for Management of Information Systems (IMIS) was founded in the year 1978 and it is one
of the leading association promoting excellence in the field of Information Systems Management
through education and Professional association. '
IMIS is previously known as Institute of Data Processing Management (IDPM). The headquarters of the
institute is located in United Kingdom. The institute approximately consists of 12,000 members and the
majority of the members are residents outside the UK.
To understand the importance of Information Systems Management the institute had consistently
played a prominent role. IMIS focuses specifically only on the practical application and management of
Information Systems within the society while all the other professional associations concentrates
primarily on the technical side of information systems. The Institute for Management of Information
Systems makes great efforts towards the recognition of Information Systems Management as one of the
key professions influencing the future of the world. The Institute for Management of Information
Systems and the British Computer Society (BCS) have been regarded as the 2 main UK professional
institutes for computer professionals
f. The Institute Of Software Practitioners Of Nigeria (ISPON)
Additionally, The Institute of Software Practitioners of Nigeria (ISPON) is the body of computer software
and related services industry in Nigeria. However, ISPON is concerned with the growth of a software-
driven Information Technology industry in Nigeria.
Others include:
g. The Internet Service Providers' Association of Nigeria (ISPAN) regulates and monitors ISPs.
h. Nigerian Information Technology Professionals in the America (NITPA)
i. Association of Telecom Companies of Nigeria (ATCN), a professional, non-profit, non-political
umbrella organization of telecommunications companies in Nigeria.
NB: Nigerian Communications Commission (NCC) is an independent regulatory authority for the
telecommunications industry in Nigeria. It is a commission and not a professional body.

60

You might also like