4 - Data Transformation Handout
4 - Data Transformation Handout
Getty Images
OVE RVI EW
HW
and often the user does not have much understanding of what happens inside
it. In this chapter, we demystify the system unit by looking inside the box and closely
examining the functions of the parts. Consequently, the chapter gives you a feel for
what the CPU, memory, and other devices commonly found inside the system unit do
and how they work together to perform the tasks that the user requests.
To start, we discuss how a computer represents data and program instructions.
Specifically, we talk about the codes that computers use to translate data back and
forth from the symbols that computers can manipulate to the symbols that people
are accustomed to using. These topics lead to a discussion of how the CPU and
memory are arranged with other components inside the system unit and the charac-
teristics of those components. Next, we discuss how a CPU performs processing
tasks. Finally, we look at strategies that can be used today to speed up a computer,
plus some strategies that may be used to create faster and better computers in the
future.
Many of you will apply this chapter’s content to conventional personal
computers—such as desktop and portable computers. However, it is important to
realize that the principles and procedures discussed in this chapter apply to other
types of computers as well, such as those embedded in toys, consumer devices,
household appliances, cars, and other devices, and those used with mobile devices,
servers, mainframes, and supercomputers. ■
from the phrase binary digits. A bit is the smallest unit of data that a binary computer can
recognize. Therefore, the input you enter via a keyboard, the software program you use to
play your music collection, the term paper stored on your USB flash drive, and the digital
photos located on your mobile phone are all just groups of bits. Representing data in a
form that can be understood by a digital computer is called digital data representation.
Because most computers can only understand data and instructions in binary form,
binary can be thought of as the computer’s natural language. People, of course, do not
speak in binary. For example, you are not likely to go up to a friend and say,
0100100001001001
which translates into the word “HI” using one binary coding system. People communi-
FIGURE 2-2
cate with one another in their natural languages, such as English, Chinese, Spanish, and
Bits and bytes.
French. For example, this book is written in English, which uses a 26-character alphabet.
Document size,
In addition, most countries use a numbering system with 10 possible symbols—0 through
storage capacity, and
9. As already mentioned, however, binary computers understand only 0s and 1s. For us to
memory capacity are
interact with a computer, a translation process from our natural language to 0s and 1s and
all measured in bytes.
then back again to our natural language is required. When we enter data into a computer
system, the computer translates the natural-language symbols we input into binary 0s and
1s. After processing the data, the computer translates and outputs the resulting information
B it in a form that we can understand.
A bit by itself typically represents only a fraction of a piece of data. Consequently,
large numbers of bits are needed to represent a written document, computer program, digi-
0 0 1 1 0 0 0 0 tal photo, music file, or virtually any other type of data. Eight bits grouped together are
collectively referred to as a byte. It is important to be familiar with this concept because
B y te byte terminology is frequently used in a variety of computer contexts, such as to indicate
A pp r ox im ate the size of a document or digital photo, the amount of memory a computer has, or the
Abb rev i a ti o n S i ze amount of room left on a storage medium. Because these quantities often involve thou-
KB 1 thou s an d byt es sands or millions of bytes, prefixes are commonly used in conjunction with the term byte
Copyright © 2015 Cengage Learning®
MB 1 mil li on b yt es to represent larger amounts of data (see Figure 2-2). For instance, a kilobyte (KB) is equal
GB 1 bi ll io n b y t es
to 1,024 bytes, but is usually thought of as approximately 1,000 bytes; a megabyte (MB)
TB 1 tril li on b y t es
is about 1 million bytes; a gigabyte (GB) is about 1 billion bytes; a terabyte (TB) is about
PB 1, 000 t e ra byt e s
EB 1, 000 p eta bt ye s
1 trillion bytes; a petabyte (PB) is about 1,000 terabytes (250 bytes); an exabyte (EB) is
ZB 1, 000 e xaby t es about 1,000 petabytes (260 bytes); a zettabyte (ZB) is about 1,000 exabytes (270 bytes);
YB 1, 000 z e tt aby t es and a yottabyte (YB) is about 1,000 zettabytes (280 bytes). Using these definitions, 5 KB
is about 5,000 bytes, 10 MB is about 10 million bytes, and 2 TB is about 2 trillion bytes.
Computers represent programs and data through a variety of binary-based coding sys-
tems. The coding system used depends primarily on the type of data that needs to be repre-
sented; the most common coding systems are discussed in the next few sections.
>Bit. The smallest unit of data a digital computer can recognize; represented by a 0 or a 1. >Byte. A group of 8 bits. >Kilobyte (KB). Approximately
1 thousand bytes (1,024 bytes to be precise). >Megabyte (MB). Approximately 1 million bytes. >Gigabyte (GB). Approximately 1 billion bytes. >Terabyte
(TB). Approximately 1 trillion bytes. >Petabyte (PB). Approximately 1,000 terabytes. >Exabyte (EB). Approximately 1,000 petabytes. >Zettabyte (ZB).
Approximately 1,000 exabytes. >Yottabyte (YB). Approximately 1,000 zettabytes. >Decimal numbering system. The numbering system that represents
all numbers using 10 symbols (0–9). >Binary numbering system. The numbering system that represents all numbers using just two symbols (0 and 1).
CHAPTER 2 THE SYSTEM UNIT: PROCESSING AND MEMORY 53
The decimal
number 10 raised to
103 10 2 101 10 0
7,216 (1,000) (100) (10) (1)
different
powers
DECIMAL NUMBERING
SYSTEM
Each place value in
a decimal number
represents 10 raised to
7 2 1 6
the appropriate power.
means 6 x 1 = 6
means 1 x 10 = 10
HW
means 2 x 100 = 200
means 7 x 1,000 = 7,000
7,216
The binary
number 2 raised to
23 22 21 20
1001 (8) (4) (2) (1) different
powers
BINARY NUMBERING
SYSTEM
Each place value in
a binary number
represents 2 raised to
1 0 0 1
the appropriate power.
Decimal
equivalent
FIGURE 2-3
Examples of using
possible numbers. Consequently, binary computers use the binary numbering system to the decimal and
represent numbers and to perform math computations. binary numbering
In both numbering systems, the position of each digit determines the power, or systems.
exponent, to which the base number (10 for decimal or 2 for binary) is raised. In the
decimal numbering system, going from right to left, the first position or column (the
ones column) represents 100 or 1; the second column (the tens column) represents 10l
or 10; the third column (the hundreds column) represents 102 or 100; and so forth. TI P
Therefore, as Figure 2-3 shows, the decimal number 7,216 is understood as 7 × 103 + 2 × For more information about and
102 + 1 × 10l + 6 × 100 or 7,000 + 200 + 10 + 6 or 7,216. In binary, the concept is the examples of converting between
same but the columns have different place values. For example, the far-right column is numbering systems, see the
the ones column (for 20), the second column is the twos column (2l), the third column “A Look at Numbering Systems”
is the fours column (22), and so on. Therefore, although 1001 represents “one thousand section in the References and
one” in decimal notation, 1001 represents “nine” (1 × 23 + 0 × 22 + 0 × 2l + 1 × 20 or Resources Guide at the end of
8 + 0 + 0 + 1 or 9) in the binary numbering system, as illustrated in the bottom half of this book.
Figure 2-3.
D 01000100
characters usually found on a keyboard, and many special characters not included on a key-
E 01000101
board such as mathematical symbols, graphic symbols, and additional punctuation marks.
F 01000110
+ 00101011
! 00100001 Unicode
# 00100011 Unlike ASCII and EBCDIC, which are limited to only the Latin alphabet used with the
English language, Unicode is a universal international coding standard designed to rep-
FIGURE 2-4 resent text-based data written in any ancient or modern language, including those with
Some extended ASCII different alphabets, such as Chinese, Greek, Hebrew, Amharic, Tibetan, and Russian (see
code examples. Figure 2-5). Unicode uniquely identifies each character using 0s and 1s, no matter which
language, program, or computer platform is being used. It is a longer code, consisting of
1 to 4 bytes (8 to 32 bits) per character, and can represent over one million characters,
which is more than enough unique combinations to represent the standard characters in
all the world’s written languages, as well as thousands of mathematical and
technical symbols, punctuation marks, and other symbols and signs. The
biggest advantage of Unicode is that it can be used worldwide with consis-
tent and unambiguous results.
Unicode is quickly replacing ASCII as the primary text-coding system. In
fact, Unicode includes the ASCII character set so ASCII data can be converted
CHINESE GREEK HEBREW
Copyright © 2015 Cengage Learning®
easily to Unicode when needed. Unicode is used by most Web browsers and is
widely used for Web pages and Web applications (Google data, for instance,
is stored exclusively in Unicode). Most recent software programs, including
the latest versions of Microsoft Windows, Mac OS, and Microsoft Office, also
use Unicode, as do modern programming languages, such as Java and Python.
Unicode is updated regularly to add new characters and new languages not
AMHARIC TIBETAN RUSSIAN originally encoded—the most recent version is Unicode 6.2.
FIGURE 2-5
Unicode. Many
Coding Systems for Other Types of Data
characters, such
So far, our discussion of data coding schemes has focused on numeric and text-based data,
as these, can be which consists of alphanumeric characters and special symbols, such as the comma and
represented by
dollar sign. Multimedia data, such as graphics, audio, and video data, must also be repre-
Unicode but not by sented in binary form in order to be used with a computer, as discussed next.
ASCII or EBCDIC.
Graphics Data
Graphics data consists of still images, such as photographs or drawings. One of the
most common methods for storing graphics data is in the form of a bitmap image—an
image made up of a grid of small dots, called pixels (short for picture elements), that
>ASCII (American Standard Code for Information Interchange). A fixed-length, binary coding system used to represent text-based data for
computer processing on many types of computers. >Unicode. An international coding system that can be used to represent text-based data in
any written language.
CHAPTER 2 THE SYSTEM UNIT: PROCESSING AND MEMORY 55
HW
should display as black). Images with more than
two colors can use 4, 8, or 24 bits to store the color 256-COLOR IMAGE
data for each pixel—this allows for 16 (24), 256 (28), The color of each pixel is represented
or 16,777,216 (224) colors respectively, as shown in using one byte (8 bits).
Figure 2-6.
The number of bits used per pixel depends on One sample pixel:
the type of image being stored; for instance, the 101001100100110111001011
JPEG images taken by most digital cameras today
use 24-bit true color images. While this can result in PHOTOGRAPHIC-QUALITY (TRUE COLOR)
large file sizes, images can typically be compressed IMAGE (16.8 million colors)
when needed, such as to reduce the amount of stor- The color of each pixel is represented
using three bytes (24 bits).
age space required to store that image or to send a
lower-resolution version of an image via e-mail. FIGURE 2-6
Representing graphics data.
With bitmapped images, the
Audio Data color of each pixel is represented
Like graphics data, audio data—such as a song or the sound of someone speaking—must by bits; the more bits used, the
be in digital form in order to be stored on a storage medium or processed by a computer. To better the image quality.
convert analog sound to digital sound, several thousand samples—digital representations
of the sound at particular moments—are taken every second. When the samples are played
back in the proper order, they re-create the sound of the voice or music. For example, audio TI P
CDs record sound using 2-byte samples, which are sampled at a rate of 44,100 times per
second. When these samples are played back at a rate of 44,100 samples per second, they For more examples of ASCII,
sound like continuous voice or music. With so many samples, however, sound files take EBCDIC, and Unicode, see the
up a great deal of storage space—about 32 MB for a 3-minute stereo song (44,100 times × “Coding Charts” section in the
2 bytes × 180 seconds × 2 channels). References and Resources Guide at
Because of its large size, audio data is usu- the end of this book.
ally compressed to reduce its file size when it
is transmitted over the Internet or stored on an
iPod or other portable digital media player. For AS K THE EXPE RT
example, files that are MP3-encoded—that is,
compressed with the MP3 compression algo- Mark Davis, President, The Unicode Consortium
Courtesy Unicode Inc.
Video Data
TI P Video data—such as home movies, feature films, video clips, and television shows—is
Graphics, audio, video, and file
displayed using a collection of frames; each frame contains a still image. When the frames
compression are discussed in
are projected one after the other (typically at a rate of 24 frames per second (fps) for film-
detail in Chapter 10.
based video and 30 or 60 fps for video taken with digital video cameras), the illusion of
movement is created. With so many frames, the amount of data involved in showing a
two-hour feature film can be substantial. Fortunately, like audio data, video data can be
compressed to reduce it to a manageable size. For example, a two-hour movie can be
compressed to fit on a single DVD disc; it can be compressed even further to be delivered
over the Web.
The Motherboard
A circuit board is a thin board containing computer chips and other electronic components.
Computer chips are very small pieces of silicon or other semiconducting material that
contain integrated circuits (ICs), which are collections of electronic circuits containing
>Machine language. A binary-based language for representing computer programs that the computer can execute directly. >System unit. The
main box of a computer that houses the CPU, motherboard, memory, and other devices.