0% found this document useful (0 votes)
951 views

Computer Science An Overview Chapter 1 PDF Notes

The document discusses how computers store and represent data at the lowest level using bits and bit patterns. It covers topics like Boolean logic, gates, flip-flops, main memory, mass storage devices, and how different types of data like text, numbers, images and sound can be encoded using bits.

Uploaded by

K DHS
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
951 views

Computer Science An Overview Chapter 1 PDF Notes

The document discusses how computers store and represent data at the lowest level using bits and bit patterns. It covers topics like Boolean logic, gates, flip-flops, main memory, mass storage devices, and how different types of data like text, numbers, images and sound can be encoded using bits.

Uploaded by

K DHS
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 82

Introduction to Computer Science

Chapter :1 Data Storage


Erkki Pesonen
Lecturer
University of Eastern Finland
Chapter 1: Data Storage
• 1.1 Bits and Their Storage
• 1.2 Main Memory
• 1.3 Mass Storage
• 1.4 Representing Information as Bit Patterns
• 1.5 The Binary System
• 1.6 Storing Integers
• 1.7 Storing Fractions
• 1.8 Data and Programming
• 1.9 Data Compression
• 1.10 Communications Errors
1.1 Bits and Their Storage
• At the lowest level in a computer all data is presented by bits
• Bit: Binary Digit (0 or 1)
• Bit Patterns are used to represent information (details later)
• Numbers
• Text characters
• Images
• Sound
• And others
Boolean Operations
• In Boolean operators the bits are interpreted as:
• 0 = False
• 1 = True
• Boolean Operation: An operation that manipulates one or
more true/false values (George Boole 1815-1864)
• Specific operations
– AND
– OR
– XOR (exclusive or)
– NOT
Boolean Operations
– Similar as arithmetic operations TIMES and
PLUS
– Combine a pair of values (inputs) to a third value
(output)
– Evaluate the truth or falseness of statement
– Example: Finland is a nation AND Australia is a
continent
Figure 1.1 The possible input and output values
of Boolean operations AND, OR, and XOR
(exclusive or)
Boolean Operations
• Boolean operator NOT has only one input and one
output
• The output is opposite to input
• Example: “NOT(Finland is a country)” would be interpreted
as “Finland is not a country”
• If inputs of an operation have more than one digit, then
they are operated bitwise:

0 1 0
AND 1 1 0
Result 0 1 0
Gates
• Gate or Logic gate: A device that computes a Boolean
operation
– Often implemented as small electronic circuits called
transistors
– Digits 0 and 1 are presented as voltage levels
– Can be constructed from a variety of other
technologies (gears, relays, optic devices)
– Provide the building blocks from which computers
are constructed
Figure 1.2 A pictorial representation of AND, OR,
XOR, and NOT gates as well as their input and output
values
Flip-flops
• Circuits built from gates that act as a fundamental unit
of computer memory
– One input line is used to set its stored value to 1
– One input line is used to set its stored value to 0
– While both input lines are 0, the most recently stored value is
preserved
– Example of
– Digital circuit design: how to build circuits from gates
– Abstract tools: many ways to build a flip-flop, but external
properties known
– How to store a bit within a modern computer
– A computer on a single chip can be built from flip-flops
Figure 1.3 A simple flip-flop circuit
Figure 1.4 Setting the output of a
flip-flop to 1
Figure 1.4 Setting the output of a
flip-flop to 1 (continued)
Figure 1.4 Setting the output of a
flip-flop to 1 (continued)
Figure 1.5 Another way of constructing a
flip-flop
Hexadecimal Notation
• Computer deals with patterns of bits, which can be long
• Example: 101101010011
• For humans hard to comprehend
• Hexadecimal notation: A shorthand notation for long bit patterns
• Divides a pattern into groups of four bits each
• Machines use bit patterns that have lengths in multiples of
four (8, 16, 32, 64, 128, 256, etc.)
• Represents each group by a single symbol
• Example: 10110101 becomes 0xB5
• Here 0x means that the following number is hexadecimal
• 1011 0101 -> B 5 (see Table 1.6)
Figure 1.6 The hexadecimal coding system
1.2 Main Memory
• Computers have a large collection of circuits capable of storing a
single bit. This is called main memory.
• Cell: A unit of main memory (typically 8 bits which is one byte)
• Small embedded systems can have a few hundred memory cells
• Desktop computers have billions of memory cells
• Most significant bit: the bit at the left (high-order) end
• Least significant bit: the bit at the right (low-order) end
Figure 1.7 The organization of a byte-size
memory cell
Main Memory Addresses
• Address: A “name” that uniquely identifies one cell in the computer’s
main memory
• The names are actually numbers.
• These numbers are assigned consecutively starting at zero.
• Numbering the cells in this manner associates an order with the
memory cells.
• The bits in main memory are in one long row, so stored bit
patterns may be longer than the length of one cell
• Example: A string of 16 bits can be stored in two consecutive
memory cells
• Bit patterns can be read from or written to a particular
memory address
Figure 1.8 Memory cells arranged by address
Memory Terminology
• Random Access Memory (RAM): Memory in which individual cells
can be easily accessed in any order
• The way computers main memory is used
• Dynamic Memory (DRAM): RAM composed of volatile memory
• Modern circuitry which is smaller and faster than flip-flops
• Needs a refresh circuit
• Synchronous DRAM (SDRAM)
• Even faster technology
Measuring Memory Capacity
• Kilobyte: 210 bytes = 1024 bytes (“about 1000 bytes”)
• Example: 3 KB = 3 times1024 bytes

• Megabyte: 220 bytes = 1,048,576 bytes


• Example: 3 MB = 3 times 1,048,576 bytes

• Gigabyte: 230 bytes = 1,073,741,824 bytes


• Example: 3 GB = 3 times 1,073,741,824 bytes

• Terabyte: 240 bytes = 1,099,511,627,776 bytes


• Example: 3 TB = 3 times 1,099,511,627,776 bytes
Measuring Memory Capacity
• Examples of main memory capacity:
• Iphone 1: 128 MB RAM
• Iphone 13: 4-6 GB RAM
• Ipad mini: 64-256 GB RAM
• Ipad Pro: 128 GB - 2 TB RAM
• Laptop 4-16 GB RAM
1.3 Mass Storage
• Computers need more memory, that is non-volatile
• Additional devices:
– Mechanical (moving parts)
– Magnetic disks (Hard disk drive HDD) 16 GB – 15 TB
– CDs (optical) 600-700 MB
– DVDs (optical) several GB
– Magnetic tapes (magnetic coated plastic tape in reels)
– Electronic (no moving parts)
– Flash drives (USB) 8 GB – 1 TB
– Solid-state drives 120GB – 30 TB
1.3 Mass Storage

Magnetic tape CD / DVD Hard disk drive

Flash drive

Solid state drive


Solid-state drive inside
SD card
1.3 Mass Storage
• Advantages over main memory
– Less volatility
– Larger storage capacities
– Low cost
– In many cases can be removed
Mass Storage Performance
• Bandwidth: The total amount of bits that can be transferred in a unit
of time (E.g. 3 megabits per second)
• Latency: The total time between the request for data transfer and its
arrival
Figure 1.9 A magnetic disk storage system
Figure 1.10 CD storage
Flash Memory
• Circuits that traps electrons in tiny silicon dioxide chambers
• No mechanical parts, only electronic signals move
• Data can be accessed in small unit
• Circuits do not need refreshment
• Data erased in large blocks
• Repeated erasing slowly damages the media
Flash Memory
• Flash Drives
• Removable memory for computers
• USB connection Flash drive
• Hundreds of GB’s storage
• Secure Digital Cards (SD, SDHC, SDXC)
• Provide GB-TBs of storage
SD card
• Different sizes (mini, micro)
• Used in electronic devices: cameras, navigation systems, etc.
Solid-State Drives (SSD)
• Large flash memory devices
• Replace magnetic hard disks
• Resilience to vibrations and physical shock
• Quiet operation
• Lower access times
• Still more expensive?
• Data erased in large blocks Solid state drive
• Repeated erasing slowly damages the media
1.4 Representing Information as Bit Patterns
• Many different kinds of information can be encoded as bit
patterns
• Systems for encoding information have been established for
– Text
– Numeric Data
– Images
– Sound
– Other data
Representing Text
• Each character (letter, punctuation, etc.) is assigned a
unique bit pattern.
– ASCII (1960s): Uses patterns of 7-bits to represent
most symbols used in written English text
– Was developed to standardized presentation of
characters, digits and control information in computer
terminals
– ISO developed a number of 8-bit extensions to
ASCII, each designed to accommodate a major
language group
– E.g. Western European extension for å, ä, ö, ü
Representing Text
– Unicode: Uses patterns up to 21-bits to represent the
symbols used in languages world wide
– 16-bits for world’s commonly used languages
– E.g. Chinese, Japanese, Hebrew, etc.
– Unicode Transformation Format 8-bit (UTF-8) is the same as 8-
bit ASCII code

– A file consisting of ASCII or Unicode coded symbols is


called “ a text file”
– Can be manipulated with text editors (e.g. Notepad)
– Note that word processor files (e.g. Microsoft Word files) are not
“text files”.
Figure 1.11 The message “Hello.” in ASCII
or UTF-8 encoding
Representing Numeric Values
• We can present numbers with ASCII code, but it is not efficient
• With 16 bits we can present numbers 0-99
• Binary notation: Uses bits to represent a number in base two
– All numeric values in a computer are stored in sequences of 0s
and 1s
– Counting from 0 to 8:
– 0000, 0001, 0010, 0011, 0100, 0101, 0110, 0111, 1000
– With 16 bits we can present numbers 0-65535
• Also other binary formats:
• Two’s complement, floating point, etc. (later)
• Still calculations with all formats give the correct result
• 010 + 010 = 100 (binary); 2 + 2 = 4 (decimal); 0x2 + 0x2 = 0x4 (hexadecimal)
Representing Images
• Bit map techniques
• Pixel: “picture element” represents one
dot of the image
• RGB: Red, Green, and Blue components
• Luminance and chrominance
• Problems with scaling up images
• E.g. Digital zoom in digital cameras
• Files:
• BMP files
• RAW files in digital cameras
Representing Images
• Vector techniques
• Represent images with geometric structures
• Point, line, polyline, arcs, curves, etc.
• Scalable
• TrueType and PostScript text symbols
• Used in especially in CAD systems and
graphical design
Representing Sound
• Audio sound in coded by taking samples from the amplitude of the
sound with frequent intervals
• Sampling techniques that record actual audio
– Long-distance telephone: 8000 samples/sec
– CD sound: 44,100 samples/sec
– High-fidelity music recordings
– Data presented with 16 bits (per audio channel)
Figure 1.12 The sound wave represented by the
sequence 0, 1.5, 2.0, 1.5, 2.0, 3.0, 4.0, 3.0, 0
Representing Sound
• MIDI stores directions for making sound
– Used in music synthesizers
– Encodes which instrument, note, and duration
– E.g. clarinet playing note D for two seconds
– Encodes the “sheet music”
– Different sound with different synthesizers
– Used in digital audio workstations, e.g. Garageband
1.5 The Binary System
The traditional decimal system is based on powers of ten.
• Number 375 = (3 x 100) + (7 x 10) + (5 x 1)
= (3 x 102) + (7 x 101) + (5 x 100)

The Binary system is based on powers of two.


• Binary number 1011 = (1 x 23) + (0 x 22) + (1 x 21) + (1 x 20)
= 8 + 0 + 2 + 1 = 11 (in decimal system)
Figure 1.13 The base ten and binary systems
Figure 1.14 Decoding the binary
representation 100101
Figure 1.15 An algorithm for finding the
binary representation of a positive integer
Figure 1.16 Applying the algorithm in Figure
1.15 to obtain the binary representation of
thirteen
Figure 1.17 The binary addition facts
Binary addition

• Traditional addition in decimal system


1 (carry)
58
+ 27
85
• Addition of binary number
• Addition of binaries: 0 + 0 = 0, 1 + 0 = 1, 0 + 1 = 1, 1 + 1 = 10
• Example with longer numbers:
111 1 (carries)
111010
+ 11011
1010101
Fractions in Binary

• In decimal system we have decimal points: 375.23


• 375.23 = (3 x 102) + (7 x 101) + (5 x 100) + (2 x 10-1) + (3 x 10-2)
• In binary system we have radix points: 101.011
• The interpretation is in the same way
• 101.101 = (1 x 22) + (0 x 21) + (1 x 20) + (1 x 2-1) + (0 x 2-2) + (1 x 2-3)
= 4 + 0 + 1 + 1/2 + 0 + 1/8 = 5 + 0.5 + 0.125 = 5.625
Figure 1.18 Decoding the binary
representation 101.101
1.6 Storing Integers
• Simple binary notation has some problems:
• How to present negative numbers and calculate with them?
• Two’s complement notation: The most popular means of
representing integer values
• Excess notation: Another means of representing integer values
Two’s complement notation
• Most popular system for presenting integers
• Fixed number of bits to represent each of the values in the system
• Usually used with 32 bits, but here the idea with less bits
• Figure 1.19: tables with 3 bit and 4 bit patterns
• Positive numbers are the same as in simple binary
• Negative numbers
• Begin with 1
• When read from right to left are identical with positive numbers until first 1. After
that bits are reversed (complemented). See figure 1.20
• Number 6 = 0110; -6 = 1010
Figure 1.19 Two’s complement notation
systems
Figure 1.20 Coding the value -6 in two’s
complement notation using four bits
Two’s complement notation
• Addition with two’s complement notation
• Same algorithm as in simple binary addition, but all bit patterns have the
same length
• Add 0:s in the beginning – if needed
• Truncate the result, if necessary
• Example: 0101 + 0010 = 0111; 0111 + 1011 = 0010 (not 10010!)
• See Figure 1.21 (All numbers presented in 2’s complement)
• Works fine also in subtractions, just use negative numbers in additions.
Figure 1.21 Addition problems converted to
two’s complement notation
The Problem of Overflow
• There is a limit to the size of the values that can be represented
in any system
• With 3 bits range is: 3 – (-4), with 4 bits range is: 7 – (-8)
• With 16 bits: 32 768
• With 32 bits (common today) : 2 147 483 647
• 16-bit systems have been upgraded to 32-bit systems
• Overflow
– occurs when a computation produces a value that falls outside
the range of values that can be represented in the machine
– In two’s complement additions:
– If the resulting sign bit is incorrect, an overflow has occurred
– In practice: If summing two positive numbers gives a negative result or
adding two negative numbers given positive result
– Example: 5 + 4 = 0101 + 0100 = 1001 = -7 Negative sing!
Excess notation
• Another way to represent positive and negative integer values
• Same bit patterns as in two’s complement but different interpretation, see
table 1.22
• Positive numbers (also 0): most significant bit is 1
• Negative numbers: most significant bit is 0
• Excess eight: number 8 has been added to the number and the binary
result is shown:
• Example: 1100 = 12, 12 – 8 = 4 in excess eight; “exceeds the number by the eight”
• Example: 0000 = 0, 0 – 8 = -8 in excess eight

• There is also excess 16 notation, which uses 5 bits


• Excess eight is used in storing fractions in floating point notation
Figure 1.22 An excess eight conversion table
Figure 1.23 An excess notation system using
bit patterns of length three
1.7 Storing Fractions
• Storing of decimal numbers requires a fractional part
• Floating-point Notation: Consists of a sign bit, a mantissa field,
and an exponent field.
– Finite storage space limits the precision
– The basic idea using only one byte (8 bits) (Figure 1.24)
– First bit: sign (0 = non-negative, 1 = negative)
– Next 3 bits: exponent field
– Last 4 bits: mantissa field
– Normalized form: fill the mantissa starting with the left-most 1
– Example: 01101011
– 0: positive
– 110: exponent -> interpreted with 3-bit excess = 2 (see Figure 1.23)
– 1011: mantissa -> .1011 (interpretation with radix point in left side)
– => + 10.11 = 2 + ½ + ¼ = 2 ¾
– Example to the other direction in Figure 1.25
1.7 Storing Fractions
– Example: 11101011
– 1: negative
– 110: exponent -> interpreted with 3-bit excess = 2 (see Figure 1.23)
– 1011: mantissa -> .1011 (interpretation with radix point in left side)
– => - 10.11 = - 2 + ½ + ¼ = - 2 ¾
– Example: 00111100 -> 0 011 1100
– 0: non-negative
– 011: -1
– 1100 -> 0.01100 = ¼ + 1/8 = 3/8
– Reverse example: 1 1/8 -> 1.001 (in binary)
– First 1 should be at the right side on radix point .1001 (normalized
form) => mantissa exponent should be 1 = 101 (binary)
– Sign is positive = 0
– => 0 101 1001
Figure 1.24 Floating-point notation
components
Truncation (Round-off) Errors
• Figure 1.25 shows truncation problem, when there is not
enough space for bits
• Encoding results in value 2 ½ which is not 2 5/8
• Occurs when part of the value being stored is lost because the
mantissa is not large enough
• Can be resolved by using longer mantissa field – today at least
32 bits
• Also: some values can’t be presented accurately (1/3 in decimal
system; 1/10 in binary system)
• Non-terminating expansions of fractions
– This happens more often with binary notation
– Often these values are converted to integers
Figure 1.25 Encoding the value 2 5⁄8
Numerical Analysis
• The study of dealing with problems when computing large
values that require significant accuracy
• The order in which values are added can lead to two different
results
• Adding very small values to very large values can result in
errors
• Errors might occur if the difference between numbers differs by 1016 or
more
• Tip: first add small numbers and after that the result with the large
number
• Todays’ commercial program usually work fine
1.9 Data Compression
• To reduce the amount of data in storage or transfer
the data is compressed while retaining information
• Lossy versus lossless
• Lossless keeps all information
• Lossy looses some (unnecessary?) information,
provides more compression (E. g. audio and video)
• Run-length encoding
• E. g. in images repeating same pixel is replaced by the
pixel and how many times it is repeated:
• 1111111111 vs. 10 1’s
1.9 Data Compression
• Frequency-dependent encoding
• Example: Huffman codes: frequency sorted binary tree
• Data is coded so that most frequent patterns have
shortest codes
• E. g. English text
• Most common letters are e, t, a, I -> shortest codes
• Least frequent letters are z, q, x, -> longest codes
• Relative encoding
• Record changes in data units
• Used in motion pictures: do not code the whole picture
but changes
1.9 Data Compression

• Dictionary encoding: a dictionary is used and only


references to that are included in the message
• Entries can be lossless or lossy
• E. g.: A word processors contain dictionary for spelling
checker
• An entire word can be coded as a single reference
• Needs less space than UTF-8
1.9 Data Compression

• Adaptive dictionary encoding:


• LZW encoding (Lempel-Ziv-Welsh)
• Starts with basic building block (e. g. letters)
• Adds found larger units to dictionary
• Only basic dictionary passed with message
• Example:
• xyx xyx xyx xyx -> 1 = x, 2 = y, “ “ = 3, xyx = 4 => 121343434
Compressing Images
• Images encoded using bit map need a lot space, usually
compressed
• GIF:
• Reduces number of colors: 256-bits for pixels
• red-green-blue combinations stored in a table (dictionary)
• Pixel stored in one byte
• Lossy system
• Good for cartoons
Compressing Images
• JPEG:
• Used in most digital cameras
• Also lossless mode (not so compressed, rarely used)
• Takes advantage of human eye’s limitations
• Takes average of chrominance values of two-by-two pixel squares
• Divides image into eight-by-eight pixel blocks -> compression in each
block
• Run length encoding, relative encoding, variable-length encoding
• Result: compression by the factor 10 – 30, no noticeable loss
• Good for photographs
• TIFF:
• RGB without compression
• Same compression methods as in GIF
• Good for image archiving
Compressing Audio and Video
• Video: MPEG
– Sequence of pictures
– Some I-Frames are encoded in the same way as JPEG
– Other frames with relative encoding
– High definition television broadcast
– Video conferencing
– One hour of video within 128MB
– Video transported through transfer rates of 40Mbps (bits per
second)
Compressing Audio and Video
• Audio: MP3
– Developed with MPEG
– Takes advantage of the properties of human ear
– Temporal masking
– Short periods after loud sound masked
– Frequency masking
– Some frequencies mask nearby frequencies
– Still near CD quality sound
– 400 popular songs in one gigabyte
– Audio transferred through 64 Kbps (bits per second)
1.10 Communication Errors
• Transferred data can be corrupted
• Dirt or grease on magnetic recording surface
• Malfunctioning circuit
• Corrupted transmission path
• Background radiation
• Goal: To reduce errors and increase the reliability of computing
equipment
• Encoding to detect and even correct errors
1.10 Communication Errors
• Parity bits (even versus odd)
• Bit pattern can be manipulated to have odd number of 1’s
• A parity bit (0/1) added to keep (0) or make (1) number of 1’s odd
• Example in Figure 1.26
• Used for example in computer’s main memory
• Can detect only errors of one-bit, two-bit error is not detected

• Check bytes
• Many related parity bits are collected to a byte -> check byte
• Collection of errors can be detected
• More general version is a Checksum where a function is used to generate the
reference value
Figure 1.26 The ASCII codes for the letters A
and F adjusted for odd parity
1.10 Communication Errors
• Parity bits only detect an error
• Error correcting codes
• Can correct errors
• Hamming Distance: the number of bits in which two bit patterns
differ
• Example Figure 1.27
• Hamming distance between patterns representing A and B is four, C and D is three
• Note in Figure 1.27 distance between bit patterns is at least three
• If a single error occurs result is not a legal pattern, and we can figure out what the
original pattern was
• To decode a message which was encoded with 1.27, we must find the reverse
code that is the same or differs with 1 bit
• See Figure 1.28
• Also longer codes can be made to correct more errors
• Used in magnetic disks and CD disks
Figure 1.27 An error-correcting code
Figure 1.28 Decoding the pattern 010100
using the code in Figure 1.27

You might also like