0% found this document useful (0 votes)
14 views

Chp3 Strings

Uploaded by

Tayyab Fawad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

Chp3 Strings

Uploaded by

Tayyab Fawad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 40

STRING PROCESSING

CHAPTER 3
INTRODUCTION TO STRINGS

• A string is a sequence of characters typically used to represent text.


• In programming languages, strings are stored in various ways, primarily depending on the
language's characteristics and the underlying data structures it supports.
• Strings can include letters, digits, spaces, punctuation marks, or any combination thereof.
• A finite sequence S of zero or more characters is called a string.
• The number of characters in a string is called its length. The string with zero characters is
called the empty string or the null string.
• Specific strings will be denoted by enclosing their characters in single quotation marks.
The quotation marks will also serve as string delimiters.
• Hence 'THE END' 'TO BE OR NOT TO BE' ' ', ‘ ‘ are strings with lengths 7, 18, 0 and
2, respectively
STORING STRINGS

Strings are stored in three types of structures:


(1) fixed length structures,
(2) variable-length structures with fixed maximums and
(3) linked structures.
RECORD-ORIENTED, FIXED-LENGTH STORAGE

• In fixed-length storage each line of print • The main disadvantages are:


is viewed as a record, where all records (1) Time is wasted reading an entire record if most
have the same length, i.e., where each of the storage consists of inessential blank spaces.
record accommodates the same number (2) Certain records may require more space than
of characters. available.
(3) When the correction consists of more or fewer
characters than the original text, changing a
misspelled word requires the entire record to be
changed.
VARIABLE-LENGTH STORAGE WITH FIXED
MAXIMUM
• The storage of variable-length strings in
memory cells with fixed lengths can be
done in two general ways:
(1) One can use a marker, such as two dollar
signs ($$), to signal the end of the string.
(2) One can list the length of the string—as
an additional item in the
pointer array.
To store strings one after another by using
• some separation marker, such as the two dollar
signs ($$) in Fig(a), or by
• using a pointer array giving the location of the
strings.
These ways of storing strings will obviously save
space.
However, such methods of storage are usually
inefficient when the
strings and their lengths are frequently being changed.
• The fixed length memory cells doesn’t allow deleting, changing and inserting words,
phrases .
• Accordingly, for most extensive word processing applications, strings are stored by
means of linked lists.
• By a (one-way) linked list, we mean a linearly ordered sequence of memory cells, called
nodes, where each node contains an item, called a link, which points to the next node in
the list
• Figure (a) shows how the string would appear in memory with one character per node,
• Fig.(b) shows how It would appear with four characters per node.
CHARACTER DATA TYPE

• VARIABLES
• CONSTANTS
Character variables fall into three categories
• Many programming languages denote
• By a static character variable, we mean a variable
string constants by placing the string in
whose length is defined before the program is executed
either single or double quotation marks. and cannot change throughout the program.
For example, 'THE END' and ‘TO BE
• By a semistatic character variable, we mean a variable
OR NOT TO BE’ are string constants of whose length may vary during the execution of the
lengths 7 and 18 characters respectively. program as long as the length does not exceed a
• . maximum value determined by the program before the
program is executed.
• By a dynamic character variable, we mean a variable
whose length can change during the execution of the
program
STRING OPERATIONS

• A string may be viewed simply as a sequence or linear array of characters.


• Consider, for example, the string 'TO BE OR NOT TO BE'
• We may view the string as the 18-character sequence T, O, , B, …, E.
• However, the substrings TO, BE, OR, … have their own meaning.
• On the other hand, consider an 18-element linear array of 18 integers, 4, 8, 6, 15, 9, 5, 4,
13, 8, 5, 11, 9, 9, 13, 7, 10, 6, 1
1. SUBSTRING

• Accessing a substring from a given string requires three pieces of information:


(1) the name of the string or the string itself,
(2) (2) the position of the first character of the substring in the given string and
(3) the length of the substring or the position of the last character of the substring
• SUBSTRING(string, initial,length)
• to denote the substring of a string S beginning in a position K and having a length L.
2.INDEXING

• Indexing, also called pattern matching, refers to finding the position where a string
pattern P first appears in a given string text T.
• We call this operation INDEX and write INDEX(text, pattern).
• If the pattern P does not appear in the text T, then INDEX is assigned the value 0.
strcpy(s1, s2);
1
Copies string s2 into string s1.
strcat(s1, s2);
2
Concatenates string s2 onto the end of string s1.
strlen(s1);
3
Returns the length of string s1.
strcmp(s1, s2);
4 Returns 0 if s1 and s2 are the same; less than 0 if
s1<s2; greater than 0 if s1>s2.
strchr(s1, ch);
5 Returns a pointer to the first occurrence of
character ch in string s1.
strstr(s1, s2);
6 Returns a pointer to the first occurrence of string
s2 in string s1.
3.CONCATENATION

• Let S1 and S2 be strings.


• The concatenation of S1 and S2, which we denote by Sl//S2, is the string consisting of the
characters of S1 followed by the characters of S2.
STRCMP()

• The strcmp() function in C++ compares two null-terminating strings (C-strings). The
comparison is done lexicographically. It is defined in the cstring header file.
• char str1[] = "Megadeth";
• char str2[] = "Metallica";
• int result = strcmp(str1, str2);
SUBSTRING
4.LENGTH

• The number of characters in a string is called its length.


• LENGTH(string)
• LENGTH('COMPUTER') = 8 LENGTH('0') = 0 LENGTH(' ') = 0
• Some of the programming languages denote this function as follows:
• PL/1: LENGTH(string)
• BASIC: LEN(string)
• UCSD Pascal: LENGTH(string)
• SNOBOL: SIZE(string)
WORD PROCESSING

• The following operations can be executed by using the string operations discussed in the
preceding section
(a) Replacement. Replacing one string in the text by another.
(b) Insertion. Inserting a string in the middle of the text.
(c) Deletion. Deleting a string from the text.
INSERTION

• Suppose in a given text T we want to insert a string S so that S begins in position K. We


denote this operation by INSERT(text, position, string)
INSERT ('ABCDEFG', 3, 'XYZ') = 'ABXYZCDEFG'
INSERT ('ABCDEFG', 6, 'XYZ') = 'ABCDEXYZFG'
• This INSERT function can be implemented by using the string operation as follows:
INSERT(T, K, S) = SUBSTRING(T, 1, K − 1) //S// SUBSTRING(T, K, LENGTH(T) − K
+ 1)
DELETION

• Suppose in a given text T we want to delete the substring which begins in position K and
has length L. We denote this operation by DELETE(text, position, length)
• DELETE(' ABCDEFG ', 4, 2) = ' ABCFG ‘
• DELETE(' ABCDEFG ', 2, 4) = ' AFG ‘
• We assume that nothing is deleted if position K = 0. Thus DELETE(' ABCDEFG ', 0, 2)
= ' ABCDEFG '
REPLACEMENT

• Suppose in a given text T we want to replace the first occurrence of a pattern P1 by a


pattern P2 .
• We will denote this operation by REPLACE(text, pattern1 , pattern2 )
• For example REPLACE('XABYABZ', 'AB', 'C') = 'XCYABZ’
• REPLACE('XABYABZ' , 'BA', 'C') = 'XABYABZ’
• In the second case, the pattern BA does not occur, and hence there is no change.
• We note that this REPLACE function can be expressed as a deletion followed by an
insertion if we use the preceding DELETE and INSERT functions.
• Specifically, the REPLACE function can be executed by using the following three steps:
• K := INDEX(T, P1 )
• T := DELETE(T, K, LENGTH(P1 ))
• INSERT(T, K, P2 )
• The first two steps delete P1 from T, and the third step inserts P2 in the position K from
which P1 was deleted.
PATTERN MATCHING ALGORITHMS

• First Pattern Matching Algorithm


• The first pattern matching algorithm is the obvious one in which we compare a given
pattern P with each of the substrings of T, moving from left to right, until we get a match.
• let WK = SUBSTRING(T, K, LENGTH(P)) That is, let WK denote the substring of T having
the same length as P and beginning with the Kth character of T.
• First we compare P, character by character, with the first substring, W1 .
• If all the characters are the same, then P = W1 and so P appears in T and INDEX(T, P) = 1.
• On the other hand, suppose we find that some character of P is not the same as the
corresponding character of W1 .
• Then P ≠ W1 and we can immediately move on to the next substring, W2 .
• The maximum value MAX of the subscript K is equal to LENGTH(T) − LENGTH(P) + 1
• Let us assume, that P is a 4-character string and that T is a 20-character string, and that P
and T appear in memory as linear arrays with one character per element.
• That is, P = P[1]P[2]P[3]P[4] and T = T[1]T[2]T[3] ... T[19]T[20] Then P is compared
with each of the following 4-character substrings of T: W1 = T[1]T[2]T[3]T[4], W2 =
T[2]T[3]T[4]T[5], ..., W17 = T[17]T[18]T[19]T[20] Note that there are MAX = 20 − 4 +
1 = 17 such substrings of T.

You might also like