0% found this document useful (0 votes)
9 views

03 Strings

Uploaded by

micklanape
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

03 Strings

Uploaded by

micklanape
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 67

Manipulating Texts

Learning Outcomes
● Python built-in functions
● Work with numeric data and string data
● Objects and methods
● Formatting numbers and strings
Built-in functions
● Python provides many useful functions for common
programming tasks.
● A function is a group of statements that performs a specific
task.
● You have already used the functions eval , input , print , and
int ,...
○ These are built-in functions and they are always available
in the Python interpreter.
○ You don’t have to import any modules to use these
functions.

3
Math module
import math

Mathematical
functions and
constants
E.g. math.pi

4
Math module
# import math module to use the math functions
import math

# Test algebraic functions


print("log10(10, 10) =", math.log(10, 10) )
print("sqrt(4.0) =", math.sqrt(4.0) )

# Test trigonometric functions


print("sin(PI / 2) =", math.sin(math.pi / 2) )
print("cos(PI / 2) =", math.cos(math.pi / 2) )
print("tan(PI / 2) =", math.tan(math.pi / 2) )
print("degrees(1.57) =", math.degrees(1.57) ) 5

print("radians(90) =", math.radians(90) ) 5


Manipulating text[1]
∙ Why is text so important in computing applications?
− In data analysis, we work with a lot of text
● e.g. we have to mine large texts such as news feeds or
social media posts to extract information of interest by
searching for specific patterns….
− Another reason: As developers and scientists, the programs
that we write often need to work as part of a pipeline,
alongside other programs that have been written by other
people.
∙ To do this, we'll often need to write code that can understand the
output from some other program (we call this parsing) or produce
output in a format that another program can operate on. Both of these
tasks require manipulating text. 6

6
String
∙ To represent text , we use the “String” type in Python
− A string is a sequence of characters
− Python treats characters and strings the same way.
− Enclosed within double quotes(“) or single quotes(‘).
− Example:
>>> message ="Hello World"
>>> print(message)
Hello World
∙ We can also use special characters to define text:
>>> message="Hello\nWorld"
>>> print(message)
Hello
7
World
7
Special characters
● \n
● \t
● \\
● \’
● \”

>>> print("He said, \"John's program is easy to read\"")


He said, "John's program is easy to read"

8
Objects of type String
∙ Objects of type String (str) are used to represent strings of characters.
− E.g. 'abc' or "abc"
− E.g. '123' denotes a string of three characters, not the number one
hundred twenty-three.

∙ Exercise: Try typing the following expressions in to the Python


interpreter
∙ >>> 'a'
>>> 3*4
>>> 3*'a'
>>> 3+4
>>> 'a'+'a'

9
Objects and Methods
● In Python, all data—including numbers and strings—are
actually objects.
● In Python, a number is an object, a string is an object, …
● Objects of the same kind have the same type.
● You can use the type() function to get the class/type of an
object.
● You can perform operations on an object. The operations are
defined using functions.
● The functions for the objects are called methods in Python.
Methods can only be invoked from a specific object.

10
Input String [1]
∙ A string value can be input using the input() method
>>> firstName = input(“Please enter your name: ”)
∙ All values input through the input functions are strings.
∙ Strings containing digits are converted to numbers using the
eval() function.

11
Storing strings in variables
∙ We can take a string and assign a name to it using an
equals sign – we call this a variable:
>>> my_name = "Something”
>>> print(my_name)
Something

∙ Note: When we use the variable in a print statement, we


don't need any quotation marks – the quotes are part of
the string, so they are already "built in" to the variable
my_name.

12
Storing strings in variables (cont.)
∙ We can change the value of a variable as many times as
we like once we've created it:
>>> my_name = "Something"
>>> print(my_name)
Something
# change the value of my_name
>>> my_name = "Another Thing"
Another Thing

13
String Indexing
∙ Indexing can be used to extract individual characters from a
string.
∙ In Python, all indexing is zero-based.
− Typing 'abc'[0] into the interpreter will cause it to display
the string 'a' .
− Typing 'abc'[3] will produce the error message
IndexError: string index out of range .
− Since Python uses 0 to indicate the first element of a
string, the last element of a string of length 3 is accessed
using the index 2.

14
String Indexing (cont.)

∙ Indexing is used in string expressions to access a specific


character position in the string

∙ The general form for indexing is:


<string>[<expr>]

∙ Note: <expr> can be an integer value, an integer variable or


expression that gives an integer as result; its value
determines which character is selected from the string

15
String Indexing

∙ Visually, “Hello Bob” can be represented as in the


diagram below:
0 1 2 3 4 5 6 7 8
H e l l o B o b

∙ Notice that the string index is numbered and starts


with 0 and ends with n – 1 (n being the string
length)
∙ Negative numbers are used to index from the end of
a string.
○ E.g. the value of 'abc'[-1] is 'c'.
16
Indexing – Example

>>> greet = “Hello Bob”

>>> greet[0]
‘H’

>>> print (greet[0], greet[2], greet[4])


Hlo

>>> x = 8
>>> print (greet[x–2])
B
17
Exercise 3.1

∙ Write a program to input a text and to display


the characters at locations 3, 4 and 10. You
can assume that the input text is long enough.

18
Tools for manipulating strings
∙ So far we have shown that we can store and print strings
∙ But Python also provides the facilities for
manipulating strings.
∙ Python has many built-in functions for carrying out
common operations, and in the following slides we'll
take a look at them one-by-one.

19
Concatenation
∙ We can concatenate (stick together) two strings using the +
symbol.
∙ This symbol will join together the string on the left with the
string on the right:
>>> my_name = "John" + "Smith"
>>> print(my_name)
JohnSmith
∙ We can also concatenate variables that point to strings:
>>> firstname = "John”
>>> my_name = firstname + "Smith"
# my_name is now "JohnSmith"

20
Concatenation (cont.)
∙ We can even join multiple strings together in one go:
>>> upstream = "AAA"
>>> downstream = "GGG"
>>> my_dna = upstream + "ATGC" + downstream
# my_dna is now "AAAATGCGGG"
∙ Note: the result of concatenating two strings together is itself
a string. So it's perfectly OK to use a concatenation inside a
print statement:
>>> print("Hello" + " " + "world")

21
Repetition

∙ * for repetition: builds a string by multiple concatenations


of a string with itself

∙ Example:

>>> 3 * “John”

‘JohnJohnJohn’

Note: The code 'a'*'a' produces the error message

TypeError: can't multiply sequence by non-int of type 'str'

22
Exercise 3.2

∙ Predict the output of the following Python


program:
a = “Rosalind”
b=“Franklin”
c=“!”
print(a,b,3*c)

23
Finding the length of a string
∙ The len built-in function takes a single argument (a string)
∙ len outputs a value (a number) that can be stored – we call
this the return value.
○ If we write a program that uses len to calculate the
length of a string, the program will run but we won't see
any output:
# this line doesn't produce any output
>>> len("SampleText")
● If we want to actually use the return value, we need to store
it in a variable, and then do something useful with it (like
printing it):
>>> text_length = len("SampleText")
>>> print(text_length)
24
Finding the length of a string (cont.)

∙ Consider this short program (calcTextLength.py) which


calculates the length of a text and then prints a message
telling us the length:

# store the text in a variable


my_text = "SampleText"
# calculate the length of the text and store it in a variable
text_length = len(my_text)
# print a message telling us the text length
print("The length of the text is " + text_length)

25
Finding the length of a string (cont.)
When we try to run the program we get the following error:
1 Traceback (most recent call last):
2 File "calcTextLength.py", line 6, in <module>
3 print("The length of the text is " + text_length)
4 TypeError: must be str, not int
● The error message (line 4) is short but informative: "cannot
concatenate 'str' and 'int' objects".
● Python is complaining that it doesn't know how to
concatenate a string (which it calls str for short) and a number
(which it calls int – short for integer).
● But Python has a built-in solution – a function called str
which turns a number into a string so that we can print it.
26
Finding the length of a string (cont.)

∙ Here's how we can modify our program to use it

# store the text in a variable


my_text = "SampleText"
# calculate the length of the text and store it in a variable
text_length = len(my_text)
# print a message telling us the text lenth
print("The length of the text is " + str(text_length))

27
Changing case
∙ We can convert a string to lower case by using a new type of syntax – a
method that belongs to strings.
∙ A method is like a function, but instead of being built in to the Python
language, it belongs to a particular type.
∙ The method we are talking about here is called lower, and we say that it
belongs to the string type. Here's how we use it:

my_text = "SampleText"
# print my_text a in lower case
print(my_text.lower())

∙ Notice how using a method looks different to using a function.


− When we use a function like print or len, we write the function name first and the
arguments go in parentheses:
print("SampleText")
len(my_text)
∙ We can also change the case to upper, using the upper() method
>>> print("SampleText".upper())
SAMPLETEXT 28
To test if a string is in upper/lower case

>>> uni='university of mauritius’


>>> uni
'university of mauritius'
>>> uni.islower()
True
>>> uni.isupper()
False

29
Replacement
∙ replace is another example of a useful method that
belongs to the string type
∙ it takes two arguments (both strings) and returns a copy
of the variable where all occurrences of the first string
are replaced by the second string.

30
Replacement (cont.)
∙ Example of replace :
str1 = "Java is a programming language"
# Calling function
str2 = str1.replace("Java","Python")
# Displaying result
print("Old String: \t",str1)
print("New String: \t",str2)

Output
Old String: Java is a programming language
New String: Python is a programming language
31
Slicing a string
∙ Slicing is used to extract substrings of arbitrary length.
∙ If s is a string, the expression s[start:end] denotes the
substring of s that starts at index start and ends at index
end-1 .
− For example, 'abc'[1:3] = 'bc' .
∙ If the value before the colon is omitted, it defaults to 0.
∙ If the value after the colon is omitted, it defaults to the
length of the string.
∙ Consequently, the expression 'abc'[:] is semantically
equivalent to the more verbose 'abc'[0:len('abc')]

32
Extracting part of a string - Slicing
∙ Note that in Python, the positions in a string start from zero(0)
up to the position (length_of_string-1)

∙ Slicing is an operation that allows us to access a contiguous


sequence of characters or substring from a string
∙ It can be thought of as a way of indexing a range of positions
in the string

∙ Syntax:
<string>[<start>:<end>]
− Note: Both start and end should be int-valued expressions

33
Extracting part of a string (cont.)
∙ Example of substring:
module = "Problem Solving Techniques"
# print positions three to five
print(module[3:5])
# positions start at zero, not one
print(module[0:6])
# if we use a stop position beyond the end, it's the same as using the end
print(module[0:60])

Output:
bl
Proble
Problem Solving Techniques 34
Extracting part of a string (cont.)
∙ If we just give a single number in the square
brackets, we'll just get a single character:
food = "pizza"
first_char = pizza[0]
print(first_char)

Output:
p

35
Extracting part of a string (cont.)
1 s = “Hello”
2 print(s[0]) ‘H’
3 print(s[4]) ‘o’
4 print(s[-1]) ‘o’ “Slices” can be taken with
5 print(s[1:3]) ‘el’ indices separated by a colon

6 print(s[2:]) ‘llo’ Third term in slice determines


7 print(s[:3]) ‘Hel’ step size.

8 print(s[::2]) ‘Hlo’
9 print(s[::-1]) ‘olleH’
10 print(len(s)) 5
36
Exercise 3.3
∙ Write a program that allows the input of a
movie title, followed by 2 integer values x
and y and displays the substring between
positions x and y inclusive in the movie title.

37
The “in” and “not in” operators
∙ in : membership operator : true if first string exists inside
second string
∙ not in :non-membership: true if first string does not exist
in second string
∙ Examples
>>> 'John' in 'Sir John Smith’
true
>>> 'x' in 'sample’
false

38
Counting and finding substrings
∙ A very common job in text analysis is to count the number
of times some pattern occurs in a text.
∙ In computer programming terms, what that problem
translates to is counting the number of times a substring
occurs in a string.
∙ The method that does the job is called count.
− It takes a single argument whose type is string, and
returns the number of times that the argument is found
in the variable.
− The return type is a number

39
Counting and finding substrings (cont.)

string = "Python is awesome, isn't it?"


substring = "is"

count = string.count(substring)

print("The count is:", count)

Output:
The count is: 2

40
Counting and finding substrings (cont.)
Count number of occurrences of a given substring
using start and end

string = "Python is awesome, isn't it?"


substring = "is"

count = string.count(substring,8,25)

print("The count is:", count)

Output:
The count is: 1

41
Exercise 3.4
∙ Write a program that allows the input of a
sentence and displays the count of ‘a’ and
‘s’ in the sequence.

42
Exercise 3.5
∙ Write a program that allows a user to input a
sentence and displays five (5) integers
(separated by spaces) counting the
respective number of times that each vowel
occurs in the sequences.

43
Exercise 3.6
∙ Write a program that allows a user to input a DNA sequence
(that can be made up of the alphabets ‘A’, ‘C’, ‘G’ and ‘T’ in
upper or lowercase).
∙ The program will then calculate and display the GC content
(total percentage of G and C) of that sequence.
∙ To calculate the GC content of a DNA sequence (which is
simply a string):
− we must find the sum of “G” and “C”
− divide that sum by the length of the string
− Then, multiply by 100
[Hint: you can use normal mathematical symbols like add (+), subtract (-),
multiply (*), divide (/) and parentheses to carry out calculations on numbers in
Python.]
44
Counting and finding substrings (cont.)
∙ A closely-related problem to counting substrings is
finding their location.
∙ What if instead of counting the number of ‘a’ in
our text we want to know where they are?
∙ The find method will give us the answer, at least
for simple cases.
− find takes a single string argument, just like count, and
returns a number which is the position at which that
substring first appears in the string (in computing, we
call that the index of the substring).

45
Counting and finding substrings (cont.)
∙ Remember that in Python we start counting from
zero rather than one, so position 0 is the first
character, position 4 is the fifth character, etc.
∙ Examples:
word = "problem"
print(word.find('p'))
print(word.find('ob'))
print(word.find('w'))

Output
0
2
46
-1
Counting and finding substrings (cont.)
>>> dna="aagtccgcgcgctttttaaggagccttttgacggc”
#search from position 0
>>> dna.find('ag')
1
# search from position 17, after the first occurrence
>>> dna.find(‘ag’,17)
18
>>> dna.find(‘ag’,19)
21
# same as find but search backwards
>>> dna.rfind(‘ag’)
21 47
Output Formatting
>>> print("The DNA sequence’s GC content is", gc_perc,"%")
The DNA sequence’s GC content is 53.06122448979592 %

∙ The value of the gc_perc variable has many digits following the
dot which are not very significant. You can eliminate the display
of too many digits by imposing a certain format to the printed
string
Formatting string value that is formatted
>>> print("The DNA sequence’s GC content is %5.3f %%" % gc_perc)
note the double % to print a % symbol
percent operator separating the formatting string
and the value to replace the format placeholder

48
Display Values Formatting

∙ A formatting specifier has this general form:


%<width>.<precision><type-char>

● The specifier starts with a % and ends with a character that


indicates the data type of the value being inserted

49
Formatting numbers

50
Formatting - Placeholders

∙ >>> print(“Hello %s %s, you may have won $%d!” % (“Mr.”, “Smith”, 10000))
Hello Mr. Smith, you may have won $10000!

● >>> print(‘This int, %5d, was placed in a field of width 5’ % (7))


This int, 7, was placed in a field of width 5

● >>> print(‘This int, %10d, was placed in a field of width 10’ % (7))
This int, 7, was placed in a field of width 10

51
Yet another Example

∙ >>> print(‘This float, %10.5f, has width 10 and precision 5.’ % (3.1415926))
This float, 3.14159, has width 10 and precision 5.

● >>>print(‘This float, %0.5f, has width 0 and precision 5.’ % (3.1415926))


‘This float, 3.14159, has width 0 and precision 5.’

● >>>import math
● >>>print("Compare %f and %0.20f" % (math.pi, math.pi))

Compare 3.141593 and 3.14159265358979311600

52
Formatting strings (s)

● The format specifier 20s specifies that the string is formatted


within a width of 20. By default, a string is left justified.
● To right-justify it, put the symbol > in the format specifier.
● If the string is longer than the specified width, the width is
automatically increased to fit the string.

53
Print
● print() automatically prints a linefeed ( \n ) to cause the output to
advance to the next line.
● If you don’t want this to happen after the print function is
finished, you can invoke the print function by passing a special
argument end

print("AAA", end=' ')


print("BBB", end='')
print("CCC", end='***')
print("DDD", end='***')
Output
AAA BBBCCC***DDD***

54
Exercise 3.7
∙ Calculating AT content
Here's a short DNA sequence:
ACTGATCGATTACGTATAGTATTTGCTATCATACATA
TATATCGATGCGTTCAT
Write a program that will print out the AT content of this
DNA sequence.
[Hint: you can use normal mathematical symbols like add
(+), subtract (-), multiply (*), divide (/) and parentheses to
carry out calculations on numbers in Python.]

55
The Split() function
∙ This function is used to split a string into a sequence of
substrings
∙ By default, it will split the string wherever a space occurs
>>> S="Hello String Library"
>>> S.split()
['Hello', 'String', 'Library']

∙ However, it can also split on a chosen character


>>> S="32,24,25,57"
>>> S.split(',')
['32', '24', '25', '57']

56
Exercise 3.8
An important process in Computational Biology consists of breaking
a sequence on a particular pattern.
Write a program that allows the input of a DNA sequence and splits
it on the pattern “ATG” into a number of subsequences. The program
should then display the list of subsequences.
Note: Ensure that your sequence contains a number of
occurrences of “ATG”

57
More String Operations
Function Description
s.capitalize() Copy of s with only the first character capitalised

s.capwords() Copy of s with first character of each word capitalised

s.center(width) Center s in a field of given width

s.count(sub) Count the number of occurrences of sub in s

s.find(sub) Find the first position where sub occurs in s

s.ljust(width) Like center, but s is left-justified

s.lower() Copy of s in all lowercase characters

s.lstrip() Copy of s with leading whitespace removed

58
String Operations
Function Description

s.replace(olssub, newsub) Replace all occurrences of oldsub in s with newsub

s.rfind(sub) Like find, but returns the rightmost position

s.rjust(width) Like center, but s is right-justified

s.rstrip() Copy of s with trailing whitespaces removed

s.split() Split s into a list of substrings

s.upper() Copy of s with all characters converted to upper case

59
String Operations
Function Description

s.split(separator) method splits a string into a list, using provided


separator. By default any whitespace is a separator

s.rsplit(separator, maxsplit) method splits a string into a list, starting from the right.
maxsplit specifies how many splits to do. Default value
is -1, which is "all occurrences"

s.strip(characters) removes any leading (spaces at the beginning) and


trailing (spaces at the end) characters (space is the
default leading character to remove)

60
Exercise 3.9
Write a program to input a string and output a new string where all
occurrences of the first char of the original string has been changed
to '$', except the first char itself.

Sample output
Input String : 'restart'
New String : 'resta$t'

61
Exercise 3.10
Write a Python program to input two strings and create a single
string using the two given strings, separated by a space and
swapping the first two characters of each string.

Sample output
Input first String: abc
Input second String: xyz
New String : xyc abz

62
Exercise 3.11
Write a Python program to input a string and return a new string
made of 4 copies of the last two characters of the original string
(length must be at least 2).

Sample output
Input first String: Python
New String : onononon

63
Exercise 3.12
Write a Python program to input a string and output the last part of a
string before a specified character..

Sample output
Input a String: https://www.w3resource.com/python-exercises/string
Input char: /
Output string: https://www.w3resource.com/python-exercises

64
Exercise 3.13
Write a Python program to input a floating point number and display
the number with no decimal places. [Hint: Use str.format]

Sample outputs
Input a floating point number: 3.1415926
Formatted Number with no decimal places: 3

Input a floating point number: -12.9999


Formatted Number with no decimal places: -13

65
Exercise 3.14
Write a program to print the following integers with zeros on the left
of specified width:

Original Number: 3
Formatted Number(left padding, width 2): 03

Original Number: 123


Formatted Number(left padding, width 6): 000123

[Hint: Use str.format]

66
Acknowledgments
● DGT1039Y lectures notes by Dr. Shakun Baichoo, FoICDT

You might also like