DAP Module4
DAP Module4
Module-4
Web Scraping And Numerical Analysis
Prepared by
ROOPA H M
Assistant Professor
RNS INSTITUTE OF TECHNOLOGY
• Submitting a form
• CSS Selectors.
• The main purpose of web scraping is to collect and analyze data from
websites for various applications, such as research, business intelligence, or
creating datasets.
• Developers use tools and libraries like BeautifulSoup (for Python), Scrapy, or
Puppeteer to automate the process of fetching and parsing web data.
• requests
• Beautiful Soup
• Selenium
• Used to extract tables, lists, paragraph and you can also put filters to extract
information from web pages.
• BeautifulSoup does not fetch the web page for us. So we use requests pip
install beautifulsoup4
BeautifulSoup
print(type(soup))
Tag Object
• This object is usually used to extract a tag from the whole HTML document.
• Beautiful Soup is not an HTTP client which means to scrap online websites
you first have to download them using the requests module and then serve
them to Beautiful Soup for scraping.
• This object returns the first found tag if your document has multiple tags with the same name.
from bs4 import BeautifulSoup
# Initialize the object with an HTML page
soup = BeautifulSoup('''
<html>
<b>RNSIT</b>
<b> Knowx Innovations</b>
</html>
''', "html.parser")
# Get the tag
tag = soup.b
print(tag)
# Print the output
print(type(tag))
• The tag contains many methods and attributes. And two important features of a tag are
its name and attributes.
• Name:The name of the tag can be accessed through ‘.name’ as suffix.
• Attributes: Anything that is NOT tag
# Import Beautiful Soup
from bs4 import BeautifulSoup
# Initialize the object with an HTML page
soup = BeautifulSoup('''
<html>
<b>Knowx Innovations</b>
</html>
''', "html.parser")
# Get the tag
tag = soup.b
# Print the output
print(tag.name)
# changing the tag
tag.name = "Strong"
print(tag)
from bs4 import BeautifulSoup
# Initialize the object with an HTML page
soup = BeautifulSoup('''
<html>
<b class=“RNSIT“ name=“knowx”>Knowx Innoavtions</b>
</html>
''', "html.parser")
# Get the tag
tag = soup.b
print(tag["class"])
# modifying class
tag["class"] = “ekant"
print(tag)
# delete the class attributes
del tag["class"]
print(tag)
• A document may contain multi-valued attributes and can be accessed using key-value pair.
• BeautifulSoup provides several methods for searching for tags based on their contents,
such as find(), find_all(), and select().
• The find_all() method returns a list of all tags that match a given filter, while the find()
method returns the first tag that matches the filter.
• You can use the text keyword argument to search for tags that contain specific text.
Select method
• The select method allows you to apply these selectors to navigate and
extract data from the parsed document easily.
CSS Selector
• Id selector (#)
• Class selector (.)
• Universal Selector (*)
• Element Selector (tag)
• Grouping Selector(,)
CSS Selector
• Id selector (#) :The ID selector targets a specific HTML element based on its unique
identifier attribute (id). An ID is intended to be unique within a webpage, so using the ID
selector allows you to style or apply CSS rules to a particular element with a specific ID.
#header {
color: blue;
font-size: 16px;
}
• Class selector (.) : The class selector is used to select and style HTML elements based on
their class attribute. Unlike IDs, multiple elements can share the same class, enabling
you to apply the same styles to multiple elements throughout the document.
.highlight {
background-color: yellow;
font-weight: bold;
}
CSS Selector
• Universal Selector (*) :The universal selector selects all HTML elements on the webpage.
It can be used to apply styles or rules globally, affecting every element. However, it is
important to use the universal selector judiciously to avoid unintended consequences.
*{
margin: 0;
padding: 0;
}
• Element Selector (tag) : The element selector targets all instances of a specific HTML
element on the page. It allows you to apply styles universally to elements of the same
type, regardless of their class or ID.
p{
color: green;
font-size: 14px;
}
• Grouping Selector(,) : The grouping selector allows you to apply the same styles to
multiple selectors at once. Selectors are separated by commas, and the styles specified
will be applied to all the listed selectors.
h1, h2, h3 {
font-family: 'Arial', sans-serif;
color: #333;
}
• These selectors are fundamental to CSS and provide a powerful way to target and style
different elements on a webpage.
• You can create the instance of webdriver by using class webdriver and a browser which
you want to use
• Ex: driver = webdriver.Chrome()
• Browsers:
– webdriver.Chrome()
– webdriver.Firefox()
– webdriver.Edge()
– webdriver.Safari()
– webdriver.Opera()
– webdriver.Ie()
Find the element
X = 10000
While Python’s array object provides efficient storage of array-based data, NumPy adds to
this efficient operations on that data.
NumPy is constrained to arrays that all contain the same type. If types do not match, NumPy will upcast if possible
If we want to explicitly set the data type of the resulting array, we can use the dtype keyword:
NumPy arrays have a fixed type. This means, for example, that if you attempt to insert a floating-point value
to an integer array, the value will be silently truncated.
• The reshape method will use a no-copy view of the initial array, but with noncontiguous
memory buffers this is not always the case.
• Computation on NumPy arrays can be very fast, or it can be very slow. The key to making
it fast is to use vectorized operations, generally implemented through NumPy’s universal
functions (ufuncs).
• NumPy’s ufuncs can be used to make repeated calculations on array elements much
more efficient.
• This vectorized approach is designed to push the loop into the compiled layer that
underlies NumPy, leading to much faster execution.
If we had instead written y[::2] = 2 ** x, this would have resulted in the creation of
a temporary array to hold the results of 2 ** x
• A reduce method repeatedly applies a given operation to the elements of an array until
only a single result remains.
• For example, calling reduce on the add ufunc returns the sum of all elements in the
array:
Note that for these particular cases, there are dedicated NumPy functions to compute the results
(np.sum, np.prod, np.cumsum, np.cumprod)
Roopa H M, Asst. Professor, MCA, RNSIT
Outer products
• Finally, any ufunc can compute the output of all pairs of two different inputs using the
outer method. This allows you, in one line, to do things like create a multiplication table: