Web scraping : How to extract a table using Python and Beautiful Soup

memudu alimatou sadia
3 min readJan 4, 2021

This article aims to demonstrate how to extract a table data format from a website using Beautiful Soup, Requests and Pandas. To understand this tutorial, we are expecting you to have the basic knowledge of how to scrape a website using Beautiful Soup.

Data are distributed on websites in different format, when a data is embedded in a table, you can scrape the data with ease using Python and Beautiful Soup. This article focused on how to achieve that, let’s begin.

Web Scraping is the automated process of extracting data from a website. The data can be of any format and in this tutorial we will focus on how to extract a table.

Let’s Talk about Beautiful Soup and Request.

Beautiful Soup is a Python package for parsing HTML and XML documents. It creates a parse tree for parsed pages that can be used to extract data from HTML, which is useful for web scraping.

Requests is a Python HTTP library, released under the Apache License 2.0. The goal of the project is to make HTTP requests simpler and more human-friendly.

Pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language, frequently used to manipulate data in a tabular format.

Let’s Start !

As a fan of football I want to extract the premier league table from Skysports. In the image below, The table we are going to extract is the one by the left, by right clicking on the website, we can inspect the page which provides the right side. Inspecting the web page reveals the website Html’s elements structure from which we can extract the table class name.

Let’s do this !

Step1: Import the Libraries and Make a request.

Request.session() is a Session object which allows you to persist certain parameters across requests. It also persists cookies across all requests made from the Session instance. Read more about it here. We used soup, the beautiful Soup object to find the table from the website by using its class name.

https://github.com/memudualimatou/web-scraping-Table/blob/main/images/8.PNG

Step2: Extract the data.

The read_html tag is used for defining a table in HTML and turn a table to a list of Dataframe objects. The Prettify() function in Beautiful Soup will enable us to view how the tags are nested in the document.

Step3: Visualize the extracted table data.

https://github.com/memudualimatou/web-scraping-Table/blob/main/images/5.PNG

The last column has no Value, so let’s delete it.

https://github.com/memudualimatou/web-scraping-Table/blob/main/images/6.PNG

Finally !

Now our data looks better! Now you know how to easily extract a table from a website using python and Beautiful Soup in three steps.

--

--