Web Scraping With Selenium Python: Delayed JavaScript Rendering

Name: Web Scraping With Selenium Python: Delayed JavaScript Rendering
Uploaded: Sep 25, 2023
Duration: 462 s

Decodo (formerly Smartproxy)9.4K subscribers

4.2K views

Sep 25, 2023

7:42

Wanna learn to web scrape with Selenium? In this Web Scraping With Selenium Python tutorial, you'll learn how to handle dynamic content with delayed JavaScript rendering. Moreover, it will teach you how to scrape in headless and headful modes. 🚀 Try Decodo (formerly Smartproxy) proxies today: https://decodo.com/proxies/residential-proxies?utm_source=SP&utm_medium=youtube&utm_campaign=web_scraping ⚙️ You can find Selenium documentation here: https://www.selenium.dev/documentation/ ⚙️ Beautiful Soup documentation: https://readthedocs.org/projects/beautiful-soup-4/ ⚙️ Find the full code archive on our GitHub: https://github.com/Decodo/selenium-delayed-js The requirements for the code: webdriver-manager selenium bs4 Copy the code: from webdriver_manager.chrome import ChromeDriverManager from selenium import webdriver from selenium.webdriver.chrome.service import Service from selenium.webdriver.common.by import By from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC from extension import proxies from bs4 import BeautifulSoup import json username = ' ' password = ' ' endpoint = 'gate.decodo.com' port = '7000' _# Set up Chrome WebDriver_ chrome_options = webdriver.ChromeOptions() proxies_extension = proxies(username, password, endpoint, port) chrome_options.add_extension(proxies_extension) _# chrome_options.add_argument("--headless=new")_ chrome = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=chrome_options) _# Open the desired webpage_ url = "https://quotes.toscrape.com/js-delayed/" chrome.get(url) _# Wait for the "quotes" divs to load_ wait = WebDriverWait(chrome, 30) quote_elements = wait.until(EC.presence_of_all_elements_located((By.CLASS_NAME, "quote"))) _# Extract the HTML of all "quote" elements, parse them with BS4 and save to JSON_ quote_data = [] for quote_element in quote_elements: print(quote_element.get_attribute("outerHTML")) soup = BeautifulSoup(quote_element.get_attribute("outerHTML"), 'html.parser') quote_text = soup.find('span', class_='text').text author = soup.find('small', class_='author').text tags = [tag.text for tag in soup.find_all('a', class_='tag')] quote_info = { "Quote": quote_text, "Author": author, "Tags": tags } quote_data.append(quote_info) with open('quote_info.json', 'w') as json_file: json.dump(quote_data, json_file, indent=4) _# Close the WebDriver_ chrome.quit() 💡 For more web scraping with Python tutorials, check out our playlist: https://youtube.com/playlist?list=PL7pslqhZ89OjfDEEBkUrLHYZezzW0vYZX ❓ Why use Python for web scraping? Python is considered one of the most efficient programming languages for web scraping. It is general-purpose and has a variety of web scraping frameworks and libraries, such as Selenium, Beautiful Soup, and Scrapy. What's more, web scraping with Python is easy to learn, even for beginners, thanks to its shallow learning curve.

Download

1 formats

Video Formats

360pmp413.8 MB

Download

Right-click 'Download' and select 'Save Link As' if the file opens in a new tab.