Web And API Scraping
The Underground Guide
Fundamental concepts of web scraping
Ethics and legality of web scraping
Common data scraping applications and ways to monetise it
Dealing with scraping countermeasures to scrape sites/apps that don't want to be scraped
Helpful tools for web and API scraper development: Chrome Devtool, curl, mitmproxy and others
Fetching and parsing HTML
Data extraction with XPath queries, CSS selector and regular expressions
Reverse engineering private API of web and mobile apps for data scraping purposes (API scraping)
How to get serious with Scrapy framework
What is Selenium and why you don't need it most of the time
For anyone who knows basic of Python and would benefit from scraping some data...
Web developers
Tech entrepreneurs
Security professionals
Ecommerce merchants
Data analysts and scientists
OSINT investigators
Digital marketers and growth hackers
This course will introduce you to tools and techniques to scrape data even from the sites that are utilizing anti-bot technologies. Learn to automatically gather data with Python libraries such as requests, BeautifulSoup, lxml and Scrapy.
You don't always need to parse HTML. Sometimes data is available in structured form through a private API for easy extraction. We will show you how.
We introduce tools and techniques to reverse engineer mobile app network communications to reproduce requests for automation and data extraction.
Three real-world examples of scraping projects that you may be doing as a freelancer.
01. HTTP
FREE PREVIEW02. HTML & CSS
03. JavaScript
04. REST, JSON, XML
Resources
01. Chrome developer tools
FREE PREVIEW02. curl
03. curl.trillworks.com
04. wget
05. mitmproxy
06. youtube-dl
07. tmux
08. crontab-ui
09. Vagrant
Resources
01. Fetching HTML with Python
FREE PREVIEW02. Parsing HTML with BeautifulSoup
03. Parsing HTML with lxml
04. XPath
05. Traversing the pages
06. Regular expressions
07. Using pandas to parse HTML tables
08. Using js2xml to get data from Javascript
09. Leveraging JSON inside HTML pages
10. Using CSS selectors for scraping
11. Programmatic browser control with Selenium
Resources
01. Introduction to Scrapy
FREE PREVIEW02. Scrapy shell
03. Selectors
04. Spiders
05. Items
06. Pipelines
07. Middlewares
08. Extensions
09. Scrapy CLI
10. Scrapy Cloud
Resources
01. Scraping public APIs
FREE PREVIEW02. Discover hidden web APIs
03. Scraping hidden web APIs
04. Setting up mitmproxy with iOS device
05. Setting up mitmproxy with Android device
06. Scraping private API of mobile app
Resources
01. Getting your headers and cookies right
FREE PREVIEW02. Doing it slower
03. Proxy servers and proxy pools
04. Countering 3rd party countermeasures
05. Captcha solving services
06. undetected-chromedriver
Resources
01. Introducing email harvesting through Google
02. Harvesting emails from Google SERP
03. Harvesting emails from LinkedIN profiles
01. Introducing Zillow FSBO scraping
02. Implementing Scrapy project
01. Introducing GOAT API scraping
02. Scraping sneaker data from GOAT API
01. Summary
02. Homework
03. Further learning
Resources
Email [email protected] for any inquiries.