Every time you post a new article or upload a new video, you are creating a new object on the web. Google will index its URL, and Facebook will add it to the Open Graph.

Keeping track of your web objects inventory can help you to size how big is your website (the more content, the better) and check how it is indexed in Google, Facebook or any other social network.

Scrapy is a pretty useful python library to crawl domains. I’ve modified Kevin Jacobs’ implementation to get only the links and titles from the domain you want to scan.

Here are the installation steps:

pip3 install scrapy
scrapy startproject links_mapper
cd links_mapper
scrapy genspider leocelis leocelis.com

Then you need to modify items.py and leocelis.py (or whatever you named your spider.)

Once you are done you can run:

scrapy crawl leocelis -o links.csv -t csv

This last command will generate a links.csv file with a list of all the URLs and titles from the domain you’ve specified.

Did you like this post? Subscribe!

Powered by MailChimp

leocelis

Hi! My name is Leo Celis. I’m an entrepreneur and Python developer specialized in Ad Tech and MarTech.

read more