Every time you post a new article or upload a new video, you are creating a new object on the web. Google will index its URL, and Facebook will add it to the Open Graph.
Keeping track of your web objects inventory can help you to size how big is your website (the more content, the better) and check how it is indexed in Google, Facebook or any other social network.
Scrapy is a pretty useful python library to crawl domains. I’ve modified Kevin Jacobs’ implementation to get only the links and titles from the domain you want to scan.
Here are the installation steps:
pip3 install scrapy
scrapy startproject links_mapper
cd links_mapper
scrapy genspider leocelis leocelis.com
Then you need to modify items.py and leocelis.py (or whatever you named your spider.)
Once you are done you can run:
scrapy crawl leocelis -o links.csv -t csv
This last command will generate a links.csv file with a list of all the URLs and titles from the domain you’ve specified.
- Contrasting Traditional vs. Remote Team Management Tactics - 11/20/24
- The Role of Color in Brand Identity - 10/23/24
- Human-in-the-Loop for Bias Mitigation - 10/16/24