raena-crawler-engine/tokopedia_crawler_engine
Shariar Imtiaz 3154eec5ab first commit 2024-01-24 17:05:07 +04:00
..
.gitignore first commit 2024-01-24 17:05:07 +04:00
Readme.md first commit 2024-01-24 17:05:07 +04:00
conf.json.sample first commit 2024-01-24 17:05:07 +04:00
tokopedia_api.py first commit 2024-01-24 17:05:07 +04:00
tokopedia_config.py first commit 2024-01-24 17:05:07 +04:00
tokopedia_crawler.py first commit 2024-01-24 17:05:07 +04:00
tokopedia_db_migrations.py first commit 2024-01-24 17:05:07 +04:00
tokopedia_db_writer.py first commit 2024-01-24 17:05:07 +04:00
tokopedia_logger.py first commit 2024-01-24 17:05:07 +04:00
tokopedia_product_list.py first commit 2024-01-24 17:05:07 +04:00
tokopedia_products.py first commit 2024-01-24 17:05:07 +04:00
tokopedia_sub_categories.py first commit 2024-01-24 17:05:07 +04:00
zyte-proxy-ca.crt first commit 2024-01-24 17:05:07 +04:00

Readme.md

Run:

  • run "python tokopedia_crawler.py"

Configuration:

Notes:

  • Cronjob can be setup for 'Master' to run every 1 minute.
  • It is expected to capture all product urls in ~107 minutes.
  • It makes only 2 API calls per minute(3 in the first minute) to prevent IP blocking.
  • Infinite slaves can be added.