You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
You doing nice work in this repo. I have the same desire: different message queues should be supported in scrapy.
Old implementations of this idea and one you have here share common disadvantage. For every type of queue you need to implement separate scheduler. Beside amount of work required such implementations can't use work done on improvement of scheduling. I am talking mostly about scrapy/scrapy#3520. The reason for going distributed(at least for me) is a lot of domains in a single crawl. Not using DownloaderAwarePriorityQueue makes crawling slower(like 10 times slower) according to benchmarks in mentioned PR.
To overcome this situation I developed and merged in scrapy/scrapy#3884 separation between logic of scheduler and external message queue.
It would be great for your project and scrapy community if you change from scheduler-based to queue-based.
I agree with you. It looks so great that we can implement different message queues without implement different schedulers. I am tired of those DRY's problems. 😫
I have read the issues and PRs that your mention, they are very valuable. I will try to use DownloaderAwarePriorityQueue and queue-based implementation. That would be great for me to implement some modules in the future. 😸
In the end, thank you for your contributions to the Scrapy project. 😸
You doing nice work in this repo. I have the same desire: different message queues should be supported in scrapy.
Old implementations of this idea and one you have here share common disadvantage. For every type of queue you need to implement separate scheduler. Beside amount of work required such implementations can't use work done on improvement of scheduling. I am talking mostly about scrapy/scrapy#3520. The reason for going distributed(at least for me) is a lot of domains in a single crawl. Not using DownloaderAwarePriorityQueue makes crawling slower(like 10 times slower) according to benchmarks in mentioned PR.
To overcome this situation I developed and merged in scrapy/scrapy#3884 separation between logic of scheduler and external message queue.
It would be great for your project and scrapy community if you change from scheduler-based to queue-based.
Uh oh!
There was an error while loading. Please reload this page.
Hi @Insutanto
You doing nice work in this repo. I have the same desire: different message queues should be supported in scrapy.
Old implementations of this idea and one you have here share common disadvantage. For every type of queue you need to implement separate scheduler. Beside amount of work required such implementations can't use work done on improvement of scheduling. I am talking mostly about scrapy/scrapy#3520. The reason for going distributed(at least for me) is a lot of domains in a single crawl. Not using
DownloaderAwarePriorityQueue
makes crawling slower(like 10 times slower) according to benchmarks in mentioned PR.To overcome this situation I developed and merged in scrapy/scrapy#3884 separation between logic of scheduler and external message queue.
It would be great for your project and scrapy community if you change from scheduler-based to queue-based.
More details and discussions can be find in scrapy/scrapy#4326. Example of such implementation for redis you can find in https://github.com/whalebot-helmsman/scrapy/blob/redis/scrapy/squeues.py#L101-L173 .
Also there is a PR for external queue protocol scrapy/scrapy#4783
The text was updated successfully, but these errors were encountered: