2025-04-13 03:11:14 [scrapy.utils.log] (PID: 57) INFO: Scrapy 2.12.0 started (bot: catalog_extraction) 2025-04-13 03:11:14 [scrapy.utils.log] (PID: 57) INFO: Versions: lxml 5.3.1.0, libxml2 2.12.9, cssselect 1.3.0, parsel 1.10.0, w3lib 2.3.1, Twisted 24.11.0, Python 3.11.12 (main, Apr 9 2025, 18:23:23) [GCC 12.2.0], pyOpenSSL 25.0.0 (OpenSSL 3.4.1 11 Feb 2025), cryptography 44.0.2, Platform Linux-6.9.12-x86_64-with-glibc2.36 2025-04-13 03:11:15 [on_time_supplies] (PID: 57) INFO: Starting extraction spider on_time_supplies... 2025-04-13 03:11:15 [scrapy.addons] (PID: 57) INFO: Enabled addons: [] 2025-04-13 03:11:15 [py.warnings] (PID: 57) WARNING: /usr/local/lib/python3.11/site-packages/scrapy/utils/request.py:120: ScrapyDeprecationWarning: 'REQUEST_FINGERPRINTER_IMPLEMENTATION' is a deprecated setting. It will be removed in a future version of Scrapy. return cls(crawler) 2025-04-13 03:11:15 [scrapy.extensions.telnet] (PID: 57) INFO: Telnet Password: cbe2eb9b675639be 2025-04-13 03:11:15 [py.warnings] (PID: 57) WARNING: /var/lib/scrapyd/eggs/catalog_extraction/1744501546.egg/catalog_extraction/extensions/bq_feedstorage.py:33: ScrapyDeprecationWarning: scrapy.extensions.feedexport.build_storage() is deprecated, call the builder directly. 2025-04-13 03:11:15 [scrapy.middleware] (PID: 57) INFO: Enabled extensions: ['scrapy.extensions.corestats.CoreStats', 'scrapy.extensions.telnet.TelnetConsole', 'scrapy.extensions.closespider.CloseSpider', 'scrapy.extensions.feedexport.FeedExporter', 'scrapy.extensions.logstats.LogStats', 'scrapy_playwright.memusage.ScrapyPlaywrightMemoryUsageExtension', 'spidermon.contrib.scrapy.extensions.Spidermon'] 2025-04-13 03:11:15 [scrapy.crawler] (PID: 57) INFO: Overridden settings: {'BOT_NAME': 'catalog_extraction', 'CONCURRENT_ITEMS': 250, 'CONCURRENT_REQUESTS': 24, 'FEED_EXPORT_ENCODING': 'utf-8', 'HTTPPROXY_ENABLED': False, 'LOG_FILE': '/var/lib/scrapyd/logs/catalog_extraction/on_time_supplies/f168f4c4181411f0ada74200a9fe0102.log', 'LOG_FORMAT': '%(asctime)s [%(name)s] (PID: %(process)d) %(levelname)s: ' '%(message)s', 'LOG_LEVEL': 'INFO', 'NEWSPIDER_MODULE': 'catalog_extraction.spiders', 'REQUEST_FINGERPRINTER_CLASS': 'scrapy_poet.ScrapyPoetRequestFingerprinter', 'REQUEST_FINGERPRINTER_IMPLEMENTATION': '2.7', 'SPIDER_MODULES': ['catalog_extraction.spiders'], 'TWISTED_REACTOR': 'twisted.internet.asyncioreactor.AsyncioSelectorReactor', 'USER_AGENT': None} 2025-04-13 03:11:15 [scrapy_poet.injection] (PID: 57) INFO: Loading providers: [, , , , , , ] 2025-04-13 03:11:15 [scrapy.middleware] (PID: 57) INFO: Enabled downloader middlewares: ['scrapy.downloadermiddlewares.offsite.OffsiteMiddleware', 'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', 'scraping_utils.middlewares.downloaders.HeadersSpooferDownloaderMiddleware', 'scrapy_poet.InjectionMiddleware', 'scrapy.downloadermiddlewares.retry.RetryMiddleware', 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', 'scrapy_poet.DownloaderStatsMiddleware'] 2025-04-13 03:11:15 [NotFoundHandlerSpiderMiddleware] (PID: 57) INFO: NotFoundHandlerSpiderMiddleware running on PRODUCTION environment. 2025-04-13 03:11:15 [scrapy.middleware] (PID: 57) INFO: Enabled spider middlewares: ['catalog_extraction.middlewares.NotFoundHandlerSpiderMiddleware', 'catalog_extraction.middlewares.FixtureSavingMiddleware', 'scrapy_poet.RetryMiddleware', 'scrapy.spidermiddlewares.referer.RefererMiddleware', 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', 'scrapy.spidermiddlewares.depth.DepthMiddleware'] 2025-04-13 03:11:15 [scrapy.middleware] (PID: 57) INFO: Enabled item pipelines: ['catalog_extraction.pipelines.DuplicatedSKUsFilterPipeline', 'catalog_extraction.pipelines.DiscontinuedProductsAdjustmentPipeline', 'catalog_extraction.pipelines.PriceRoundingPipeline', 'scraping_utils.pipelines.AttachSupplierPipeline', 'spidermon.contrib.scrapy.pipelines.ItemValidationPipeline'] 2025-04-13 03:11:15 [scrapy.core.engine] (PID: 57) INFO: Spider opened 2025-04-13 03:11:15 [scrapy.extensions.closespider] (PID: 57) INFO: Spider will stop when no items are produced after 7200 seconds. 2025-04-13 03:11:15 [scrapy.extensions.logstats] (PID: 57) INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2025-04-13 03:11:15 [scrapy.extensions.telnet] (PID: 57) INFO: Telnet console listening on 127.0.0.1:6024 2025-04-13 03:11:15 [scrapy-playwright] (PID: 57) INFO: Starting download handler 2025-04-13 03:11:15 [scrapy-playwright] (PID: 57) INFO: Starting download handler 2025-04-13 03:11:25 [scrapy-playwright] (PID: 57) INFO: Launching browser chromium 2025-04-13 03:11:25 [scrapy-playwright] (PID: 57) INFO: Launching browser chromium 2025-04-13 03:11:25 [scrapy-playwright] (PID: 57) INFO: Launching browser chromium 2025-04-13 03:11:25 [scrapy.core.scraper] (PID: 57) ERROR: Error downloading Traceback (most recent call last): File "/usr/local/lib/python3.11/site-packages/twisted/internet/defer.py", line 2013, in _inlineCallbacks result = context.run( ^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/twisted/python/failure.py", line 467, in throwExceptionIntoGenerator return g.throw(self.value.with_traceback(self.tb)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/scrapy/core/downloader/middleware.py", line 68, in process_request return (yield download_func(request, spider)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/twisted/internet/defer.py", line 1253, in adapt extracted: _SelfResultT | Failure = result.result() ^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/scrapy_playwright/handler.py", line 380, in _download_request return await self._download_request_with_retry(request=request, spider=spider) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/scrapy_playwright/handler.py", line 399, in _download_request_with_retry page = await self._create_page(request=request, spider=spider) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/scrapy_playwright_stealth/handler.py", line 38, in _create_page page = await super()._create_page(request, spider) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/scrapy_playwright/handler.py", line 297, in _create_page ctx_wrapper = await self._create_browser_context( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/scrapy_playwright/handler.py", line 258, in _create_browser_context await self._maybe_launch_browser() File "/usr/local/lib/python3.11/site-packages/scrapy_playwright/handler.py", line 206, in _maybe_launch_browser self.browser = await self.browser_type.launch(**self.config.launch_options) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/playwright/async_api/_generated.py", line 14450, in launch await self._impl_obj.launch( File "/usr/local/lib/python3.11/site-packages/playwright/_impl/_browser_type.py", line 96, in launch Browser, from_channel(await self._channel.send("launch", params)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/playwright/_impl/_connection.py", line 61, in send return await self._connection.wrap_api_call( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/playwright/_impl/_connection.py", line 528, in wrap_api_call raise rewrite_error(error, f"{parsed_st['apiName']}: {error}") from None playwright._impl._errors.Error: BrowserType.launch: Chromium distribution 'chrome' is not found at /opt/google/chrome/chrome Run "playwright install chrome" 2025-04-13 03:11:25 [scrapy.core.scraper] (PID: 57) ERROR: Error downloading Traceback (most recent call last): File "/usr/local/lib/python3.11/site-packages/twisted/internet/defer.py", line 2013, in _inlineCallbacks result = context.run( ^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/twisted/python/failure.py", line 467, in throwExceptionIntoGenerator return g.throw(self.value.with_traceback(self.tb)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/scrapy/core/downloader/middleware.py", line 68, in process_request return (yield download_func(request, spider)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/twisted/internet/defer.py", line 1253, in adapt extracted: _SelfResultT | Failure = result.result() ^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/scrapy_playwright/handler.py", line 380, in _download_request return await self._download_request_with_retry(request=request, spider=spider) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/scrapy_playwright/handler.py", line 399, in _download_request_with_retry page = await self._create_page(request=request, spider=spider) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/scrapy_playwright_stealth/handler.py", line 38, in _create_page page = await super()._create_page(request, spider) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/scrapy_playwright/handler.py", line 297, in _create_page ctx_wrapper = await self._create_browser_context( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/scrapy_playwright/handler.py", line 258, in _create_browser_context await self._maybe_launch_browser() File "/usr/local/lib/python3.11/site-packages/scrapy_playwright/handler.py", line 206, in _maybe_launch_browser self.browser = await self.browser_type.launch(**self.config.launch_options) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/playwright/async_api/_generated.py", line 14450, in launch await self._impl_obj.launch( File "/usr/local/lib/python3.11/site-packages/playwright/_impl/_browser_type.py", line 96, in launch Browser, from_channel(await self._channel.send("launch", params)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/playwright/_impl/_connection.py", line 61, in send return await self._connection.wrap_api_call( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/playwright/_impl/_connection.py", line 528, in wrap_api_call raise rewrite_error(error, f"{parsed_st['apiName']}: {error}") from None playwright._impl._errors.Error: BrowserType.launch: Chromium distribution 'chrome' is not found at /opt/google/chrome/chrome Run "playwright install chrome" 2025-04-13 03:11:25 [scrapy.core.scraper] (PID: 57) ERROR: Error downloading Traceback (most recent call last): File "/usr/local/lib/python3.11/site-packages/twisted/internet/defer.py", line 2013, in _inlineCallbacks result = context.run( ^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/twisted/python/failure.py", line 467, in throwExceptionIntoGenerator return g.throw(self.value.with_traceback(self.tb)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/scrapy/core/downloader/middleware.py", line 68, in process_request return (yield download_func(request, spider)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/twisted/internet/defer.py", line 1253, in adapt extracted: _SelfResultT | Failure = result.result() ^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/scrapy_playwright/handler.py", line 380, in _download_request return await self._download_request_with_retry(request=request, spider=spider) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/scrapy_playwright/handler.py", line 399, in _download_request_with_retry page = await self._create_page(request=request, spider=spider) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/scrapy_playwright_stealth/handler.py", line 38, in _create_page page = await super()._create_page(request, spider) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/scrapy_playwright/handler.py", line 297, in _create_page ctx_wrapper = await self._create_browser_context( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/scrapy_playwright/handler.py", line 258, in _create_browser_context await self._maybe_launch_browser() File "/usr/local/lib/python3.11/site-packages/scrapy_playwright/handler.py", line 206, in _maybe_launch_browser self.browser = await self.browser_type.launch(**self.config.launch_options) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/playwright/async_api/_generated.py", line 14450, in launch await self._impl_obj.launch( File "/usr/local/lib/python3.11/site-packages/playwright/_impl/_browser_type.py", line 96, in launch Browser, from_channel(await self._channel.send("launch", params)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/playwright/_impl/_connection.py", line 61, in send return await self._connection.wrap_api_call( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/playwright/_impl/_connection.py", line 528, in wrap_api_call raise rewrite_error(error, f"{parsed_st['apiName']}: {error}") from None playwright._impl._errors.Error: BrowserType.launch: Chromium distribution 'chrome' is not found at /opt/google/chrome/chrome Run "playwright install chrome" 2025-04-13 03:12:15 [scrapy.extensions.logstats] (PID: 57) INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2025-04-13 03:13:03 [scrapy.crawler] (PID: 57) INFO: Received SIGINT, shutting down gracefully. Send again to force 2025-04-13 03:13:03 [scrapy.core.engine] (PID: 57) INFO: Closing spider (shutdown) 2025-04-13 03:13:15 [scrapy.extensions.logstats] (PID: 57) INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2025-04-13 03:14:15 [scrapy.extensions.logstats] (PID: 57) INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2025-04-13 03:15:15 [scrapy.extensions.logstats] (PID: 57) INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2025-04-13 03:16:15 [scrapy.extensions.logstats] (PID: 57) INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2025-04-13 03:17:15 [scrapy.extensions.logstats] (PID: 57) INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2025-04-13 03:18:15 [scrapy.extensions.logstats] (PID: 57) INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2025-04-13 03:19:15 [scrapy.extensions.logstats] (PID: 57) INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2025-04-13 03:20:15 [scrapy.extensions.logstats] (PID: 57) INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2025-04-13 03:21:15 [scrapy.extensions.logstats] (PID: 57) INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2025-04-13 03:21:24 [scrapy.crawler] (PID: 57) INFO: Received SIGINT twice, forcing unclean shutdown