2025-04-12 22:56:06 [scrapy.utils.log] (PID: 40) INFO: Scrapy 2.12.0 started (bot: catalog_extraction) 2025-04-12 22:56:06 [scrapy.utils.log] (PID: 40) INFO: Versions: lxml 5.3.1.0, libxml2 2.12.9, cssselect 1.3.0, parsel 1.10.0, w3lib 2.3.1, Twisted 24.11.0, Python 3.11.12 (main, Apr 9 2025, 18:23:23) [GCC 12.2.0], pyOpenSSL 25.0.0 (OpenSSL 3.4.1 11 Feb 2025), cryptography 44.0.2, Platform Linux-6.9.12-x86_64-with-glibc2.36 2025-04-12 22:56:06 [grainger] (PID: 40) INFO: Starting extraction spider grainger... 2025-04-12 22:56:06 [scrapy.addons] (PID: 40) INFO: Enabled addons: [] 2025-04-12 22:56:06 [py.warnings] (PID: 40) WARNING: /usr/local/lib/python3.11/site-packages/scrapy/utils/request.py:120: ScrapyDeprecationWarning: 'REQUEST_FINGERPRINTER_IMPLEMENTATION' is a deprecated setting. It will be removed in a future version of Scrapy. return cls(crawler) 2025-04-12 22:56:06 [scrapy.extensions.telnet] (PID: 40) INFO: Telnet Password: 2c8b3a759619001a 2025-04-12 22:56:06 [py.warnings] (PID: 40) WARNING: /var/lib/scrapyd/eggs/catalog_extraction/1744498345.egg/catalog_extraction/extensions/bq_feedstorage.py:33: ScrapyDeprecationWarning: scrapy.extensions.feedexport.build_storage() is deprecated, call the builder directly. 2025-04-12 22:56:07 [scrapy.middleware] (PID: 40) INFO: Enabled extensions: ['scrapy.extensions.corestats.CoreStats', 'scrapy.extensions.telnet.TelnetConsole', 'scrapy.extensions.closespider.CloseSpider', 'scrapy.extensions.feedexport.FeedExporter', 'scrapy.extensions.logstats.LogStats', 'scrapy_playwright.memusage.ScrapyPlaywrightMemoryUsageExtension', 'spidermon.contrib.scrapy.extensions.Spidermon'] 2025-04-12 22:56:07 [scrapy.crawler] (PID: 40) INFO: Overridden settings: {'BOT_NAME': 'catalog_extraction', 'CONCURRENT_ITEMS': 250, 'CONCURRENT_REQUESTS': 3, 'FEED_EXPORT_ENCODING': 'utf-8', 'HTTPPROXY_ENABLED': False, 'LOG_FILE': '/var/lib/scrapyd/logs/catalog_extraction/grainger/4d4075e817f111f083b84200a9fe0102.log', 'LOG_FORMAT': '%(asctime)s [%(name)s] (PID: %(process)d) %(levelname)s: ' '%(message)s', 'LOG_LEVEL': 'INFO', 'NEWSPIDER_MODULE': 'catalog_extraction.spiders', 'REQUEST_FINGERPRINTER_CLASS': 'scrapy_poet.ScrapyPoetRequestFingerprinter', 'REQUEST_FINGERPRINTER_IMPLEMENTATION': '2.7', 'SPIDER_MODULES': ['catalog_extraction.spiders'], 'TWISTED_REACTOR': 'twisted.internet.asyncioreactor.AsyncioSelectorReactor', 'USER_AGENT': None} 2025-04-12 22:56:08 [scrapy-playwright] (PID: 40) WARNING: Connecting to remote browser, ignoring PLAYWRIGHT_LAUNCH_OPTIONS 2025-04-12 22:56:08 [scrapy-playwright] (PID: 40) WARNING: Connecting to remote browser, ignoring PLAYWRIGHT_LAUNCH_OPTIONS 2025-04-12 22:56:08 [scrapy-playwright] (PID: 40) WARNING: Connecting to remote browser, ignoring PLAYWRIGHT_LAUNCH_OPTIONS 2025-04-12 22:56:08 [scrapy-playwright] (PID: 40) WARNING: Connecting to remote browser, ignoring PLAYWRIGHT_LAUNCH_OPTIONS 2025-04-12 22:56:09 [scrapy_poet.injection] (PID: 40) INFO: Loading providers: [, , , , , , ] 2025-04-12 22:56:09 [scrapy.middleware] (PID: 40) INFO: Enabled downloader middlewares: ['scrapy.downloadermiddlewares.offsite.OffsiteMiddleware', 'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', 'scraping_utils.middlewares.downloaders.HeadersSpooferDownloaderMiddleware', 'scrapy_poet.InjectionMiddleware', 'scrapy.downloadermiddlewares.retry.RetryMiddleware', 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', 'scrapy_poet.DownloaderStatsMiddleware'] 2025-04-12 22:56:09 [NotFoundHandlerSpiderMiddleware] (PID: 40) INFO: NotFoundHandlerSpiderMiddleware running on PRODUCTION environment. 2025-04-12 22:56:09 [scrapy.middleware] (PID: 40) INFO: Enabled spider middlewares: ['catalog_extraction.middlewares.NotFoundHandlerSpiderMiddleware', 'catalog_extraction.middlewares.FixtureSavingMiddleware', 'scrapy_poet.RetryMiddleware', 'scrapy.spidermiddlewares.referer.RefererMiddleware', 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', 'scrapy.spidermiddlewares.depth.DepthMiddleware'] 2025-04-12 22:56:09 [scrapy.middleware] (PID: 40) INFO: Enabled item pipelines: ['catalog_extraction.pipelines.DuplicatedSKUsFilterPipeline', 'catalog_extraction.pipelines.DiscontinuedProductsAdjustmentPipeline', 'catalog_extraction.pipelines.PriceRoundingPipeline', 'scraping_utils.pipelines.AttachSupplierPipeline', 'spidermon.contrib.scrapy.pipelines.ItemValidationPipeline'] 2025-04-12 22:56:09 [scrapy.core.engine] (PID: 40) INFO: Spider opened 2025-04-12 22:56:09 [scrapy.extensions.closespider] (PID: 40) INFO: Spider will stop when no items are produced after 7200 seconds. 2025-04-12 22:56:09 [scrapy.extensions.logstats] (PID: 40) INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2025-04-12 22:56:09 [scrapy.extensions.telnet] (PID: 40) INFO: Telnet console listening on 127.0.0.1:6023 2025-04-12 22:56:09 [scrapy-playwright] (PID: 40) INFO: Starting download handler 2025-04-12 22:56:09 [scrapy-playwright] (PID: 40) INFO: Starting download handler 2025-04-12 22:56:47 [scrapy-playwright] (PID: 40) INFO: Connecting using CDP: wss://brd-customer-hl_13cda1e4-zone-main_scraping_browser:l9p73ctebkrc@brd.superproxy.io:9222 2025-04-12 22:56:48 [scrapy-playwright] (PID: 40) INFO: Connected using CDP: wss://brd-customer-hl_13cda1e4-zone-main_scraping_browser:l9p73ctebkrc@brd.superproxy.io:9222 2025-04-12 22:56:49 [grainger] (PID: 40) WARNING: Page from CDP session 1 closed in the errback 2025-04-12 22:56:49 [grainger] (PID: 40) WARNING: Page from CDP session 0 closed in the errback 2025-04-12 22:56:51 [grainger] (PID: 40) WARNING: Page from CDP session 2 closed in the errback 2025-04-12 22:56:56 [grainger] (PID: 40) INFO: CDP session 5 parsing 2025-04-12 22:56:57 [grainger] (PID: 40) WARNING: Invalid session for https://www.grainger.com/product/KEYSTONE-FABRICS-Motorized-Remote-Cellular-827LW6: Challenge response 2025-04-12 22:57:00 [grainger] (PID: 40) INFO: CDP session 6 parsing 2025-04-12 22:57:00 [grainger] (PID: 40) WARNING: Invalid session for https://www.grainger.com/product/KEYSTONE-FABRICS-Motorized-Remote-Cellular-827LW1: Challenge response 2025-04-12 22:57:04 [grainger] (PID: 40) INFO: CDP session 5 parsing 2025-04-12 22:57:04 [grainger] (PID: 40) WARNING: Invalid session for https://www.grainger.com/product/KEYSTONE-FABRICS-Motorized-Remote-Cellular-827LW6: Challenge response 2025-04-12 22:57:07 [grainger] (PID: 40) INFO: CDP session 6 parsing 2025-04-12 22:57:08 [grainger] (PID: 40) WARNING: Invalid session for https://www.grainger.com/product/KEYSTONE-FABRICS-Motorized-Remote-Cellular-827LW1: Challenge response 2025-04-12 22:57:09 [scrapy.extensions.logstats] (PID: 40) INFO: Crawled 4 pages (at 4 pages/min), scraped 0 items (at 0 items/min) 2025-04-12 22:57:11 [grainger] (PID: 40) INFO: CDP session 5 parsing 2025-04-12 22:57:11 [grainger] (PID: 40) WARNING: Invalid session for https://www.grainger.com/product/KEYSTONE-FABRICS-Motorized-Remote-Cellular-827LW6: Challenge response 2025-04-12 22:57:11 [scrapy.downloadermiddlewares.retry] (PID: 40) ERROR: Gave up retrying (failed 3 times): Invalid session 2025-04-12 22:57:15 [grainger] (PID: 40) INFO: CDP session 6 parsing 2025-04-12 22:57:15 [grainger] (PID: 40) WARNING: Invalid session for https://www.grainger.com/product/KEYSTONE-FABRICS-Motorized-Remote-Cellular-827LW1: Challenge response 2025-04-12 22:57:15 [scrapy.downloadermiddlewares.retry] (PID: 40) ERROR: Gave up retrying (failed 3 times): Invalid session 2025-04-12 22:58:09 [scrapy.extensions.logstats] (PID: 40) INFO: Crawled 6 pages (at 2 pages/min), scraped 0 items (at 0 items/min) 2025-04-12 22:59:09 [scrapy.extensions.logstats] (PID: 40) INFO: Crawled 6 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2025-04-12 23:00:09 [scrapy.extensions.logstats] (PID: 40) INFO: Crawled 6 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2025-04-12 23:01:09 [scrapy.extensions.logstats] (PID: 40) INFO: Crawled 6 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2025-04-12 23:02:09 [scrapy.extensions.logstats] (PID: 40) INFO: Crawled 6 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2025-04-12 23:03:09 [scrapy.extensions.logstats] (PID: 40) INFO: Crawled 6 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2025-04-12 23:04:09 [scrapy.extensions.logstats] (PID: 40) INFO: Crawled 6 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2025-04-12 23:05:09 [scrapy.extensions.logstats] (PID: 40) INFO: Crawled 6 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2025-04-12 23:06:09 [scrapy.extensions.logstats] (PID: 40) INFO: Crawled 6 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2025-04-12 23:07:09 [scrapy.extensions.logstats] (PID: 40) INFO: Crawled 6 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2025-04-12 23:08:09 [scrapy.extensions.logstats] (PID: 40) INFO: Crawled 6 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2025-04-12 23:09:09 [scrapy.extensions.logstats] (PID: 40) INFO: Crawled 6 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2025-04-12 23:10:09 [scrapy.extensions.logstats] (PID: 40) INFO: Crawled 6 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2025-04-12 23:11:09 [scrapy.extensions.logstats] (PID: 40) INFO: Crawled 6 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2025-04-12 23:12:09 [scrapy.extensions.logstats] (PID: 40) INFO: Crawled 6 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2025-04-12 23:13:09 [scrapy.extensions.logstats] (PID: 40) INFO: Crawled 6 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2025-04-12 23:14:09 [scrapy.extensions.logstats] (PID: 40) INFO: Crawled 6 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2025-04-12 23:15:09 [scrapy.extensions.logstats] (PID: 40) INFO: Crawled 6 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2025-04-12 23:16:09 [scrapy.extensions.logstats] (PID: 40) INFO: Crawled 6 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2025-04-12 23:17:09 [scrapy.extensions.logstats] (PID: 40) INFO: Crawled 6 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2025-04-12 23:18:09 [scrapy.extensions.logstats] (PID: 40) INFO: Crawled 6 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2025-04-12 23:19:09 [scrapy.extensions.logstats] (PID: 40) INFO: Crawled 6 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2025-04-12 23:20:09 [scrapy.extensions.logstats] (PID: 40) INFO: Crawled 6 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2025-04-12 23:21:09 [scrapy.extensions.logstats] (PID: 40) INFO: Crawled 6 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2025-04-12 23:21:56 [scrapy-playwright] (PID: 40) INFO: Connecting using CDP: wss://brd-customer-hl_13cda1e4-zone-main_scraping_browser:l9p73ctebkrc@brd.superproxy.io:9222 2025-04-12 23:21:57 [scrapy-playwright] (PID: 40) INFO: Connected using CDP: wss://brd-customer-hl_13cda1e4-zone-main_scraping_browser:l9p73ctebkrc@brd.superproxy.io:9222 2025-04-12 23:22:01 [grainger] (PID: 40) WARNING: Page from CDP session 7 closed in the errback 2025-04-12 23:22:09 [scrapy.extensions.logstats] (PID: 40) INFO: Crawled 6 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2025-04-12 23:22:36 [grainger] (PID: 40) INFO: CDP session 8 parsing 2025-04-12 23:22:37 [grainger] (PID: 40) WARNING: URL https://www.grainger.com/product/Socket-Head-Cap-Screw-M8-1-53GG96 not found in the scheduled URLs 2025-04-12 23:22:41 [scrapy.core.scraper] (PID: 40) ERROR: Spider error processing (referer: http://www.sogou.com/) Traceback (most recent call last): File "/usr/local/lib/python3.11/site-packages/scrapy/utils/defer.py", line 346, in aiter_errback yield await it.__anext__() ^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/scrapy/utils/python.py", line 394, in __anext__ return await self.data.__anext__() ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/scrapy/utils/python.py", line 375, in _async_chain async for o in as_async_generator(it): File "/usr/local/lib/python3.11/site-packages/scrapy/utils/asyncgen.py", line 21, in as_async_generator async for r in it: File "/usr/local/lib/python3.11/site-packages/scrapy/utils/python.py", line 394, in __anext__ return await self.data.__anext__() ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/scrapy/utils/python.py", line 375, in _async_chain async for o in as_async_generator(it): File "/usr/local/lib/python3.11/site-packages/scrapy/utils/asyncgen.py", line 21, in as_async_generator async for r in it: File "/usr/local/lib/python3.11/site-packages/scrapy/core/spidermw.py", line 121, in process_async async for r in iterable: File "/usr/local/lib/python3.11/site-packages/scrapy/spidermiddlewares/referer.py", line 384, in process_spider_output_async async for r in result: File "/usr/local/lib/python3.11/site-packages/scrapy/core/spidermw.py", line 121, in process_async async for r in iterable: File "/usr/local/lib/python3.11/site-packages/scrapy/spidermiddlewares/urllength.py", line 62, in process_spider_output_async async for r in result: File "/usr/local/lib/python3.11/site-packages/scrapy/core/spidermw.py", line 121, in process_async async for r in iterable: File "/usr/local/lib/python3.11/site-packages/scrapy/spidermiddlewares/depth.py", line 60, in process_spider_output_async async for r in result: File "/usr/local/lib/python3.11/site-packages/scrapy/core/spidermw.py", line 121, in process_async async for r in iterable: File "/var/lib/scrapyd/eggs/catalog_extraction/1744498345.egg/catalog_extraction/spiders/grainger.py", line 185, in parse async for item in new_page.get_items(): File "/var/lib/scrapyd/eggs/catalog_extraction/1744498345.egg/catalog_extraction/pages/grainger.py", line 54, in get_items async for item in GraingerMultiProductPageObject(self.response).get_items(): File "/var/lib/scrapyd/eggs/catalog_extraction/1744498345.egg/catalog_extraction/pages/grainger.py", line 86, in get_items yield await page.to_item() ^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/web_poet/pages.py", line 81, in _to_item validation_item = self._validate_input() ^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/web_poet/utils.py", line 205, in inner return cached_meth(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/web_poet/pages.py", line 140, in _validate_input validation_item = self.validate_input() # type: ignore[attr-defined] ^^^^^^^^^^^^^^^^^^^^^ File "/var/lib/scrapyd/eggs/catalog_extraction/1744498345.egg/catalog_extraction/pages/__init__.py", line 33, in validate_input raise NotProductPage(f"URL {self.url} landed on page that is not a product page.") scraping_utils.common.exceptions.NotProductPage: URL https://www.grainger.com/product/info?productArray=1LRK7,497V41,5RA80,60KM79,497V34,55PW62,42JJ36,34NN36,60ZL09,1LRF5 landed on page that is not a product page. 2025-04-12 23:22:44 [grainger] (PID: 40) INFO: CDP session 9 parsing 2025-04-12 23:22:44 [grainger] (PID: 40) WARNING: Invalid session for https://www.grainger.com/product/Aluminum-Round-Tube-6061-Aluminum-795CF4: Challenge response 2025-04-12 23:22:49 [grainger] (PID: 40) INFO: CDP session 9 parsing 2025-04-12 23:22:49 [grainger] (PID: 40) WARNING: Invalid session for https://www.grainger.com/product/APPROVED-VENDOR-Aluminum-Round-Tube-6061-Aluminum-795CF4: Challenge response 2025-04-12 23:22:55 [grainger] (PID: 40) INFO: CDP session 9 parsing 2025-04-12 23:22:55 [grainger] (PID: 40) WARNING: Invalid session for https://www.grainger.com/product/APPROVED-VENDOR-Aluminum-Round-Tube-6061-Aluminum-795CF4: Challenge response 2025-04-12 23:22:55 [scrapy.downloadermiddlewares.retry] (PID: 40) ERROR: Gave up retrying (failed 3 times): Invalid session 2025-04-12 23:23:08 [scrapy.crawler] (PID: 40) INFO: Received SIGINT, shutting down gracefully. Send again to force 2025-04-12 23:23:08 [scrapy.core.engine] (PID: 40) INFO: Closing spider (shutdown) 2025-04-12 23:23:09 [scrapy.extensions.logstats] (PID: 40) INFO: Crawled 10 pages (at 4 pages/min), scraped 11 items (at 11 items/min) 2025-04-12 23:23:18 [scrapy.crawler] (PID: 40) INFO: Received SIGINT twice, forcing unclean shutdown 2025-04-12 23:23:19 [asyncio] (PID: 40) ERROR: Task was destroyed but it is pending! task: wait_for= cb=[Deferred.fromFuture..adapt() at /usr/local/lib/python3.11/site-packages/twisted/internet/defer.py:1251]>