If you’re still running Scrapy 1.x in production, migrating to Scrapy 2.x is overdue — and the breaking changes are real enough to warrant a methodical walkthrough before you touch your spiders. Scrapy 2.0 dropped in March 2020, but plenty of teams are still on legacy 1.8 pipelines in 2026, especially those who bolted Scrapy onto older data infrastructure and never had time to revisit. This guide covers every breaking change that will actually bite you, with concrete fixes.
Why teams are still on 1.x in 2026
Legacy Scrapy 1.x installs survive mostly because they “work.” A spider scraping 50k pages a day doesn’t scream for upgrades. The common blockers are Python 2 compatibility (Scrapy 2.x dropped it entirely), in-house middleware that relied on undocumented internals, and pinned dependencies like Twisted 18.x that conflict with newer Scrapy.
The real cost is compounding: Scrapy 1.8 no longer receives security patches, asyncio integration is not available, and third-party extensions like scrapy-playwright and scrapy-impersonate require 2.x. If you are building modern pipelines with async rendering, the upgrade is not optional.
Python version and dependency floor
Scrapy 2.x requires Python 3.6 minimum; 2.11+ (current stable as of 2026) requires 3.8+. If you are on Python 2 anywhere in the stack, that migration comes first.
Key dependency changes:
| Dependency | Scrapy 1.x | Scrapy 2.x |
|---|---|---|
| Python | 2.7, 3.5+ | 3.6+ (3.8+ for 2.11) |
| Twisted | 14.1+ | 18.7+ |
| w3lib | 1.17+ | 1.21+ |
| queuelib | 1.4.2+ | 1.6.2+ |
| parsel | 1.5+ | 1.6.2+ |
Run this before anything else:
pip install "scrapy>=2.11" --dry-run 2>&1 | grep -E "ERROR|conflict"Resolve conflicts iteratively. The most common collision is Twisted pinned by another library (Celery 4.x, for example). Upgrading to Celery 5 usually unblocks it.
Breaking API changes in spiders and middleware
Request fingerprinting
Scrapy 2.7 introduced a new request fingerprinter that changed how duplicate URLs are detected. If you have a custom DUPEFILTER_CLASS or override request_fingerprint(), your spider may start re-crawling pages it already visited, or drop requests it shouldn’t.
The fix: explicitly set the fingerprint class in settings to match your old behavior until you can audit the logic.
# settings.py
REQUEST_FINGERPRINTER_IMPLEMENTATION = "2.7" # opt-in to new behavior
# or pin to legacy:
REQUEST_FINGERPRINTER_IMPLEMENTATION = "2.6"Feed exports
The FEED_URI and FEED_FORMAT settings were deprecated in 2.1 and removed in 2.6. Any spider using these will fail silently on startup and produce no output.
Replace with:
# Old (1.x)
FEED_URI = "output.json"
FEED_FORMAT = "json"
# New (2.x)
FEEDS = {
"output.json": {"format": "json"},
}The FEEDS dict supports multiple simultaneous outputs, per-feed item classes, and S3/GCS URIs natively, so it’s a net improvement.
crawl command and CloseSpider exception
The CloseSpider exception behavior changed. In 1.x, raising it inside a callback closed the spider cleanly. In 2.x, it is still supported but the reason argument is now required if you want the close reason logged correctly. Unnamed raises like raise CloseSpider still work but produce a generic log entry, which breaks monitoring scripts that parse close reasons.
Middleware and extension compatibility
This is where most migrations stall. Common patterns that break:
- Accessing
spider.crawler.enginedirectly in middleware (internals moved) - Using
request.meta['dont_redirect']while overridingprocess_responsein a custom redirect middleware (priority ordering changed) - Monkey-patching
scrapy.http.Requestattributes that are now slots
The safe audit path:
- Grep your middleware for
crawler.engine,slot.scheduler, and_next_request - Check every
process_spider_exceptionhandler — exception propagation order changed in 2.3 - Run your spider with
SCRAPY_SETTINGS_MODULEpointing to a test settings file that setsCLOSESPIDER_PAGECOUNT = 5to catch crashes fast
If you have browser-rendering middleware using Splash, migrate to scrapy-playwright instead. Splash is effectively unmaintained in 2026. The migration path is similar to what engineers face when migrating from Puppeteer to Playwright in 2026 — the mental model maps across cleanly once you understand async context handling.
Asyncio and async spider support
Scrapy 2.0 added native asyncio support via ASYNCIO_EVENT_LOOP. Scrapy 2.4+ made asyncio the default event loop on supported platforms. If you are mixing Scrapy with other async libraries (httpx, aiohttp), you can now run them in the same loop.
What breaks:
@defer.inlineCallbacksdecorators on spider methods still work but are not composable withasync defcallbacks in the same spider. Pick one pattern per spider.- If you use
CrawlerRunnerinside an existing asyncio app (FastAPI, for example), you must pass the running loop explicitly.
import asyncio
from scrapy.crawler import CrawlerRunner
from scrapy.utils.project import get_project_settings
async def run_spider(spider_cls):
settings = get_project_settings()
runner = CrawlerRunner(settings)
await runner.crawl(spider_cls)This pattern is stable in 2.11 and works cleanly with uvloop.
Testing and CI changes
A short checklist before calling the migration done:
- Replace
scrapy.testsimports withscrapy.utils.test(moved in 2.0) - Update any
ScrapyCommandsubclasses —short_desc()is now a property, not a method - Pin
testcontainersor your mock HTTP server to a version compatible with your new Twisted version - Run
scrapy checkagainst all spiders — it validates contracts and catches callback signature mismatches
If your CI pipeline runs on Docker, update the base image from python:3.7 to at minimum python:3.10-slim. Scrapy 2.11 on 3.10 is the most stable combination tested against in 2026.
One test pattern worth adding after migration:
from scrapy.http import HtmlResponse, Request
def test_parse_returns_items():
spider = MySpider()
response = HtmlResponse(
url="https://example.com",
body=b"<html><body><p>test</p></body></html>",
request=Request("https://example.com"),
)
items = list(spider.parse(response))
assert len(items) > 0This test pattern is Scrapy 2.x compatible and does not require a running Twisted reactor.
Bottom line
Migrate to Scrapy 2.x now — the asyncio support, maintained security patches, and ecosystem compatibility with modern scraping tools are worth the two to four hours the migration takes for most projects. Pin to 2.11, audit your middleware with the grep checklist above, and convert feed settings before anything else. DRT will keep covering practical migration paths like this as the scraping stack continues to evolve in 2026.