fix(scrapers): harden CSES and CF submit edge cases (#295)
Problem: CSES `_web_login` did bare dict indexing on the API response, causing an opaque `KeyError` if fields were absent. `_check_token` swallowed all exceptions as `False`, treating transient network errors as invalid tokens. CF wrote cookies unconditionally (login and submit), and swallowed `_solve_turnstile` failures in `submit_action`. Solution: Replace direct indexing with `.get()` + `RuntimeError` for missing CSES API fields. Re-raise `httpx` network/timeout exceptions from `_check_token`. Guard CF cookie writes behind an `X-User-Handle` check (the CF auth cookie). Propagate `_solve_turnstile` errors so failures surface rather than silently proceeding.
This commit is contained in:
parent
6c036a7b2e
commit
cc48c901c0
2 changed files with 12 additions and 10 deletions
|
|
@ -352,8 +352,12 @@ class CSESScraper(BaseScraper):
|
|||
f"{API_URL}/login", headers=HEADERS, timeout=HTTP_TIMEOUT
|
||||
)
|
||||
api_data = api_resp.json()
|
||||
token: str = api_data["X-Auth-Token"]
|
||||
auth_url: str = api_data["authentication_url"]
|
||||
token: str | None = api_data.get("X-Auth-Token")
|
||||
auth_url: str | None = api_data.get("authentication_url")
|
||||
if not token:
|
||||
raise RuntimeError("CSES API login response missing 'X-Auth-Token'")
|
||||
if not auth_url:
|
||||
raise RuntimeError("CSES API login response missing 'authentication_url'")
|
||||
|
||||
auth_page = await client.get(auth_url, headers=HEADERS, timeout=HTTP_TIMEOUT)
|
||||
auth_csrf = re.search(r'name="csrf_token" value="([^"]+)"', auth_page.text)
|
||||
|
|
@ -388,8 +392,8 @@ class CSESScraper(BaseScraper):
|
|||
timeout=HTTP_TIMEOUT,
|
||||
)
|
||||
return r.status_code == 200
|
||||
except Exception:
|
||||
return False
|
||||
except (httpx.ConnectError, httpx.TimeoutException, httpx.NetworkError):
|
||||
raise
|
||||
|
||||
async def submit(
|
||||
self,
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue