fix(scrapers): cookie fast paths, centralized storage, and reauth hardening (#363)

## Problem

Scraper cookie handling was fragmented across per-platform files with no
shared access, httpx scrapers lacked `checking_login` fast paths on
login, and several re-auth edge cases (CodeChef submit, CF cookie guard,
AtCoder cookie persistence) caused unnecessary full re-logins or silent
failures.

## Solution

Centralize all cookie storage into a single `cookies.json` via helpers
in `base.py`. Add `checking_login` fast paths to `kattis.py` (using the
`x-username` response header as a session probe), `usaco.py`, and
`cses.py` login flows. Fix `kattis.py` submit to emit `checking_login`
only after loading cookies. Remove AtCoder cookie persistence from login
entirely — always do a fresh session. Harden CodeChef and CF reauth
with consistent status logging and cookie guard checks.
This commit is contained in:
Barrett Ruth 2026-03-07 16:10:51 -05:00 committed by GitHub
parent 564f9286da
commit b7ddf4c253
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
10 changed files with 813 additions and 194 deletions

1
.gitignore vendored
View file

@ -17,3 +17,4 @@ node_modules/
.envrc
.direnv/
AI_DEBUG.md

529
AI_DEBUG.md Normal file
View file

@ -0,0 +1,529 @@
# Browser Scraper Login Debugging Guide
## Goal
Make CF, AtCoder, and CodeChef login/submit behavior IDENTICAL to Kattis.
Every log message, every pathway, zero unnecessary logins.
---
## Current Branch
`fix/scraper-browser-v2`
---
## Architecture Crash Course
### Lua side
- `credentials.lua``:CP <platform> login/logout`
- `M.login`: if credentials cached → calls `scraper.login(platform, cached_creds, on_status, cb)`
- `on_status(ev)`: logs `"<Platform>: <STATUS_MESSAGES[ev.status]>"`
- `cb(result)`: on success logs `"<Platform> login successful"`, on failure calls `prompt_and_login`
- `prompt_and_login`: prompts username+password, then same flow
- `M.logout`: clears credentials from cache + clears platform key from `~/.cache/cp-nvim/cookies.json`
- STATUS_MESSAGES: `checking_login="Checking existing session..."`, `logging_in="Logging in..."`, `installing_browser="Installing browser..."`
- `submit.lua``:CP submit`
- Gets saved creds (or prompts), calls `scraper.submit(..., on_status, cb)`
- `on_status(ev)`: logs `STATUS_MSGS[ev.status]` (no platform prefix)
- STATUS_MSGS: `checking_login="Checking login..."`, `logging_in="Logging in..."`, `submitting="Submitting..."`, `installing_browser="Installing browser (first time setup)..."`
- `scraper.lua``run_scraper(platform, subcommand, args, opts)`
- `needs_browser = subcommand == 'submit' or subcommand == 'login' or (platform == 'codeforces' and subcommand in {'metadata','tests'})`
- browser path: FHS env (`utils.get_python_submit_cmd`), 120s timeout, `UV_PROJECT_ENVIRONMENT=~/.cache/nvim/cp-nvim/submit-env`
- ndjson mode: reads stdout line by line, calls `opts.on_event(ev)` per line
- login event routing: `ev.credentials``cache.set_credentials`; `ev.status``on_status`; `ev.success` → callback
### Python side
- `base.py``BaseScraper.run_cli()` / `_run_cli_async()`
- `login` mode: reads `CP_CREDENTIALS` env, calls `self.login(credentials)`, prints `result.model_dump_json()`
- `submit` mode: reads `CP_CREDENTIALS` env, calls `self.submit(...)`, prints `result.model_dump_json()`
- ndjson status events: `print(json.dumps({"status": "..."}), flush=True)` during login/submit
- final result: `print(result.model_dump_json())` — this is what triggers `ev.success`
- `base.py` — cookie helpers
- `load_platform_cookies(platform)` → reads `~/.cache/cp-nvim/cookies.json`, returns platform key
- `save_platform_cookies(platform, data)` → writes to same file
- `clear_platform_cookies(platform)` → removes platform key from same file
- `models.py``LoginResult(success, error, credentials={})`, `SubmitResult(success, error, submission_id="", verdict="")`
---
## Kattis: The Reference Implementation
Kattis is the gold standard. Everything else must match it exactly.
### Kattis login flow (`kattis.py:login`)
1. Always emits `{"status": "logging_in"}`
2. POSTs to `/login` with credentials
3. If fail → `LoginResult(success=False, ...)`
4. If success → saves cookies, returns `LoginResult(success=True, ..., credentials={username, password})`
Lua sees: `ev.credentials` (non-empty) → `cache.set_credentials`. Then `ev.success=True``"<Platform> login successful"`.
### Kattis submit flow (`kattis.py:submit`)
```
emit checking_login
load_cookies
if no cookies:
emit logging_in
do_login → save_cookies
emit submitting
POST /submit
if 400/403 or "Request validation failed":
clear_cookies
emit logging_in
do_login → save_cookies
POST /submit (retry)
return SubmitResult
```
### Expected log sequences — CONFIRMED from Kattis live testing
**Scenario 1: login+logout+login**
```
Kattis: Logging in...
Kattis login successful
Kattis credentials cleared
Kattis: Logging in...
Kattis login successful
```
Note: after logout, login prompts for credentials again (cleared from cache).
**Scenario 2: login+login**
```
Kattis: Logging in...
Kattis login successful
Kattis: Logging in...
Kattis login successful
```
Note: second login uses cached credentials, no prompt.
**Scenario 3: submit happy path (valid cookies)**
```
Checking login...
Submitting...
Submitted successfully
```
Note: no `Logging in...` — cookies present, skip login.
**Scenario 4: bad cookie → submit** ← CONFIRMED
```
Checking login...
Submitting...
Logging in...
Submitted successfully
```
REACTIVE re-login: cookies exist so it assumes logged in, attempts submit, server rejects
(400/403), re-logins, retries submit silently (NO second `Submitting...`).
**Scenario 5: fresh start → submit (no cookies, credentials cached)**
```
Checking login...
Logging in...
Submitting...
Submitted successfully
```
Note: no cookies present → login before attempting submit.
---
### Browser scraper bad-cookie note
Browser scrapers (CF, AtCoder, CodeChef) can do a PROACTIVE check during `checking_login`
by loading cookies into the browser session and fetching the homepage to verify login state.
If proactive check works, bad cookie sequence becomes:
```
Checking login...
Logging in... ← detected bad cookie before submit attempt
Submitting...
Submitted successfully
```
This differs from Kattis (which can't proactively verify). Decide per-platform which is
correct once live testing reveals what the browser check returns on bad cookies.
The proactive sequence is PREFERRED — avoids a wasted submit attempt.
---
## Required Behavior for Browser Scrapers
Match Kattis exactly. The differences come from how login is validated:
- Kattis: cookie presence check (no real HTTP check — reactive on submit failure)
- CF/AtCoder/CodeChef: must use browser session to check login state
### Login subcommand
ALWAYS:
1. Emit `{"status": "logging_in"}`
2. Do full browser login
3. If success → save cookies, return `LoginResult(success=True, credentials={username, password})`
4. If fail → return `LoginResult(success=False, error="...")`
NO cookie fast path on login. Login always re-authenticates. (Matches Kattis.)
MUST return `credentials={username, password}` so Lua caches them.
### Submit subcommand
```
emit checking_login
load cookies
if cookies:
check if still valid (browser or HTTP)
if invalid → emit logging_in → login → save cookies
else → logged_in = True
else:
emit logging_in → login → save cookies
emit submitting
do submit
if auth failure (redirect to login):
clear cookies
emit logging_in → login → save cookies
retry submit
return SubmitResult
```
---
## Test Protocol
### Environment
Neovim: `nvim --clean -u ~/dev/cp.nvim/t/minimal_init.lua`
Clean state:
```bash
rm -f ~/.cache/cp-nvim/cookies.json
rm -f ~/.local/share/nvim/cp-nvim.json
```
## CRITICAL PROTOCOL RULES (do not skip)
1. **Bad cookie scenario is MANDATORY.** Never skip it. If user hasn't run it, stop and demand it.
Without it we cannot verify reactive re-login works. It is the hardest scenario.
2. **AI clears cookies between scenarios** using the commands below. Never ask the user to do it.
3. Do not move to the next platform until ALL 5 scenarios show correct logs.
4. Go one scenario at a time. Do not batch. Wait for user to paste logs before proceeding.
---
## Cookie File Structure
**Single unified file:** `~/.cache/cp-nvim/cookies.json`
Two formats depending on platform type:
**httpx platforms (kattis, usaco):** simple dict
```json
{"kattis": {"KattisSiteCookie": "abc123"}}
{"usaco": {"PHPSESSID": "abc123"}}
```
**Browser/playwright platforms (codeforces, atcoder, codechef):** list of playwright cookie dicts
```json
{"codeforces": [
{"domain": ".codeforces.com", "name": "X-User-Handle", "value": "dalet",
"httpOnly": false, "sameSite": "Lax", "expires": 1234567890, "secure": false, "path": "/"}
]}
```
### Cookie manipulation commands
**Inject bad cookies — httpx platforms (kattis, usaco):**
```bash
python3 -c "
import json
d = json.load(open('/home/barrett/.cache/cp-nvim/cookies.json'))
d['kattis'] = {k: 'bogus' for k in d['kattis']}
json.dump(d, open('/home/barrett/.cache/cp-nvim/cookies.json','w'))
"
```
**Inject bad cookies — playwright platforms (codeforces, atcoder, codechef):**
```bash
python3 -c "
import json
d = json.load(open('/home/barrett/.cache/cp-nvim/cookies.json'))
for c in d['codeforces']:
c['value'] = 'bogus'
json.dump(d, open('/home/barrett/.cache/cp-nvim/cookies.json','w'))
"
```
**Remove platform cookies only (keep credentials in cp-nvim.json):**
```bash
python3 -c "
import json
d = json.load(open('/home/barrett/.cache/cp-nvim/cookies.json'))
d.pop('codeforces', None)
json.dump(d, open('/home/barrett/.cache/cp-nvim/cookies.json','w'))
"
```
### Test scenarios (run in order for each platform)
Run ONE at a time. Wait for user logs. AI clears state between scenarios.
1. **login+logout+login**
- `:CP <p> login` (prompts for creds)
- `:CP <p> logout`
- `:CP <p> login` (should prompt again — creds cleared by logout)
2. **login+login**
- `:CP <p> login` (uses cached creds from step 1, no prompt)
- `:CP <p> login` (again, no prompt)
3. **submit happy path**
- AI ensures valid cookies exist (left over from login)
- `:CP submit`
- Expected: `Checking login...``Submitting...``Submitted successfully`
4. **bad cookie → submit** ← MANDATORY, never skip
- AI runs bad-cookie injection command
- `:CP submit`
- Expected: `Checking login...``Logging in...``Submitting...``Submitted successfully`
5. **fresh start → submit**
- AI removes platform cookies only (credentials remain in cp-nvim.json)
- `:CP submit`
- Expected: `Checking login...``Logging in...``Submitting...``Submitted successfully`
For each scenario: user pastes exact notification text, AI compares to Kattis reference.
### Debugging tool: headless=False
To see the browser, change `headless=True``headless=False` in the scraper.
This lets you watch exactly what the page shows when `page_action` fires.
Remember to revert after debugging.
### ABSOLUTE RULE: no waits, no timeout increases — EVER
Never add `page.wait_for_timeout()`, `time.sleep()`, or increase any timeout value to fix
a bug. If something times out, the root cause is wrong logic or wrong selector — fix that.
Increasing timeouts masks bugs and makes the UX slower. Find the real fix.
### Debugging tool: direct Python invocation
```bash
SUBMIT_CMD=$(cat ~/.cache/nvim/cp-nvim/nix-submit)
UV_PROJECT_ENVIRONMENT=~/.cache/nvim/cp-nvim/submit-env
# Login:
CP_CREDENTIALS='{"username":"USER","password":"PASS"}' \
$SUBMIT_CMD run --directory ~/dev/cp.nvim -m scrapers.codeforces login
# Submit:
CP_CREDENTIALS='{"username":"USER","password":"PASS"}' \
$SUBMIT_CMD run --directory ~/dev/cp.nvim -m scrapers.codeforces submit \
<contest_id> <problem_id> <language_id> <file_path>
```
For passwords with special chars, use a temp file:
```bash
cat > /tmp/creds.json << 'EOF'
{"username":"user","password":"p@ss!word\"with\"quotes"}
EOF
CREDS=$(cat /tmp/creds.json)
CP_CREDENTIALS="$CREDS" $SUBMIT_CMD run --directory ~/dev/cp.nvim -m scrapers.codeforces login
```
---
## Platform-Specific Notes
### Codeforces
**Credentials:** username=`dalet`, password=`y)o#oW83JlhmQ3P`
**Cookie file key:** `codeforces` (list of cookie dicts with playwright format)
**Cookie guard on save:** only saves if `X-User-Sha1` cookie present (NOT `X-User-Handle` — that cookie no longer exists). Verified 2026-03-07.
**Known issues:**
- CF has a custom Turnstile gate on `/enter`. It's a FULL PAGE redirect ("Verification"), not
an embedded widget. It POSTs to `/data/turnstile` then reloads to show the actual login form.
`page_action` is called by scrapling at page load, which may fire BEFORE the reload completes.
Fix: add `page.wait_for_selector('input[name="handleOrEmail"]', timeout=60000)` as the FIRST
line of every `login_action` that fills the CF login form.
- The same issue exists in BOTH `_login_headless_cf.login_action` and `_submit_headless.login_action`.
- The `check_login` on homepage uses `solve_cloudflare=True` (current diff). Verify this works.
- `needs_relogin` triggers if submit page redirects to `/enter` or `/login`.
**Submit page Turnstile:** The submit page (`/contest/{id}/submit`) has an EMBEDDED Turnstile
(not the full-page gate). `submit_action` correctly calls `_solve_turnstile(page)` for this.
**Cookie fast path for submit:**
- Load cookies → `StealthySession(cookies=saved_cookies)`
- If `_retried=False`: emit `checking_login`, fetch `/` with `solve_cloudflare=True`, check for "Logout"
- If not logged in: emit `logging_in`, fetch `/enter` with `solve_cloudflare=True` and `login_action`
**Test problem:** `:CP codeforces 2060` (recent educational round, has problems A-G)
**submit_action source injection:** uses `page.evaluate` to set CodeMirror + textarea directly.
This is correct — CF does not use file upload.
---
### AtCoder
**Credentials:** username=`barrettruth`, password=`vG\`kD)m31A8_`
**Cookie file key:** `atcoder` — BUT currently AtCoder NEVER saves cookies. Submit always
does a fresh full login. This is WRONG vs. Kattis model. Needs cookie fast path added.
**Current login flow:**
- `_login_headless`: Emits `logging_in`, does browser login, checks `/home` for "Sign Out".
Does NOT save cookies. This means `:CP submit` always does full login (slow, wastes Turnstile solve).
**Current submit flow:**
- `_submit_headless`: Emits `logging_in` FIRST (no `checking_login`). Always does full browser login.
No cookie fast path. This must change.
**Required submit flow (to match Kattis):**
```
emit checking_login
load_platform_cookies("atcoder")
if cookies:
StealthySession(cookies=saved_cookies)
check /home for "Sign Out"
if not logged in: emit logging_in, do browser login
else:
emit logging_in, do browser login (fresh StealthySession)
save cookies after login
emit submitting
do submit_action
if submit redirects to /login: clear cookies, retry once with full login
```
**Login flow must save cookies** so submit can use fast path.
**AtCoder Turnstile:** embedded in the login form itself (not a separate gate page).
`_solve_turnstile(page)` is called in `login_action` before filling fields. This is correct.
No `wait_for_selector` needed — the Turnstile is on the same page.
**Submit file upload:** uses `page.set_input_files("#input-open-file", {...buffer...})`.
In-memory buffer approach. Correct — no temp file needed.
**Submit nav timeout:** `BROWSER_SUBMIT_NAV_TIMEOUT["atcoder"]` currently = `BROWSER_NAV_TIMEOUT * 2` = 20s.
CLAUDE.md says it should be 40s (`* 4`). May need to increase if submit navigation is slow.
**Test problem:** `:CP atcoder abc394` (recent ABC, has problems A-G)
---
### CodeChef
**Credentials:** username=TBD, password=`pU5889'%c2IL`
**Cookie file key:** `codechef`
**Cookie guard on save:** saves any non-empty cookies — no meaningful guard. Should add one
(e.g., check for a session cookie name specific to CodeChef, or check logged_in state).
**Current login form selectors:** `input[name="name"]`, `input[name="pass"]`, `input.cc-login-btn`
These look like OLD Drupal-era selectors. Current CodeChef is React/Next.js. MUST VERIFY.
Use `headless=False` to see what the login page actually looks like.
**Current timeout:** 3000ms after clicking login button. Way too short for a React SPA navigation.
**No `solve_cloudflare`** on the login fetch. May or may not be needed. Verify with headless=False.
**`check_login` logic:** `"dashboard" in page.url or page.evaluate(_CC_CHECK_LOGIN_JS)`
where `_CC_CHECK_LOGIN_JS = "() => !!document.querySelector('a[href*=\"/users/\"]')"`.
Needs verification — does CC redirect to /dashboard after login? Does this selector exist?
**Submit flow:** has `PRACTICE_FALLBACK` logic — if contest says "not available for accepting
solutions", retries with `contest_id="PRACTICE"`. This is unique to CodeChef.
**Submit URL:** `/{contest_id}/submit/{problem_id}` or `/submit/{problem_id}` for PRACTICE.
**Submit selectors (need verification):**
- `[aria-haspopup="listbox"]` — language selector
- `[role="option"][data-value="{language_id}"]` — specific language option
- `.ace_editor` — code editor
- `#submit_btn` — submit button
**Test problem:** `:CP codechef START209` or similar recent Starters contest.
---
## Debugging Methodology
### Step-by-step for each issue
1. Identify the specific failure (wrong log, missing log, crash, wrong order)
2. Set `headless=False` to visually inspect what the browser shows
3. Run direct Python invocation to isolate from Neovim
4. Fix one thing at a time
5. Re-run ALL 5 test scenarios after each fix
6. Do NOT move to next platform until ALL 5 scenarios show correct logs
### When context runs low
Read this file first. Then read:
- `scrapers/kattis.py` — reference implementation
- `scrapers/<platform>.py` — current implementation being debugged
- `lua/cp/credentials.lua` — login Lua side
- `lua/cp/submit.lua` — submit Lua side
Current test status (update this section as work progresses):
| Scenario | Kattis | CF | AtCoder | CodeChef |
|---|---|---|---|---|
| login+logout+login | ✓ | ✓ | ? | ? |
| login+login | ✓ | ✓ | ? | ? |
| submit happy | ✓ | ✓ | ? | ? |
| bad cookie→submit | ✓ | ✓ | ? | ? |
| fresh→submit | ✓ | ✓ | ? | ? |
### CF confirmed log sequences
**login (no cookies):** `CodeForces: Logging in...``CodeForces login successful`
**login (valid cookies):** `CodeForces: Checking existing session...``CodeForces login successful`
**login (bad cookies):** `CodeForces: Checking existing session...``CodeForces: Logging in...``CodeForces login successful`
**submit happy:** `Checking login...``Submitting...``Submitted successfully`
**submit bad cookie:** `Checking login...``Logging in...``Submitting...``Submitted successfully`
**submit fresh:** `Checking login...``Logging in...``Submitting...``Submitted successfully`
Note: bad cookie and fresh start produce identical submit logs for CF (proactive check).
Kattis bad cookie is reactive (`Submitting...` before `Logging in...`). Issue #362 tracks alignment.
---
## Key Files
```
scrapers/base.py — BaseScraper, cookie helpers, run_cli
scrapers/kattis.py — REFERENCE IMPLEMENTATION
scrapers/codeforces.py — browser scraper (CF Turnstile gate issue)
scrapers/atcoder.py — browser scraper (_solve_turnstile, no cookie fast path)
scrapers/codechef.py — browser scraper (selectors unverified)
scrapers/timeouts.py — all timeout constants
lua/cp/scraper.lua — run_scraper, ndjson event routing
lua/cp/credentials.lua — login/logout commands
lua/cp/submit.lua — submit command
lua/cp/cache.lua — credential + cache storage
lua/cp/constants.lua — COOKIE_FILE, PLATFORM_DISPLAY_NAMES
t/minimal_init.lua — test Neovim config
```
---
## Open Questions (fill in as discovered)
- What are the actual CodeChef login form selectors on the current React site?
- Does CodeChef require `solve_cloudflare=True`?
- What is the correct CodeChef session cookie name to use as a guard?
- Does AtCoder cookie fast path work reliably (Cloudflare on /home without cookies)?
- What is the exact CodeChef username for credentials?
- Is `BROWSER_SUBMIT_NAV_TIMEOUT["atcoder"]` sufficient at 20s or does it need 40s?

View file

@ -219,4 +219,6 @@ M.LANGUAGE_VERSIONS = {
M.DEFAULT_VERSIONS = { cpp = 'c++20', python = 'python3' }
M.COOKIE_FILE = vim.fn.expand('~/.cache/cp-nvim/cookies.json')
return M

View file

@ -38,6 +38,7 @@ local function prompt_and_login(platform, display)
end, function(result)
vim.schedule(function()
if result.success then
cache.set_credentials(platform, credentials)
logger.log(
display .. ' login successful',
{ level = vim.log.levels.INFO, override = true }
@ -105,6 +106,14 @@ function M.logout(platform)
local display = constants.PLATFORM_DISPLAY_NAMES[platform] or platform
cache.load()
cache.clear_credentials(platform)
local cookie_file = constants.COOKIE_FILE
if vim.fn.filereadable(cookie_file) == 1 then
local ok, data = pcall(vim.fn.json_decode, vim.fn.readfile(cookie_file, 'b'))
if ok and type(data) == 'table' then
data[platform] = nil
vim.fn.writefile({ vim.fn.json_encode(data) }, cookie_file)
end
end
logger.log(display .. ' credentials cleared', { level = vim.log.levels.INFO, override = true })
end

View file

@ -16,7 +16,7 @@ from bs4 import BeautifulSoup, Tag
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry
from .base import BaseScraper, extract_precision
from .base import BaseScraper, clear_platform_cookies, extract_precision, load_platform_cookies, save_platform_cookies
from .models import (
ContestListResult,
ContestSummary,
@ -379,26 +379,15 @@ def _ensure_browser() -> None:
break
def _login_headless(credentials: dict[str, str]) -> LoginResult:
try:
from scrapling.fetchers import StealthySession # type: ignore[import-untyped,unresolved-import]
except ImportError:
return LoginResult(
success=False,
error="scrapling is required for AtCoder login. Install it: uv add 'scrapling[fetchers]>=0.4'",
)
def _at_check_logged_in(page) -> bool:
return page.evaluate(
"() => Array.from(document.querySelectorAll('a')).some(a => a.textContent.trim() === 'Sign Out')"
)
_ensure_browser()
logged_in = False
def _at_login_action(credentials: dict[str, str]):
login_error: str | None = None
def check_login(page):
nonlocal logged_in
logged_in = page.evaluate(
"() => Array.from(document.querySelectorAll('a')).some(a => a.textContent.trim() === 'Sign Out')"
)
def login_action(page):
nonlocal login_error
try:
@ -412,6 +401,45 @@ def _login_headless(credentials: dict[str, str]) -> LoginResult:
except Exception as e:
login_error = str(e)
return login_action, lambda: login_error
def _login_headless(credentials: dict[str, str]) -> LoginResult:
try:
from scrapling.fetchers import StealthySession # type: ignore[import-untyped,unresolved-import]
except ImportError:
return LoginResult(
success=False,
error="scrapling is required for AtCoder login. Install it: uv add 'scrapling[fetchers]>=0.4'",
)
_ensure_browser()
saved_cookies = load_platform_cookies("atcoder") or []
if saved_cookies:
print(json.dumps({"status": "checking_login"}), flush=True)
logged_in = False
def check_action(page):
nonlocal logged_in
logged_in = _at_check_logged_in(page)
try:
with StealthySession(
headless=True,
timeout=BROWSER_SESSION_TIMEOUT,
google_search=False,
cookies=saved_cookies,
) as session:
session.fetch(f"{BASE_URL}/home", page_action=check_action, network_idle=True)
if logged_in:
return LoginResult(success=True, error="")
except Exception:
pass
login_action, get_error = _at_login_action(credentials)
try:
with StealthySession(
headless=True,
@ -424,16 +452,26 @@ def _login_headless(credentials: dict[str, str]) -> LoginResult:
page_action=login_action,
solve_cloudflare=True,
)
login_error = get_error()
if login_error:
return LoginResult(success=False, error=f"Login failed: {login_error}")
session.fetch(
f"{BASE_URL}/home", page_action=check_login, network_idle=True
)
logged_in = False
def verify_action(page):
nonlocal logged_in
logged_in = _at_check_logged_in(page)
session.fetch(f"{BASE_URL}/home", page_action=verify_action, network_idle=True)
if not logged_in:
return LoginResult(
success=False, error="Login failed (bad credentials?)"
)
return LoginResult(success=False, error="Login failed (bad credentials?)")
try:
browser_cookies = session.context.cookies()
if browser_cookies:
save_platform_cookies("atcoder", browser_cookies)
except Exception:
pass
return LoginResult(success=True, error="")
except Exception as e:
@ -446,6 +484,7 @@ def _submit_headless(
file_path: str,
language_id: str,
credentials: dict[str, str],
_retried: bool = False,
) -> "SubmitResult":
try:
from scrapling.fetchers import StealthySession # type: ignore[import-untyped,unresolved-import]
@ -457,26 +496,24 @@ def _submit_headless(
_ensure_browser()
login_error: str | None = None
submit_error: str | None = None
saved_cookies: list[dict[str, Any]] = []
if not _retried:
saved_cookies = load_platform_cookies("atcoder") or []
def login_action(page):
nonlocal login_error
try:
_solve_turnstile(page)
page.fill('input[name="username"]', credentials.get("username", ""))
page.fill('input[name="password"]', credentials.get("password", ""))
page.click("#submit")
page.wait_for_url(
lambda url: "/login" not in url, timeout=BROWSER_NAV_TIMEOUT
)
except Exception as e:
login_error = str(e)
logged_in = bool(saved_cookies)
submit_error: str | None = None
needs_relogin = False
def check_login(page):
nonlocal logged_in
logged_in = _at_check_logged_in(page)
login_action, get_login_error = _at_login_action(credentials)
def submit_action(page):
nonlocal submit_error
nonlocal submit_error, needs_relogin
if "/login" in page.url:
submit_error = "Not logged in after login step"
needs_relogin = True
return
try:
_solve_turnstile(page)
@ -488,18 +525,12 @@ def _submit_headless(
f'select[name="data.LanguageId"] option[value="{language_id}"]'
).wait_for(state="attached", timeout=BROWSER_ELEMENT_WAIT)
page.select_option('select[name="data.LanguageId"]', language_id)
ext = _LANGUAGE_ID_EXTENSION.get(
language_id, Path(file_path).suffix.lstrip(".") or "txt"
page.set_input_files("#input-open-file", file_path)
page.wait_for_function(
"() => { const ta = document.getElementById('plain-textarea'); return ta && ta.value.length > 0; }",
timeout=BROWSER_ELEMENT_WAIT,
)
page.set_input_files(
"#input-open-file",
{
"name": f"solution.{ext}",
"mimeType": "text/plain",
"buffer": Path(file_path).read_bytes(),
},
)
page.locator('button[type="submit"]').click(no_wait_after=True)
page.evaluate("document.getElementById('submit').click()")
page.wait_for_url(
lambda url: "/submissions/me" in url,
timeout=BROWSER_SUBMIT_NAV_TIMEOUT["atcoder"],
@ -512,15 +543,29 @@ def _submit_headless(
headless=True,
timeout=BROWSER_SESSION_TIMEOUT,
google_search=False,
cookies=saved_cookies if saved_cookies else [],
) as session:
print(json.dumps({"status": "logging_in"}), flush=True)
session.fetch(
f"{BASE_URL}/login",
page_action=login_action,
solve_cloudflare=True,
)
if login_error:
return SubmitResult(success=False, error=f"Login failed: {login_error}")
if not _retried and saved_cookies:
print(json.dumps({"status": "checking_login"}), flush=True)
session.fetch(f"{BASE_URL}/home", page_action=check_login, network_idle=True)
if not logged_in:
print(json.dumps({"status": "logging_in"}), flush=True)
session.fetch(
f"{BASE_URL}/login",
page_action=login_action,
solve_cloudflare=True,
)
login_error = get_login_error()
if login_error:
return SubmitResult(success=False, error=f"Login failed: {login_error}")
logged_in = True
try:
browser_cookies = session.context.cookies()
if browser_cookies:
save_platform_cookies("atcoder", browser_cookies)
except Exception:
pass
print(json.dumps({"status": "submitting"}), flush=True)
session.fetch(
@ -529,12 +574,16 @@ def _submit_headless(
solve_cloudflare=True,
)
if needs_relogin and not _retried:
clear_platform_cookies("atcoder")
return _submit_headless(
contest_id, problem_id, file_path, language_id, credentials, _retried=True
)
if submit_error:
return SubmitResult(success=False, error=submit_error)
return SubmitResult(
success=True, error="", submission_id="", verdict="submitted"
)
return SubmitResult(success=True, error="", submission_id="", verdict="submitted")
except Exception as e:
return SubmitResult(success=False, error=str(e))

View file

@ -4,6 +4,38 @@ import os
import re
import sys
from abc import ABC, abstractmethod
from pathlib import Path
from typing import Any
_COOKIE_FILE = Path.home() / ".cache" / "cp-nvim" / "cookies.json"
def load_platform_cookies(platform: str) -> Any | None:
try:
data = json.loads(_COOKIE_FILE.read_text())
return data.get(platform)
except Exception:
return None
def save_platform_cookies(platform: str, data: Any) -> None:
_COOKIE_FILE.parent.mkdir(parents=True, exist_ok=True)
try:
existing = json.loads(_COOKIE_FILE.read_text())
except Exception:
existing = {}
existing[platform] = data
_COOKIE_FILE.write_text(json.dumps(existing))
def clear_platform_cookies(platform: str) -> None:
try:
existing = json.loads(_COOKIE_FILE.read_text())
existing.pop(platform, None)
_COOKIE_FILE.write_text(json.dumps(existing))
except Exception:
pass
from .language_ids import get_language_id
from .models import (

View file

@ -9,7 +9,7 @@ from typing import Any
import httpx
from .base import BaseScraper
from .base import BaseScraper, clear_platform_cookies, load_platform_cookies, save_platform_cookies
from .timeouts import BROWSER_SESSION_TIMEOUT, HTTP_TIMEOUT
from .models import (
ContestListResult,
@ -31,7 +31,6 @@ HEADERS = {
}
CONNECTIONS = 8
_COOKIE_PATH = Path.home() / ".cache" / "cp-nvim" / "codechef-cookies.json"
_CC_CHECK_LOGIN_JS = "() => !!document.querySelector('a[href*=\"/users/\"]')"
@ -67,8 +66,6 @@ def _login_headless_codechef(credentials: dict[str, str]) -> LoginResult:
_ensure_browser()
_COOKIE_PATH.parent.mkdir(parents=True, exist_ok=True)
logged_in = False
login_error: str | None = None
@ -85,7 +82,7 @@ def _login_headless_codechef(credentials: dict[str, str]) -> LoginResult:
try:
page.wait_for_url(lambda url: "/login" not in url, timeout=3000)
except Exception:
login_error = "Login failed (bad credentials?)"
login_error = "bad credentials?"
return
except Exception as e:
login_error = str(e)
@ -99,7 +96,7 @@ def _login_headless_codechef(credentials: dict[str, str]) -> LoginResult:
print(json.dumps({"status": "logging_in"}), flush=True)
session.fetch(f"{BASE_URL}/login", page_action=login_action)
if login_error:
return LoginResult(success=False, error=f"Login failed: {login_error}")
return LoginResult(success=False, error=login_error)
session.fetch(f"{BASE_URL}/", page_action=check_login, network_idle=True)
if not logged_in:
@ -110,7 +107,7 @@ def _login_headless_codechef(credentials: dict[str, str]) -> LoginResult:
try:
browser_cookies = session.context.cookies()
if browser_cookies:
_COOKIE_PATH.write_text(json.dumps(browser_cookies))
save_platform_cookies("codechef", browser_cookies)
except Exception:
pass
@ -126,6 +123,7 @@ def _submit_headless_codechef(
language_id: str,
credentials: dict[str, str],
_retried: bool = False,
_practice: bool = False,
) -> SubmitResult:
source_code = Path(file_path).read_text()
@ -141,15 +139,11 @@ def _submit_headless_codechef(
_ensure_browser()
_COOKIE_PATH.parent.mkdir(parents=True, exist_ok=True)
saved_cookies: list[dict[str, Any]] = []
if _COOKIE_PATH.exists() and not _retried:
try:
saved_cookies = json.loads(_COOKIE_PATH.read_text())
except Exception:
pass
if not _retried:
saved_cookies = load_platform_cookies("codechef") or []
logged_in = bool(saved_cookies) and not _retried
logged_in = bool(saved_cookies)
login_error: str | None = None
submit_error: str | None = None
needs_relogin = False
@ -167,7 +161,7 @@ def _submit_headless_codechef(
try:
page.wait_for_url(lambda url: "/login" not in url, timeout=3000)
except Exception:
login_error = "Login failed (bad credentials?)"
login_error = "bad credentials?"
return
except Exception as e:
login_error = str(e)
@ -213,7 +207,9 @@ def _submit_headless_codechef(
const d = document.querySelector('[role="dialog"], .swal2-popup');
return d ? d.textContent.trim() : null;
}""")
if dialog_text and (
if dialog_text and "login" in dialog_text.lower():
needs_relogin = True
elif dialog_text and (
"not available for accepting solutions" in dialog_text
or "not available for submission" in dialog_text
):
@ -228,23 +224,23 @@ def _submit_headless_codechef(
headless=True,
timeout=BROWSER_SESSION_TIMEOUT,
google_search=False,
cookies=saved_cookies if (saved_cookies and not _retried) else [],
cookies=saved_cookies if saved_cookies else [],
) as session:
if not logged_in:
if not _retried and not _practice:
print(json.dumps({"status": "checking_login"}), flush=True)
session.fetch(
f"{BASE_URL}/", page_action=check_login, network_idle=True
)
session.fetch(f"{BASE_URL}/", page_action=check_login)
if not logged_in:
print(json.dumps({"status": "logging_in"}), flush=True)
session.fetch(f"{BASE_URL}/login", page_action=login_action)
if login_error:
return SubmitResult(
success=False, error=f"Login failed: {login_error}"
success=False, error=login_error
)
logged_in = True
print(json.dumps({"status": "submitting"}), flush=True)
if not _practice:
print(json.dumps({"status": "submitting"}), flush=True)
submit_url = (
f"{BASE_URL}/submit/{problem_id}"
if contest_id == "PRACTICE"
@ -255,12 +251,12 @@ def _submit_headless_codechef(
try:
browser_cookies = session.context.cookies()
if browser_cookies and logged_in:
_COOKIE_PATH.write_text(json.dumps(browser_cookies))
save_platform_cookies("codechef", browser_cookies)
except Exception:
pass
if needs_relogin and not _retried:
_COOKIE_PATH.unlink(missing_ok=True)
clear_platform_cookies("codechef")
return _submit_headless_codechef(
contest_id,
problem_id,
@ -270,14 +266,14 @@ def _submit_headless_codechef(
_retried=True,
)
if submit_error == "PRACTICE_FALLBACK" and not _retried:
if submit_error == "PRACTICE_FALLBACK" and not _practice:
return _submit_headless_codechef(
"PRACTICE",
problem_id,
file_path,
language_id,
credentials,
_retried=True,
_practice=True,
)
if submit_error:

View file

@ -8,7 +8,7 @@ from typing import Any
import requests
from bs4 import BeautifulSoup, Tag
from .base import BaseScraper, extract_precision
from .base import BaseScraper, clear_platform_cookies, extract_precision, load_platform_cookies, save_platform_cookies
from .models import (
ContestListResult,
ContestSummary,
@ -331,9 +331,33 @@ class CodeforcesScraper(BaseScraper):
return await asyncio.to_thread(_login_headless_cf, credentials)
def _login_headless_cf(credentials: dict[str, str]) -> LoginResult:
from pathlib import Path
def _cf_check_logged_in(page) -> bool:
return page.evaluate(
"() => Array.from(document.querySelectorAll('a'))"
".some(a => a.textContent.includes('Logout'))"
)
def _cf_login_action(credentials: dict[str, str]):
login_error: str | None = None
def login_action(page):
nonlocal login_error
try:
page.wait_for_selector('input[name="handleOrEmail"]', timeout=60000)
page.fill('input[name="handleOrEmail"]', credentials.get("username", ""))
page.fill('input[name="password"]', credentials.get("password", ""))
page.locator('#enterForm input[type="submit"]').click()
page.wait_for_url(
lambda url: "/enter" not in url, timeout=BROWSER_NAV_TIMEOUT
)
except Exception as e:
login_error = str(e)
return login_action, lambda: login_error
def _login_headless_cf(credentials: dict[str, str]) -> LoginResult:
try:
from scrapling.fetchers import StealthySession # type: ignore[import-untyped,unresolved-import]
except ImportError:
@ -346,36 +370,30 @@ def _login_headless_cf(credentials: dict[str, str]) -> LoginResult:
_ensure_browser()
cookie_cache = Path.home() / ".cache" / "cp-nvim" / "codeforces-cookies.json"
cookie_cache.parent.mkdir(parents=True, exist_ok=True)
saved_cookies = load_platform_cookies("codeforces") or []
logged_in = False
login_error: str | None = None
if saved_cookies:
print(json.dumps({"status": "checking_login"}), flush=True)
logged_in = False
def check_login(page):
nonlocal logged_in
logged_in = page.evaluate(
"() => Array.from(document.querySelectorAll('a'))"
".some(a => a.textContent.includes('Logout'))"
)
def check_action(page):
nonlocal logged_in
logged_in = _cf_check_logged_in(page)
def login_action(page):
nonlocal login_error
try:
page.fill(
'input[name="handleOrEmail"]',
credentials.get("username", ""),
)
page.fill(
'input[name="password"]',
credentials.get("password", ""),
)
page.locator('#enterForm input[type="submit"]').click()
page.wait_for_url(
lambda url: "/enter" not in url, timeout=BROWSER_NAV_TIMEOUT
)
except Exception as e:
login_error = str(e)
with StealthySession(
headless=True,
timeout=BROWSER_SESSION_TIMEOUT,
google_search=False,
cookies=saved_cookies,
) as session:
session.fetch(f"{BASE_URL}/", page_action=check_action, solve_cloudflare=True)
if logged_in:
return LoginResult(success=True, error="")
except Exception:
pass
login_action, get_error = _cf_login_action(credentials)
try:
with StealthySession(
@ -389,23 +407,24 @@ def _login_headless_cf(credentials: dict[str, str]) -> LoginResult:
page_action=login_action,
solve_cloudflare=True,
)
login_error = get_error()
if login_error:
return LoginResult(success=False, error=f"Login failed: {login_error}")
session.fetch(
f"{BASE_URL}/",
page_action=check_login,
network_idle=True,
)
logged_in = False
def verify_action(page):
nonlocal logged_in
logged_in = _cf_check_logged_in(page)
session.fetch(f"{BASE_URL}/", page_action=verify_action, network_idle=True)
if not logged_in:
return LoginResult(
success=False, error="Login failed (bad credentials?)"
)
return LoginResult(success=False, error="Login failed (bad credentials?)")
try:
browser_cookies = session.context.cookies()
if any(c.get("name") == "X-User-Handle" for c in browser_cookies):
cookie_cache.write_text(json.dumps(browser_cookies))
if any(c.get("name") == "X-User-Sha1" for c in browser_cookies):
save_platform_cookies("codeforces", browser_cookies)
except Exception:
pass
@ -426,6 +445,7 @@ def _submit_headless(
source_code = Path(file_path).read_text()
try:
from scrapling.fetchers import StealthySession # type: ignore[import-untyped,unresolved-import]
except ImportError:
@ -438,44 +458,19 @@ def _submit_headless(
_ensure_browser()
cookie_cache = Path.home() / ".cache" / "cp-nvim" / "codeforces-cookies.json"
cookie_cache.parent.mkdir(parents=True, exist_ok=True)
saved_cookies: list[dict[str, Any]] = []
if cookie_cache.exists():
try:
saved_cookies = json.loads(cookie_cache.read_text())
except Exception:
pass
if not _retried:
saved_cookies = load_platform_cookies("codeforces") or []
logged_in = cookie_cache.exists() and not _retried
login_error: str | None = None
logged_in = bool(saved_cookies)
submit_error: str | None = None
needs_relogin = False
def check_login(page):
nonlocal logged_in
logged_in = page.evaluate(
"() => Array.from(document.querySelectorAll('a'))"
".some(a => a.textContent.includes('Logout'))"
)
logged_in = _cf_check_logged_in(page)
def login_action(page):
nonlocal login_error
try:
page.fill(
'input[name="handleOrEmail"]',
credentials.get("username", ""),
)
page.fill(
'input[name="password"]',
credentials.get("password", ""),
)
page.locator('#enterForm input[type="submit"]').click()
page.wait_for_url(
lambda url: "/enter" not in url, timeout=BROWSER_NAV_TIMEOUT
)
except Exception as e:
login_error = str(e)
_login_action, _get_login_error = _cf_login_action(credentials)
def submit_action(page):
nonlocal submit_error, needs_relogin
@ -520,27 +515,25 @@ def _submit_headless(
headless=True,
timeout=BROWSER_SESSION_TIMEOUT,
google_search=False,
cookies=saved_cookies if (cookie_cache.exists() and not _retried) else [],
cookies=saved_cookies if saved_cookies else [],
) as session:
if not (cookie_cache.exists() and not _retried):
if not _retried and saved_cookies:
print(json.dumps({"status": "checking_login"}), flush=True)
session.fetch(
f"{BASE_URL}/",
page_action=check_login,
network_idle=True,
)
session.fetch(f"{BASE_URL}/", page_action=check_login, solve_cloudflare=True)
if not logged_in:
print(json.dumps({"status": "logging_in"}), flush=True)
session.fetch(
f"{BASE_URL}/enter",
page_action=login_action,
page_action=_login_action,
solve_cloudflare=True,
)
login_error = _get_login_error()
if login_error:
return SubmitResult(
success=False, error=f"Login failed: {login_error}"
)
logged_in = True
print(json.dumps({"status": "submitting"}), flush=True)
session.fetch(
@ -551,13 +544,13 @@ def _submit_headless(
try:
browser_cookies = session.context.cookies()
if any(c.get("name") == "X-User-Handle" for c in browser_cookies):
cookie_cache.write_text(json.dumps(browser_cookies))
if any(c.get("name") == "X-User-Sha1" for c in browser_cookies):
save_platform_cookies("codeforces", browser_cookies)
except Exception:
pass
if needs_relogin and not _retried:
cookie_cache.unlink(missing_ok=True)
clear_platform_cookies("codeforces")
return _submit_headless(
contest_id,
problem_id,

View file

@ -10,7 +10,7 @@ from pathlib import Path
import httpx
from .base import BaseScraper, extract_precision
from .base import BaseScraper, clear_platform_cookies, extract_precision, load_platform_cookies, save_platform_cookies
from .timeouts import HTTP_TIMEOUT
from .models import (
ContestListResult,
@ -28,8 +28,6 @@ HEADERS = {
}
CONNECTIONS = 8
_COOKIE_PATH = Path.home() / ".cache" / "cp-nvim" / "kattis-cookies.json"
TIME_RE = re.compile(
r"CPU Time limit</span>\s*<span[^>]*>\s*(\d+)\s*seconds?\s*</span>",
re.DOTALL,
@ -209,20 +207,24 @@ async def _stream_single_problem(client: httpx.AsyncClient, slug: str) -> None:
async def _load_kattis_cookies(client: httpx.AsyncClient) -> None:
if not _COOKIE_PATH.exists():
return
try:
for k, v in json.loads(_COOKIE_PATH.read_text()).items():
data = load_platform_cookies("kattis")
if isinstance(data, dict):
for k, v in data.items():
client.cookies.set(k, v)
except Exception:
pass
async def _save_kattis_cookies(client: httpx.AsyncClient) -> None:
cookies = {k: v for k, v in client.cookies.items()}
cookies = dict(client.cookies.items())
if cookies:
_COOKIE_PATH.parent.mkdir(parents=True, exist_ok=True)
_COOKIE_PATH.write_text(json.dumps(cookies))
save_platform_cookies("kattis", cookies)
async def _check_kattis_login(client: httpx.AsyncClient) -> bool:
try:
r = await client.get(BASE_URL, headers=HEADERS, timeout=HTTP_TIMEOUT)
return bool(r.headers.get("x-username"))
except Exception:
return False
async def _do_kattis_login(
@ -329,9 +331,10 @@ class KattisScraper(BaseScraper):
return self._submit_error("Missing credentials. Use :CP kattis login")
async with httpx.AsyncClient(follow_redirects=True) as client:
print(json.dumps({"status": "checking_login"}), flush=True)
await _load_kattis_cookies(client)
if not client.cookies:
if client.cookies:
print(json.dumps({"status": "checking_login"}), flush=True)
else:
print(json.dumps({"status": "logging_in"}), flush=True)
ok = await _do_kattis_login(client, username, password)
if not ok:
@ -368,7 +371,7 @@ class KattisScraper(BaseScraper):
return self._submit_error(f"Submit request failed: {e}")
if r.status_code in (400, 403) or r.text == "Request validation failed":
_COOKIE_PATH.unlink(missing_ok=True)
clear_platform_cookies("kattis")
print(json.dumps({"status": "logging_in"}), flush=True)
ok = await _do_kattis_login(client, username, password)
if not ok:
@ -399,6 +402,16 @@ class KattisScraper(BaseScraper):
return self._login_error("Missing username or password")
async with httpx.AsyncClient(follow_redirects=True) as client:
await _load_kattis_cookies(client)
if client.cookies:
print(json.dumps({"status": "checking_login"}), flush=True)
if await _check_kattis_login(client):
return LoginResult(
success=True,
error="",
credentials={"username": username, "password": password},
)
print(json.dumps({"status": "logging_in"}), flush=True)
ok = await _do_kattis_login(client, username, password)
if not ok:

View file

@ -8,7 +8,7 @@ from typing import Any, cast
import httpx
from .base import BaseScraper, extract_precision
from .base import BaseScraper, extract_precision, load_platform_cookies, save_platform_cookies
from .timeouts import HTTP_TIMEOUT
from .models import (
ContestListResult,
@ -27,7 +27,6 @@ HEADERS = {
}
CONNECTIONS = 4
_COOKIE_PATH = Path.home() / ".cache" / "cp-nvim" / "usaco-cookies.json"
_LOGIN_PATH = "/current/tpcm/login-session.php"
_SUBMIT_PATH = "/current/tpcm/submit-solution.php"
@ -202,20 +201,16 @@ def _parse_submit_form(
async def _load_usaco_cookies(client: httpx.AsyncClient) -> None:
if not _COOKIE_PATH.exists():
return
try:
for k, v in json.loads(_COOKIE_PATH.read_text()).items():
data = load_platform_cookies("usaco")
if isinstance(data, dict):
for k, v in data.items():
client.cookies.set(k, v)
except Exception:
pass
async def _save_usaco_cookies(client: httpx.AsyncClient) -> None:
cookies = {k: v for k, v in client.cookies.items()}
cookies = dict(client.cookies.items())
if cookies:
_COOKIE_PATH.parent.mkdir(parents=True, exist_ok=True)
_COOKIE_PATH.write_text(json.dumps(cookies))
save_platform_cookies("usaco", cookies)
async def _check_usaco_login(client: httpx.AsyncClient, username: str) -> bool: