Problem: CF and AtCoder always did a full browser login on every `login` invocation, even with valid cookies. AtCoder submit never persisted cookies, re-logging in on every submit. CF's cookie guard used `X-User-Handle` (no longer set by CF — now `X-User-Sha1`), so cookies were never saved. CF `login_action` was missing `wait_for_selector` for the form that appears after the Cloudflare gate reloads. AtCoder submit injected source via CodeMirror which doesn't exist on AtCoder (it uses ACE editor). Solution: Added cookie fast paths to CF and AtCoder login — emit `checking_login` and return early if the existing session is valid. `checking_login` is only emitted when cookies actually exist; fresh starts go straight to `logging_in`. Fixed CF cookie guard to `X-User-Sha1` and added `wait_for_selector` for the login form. Rewrote AtCoder submit to use `set_input_files` on the real source file path, with `wait_for_function` on `#plain-textarea` to confirm the ACE editor populated before clicking submit.
19 KiB
Browser Scraper Login Debugging Guide
Goal
Make CF, AtCoder, and CodeChef login/submit behavior IDENTICAL to Kattis. Every log message, every pathway, zero unnecessary logins.
Current Branch
fix/scraper-browser-v2
Architecture Crash Course
Lua side
-
credentials.lua—:CP <platform> login/logoutM.login: if credentials cached → callsscraper.login(platform, cached_creds, on_status, cb)on_status(ev): logs"<Platform>: <STATUS_MESSAGES[ev.status]>"cb(result): on success logs"<Platform> login successful", on failure callsprompt_and_login
prompt_and_login: prompts username+password, then same flowM.logout: clears credentials from cache + clears platform key from~/.cache/cp-nvim/cookies.json- STATUS_MESSAGES:
checking_login="Checking existing session...",logging_in="Logging in...",installing_browser="Installing browser..."
-
submit.lua—:CP submit- Gets saved creds (or prompts), calls
scraper.submit(..., on_status, cb) on_status(ev): logsSTATUS_MSGS[ev.status](no platform prefix)- STATUS_MSGS:
checking_login="Checking login...",logging_in="Logging in...",submitting="Submitting...",installing_browser="Installing browser (first time setup)..."
- Gets saved creds (or prompts), calls
-
scraper.lua—run_scraper(platform, subcommand, args, opts)needs_browser = subcommand == 'submit' or subcommand == 'login' or (platform == 'codeforces' and subcommand in {'metadata','tests'})- browser path: FHS env (
utils.get_python_submit_cmd), 120s timeout,UV_PROJECT_ENVIRONMENT=~/.cache/nvim/cp-nvim/submit-env - ndjson mode: reads stdout line by line, calls
opts.on_event(ev)per line - login event routing:
ev.credentials→cache.set_credentials;ev.status→on_status;ev.success→ callback
Python side
-
base.py—BaseScraper.run_cli()/_run_cli_async()loginmode: readsCP_CREDENTIALSenv, callsself.login(credentials), printsresult.model_dump_json()submitmode: readsCP_CREDENTIALSenv, callsself.submit(...), printsresult.model_dump_json()- ndjson status events:
print(json.dumps({"status": "..."}), flush=True)during login/submit - final result:
print(result.model_dump_json())— this is what triggersev.success
-
base.py— cookie helpersload_platform_cookies(platform)→ reads~/.cache/cp-nvim/cookies.json, returns platform keysave_platform_cookies(platform, data)→ writes to same fileclear_platform_cookies(platform)→ removes platform key from same file
-
models.py—LoginResult(success, error, credentials={}),SubmitResult(success, error, submission_id="", verdict="")
Kattis: The Reference Implementation
Kattis is the gold standard. Everything else must match it exactly.
Kattis login flow (kattis.py:login)
- Always emits
{"status": "logging_in"} - POSTs to
/loginwith credentials - If fail →
LoginResult(success=False, ...) - If success → saves cookies, returns
LoginResult(success=True, ..., credentials={username, password})
Lua sees: ev.credentials (non-empty) → cache.set_credentials. Then ev.success=True → "<Platform> login successful".
Kattis submit flow (kattis.py:submit)
emit checking_login
load_cookies
if no cookies:
emit logging_in
do_login → save_cookies
emit submitting
POST /submit
if 400/403 or "Request validation failed":
clear_cookies
emit logging_in
do_login → save_cookies
POST /submit (retry)
return SubmitResult
Expected log sequences — CONFIRMED from Kattis live testing
Scenario 1: login+logout+login
Kattis: Logging in...
Kattis login successful
Kattis credentials cleared
Kattis: Logging in...
Kattis login successful
Note: after logout, login prompts for credentials again (cleared from cache).
Scenario 2: login+login
Kattis: Logging in...
Kattis login successful
Kattis: Logging in...
Kattis login successful
Note: second login uses cached credentials, no prompt.
Scenario 3: submit happy path (valid cookies)
Checking login...
Submitting...
Submitted successfully
Note: no Logging in... — cookies present, skip login.
Scenario 4: bad cookie → submit ← CONFIRMED
Checking login...
Submitting...
Logging in...
Submitted successfully
REACTIVE re-login: cookies exist so it assumes logged in, attempts submit, server rejects
(400/403), re-logins, retries submit silently (NO second Submitting...).
Scenario 5: fresh start → submit (no cookies, credentials cached)
Checking login...
Logging in...
Submitting...
Submitted successfully
Note: no cookies present → login before attempting submit.
Browser scraper bad-cookie note
Browser scrapers (CF, AtCoder, CodeChef) can do a PROACTIVE check during checking_login
by loading cookies into the browser session and fetching the homepage to verify login state.
If proactive check works, bad cookie sequence becomes:
Checking login...
Logging in... ← detected bad cookie before submit attempt
Submitting...
Submitted successfully
This differs from Kattis (which can't proactively verify). Decide per-platform which is correct once live testing reveals what the browser check returns on bad cookies. The proactive sequence is PREFERRED — avoids a wasted submit attempt.
Required Behavior for Browser Scrapers
Match Kattis exactly. The differences come from how login is validated:
- Kattis: cookie presence check (no real HTTP check — reactive on submit failure)
- CF/AtCoder/CodeChef: must use browser session to check login state
Login subcommand
ALWAYS:
- Emit
{"status": "logging_in"} - Do full browser login
- If success → save cookies, return
LoginResult(success=True, credentials={username, password}) - If fail → return
LoginResult(success=False, error="...")
NO cookie fast path on login. Login always re-authenticates. (Matches Kattis.)
MUST return credentials={username, password} so Lua caches them.
Submit subcommand
emit checking_login
load cookies
if cookies:
check if still valid (browser or HTTP)
if invalid → emit logging_in → login → save cookies
else → logged_in = True
else:
emit logging_in → login → save cookies
emit submitting
do submit
if auth failure (redirect to login):
clear cookies
emit logging_in → login → save cookies
retry submit
return SubmitResult
Test Protocol
Environment
Neovim: nvim --clean -u ~/dev/cp.nvim/t/minimal_init.lua
Clean state:
rm -f ~/.cache/cp-nvim/cookies.json
rm -f ~/.local/share/nvim/cp-nvim.json
CRITICAL PROTOCOL RULES (do not skip)
-
Bad cookie scenario is MANDATORY. Never skip it. If user hasn't run it, stop and demand it. Without it we cannot verify reactive re-login works. It is the hardest scenario.
-
AI clears cookies between scenarios using the commands below. Never ask the user to do it.
-
Do not move to the next platform until ALL 5 scenarios show correct logs.
-
Go one scenario at a time. Do not batch. Wait for user to paste logs before proceeding.
Cookie File Structure
Single unified file: ~/.cache/cp-nvim/cookies.json
Two formats depending on platform type:
httpx platforms (kattis, usaco): simple dict
{"kattis": {"KattisSiteCookie": "abc123"}}
{"usaco": {"PHPSESSID": "abc123"}}
Browser/playwright platforms (codeforces, atcoder, codechef): list of playwright cookie dicts
{"codeforces": [
{"domain": ".codeforces.com", "name": "X-User-Handle", "value": "dalet",
"httpOnly": false, "sameSite": "Lax", "expires": 1234567890, "secure": false, "path": "/"}
]}
Cookie manipulation commands
Inject bad cookies — httpx platforms (kattis, usaco):
python3 -c "
import json
d = json.load(open('/home/barrett/.cache/cp-nvim/cookies.json'))
d['kattis'] = {k: 'bogus' for k in d['kattis']}
json.dump(d, open('/home/barrett/.cache/cp-nvim/cookies.json','w'))
"
Inject bad cookies — playwright platforms (codeforces, atcoder, codechef):
python3 -c "
import json
d = json.load(open('/home/barrett/.cache/cp-nvim/cookies.json'))
for c in d['codeforces']:
c['value'] = 'bogus'
json.dump(d, open('/home/barrett/.cache/cp-nvim/cookies.json','w'))
"
Remove platform cookies only (keep credentials in cp-nvim.json):
python3 -c "
import json
d = json.load(open('/home/barrett/.cache/cp-nvim/cookies.json'))
d.pop('codeforces', None)
json.dump(d, open('/home/barrett/.cache/cp-nvim/cookies.json','w'))
"
Test scenarios (run in order for each platform)
Run ONE at a time. Wait for user logs. AI clears state between scenarios.
-
login+logout+login
:CP <p> login(prompts for creds):CP <p> logout:CP <p> login(should prompt again — creds cleared by logout)
-
login+login
:CP <p> login(uses cached creds from step 1, no prompt):CP <p> login(again, no prompt)
-
submit happy path
- AI ensures valid cookies exist (left over from login)
:CP submit- Expected:
Checking login...→Submitting...→Submitted successfully
-
bad cookie → submit ← MANDATORY, never skip
- AI runs bad-cookie injection command
:CP submit- Expected:
Checking login...→Logging in...→Submitting...→Submitted successfully
-
fresh start → submit
- AI removes platform cookies only (credentials remain in cp-nvim.json)
:CP submit- Expected:
Checking login...→Logging in...→Submitting...→Submitted successfully
For each scenario: user pastes exact notification text, AI compares to Kattis reference.
Debugging tool: headless=False
To see the browser, change headless=True → headless=False in the scraper.
This lets you watch exactly what the page shows when page_action fires.
Remember to revert after debugging.
ABSOLUTE RULE: no waits, no timeout increases — EVER
Never add page.wait_for_timeout(), time.sleep(), or increase any timeout value to fix
a bug. If something times out, the root cause is wrong logic or wrong selector — fix that.
Increasing timeouts masks bugs and makes the UX slower. Find the real fix.
Debugging tool: direct Python invocation
SUBMIT_CMD=$(cat ~/.cache/nvim/cp-nvim/nix-submit)
UV_PROJECT_ENVIRONMENT=~/.cache/nvim/cp-nvim/submit-env
# Login:
CP_CREDENTIALS='{"username":"USER","password":"PASS"}' \
$SUBMIT_CMD run --directory ~/dev/cp.nvim -m scrapers.codeforces login
# Submit:
CP_CREDENTIALS='{"username":"USER","password":"PASS"}' \
$SUBMIT_CMD run --directory ~/dev/cp.nvim -m scrapers.codeforces submit \
<contest_id> <problem_id> <language_id> <file_path>
For passwords with special chars, use a temp file:
cat > /tmp/creds.json << 'EOF'
{"username":"user","password":"p@ss!word\"with\"quotes"}
EOF
CREDS=$(cat /tmp/creds.json)
CP_CREDENTIALS="$CREDS" $SUBMIT_CMD run --directory ~/dev/cp.nvim -m scrapers.codeforces login
Platform-Specific Notes
Codeforces
Credentials: username=dalet, password=y)o#oW83JlhmQ3P
Cookie file key: codeforces (list of cookie dicts with playwright format)
Cookie guard on save: only saves if X-User-Sha1 cookie present (NOT X-User-Handle — that cookie no longer exists). Verified 2026-03-07.
Known issues:
- CF has a custom Turnstile gate on
/enter. It's a FULL PAGE redirect ("Verification"), not an embedded widget. It POSTs to/data/turnstilethen reloads to show the actual login form.page_actionis called by scrapling at page load, which may fire BEFORE the reload completes. Fix: addpage.wait_for_selector('input[name="handleOrEmail"]', timeout=60000)as the FIRST line of everylogin_actionthat fills the CF login form. - The same issue exists in BOTH
_login_headless_cf.login_actionand_submit_headless.login_action. - The
check_loginon homepage usessolve_cloudflare=True(current diff). Verify this works. needs_relogintriggers if submit page redirects to/enteror/login.
Submit page Turnstile: The submit page (/contest/{id}/submit) has an EMBEDDED Turnstile
(not the full-page gate). submit_action correctly calls _solve_turnstile(page) for this.
Cookie fast path for submit:
- Load cookies →
StealthySession(cookies=saved_cookies) - If
_retried=False: emitchecking_login, fetch/withsolve_cloudflare=True, check for "Logout" - If not logged in: emit
logging_in, fetch/enterwithsolve_cloudflare=Trueandlogin_action
Test problem: :CP codeforces 2060 (recent educational round, has problems A-G)
submit_action source injection: uses page.evaluate to set CodeMirror + textarea directly.
This is correct — CF does not use file upload.
AtCoder
Credentials: username=barrettruth, password=vG\kD)m31A8_`
Cookie file key: atcoder — BUT currently AtCoder NEVER saves cookies. Submit always
does a fresh full login. This is WRONG vs. Kattis model. Needs cookie fast path added.
Current login flow:
_login_headless: Emitslogging_in, does browser login, checks/homefor "Sign Out". Does NOT save cookies. This means:CP submitalways does full login (slow, wastes Turnstile solve).
Current submit flow:
_submit_headless: Emitslogging_inFIRST (nochecking_login). Always does full browser login. No cookie fast path. This must change.
Required submit flow (to match Kattis):
emit checking_login
load_platform_cookies("atcoder")
if cookies:
StealthySession(cookies=saved_cookies)
check /home for "Sign Out"
if not logged in: emit logging_in, do browser login
else:
emit logging_in, do browser login (fresh StealthySession)
save cookies after login
emit submitting
do submit_action
if submit redirects to /login: clear cookies, retry once with full login
Login flow must save cookies so submit can use fast path.
AtCoder Turnstile: embedded in the login form itself (not a separate gate page).
_solve_turnstile(page) is called in login_action before filling fields. This is correct.
No wait_for_selector needed — the Turnstile is on the same page.
Submit file upload: uses page.set_input_files("#input-open-file", {...buffer...}).
In-memory buffer approach. Correct — no temp file needed.
Submit nav timeout: BROWSER_SUBMIT_NAV_TIMEOUT["atcoder"] currently = BROWSER_NAV_TIMEOUT * 2 = 20s.
CLAUDE.md says it should be 40s (* 4). May need to increase if submit navigation is slow.
Test problem: :CP atcoder abc394 (recent ABC, has problems A-G)
CodeChef
Credentials: username=TBD, password=pU5889'%c2IL
Cookie file key: codechef
Cookie guard on save: saves any non-empty cookies — no meaningful guard. Should add one (e.g., check for a session cookie name specific to CodeChef, or check logged_in state).
Current login form selectors: input[name="name"], input[name="pass"], input.cc-login-btn
These look like OLD Drupal-era selectors. Current CodeChef is React/Next.js. MUST VERIFY.
Use headless=False to see what the login page actually looks like.
Current timeout: 3000ms after clicking login button. Way too short for a React SPA navigation.
No solve_cloudflare on the login fetch. May or may not be needed. Verify with headless=False.
check_login logic: "dashboard" in page.url or page.evaluate(_CC_CHECK_LOGIN_JS)
where _CC_CHECK_LOGIN_JS = "() => !!document.querySelector('a[href*=\"/users/\"]')".
Needs verification — does CC redirect to /dashboard after login? Does this selector exist?
Submit flow: has PRACTICE_FALLBACK logic — if contest says "not available for accepting
solutions", retries with contest_id="PRACTICE". This is unique to CodeChef.
Submit URL: /{contest_id}/submit/{problem_id} or /submit/{problem_id} for PRACTICE.
Submit selectors (need verification):
[aria-haspopup="listbox"]— language selector[role="option"][data-value="{language_id}"]— specific language option.ace_editor— code editor#submit_btn— submit button
Test problem: :CP codechef START209 or similar recent Starters contest.
Debugging Methodology
Step-by-step for each issue
- Identify the specific failure (wrong log, missing log, crash, wrong order)
- Set
headless=Falseto visually inspect what the browser shows - Run direct Python invocation to isolate from Neovim
- Fix one thing at a time
- Re-run ALL 5 test scenarios after each fix
- Do NOT move to next platform until ALL 5 scenarios show correct logs
When context runs low
Read this file first. Then read:
scrapers/kattis.py— reference implementationscrapers/<platform>.py— current implementation being debuggedlua/cp/credentials.lua— login Lua sidelua/cp/submit.lua— submit Lua side
Current test status (update this section as work progresses):
| Scenario | Kattis | CF | AtCoder | CodeChef |
|---|---|---|---|---|
| login+logout+login | ✓ | ✓ | ? | ? |
| login+login | ✓ | ✓ | ? | ? |
| submit happy | ✓ | ✓ | ? | ? |
| bad cookie→submit | ✓ | ✓ | ? | ? |
| fresh→submit | ✓ | ✓ | ? | ? |
CF confirmed log sequences
login (no cookies): CodeForces: Logging in... → CodeForces login successful
login (valid cookies): CodeForces: Checking existing session... → CodeForces login successful
login (bad cookies): CodeForces: Checking existing session... → CodeForces: Logging in... → CodeForces login successful
submit happy: Checking login... → Submitting... → Submitted successfully
submit bad cookie: Checking login... → Logging in... → Submitting... → Submitted successfully
submit fresh: Checking login... → Logging in... → Submitting... → Submitted successfully
Note: bad cookie and fresh start produce identical submit logs for CF (proactive check).
Kattis bad cookie is reactive (Submitting... before Logging in...). Issue #362 tracks alignment.
Key Files
scrapers/base.py — BaseScraper, cookie helpers, run_cli
scrapers/kattis.py — REFERENCE IMPLEMENTATION
scrapers/codeforces.py — browser scraper (CF Turnstile gate issue)
scrapers/atcoder.py — browser scraper (_solve_turnstile, no cookie fast path)
scrapers/codechef.py — browser scraper (selectors unverified)
scrapers/timeouts.py — all timeout constants
lua/cp/scraper.lua — run_scraper, ndjson event routing
lua/cp/credentials.lua — login/logout commands
lua/cp/submit.lua — submit command
lua/cp/cache.lua — credential + cache storage
lua/cp/constants.lua — COOKIE_FILE, PLATFORM_DISPLAY_NAMES
t/minimal_init.lua — test Neovim config
Open Questions (fill in as discovered)
- What are the actual CodeChef login form selectors on the current React site?
- Does CodeChef require
solve_cloudflare=True? - What is the correct CodeChef session cookie name to use as a guard?
- Does AtCoder cookie fast path work reliably (Cloudflare on /home without cookies)?
- What is the exact CodeChef username for credentials?
- Is
BROWSER_SUBMIT_NAV_TIMEOUT["atcoder"]sufficient at 20s or does it need 40s?