Security Analysis of a Python Job Application Tracker
Overview
One of the best ways to sharpen your vulnerability analysis skills is to turn the lens on your own projects. I recently built a personal job application tracker in Python that automatically monitors my Gmail inbox, parses application status emails, stores records in a MySQL database, and visualizes everything in a Dash dashboard. It’s a practical tool — but it’s also a useful target for a real security assessment.
This post walks through applying the same vulnerability analysis methodology I use professionally (OWASP Top 10, threat modeling, CVSS-informed prioritization) to this project, covering real findings and concrete remediations.
Project Architecture
Before analyzing vulnerabilities, understanding the attack surface is essential.
Stack:
- Data layer: MySQL via SQLAlchemy + PyMySQL, seeded from CSV or live Gmail polling
- Email integration: Gmail API (OAuth 2.0) via
google-api-python-client - Analysis: Pandas, Scikit-learn, Jupyter Notebooks
- Visualization: Plotly Dash dashboard with Bootstrap components
- Config:
.envfile loaded viapython-dotenv
Data flow:
Gmail Inbox → email_monitor.py (OAuth) → email_parser.py (regex) → db.py (SQLAlchemy) → MySQL
CSV files → ingest.py (pandas) → db.py → MySQL
↓
dashboards/app.py (Dash)
Sensitive assets at a glance:
| Asset | Location | Sensitivity |
|—|—|—|
| DB credentials | .env | High |
| Gmail OAuth credentials | config/gmail_credentials.json | Critical |
| Gmail OAuth token | config/gmail_token.json | Critical |
| Application data (PII) | MySQL applications table | Medium |
| Raw CSV imports | data/raw/ | Medium |
Threat Model
Threat actors considered:
- Attacker with read access to the filesystem (e.g., misconfigured file share, stolen laptop)
- Attacker with network access to the MySQL host
- Attacker who obtains the repository (e.g., accidental public push)
Primary concerns:
- Credential / token exposure
- Unauthorized database access
- Vulnerable dependencies
- Dash dashboard without authentication
- Unsafe file ingestion
Finding 1 — Credential Exposure via Accidental Git Commit
OWASP: A02:2021 – Cryptographic Failures
Severity: Critical (CVSS 9.1)
The most dangerous finding has nothing to do with runtime behavior — it’s the risk of committing secrets to version control. The project stores three sensitive files that would be catastrophic if pushed to a public repository:
config/gmail_credentials.json ← Google OAuth client ID + secret
config/gmail_token.json ← Live access + refresh token for your Gmail account
.env ← DB_USER, DB_PASSWORD, DB_HOST
settings.py constructs the database URL directly from environment variables:
DATABASE_URL = (
f"mysql+pymysql://{DB_USER}:{DB_PASSWORD}"
f"@{DB_HOST}:{DB_PORT}/{DB_NAME}?charset=utf8mb4"
)
If DB_HOST has a hardcoded default pointing to a real host and that IP is reachable, an attacker with the .env file has full database access. If gmail_token.json leaks, they have read access to your entire Gmail inbox — including every job offer, salary negotiation, and personal email.
Remediations:
- Add all three to
.gitignoreimmediately:.env config/gmail_credentials.json config/gmail_token.json data/raw/ data/processed/ - Rotate the Google OAuth credentials in Google Cloud Console if there’s any chance they were ever committed.
- Remove the hardcoded
DB_HOSTdefault — fail loudly if the variable is absent rather than defaulting to a real IP:DB_HOST = os.getenv('DB_HOST') if not DB_HOST: raise EnvironmentError("DB_HOST is not set") - Consider using a secrets manager (Vault, AWS Secrets Manager, or even
keyring) rather than plaintext.envfiles for production use.
Finding 2 — Dash Dashboard Has No Authentication
OWASP: A01:2021 – Broken Access Control
Severity: High (CVSS 7.5)
The Dash application (dashboards/app.py) runs as a standard Flask/Dash server. By default, Dash exposes the app on 0.0.0.0 with no authentication layer. Anyone on the local network who knows the port can view all job application data, including companies, roles, salary stages, and notes.
On a home network this may seem low risk, but consider:
- The MySQL host is at a static LAN IP — suggesting it’s a reachable home server
- A guest device or a compromised device on the same network gets full dashboard access
- If the machine is ever port-forwarded or on a VPN, exposure widens significantly
Remediations:
- Bind the Dash server to
127.0.0.1only for local use:app.run(host="127.0.0.1", port=8050, debug=False) - For network access, add HTTP Basic Auth using
dash-author Flask’sflask-login:from dash_auth import BasicAuth VALID_USERNAME_PASSWORD_PAIRS = {'admin': os.getenv('DASH_PASSWORD')} auth = BasicAuth(app, VALID_USERNAME_PASSWORD_PAIRS) - Add
DASH_PASSWORDto.envrather than hardcoding credentials.
Finding 3 — Unpinned Dependencies with Known Vulnerability Surface
OWASP: A06:2021 – Vulnerable and Outdated Components
Severity: Medium (CVSS 6.5)
requirements.txt lists every dependency without version pins:
pandas
sqlalchemy
pymysql
plotly
dash
scikit-learn
google-api-python-client
...
This means pip install -r requirements.txt always installs the latest version — which sounds safe but introduces real risks:
- A dependency may release a breaking or malicious version (supply chain attack)
- You cannot reproduce the exact environment used when the project was last known-good
pip auditcannot check for CVEs without pinned versions to look up
In early 2025, the requests package ecosystem saw several typosquatting incidents. The dash and flask ecosystem has had multiple CVEs related to CORS misconfigurations and debug mode exposure.
Remediations:
- Pin all dependencies after a clean install:
pip install -r requirements.txt pip freeze > requirements.txt - Run
pip auditor integratesafety checkinto your workflow:pip install pip-audit pip-audit -r requirements.txt - Use a virtual environment with a lockfile (Poetry or
pip-compilefrompip-tools) for reproducible installs.
Finding 4 — CSV Ingestion Without Input Validation
OWASP: A08:2021 – Software and Data Integrity Failures
Severity: Medium (CVSS 5.4)
ingest.py reads arbitrary CSV files from the filesystem and inserts their contents into MySQL. While the code does truncate string fields (e.g., company[:100]) and validates the status field against a whitelist, there are gaps:
def ingest_csv(filepath: str) -> int:
df = pd.read_csv(filepath) # No file type verification
df.columns = df.columns.str.strip() # Column names from file are trusted
...
execute("""
INSERT IGNORE INTO applications (date_applied, company, role, status, notes)
VALUES (:date_applied, :company, :role, :status, :notes)
""", { ... }) # Parameterized — SQL injection not possible
The SQL itself is safe — parameterized queries via SQLAlchemy’s text() with bound parameters prevent SQL injection. However:
- A crafted CSV could supply a
notesfield with 500 characters of XSS payload that renders in the Dash dashboard - A malicious CSV could supply a
date_appliedvalue that causes unexpected behavior in date parsing - There is no verification that the file is actually a CSV (e.g., a zip bomb or a file with a
.csvextension containing binary data)
Remediations:
- Sanitize the
notesfield — strip HTML tags before storage:import html notes = html.escape(str(row.get('notes', '')))[:500] - Validate date format strictly:
from datetime import datetime try: date_val = datetime.strptime(str(row['date_applied']), '%Y-%m-%d').date() except ValueError: continue # Skip malformed rows - Add a file size check before reading with pandas — prevent processing unexpectedly large files.
Finding 5 — OAuth Token Stored as World-Readable File (Partially Mitigated)
OWASP: A02:2021 – Cryptographic Failures
Severity: Medium (CVSS 5.9)
email_monitor.py saves the Gmail OAuth token after the first auth flow:
TOKEN_FILE.write_text(creds.to_json())
TOKEN_FILE.chmod(0o600) # ← Good: owner-only read/write
log.info("Token saved to %s", TOKEN_FILE)
The chmod(0o600) is a positive security control — it restricts the token file to the owner only. However:
- The token is stored in plaintext JSON — any process running as the same user can read it
- If the refresh token leaks, it can generate new access tokens indefinitely until revoked
- The token file path (
config/gmail_token.json) is inside the project directory, adjacent to code that may be shared or synced
Remediations:
- Store the token outside the project directory (e.g.,
~/.config/job_tracker/gmail_token.json) - Consider encrypting the token at rest using the system keychain:
import keyring keyring.set_password("job_tracker", "gmail_token", creds.to_json()) - Implement token revocation on application exit for sensitive workflows.
What Was Done Well
Not every finding is a problem — good security practices deserve recognition:
| Practice | Code Location | Why It Matters |
|---|---|---|
| Parameterized SQL queries | db.py, ingest.py |
Prevents SQL injection completely |
| Status field allowlist | ingest.py |
Prevents unexpected values from entering the DB |
OAuth gmail.readonly scope only |
email_monitor.py |
Limits blast radius if token is compromised |
Token file chmod(0o600) |
email_monitor.py |
Restricts filesystem access to owner |
.env for credentials |
config/settings.py |
Avoids hardcoding secrets in source |
| Field length truncation | ingest.py |
Prevents oversized inserts |
Prioritized Remediation Plan
| Priority | Finding | Effort | Impact |
|---|---|---|---|
| P0 | Add secrets to .gitignore, rotate if committed |
5 min | Critical |
| P1 | Bind Dash to 127.0.0.1 or add basic auth |
15 min | High |
| P2 | Pin dependencies + add pip-audit |
20 min | Medium |
| P3 | Sanitize notes field (HTML escape) |
10 min | Medium |
| P4 | Move token file outside project dir | 10 min | Medium |
| P5 | Add hard failure on missing env vars | 5 min | Low |
Lessons Learned
Personal projects are easy to treat as “low stakes” from a security perspective — but they often have the most sensitive access of anything you run. This tracker has OAuth access to a Gmail inbox containing salary negotiations, offer letters, and background check confirmations. That warrants the same rigor as any professional system.
The key takeaways from this analysis:
- Your
.gitignoreis a security control. Treat it as one. - Parameterized queries matter — even in personal projects, they’re the right habit.
- Principle of least privilege applies to OAuth scopes, not just IAM roles.
- Dependency pinning is both a stability and a security practice.
- Local network ≠ trusted network — bind services explicitly.
The same methodology used here — threat modeling, OWASP mapping, CVSS-informed prioritization, and concrete remediation steps — applies directly to enterprise vulnerability programs. The scale changes; the framework doesn’t.