DataOps is the combined DataTalks.Club operations portal.
Version 1 focuses on operations docs and tasks:
- process docs, SOPs, templates, references, playbooks, prompts, and search
- task workflows, bundles, recurring work, required links, and execution state
- AWS Lambda deployment with GitHub Actions OIDC
The deployed V1 app is the DataTalks.Club operations portal for
ops.dtcdev.click, served from the DataTalksClub/dataops repository and the
dataops-v1 stack. The work engine lives under work-engine/ as an internal
DataOps task/workflow module, and the podcast assistant lives under
assistants/podcast/ as a DataOps assistant module for the podcast operations
workflow.
content/— operational documentation (SOPs, templates, references, playbooks, prompts) and its image assets.work-engine/— DataOps task/workflow execution module.assistants/podcast/— DataOps podcast workflow assistant module, process docs, guest-intake template, knowledge-base builder, and tests.docs/— repo-meta docs (this README,STRUCTURE.md,sop-format*.md, archived materials)._docs/— DataOps merge plan, development process, and planning notes.frontend/— static vanilla-JS app and its Dockerfile.lambda-functions/— AWS Lambda backend (docs API + search index).scripts/— repo tooling (SOP parser/linter/normalizer, migration scripts, the dev server).
Use the root Makefile as the preferred command surface:
make help
make setupFor the full local Operations Home path, run the long-lived servers in separate terminals:
make dev-work-engine
WORK_ENGINE_DEV_URL=http://127.0.0.1:3000 make dev-docs
make dev-frontendThe Docker Compose stack remains available through the Makefile:
make dev-compose
# Frontend: http://127.0.0.1:5173
# Lambda upstream: http://127.0.0.1:8787The frontend container proxies /docs, /search, /health, /images,
/folders, and /lint to the lambda container, and exposes its own
/git/status and /git/commit so the UI can commit + push using the host's
SSH key (mounted read-only).
make help lists the current setup, dev, search, SOP lint, test, SAM
validation/build, and CI-parity targets. These targets are thin wrappers around
the package-local commands documented below so failures stay visible and package
commands remain useful for troubleshooting.
Common verification targets:
make search-index
make test-docs
make seed-work-engine
make test-work-engine
make typecheck-work-engine
make build-work-engine
make test-work-engine-e2e
make test-assistant
make smoke-docs
make sam-validate
make sam-build
make ciRun make sop-lint FILES="content/path/to/sop.md" for marked SOP files.
make sam-validate is local template validation only: it uses empty AWS config
and credentials files under .tmp/aws-empty/, disables EC2 metadata lookup, and
does not require live AWS credentials or run sam deploy.
DataOps uses npm workspaces from the repository root for Node tooling. The
current workspace is work-engine/; frontend/ remains the static vanilla-JS
portal shell served by the Python Lambda app and does not have a separate Node
package or build pipeline in this milestone.
Install Node dependencies from the repo root:
npm ciCommon work-engine lifecycle commands are available from the root:
npm run dev:work-engine
npm run test:work-engine
npm run test:e2e:work-engine
npm run typecheck:work-engine
npm run build:work-engine
npm run seed:work-engine
npm run export:templates:work-engine
npm run validate:export:work-engine -- <export-dir>
npm run dry-run:import:work-engine -- <export-dir>
npm run clean:work-engineThe Makefile uses these root workspace scripts where they are the daily entry
point, while work-engine test, typecheck, build, and E2E targets continue to
preserve the package-local npm --prefix work-engine ... commands required for
debugging and CI parity.
package-lock.json at the repo root is the npm lockfile for the workspace. The
work-engine Lambda packaging, CI cache, and Docker Lambda image all use that
root lockfile; there is no nested work-engine/package-lock.json.
The deployed v1 app is a single protected Lambda app. It serves the frontend,
the docs API, GitHub-backed content editing, and same-origin search from one
Lambda Function URL. GitHub is the source of truth for markdown content;
Lambda keeps a /tmp cache and rebuilds the minsearch index from that cache.
flowchart LR
Browser[Browser] -->|Basic auth| App[Full docs Lambda]
App --> Frontend[Static frontend]
App --> Api[Docs API]
App --> Search[Search handler]
Api --> Cache[/Lambda /tmp content cache/]
Search --> Index[/Lambda /tmp minsearch index/]
App -->|GitHub API| GitHub[(GitHub repo content/)]
App -->|runtime secrets| Secrets[AWS Secrets Manager]
Content changes made in the UI are committed directly to GitHub by Lambda through the GitHub Contents API. After a successful save, Lambda refreshes its GitHub tree cache and rebuilds the local search index. There is no separate SQLite service and no EFS filesystem in the current design.
CI/CD is split by lifecycle:
content/**changes run content validation, search-index smoke tests, and refresh the deployed Lambda cache without redeploying code.- app, Lambda, infra, and test changes run the full test/build/deploy workflow.
- GitHub Actions deploys through an AWS OIDC role managed by CloudFormation.
- Runtime secrets live in AWS Secrets Manager, not in GitHub Actions secrets.
For the inherited docs-app architecture, see docs/architecture.md.
- Opens every SOP in a block view — Section / Group / Step / Free-form / Screenshot / TODO. Click any text to edit inline; Cmd/Ctrl+Enter to commit.
- Add, delete, drag-reorder steps; cross-group moves; convert flat ↔ grouped
procedures; renumber automatically. Roundtrips through
sop_lint.py. - Image upload via file picker, drag-and-drop onto a step, or clipboard paste.
- Frontmatter editor:
doc_type, summary, tags, systems (chips). - Pending-changes panel aggregates every local draft; Save all from the sidebar, then review lint and commit from the publish dialog.
- Search (server-side), tag/system/domain/type filters, quick-nav palette
(
Cmd/Ctrl+P), sidebar tree filter, "Recently edited" + "Pinned" sections. - Diff view between draft and saved version. Lint dashboard for the whole corpus in the publish dialog. Loom + YouTube + Vimeo embeds. Lightbox for screenshots.
- Dark mode, resizable sidebar, mobile layout.
/focuses sidebar search.Cmd/Ctrl + Kfocuses search from anywhere.Cmd/Ctrl + Popens quick navigation.Cmd/Ctrl + Ssaves the current doc.Cmd/Ctrl + Shift + Ssaves all drafts.Cmd/Ctrl + Entercommits an inline edit.Esccancels inline edits and closes modals.?shows keyboard shortcut help.
Every SOP is structured-markdown with HTML-comment markers (<!-- sop-section-start: ... -->
etc.) — invisible on GitHub but machine-readable. See
docs/sop-format.md for the strict spec and
docs/sop-format-design.md for the design log.
Tooling:
scripts/sop_parse.py— parse a marked-up SOP → structured JSON.scripts/sop_lint.py— validate against the spec.scripts/sop_normalize.py— convert a legacy SOP into the marker format.
All three share their implementation with lambda-functions/src/lambda_functions/sop_parse.py
sop_lint.py, so the deployed Lambda enforces the same rules as the CLI.