Skip to main content

Infrastructure Dashboard

The Infrastructure Dashboard is an admin page at /admin/infrastructure that provides a live health view of the platform’s GCP resources. It auto-detects the current environment and polls GCP APIs every 60 seconds. For background on the underlying infrastructure, see Infrastructure Overview.

At a Glance

SectionData sourceWhat it shows
Cloud Run ServicesCloud Run Admin API v2Health status, instance count, traffic %, latest revision
GKE ClustersGKE Container APICluster health, node count, ready nodes, master version
Tenant WorkloadsKubernetes API (GKE)Per-tenant namespace, deployment, ready/desired pod count, HTTPRoute hostnames
IAM & Service AccountsGCS manifestDeclared service accounts, their roles, and scope (shared / per-environment / per-tenant)
Cloud Tasks QueuesCloud Tasks APIQueue health, pending task count, oldest task age
All sections degrade gracefully — a failed GCP API call returns null for that section only; other sections continue to render.

Architecture

InfrastructurePage (Server Component)
  │  resolveEnvironment(process.env)
  │  → EnvironmentContext { id, label, projectId, region, manifestKey }

InfrastructureDashboard (Client Component)
  │  useInfrastructureStatus(env)
  │  → GET /api/infrastructure/status?env={id}  (every 60s)

/api/infrastructure/status (Next.js API Route)
  │  auth: GOOGLE_FIREBASE_ADMIN_KEY SA
  ├── GCS: gs://shokunin-480309-tfstate/infra-manifest/{env}.json  (declared state)
  ├── Cloud Run Admin API v2  (live service status)
  ├── GKE Container API       (live cluster status)
  └── Cloud Tasks API         (live queue stats)

Environment detection

The page detects which GCP environment it is serving by reading NEXT_PUBLIC_FIREBASE_DATABASE_ID at request time (Server Component):
FIREBASE_DATABASE_ID valueResolved environmentManifest key
"production"Productionproduction
"staging"Stagingstaging
"dev-*" or unsetDev-shareddev-shared
The environment label is displayed in the EnvironmentBanner component at the top of the page.

API route authentication

GET /api/infrastructure/status authenticates to GCP using the GOOGLE_FIREBASE_ADMIN_KEY service account JSON (also used for Firebase Admin SDK operations). It falls back to GOOGLE_SERVICE_ACCOUNT_KEY when the Firebase admin key is absent (local dev). The Firebase Admin SA requires three additional IAM roles beyond its Firebase permissions:
RoleUsed for
roles/run.viewerList Cloud Run services and fetch revision/instance metadata
roles/container.viewerRead GKE cluster status and node pool details
roles/cloudtasks.viewerQuery Cloud Tasks queue depth and state
These roles are granted by terraform/platform/modules/iam/main.tf and applied as part of the dev-shared, staging, and production Terraform environments.

GCS Infrastructure Manifest

Every Terraform environment root writes an infra-manifest/{key}.json file to gs://shokunin-480309-tfstate/ on every apply. The dashboard reads this file to populate the “declared” side of each section — the intended resource configuration before live status is overlaid.
Manifest keyWritten byContents
dev-sharedplatform/environments/dev-shared/manifest.tfGKE cluster, Gateway, Provisioner URL/queue, SA emails, secret IDs
stagingplatform/environments/staging/manifest.tfDolt VM, Beads API URL, SA emails
productionplatform/environments/production/manifest.tfDolt VM, Beads API URL, SA emails
tenant-{id}terraform/tenants/manifest.tfNamespace, HTTPRoute names, agent/opencode endpoint URLs, workshop GSA email
The GcpInfraManifest TypeScript type in domains/infrastructure/types.ts defines an array-based schema (cloudRunServices[], gkeClusters[], queues[], serviceAccounts[]). The Terraform manifest files currently write a flat nested structure — aligning them to the typed schema is in progress. Until then, declared-resource sections fall back to empty arrays and the dashboard displays live GCP API data only.

Dashboard Components

All components live in app/(platform)/admin/infrastructure/components/.
ComponentPropsDescription
EnvironmentBannerenvironment, fetchedAt, onRefresh, isRefreshingSticky banner showing environment label, last-fetch timestamp, and a manual refresh button
CloudRunCarddeclaration, statusAnimated arc gauge showing instance count; health pill; revision/traffic details
GkeClusterCarddeclaration, statusNode count, ready node count, Kubernetes master version, raw status
TenantWorkloadsTableworkloadsTable of all tenant namespaces with deployment name, ready/desired pod count, and HTTPRoute hostname chips
IamSectionserviceAccountsCollapsible table of declared service accounts, their email addresses, scope badge, and IAM roles
CloudTasksCarddeclaration, statusQueue health, pending task count, oldest task age
ResourceSkeletonLoading skeleton variants matching each card’s dimensions
StatusPulsestatusAnimated pulsing dot for health state (healthy / degraded / unreachable / loading)

Health states

All live GCP resources report one of four health states:
StateMeaning
healthyResource available and operating normally
degradedResource exists but is experiencing issues (e.g. Cloud Run CONDITION_FAILED)
unreachableResource could not be reached or the GCP API call failed
loadingStatus not yet fetched (client-side initial render only)

Domain Layer

The Infrastructure Domain (domains/infrastructure/) provides the data layer for the dashboard.

GcpStatusRepository

Provider-agnostic interface (repositories/gcp-status-repository.ts):
interface GcpStatusRepository {
  getStatus(env: EnvironmentContext): Promise<InfrastructureStatus>;
}
The HTTP implementation (repositories/http/gcp-status-repository.ts) calls GET /api/infrastructure/status from the browser. Follow the factory function pattern established in domains/workshop/repositories/ when adding new implementations.

useInfrastructureStatus

const { status, loading, error, refresh } = useInfrastructureStatus(env);
Fetches on mount, auto-refreshes every 60 seconds, clears the interval on unmount. Manages loading and error state separately so the dashboard can show stale data alongside an error banner.

resolveEnvironment

Pure function (utils/environment-resolver.ts) — accepts a process.env snapshot and returns EnvironmentContext. Fully unit-tested via environment-resolver.test.ts.

manifest-parser.ts

Pure stateless utility functions for querying a GcpInfraManifestgetCloudRunDeclarations, getServiceAccounts, etc. No I/O, no side effects, fully unit-testable.

Adding a New Dashboard Section

  1. Add a new declaration type to domains/infrastructure/types.ts
  2. Add the live status type to the same file
  3. Add the declaration to GcpInfraManifest and the status to InfrastructureStatus
  4. Add a fetch function in app/api/infrastructure/status/route.ts (follow the per-source try/catch pattern)
  5. Add the Terraform manifest field in all three environment manifest.tf files
  6. Create a new card component in app/(platform)/admin/infrastructure/components/
  7. Wire it into InfrastructureDashboard

For the GCP resource provisioning flow (how workshop containers are created), see Infrastructure Overview — Workshop provisioning flow. For IAM and secrets, see Secrets & Identity Management.