Skip to main content

Infrastructure Playbook

End-to-end runbook for building, deploying, running, and verifying PQ Signer in a custodian's own AWS account.

Target topology

custodian client → Gateway EC2 → mTLS → Nitro host-agent → vsock → Nitro Enclave
  • The Gateway owns the public API and DynamoDB access.
  • The Nitro host owns the KMS relay role and has no DynamoDB access.
  • The enclave owns ML-DSA key generation, key unwrapping, signing, and birth-attestation production.

Required network shape, enforced by Terraform:

  • Gateway ingress is limited to custodian-controlled source CIDRs or private custodian workloads.
  • Gateway egress to the host-agent is limited to Nitro host TCP 8443.
  • Nitro host ingress on TCP 8443 is limited to the Gateway security group.
  • Nitro host has no public ingress.

1. Prerequisites

On the build / Nitro host (Linux, Nitro-capable):

  • Rust toolchain from rust-toolchain.toml.
  • cargo, docker, nitro-cli, jq, openssl, aws CLI.
  • The Nitro Enclaves allocator service.

On the operator workstation:

  • AWS credentials for the customer account.
  • Terraform.
  • jq, curl, base64, scp, ssh.

2. Build the binaries and the signed EIF

Run on a Nitro-capable Linux build host. This can be the same EC2 instance that will run the enclave.

cd /home/ec2-user/pq-signer
cargo build --release --workspace

Install or generate the EIF signing material:

mkdir -p infra/dev-cert
openssl genrsa -out infra/dev-cert/eif-signing-key.pem 4096
openssl req -new -x509 \
-key infra/dev-cert/eif-signing-key.pem \
-out infra/dev-cert/eif-signing-cert.pem \
-days 30 \
-subj "/CN=PQ Signer EIF"

Keep the EIF signing key under custodian control. If the signing certificate changes, PCR8 changes and the KMS policy must be updated to match.

Build the signed EIF and the release measurement manifest:

export AWS_REGION=eu-central-1
make eif
make pcr
make manifest
jq . docs/release/measurement-manifest.dev.json

Confirm the EIF is signed and PCR8 exists:

jq '.IsSigned, .Measurements.PCR8' target/nitro/pcrs.json
# Expected:
# true
# "<PCR8 hex>"

Build outputs:

  • Gateway binary: target/release/pq-signer-gateway
  • Host-agent binary: target/release/pq-signer-host-agent
  • Enclave EIF: target/nitro/pq-signer-enclave.eif
  • Measurement manifest: docs/release/measurement-manifest.dev.json

3. Generate Gateway-to-host mTLS material

Generate one internal CA plus a Gateway client certificate and a host-agent server certificate. The host-agent certificate must include the private DNS name or private IP the Gateway uses as --host.

PQ_SIGNER_HOST_AGENT_DNS="pq-signer-host-agent.internal" \
PQ_SIGNER_HOST_AGENT_IP="<nitro-host-private-ip>" \
sh scripts/generate-dev-mtls-certs.sh infra/dev-cert

Install on Gateway EC2:

infra/dev-cert/internal-ca.pem
infra/dev-cert/gateway-client.pem
infra/dev-cert/gateway-client-key.pem

Install on Nitro host EC2:

infra/dev-cert/internal-ca.pem
infra/dev-cert/host-agent-server.pem
infra/dev-cert/host-agent-server-key.pem

Protect private keys:

chmod 600 infra/dev-cert/*-key.pem

Rotation is manual: generate a new CA and replacement certs, deploy the new trust bundle to both instances, restart the host-agent and Gateway, and remove old private keys from disk. Custodian CA integration and automated issuance/revocation are tracked separately.

4. Deploy AWS infrastructure with Terraform

Use the Terraform template under infra/. It creates:

  • the split Gateway / Nitro-host topology,
  • IAM roles for the Gateway and the Nitro host,
  • the DynamoDB key table with user_id as a string partition key,
  • the wrapping CMK with manifest-fed PCR conditions and Recipient requirements,
  • security groups and a CloudTrail evidence trail,
  • the Nitro host parent configuration: KMS vsock-proxy allowlisting kms.<region>.amazonaws.com:443 on fixed port 8000, plus the credential forwarder on fixed port 8001.

Populate infra/terraform.tfvars from the signed EIF manifest:

aws_region = "eu-central-1"
ami_id = "ami-REPLACE_ME"
operator_admin_arn = "arn:aws:iam::<account-id>:role/<operator-role>"
manifest_pcr0 = "<PCR0 from docs/release/measurement-manifest.dev.json>"
manifest_pcr1 = "<PCR1 from docs/release/measurement-manifest.dev.json>"
manifest_pcr2 = "<PCR2 from docs/release/measurement-manifest.dev.json>"
manifest_pcr8 = "<PCR8 from docs/release/measurement-manifest.dev.json>"
custodian_ingress_cidrs = ["<custodian egress cidr>/32"]

Apply:

cd infra
terraform init
terraform plan
terraform apply

Record outputs:

terraform output dynamodb_table_name
terraform output wrapping_key_arn
terraform output gateway_role_arn
terraform output nitro_host_role_arn

Verify the DynamoDB table uses string user_id:

aws dynamodb describe-table \
--region eu-central-1 \
--table-name <table> \
--query 'Table.AttributeDefinitions'
# Expected:
# [{"AttributeName":"user_id","AttributeType":"S"}]

5. Launch the enclave

On the Nitro host, reserve enclave memory and CPU:

sudo sh -c 'printf "%s\n" "---" "memory_mib: 4096" "cpu_count: 2" > /etc/nitro_enclaves/allocator.yaml'
sudo systemctl restart nitro-enclaves-allocator.service
sudo systemctl status nitro-enclaves-allocator.service --no-pager

Launch the signed EIF without debug mode:

cd /home/ec2-user/pq-signer
nitro-cli run-enclave \
--eif-path target/nitro/pq-signer-enclave.eif \
--cpu-count 2 \
--memory 4096 \
--enclave-cid 16

Verify it is running:

nitro-cli describe-enclaves

If memory allocation fails (Nitro error E27), lower both the allocator memory_mib and --memory to an instance-appropriate value such as 2048.

6. Start the host-agent

On the Nitro host:

cd /home/ec2-user/pq-signer
export AWS_REGION=eu-central-1

./target/release/pq-signer-host-agent serve-mtls \
--cid 16 \
--port 5005 \
--listen-port 8443 \
--client-ca infra/dev-cert/internal-ca.pem \
--server-cert infra/dev-cert/host-agent-server.pem \
--server-key infra/dev-cert/host-agent-server-key.pem

Keep this process running under the customer's process manager of choice (systemd, tmux, or equivalent).

serve-mtls must remain reachable only from the Gateway security group. It is not hardened against slow or stalled peers; production deployments must add concurrency and deadlines or replace this path.

7. Start the Gateway

On the Gateway EC2, ensure the host-agent name resolves to the Nitro host private IP if DNS is not already configured:

echo "<nitro-host-private-ip> pq-signer-host-agent.internal" | sudo tee -a /etc/hosts
getent hosts pq-signer-host-agent.internal

Start the Gateway:

cd /home/ec2-user/pq-signer
export AWS_REGION=eu-central-1
export DYNAMODB_TABLE="<terraform dynamodb_table_name>"
export KMS_KEY_ARN="<terraform wrapping_key_arn>"

./target/release/pq-signer-gateway serve-http \
--bind 0.0.0.0:8080 \
--host pq-signer-host-agent.internal \
--table "$DYNAMODB_TABLE" \
--ca infra/dev-cert/internal-ca.pem \
--client-cert infra/dev-cert/gateway-client.pem \
--client-key infra/dev-cert/gateway-client-key.pem \
--kms-key-arn "$KMS_KEY_ARN"

For production integration, place the HTTP endpoint behind the custodian's normal TLS and auth layer. Do not expose serve-http directly as a public or broadly reachable API.

8. Smoke test

8.1 Key generation

From an allowed client network:

export GATEWAY_URL="http://<gateway-public-ip>:8080/v1/pq-signer"
export USER_ID="customer-demo-$(date +%s)"

jq -n --arg user_id "$USER_ID" '{
request_type: "key_generation",
payload: { alg: "ML-DSA-44", user_id: $user_id }
}' > /tmp/pq-keygen.json

curl -sS -D /tmp/pq-keygen.headers -o /tmp/pq-keygen.json.out \
-H 'content-type: application/json' \
--data-binary @/tmp/pq-keygen.json \
"$GATEWAY_URL"

cat /tmp/pq-keygen.headers
jq . /tmp/pq-keygen.json.out

Expected:

{
"user_id": "customer-demo-...",
"mldsa_pubkey": "<base64>",
"birth_attestation": "<base64 COSE_Sign1>",
"enclave_version": "0.1.0"
}

8.2 Signing

MESSAGE_B64="$(printf 'PQ Signer smoke message' | base64)"
USER_ID="$(jq -r '.user_id' /tmp/pq-keygen.json.out)"

jq -n --arg user_id "$USER_ID" --arg message "$MESSAGE_B64" '{
request_type: "sign",
payload: {
alg: "ML-DSA-44",
user_id: $user_id,
message: $message,
request_attestation: false
}
}' > /tmp/pq-sign.json

curl -sS -D /tmp/pq-sign.headers -o /tmp/pq-sign.json.out \
-H 'content-type: application/json' \
--data-binary @/tmp/pq-sign.json \
"$GATEWAY_URL"

cat /tmp/pq-sign.headers
jq . /tmp/pq-sign.json.out

Expected:

{
"mldsa_signature": "<base64>",
"mldsa_public_key": "<base64>"
}

8.3 Verify the birth attestation

Birth-attestation verification proves the keygen response was created by an enclave whose PCRs match the custodian-approved release manifest and whose user_data matches the persisted key-record metadata.

Build pq-attest on the verifier machine:

cargo build --release -p pq-attest

Save the response attestation:

jq -r '.birth_attestation' /tmp/pq-keygen.json.out > /tmp/birth-attestation.b64

Export the matching DynamoDB record metadata. This includes wrapped_dk, which is not returned by the public keygen API:

USER_ID="$(jq -r '.user_id' /tmp/pq-keygen.json.out)"
aws dynamodb get-item \
--region eu-central-1 \
--table-name "$DYNAMODB_TABLE" \
--key "{\"user_id\":{\"S\":\"$USER_ID\"}}" \
--consistent-read \
--output json > /tmp/key-record.json

jq '{
mldsa_pubkey: .Item.mldsa_pubkey.B,
wrapped_dk: .Item.wrapped_dk.B,
user_id: .Item.user_id.S,
kms_key_arn: .Item.kms_key_arn.S,
alg: .Item.alg.S,
created_at: (.Item.created_at.N | tonumber)
}' /tmp/key-record.json > /tmp/birth-metadata.json

Verify with the exact manifest produced from the signed EIF:

./target/release/pq-attest verify \
--birth-attestation /tmp/birth-attestation.b64 \
--manifest docs/release/measurement-manifest.dev.json \
--metadata /tmp/birth-metadata.json

Output includes verified_pcrs, user_data_sha256, and nitro_root_der_sha256. Any PCR mismatch, certificate-chain failure, or metadata mismatch causes pq-attest to exit non-zero.

9. KMS boundary checks

Run the integrated happy-path check from the Gateway EC2:

./target/release/pq-signer-gateway m2-live-happy-path \
--host pq-signer-host-agent.internal \
--table "$DYNAMODB_TABLE" \
--ca infra/dev-cert/internal-ca.pem \
--client-cert infra/dev-cert/gateway-client.pem \
--client-key infra/dev-cert/gateway-client-key.pem \
--kms-key-arn "$KMS_KEY_ARN"

Then run the three KMS-boundary checks to prove the policy enforces its invariants.

A. Gateway role cannot call the wrapping CMK directly:

pq-signer-host-agent m3-kms-boundary-checks \
--mode gateway-iam \
--kms-key-arn "$KMS_KEY_ARN" \
--wrapped-dk-base64 <base64>
# Expected: AccessDenied / AccessDeniedException

B. Nitro host parent role cannot get plaintext without Recipient:

pq-signer-host-agent m3-kms-boundary-checks \
--mode nitro-host-parent \
--kms-key-arn "$KMS_KEY_ARN" \
--wrapped-dk-base64 <base64> \
--user-id <opaque-user-id>
# Expected: key-policy denial for missing Recipient

C. EncryptionContext is enforced (missing → policy denial, wrong → KMS InvalidCiphertextException, valid attested recipient → CiphertextForRecipient and never plaintext):

pq-signer-host-agent m3-kms-boundary-checks \
--mode encryption-context \
--kms-key-arn "$KMS_KEY_ARN" \
--wrapped-dk-base64 <base64> \
--user-id <opaque-user-id> \
--recipient-attestation <file>

CMK policy summary. The CMK policy permits the Nitro host/enclave role only when all of the following are true:

  • kms:RecipientAttestation:ImageSha384 equals manifest PCR0.
  • kms:RecipientAttestation:PCR1 equals manifest PCR1.
  • kms:RecipientAttestation:PCR2 equals manifest PCR2.
  • kms:RecipientAttestation:PCR8 equals manifest PCR8.
  • kms:EncryptionContextKeys is exactly user_id.
  • kms:EncryptionContext:user_id is present.

The policy also explicitly denies the Nitro host/enclave role kms:Decrypt and kms:GenerateDataKey when Recipient is missing, blocking direct parent-role calls even though IAM allows the API.

10. Updating and rolling back

Update procedure:

  1. Rebuild a signed replacement EIF using the build steps in §2.
  2. Capture PCRs and write a new release measurement manifest from that exact EIF.
  3. Compare the new manifest against the currently deployed manifest.
  4. Update Terraform manifest PCR variables (manifest_pcr0/1/2/8).
  5. terraform apply with the operator/deploy principal.
  6. Deploy the replacement EIF to the Nitro host.
  7. Restart the enclave, host-agent, and Gateway processes.
  8. Re-run:
    cargo fmt --check
    cargo clippy -- -D warnings
    cargo test
    make m3-clean-release-evidence
  9. Re-run the live KMS boundary checks (§9).

Rollback procedure: restore both the previous EIF and the previous manifest PCR values together. Do not run a previous EIF under a CMK policy generated from a different manifest, and do not run a new EIF under an older PCR-bound policy.

11. Troubleshooting

  • key_generation returns 500. Check Gateway logs, host-agent reachability on TCP 8443, KMS PCR policy, and DynamoDB table schema. The table key attribute must be S, not B.
  • pq-attest fails with PCR mismatch. Verify the manifest came from the exact signed EIF currently running and that the enclave was not launched in debug mode.
  • mTLS failures. Regenerate certs with the Gateway --host name/IP in the host-agent certificate SAN and deploy the same internal-ca.pem to both instances.
  • Nitro E27 insufficient memory. Adjust /etc/nitro_enclaves/allocator.yaml and match nitro-cli run-enclave --memory.
  • KMS InvalidCiphertextException on Decrypt. Confirm EncryptionContext.user_id matches the value bound at GenerateDataKey time and that the wrapped DK is the one from the DynamoDB row for that user_id.
  • AccessDenied on the Gateway role calling KMS directly. Expected. The Gateway has an explicit IAM deny on the wrapping CMK; only the enclave, via the host-agent relay with a valid Recipient, can use it.

12. Evidence artifacts

After the procedures above, the deployment produces:

  • Signed EIF: target/nitro/pq-signer-enclave.eif
  • PCR capture: target/nitro/pcrs.json
  • Release manifest: docs/release/measurement-manifest.dev.json
  • Clean-build comparison: docs/release/m3/clean-build-comparison.json
  • CloudTrail KMS trail: created by Terraform; contains every Decrypt and GenerateDataKey call against the wrapping CMK.
  • Birth attestations: one COSE_Sign1 document per keypair, persisted in the birth_attestation column of the DynamoDB key table.

These artifacts are the substrate for customer-side compliance attestations and for post-incident review.