Infrastructure Playbook
End-to-end runbook for building, deploying, running, and verifying PQ Signer in a custodian's own AWS account.
Target topology
custodian client → Gateway EC2 → mTLS → Nitro host-agent → vsock → Nitro Enclave
- The Gateway owns the public API and DynamoDB access.
- The Nitro host owns the KMS relay role and has no DynamoDB access.
- The enclave owns ML-DSA key generation, key unwrapping, signing, and birth-attestation production.
Required network shape, enforced by Terraform:
- Gateway ingress is limited to custodian-controlled source CIDRs or private custodian workloads.
- Gateway egress to the host-agent is limited to Nitro host TCP
8443. - Nitro host ingress on TCP
8443is limited to the Gateway security group. - Nitro host has no public ingress.
1. Prerequisites
On the build / Nitro host (Linux, Nitro-capable):
- Rust toolchain from
rust-toolchain.toml. cargo,docker,nitro-cli,jq,openssl,awsCLI.- The Nitro Enclaves allocator service.
On the operator workstation:
- AWS credentials for the customer account.
- Terraform.
jq,curl,base64,scp,ssh.
2. Build the binaries and the signed EIF
Run on a Nitro-capable Linux build host. This can be the same EC2 instance that will run the enclave.
cd /home/ec2-user/pq-signer
cargo build --release --workspace
Install or generate the EIF signing material:
mkdir -p infra/dev-cert
openssl genrsa -out infra/dev-cert/eif-signing-key.pem 4096
openssl req -new -x509 \
-key infra/dev-cert/eif-signing-key.pem \
-out infra/dev-cert/eif-signing-cert.pem \
-days 30 \
-subj "/CN=PQ Signer EIF"
Keep the EIF signing key under custodian control. If the signing certificate changes, PCR8 changes and the KMS policy must be updated to match.
Build the signed EIF and the release measurement manifest:
export AWS_REGION=eu-central-1
make eif
make pcr
make manifest
jq . docs/release/measurement-manifest.dev.json
Confirm the EIF is signed and PCR8 exists:
jq '.IsSigned, .Measurements.PCR8' target/nitro/pcrs.json
# Expected:
# true
# "<PCR8 hex>"
Build outputs:
- Gateway binary:
target/release/pq-signer-gateway - Host-agent binary:
target/release/pq-signer-host-agent - Enclave EIF:
target/nitro/pq-signer-enclave.eif - Measurement manifest:
docs/release/measurement-manifest.dev.json
3. Generate Gateway-to-host mTLS material
Generate one internal CA plus a Gateway client certificate and a host-agent server certificate. The host-agent certificate must include the private DNS name or private IP the Gateway uses as --host.
PQ_SIGNER_HOST_AGENT_DNS="pq-signer-host-agent.internal" \
PQ_SIGNER_HOST_AGENT_IP="<nitro-host-private-ip>" \
sh scripts/generate-dev-mtls-certs.sh infra/dev-cert
Install on Gateway EC2:
infra/dev-cert/internal-ca.pem
infra/dev-cert/gateway-client.pem
infra/dev-cert/gateway-client-key.pem
Install on Nitro host EC2:
infra/dev-cert/internal-ca.pem
infra/dev-cert/host-agent-server.pem
infra/dev-cert/host-agent-server-key.pem
Protect private keys:
chmod 600 infra/dev-cert/*-key.pem
Rotation is manual: generate a new CA and replacement certs, deploy the new trust bundle to both instances, restart the host-agent and Gateway, and remove old private keys from disk. Custodian CA integration and automated issuance/revocation are tracked separately.
4. Deploy AWS infrastructure with Terraform
Use the Terraform template under infra/. It creates:
- the split Gateway / Nitro-host topology,
- IAM roles for the Gateway and the Nitro host,
- the DynamoDB key table with
user_idas a string partition key, - the wrapping CMK with manifest-fed PCR conditions and
Recipientrequirements, - security groups and a CloudTrail evidence trail,
- the Nitro host parent configuration: KMS
vsock-proxyallowlistingkms.<region>.amazonaws.com:443on fixed port8000, plus the credential forwarder on fixed port8001.
Populate infra/terraform.tfvars from the signed EIF manifest:
aws_region = "eu-central-1"
ami_id = "ami-REPLACE_ME"
operator_admin_arn = "arn:aws:iam::<account-id>:role/<operator-role>"
manifest_pcr0 = "<PCR0 from docs/release/measurement-manifest.dev.json>"
manifest_pcr1 = "<PCR1 from docs/release/measurement-manifest.dev.json>"
manifest_pcr2 = "<PCR2 from docs/release/measurement-manifest.dev.json>"
manifest_pcr8 = "<PCR8 from docs/release/measurement-manifest.dev.json>"
custodian_ingress_cidrs = ["<custodian egress cidr>/32"]
Apply:
cd infra
terraform init
terraform plan
terraform apply
Record outputs:
terraform output dynamodb_table_name
terraform output wrapping_key_arn
terraform output gateway_role_arn
terraform output nitro_host_role_arn
Verify the DynamoDB table uses string user_id:
aws dynamodb describe-table \
--region eu-central-1 \
--table-name <table> \
--query 'Table.AttributeDefinitions'
# Expected:
# [{"AttributeName":"user_id","AttributeType":"S"}]
5. Launch the enclave
On the Nitro host, reserve enclave memory and CPU:
sudo sh -c 'printf "%s\n" "---" "memory_mib: 4096" "cpu_count: 2" > /etc/nitro_enclaves/allocator.yaml'
sudo systemctl restart nitro-enclaves-allocator.service
sudo systemctl status nitro-enclaves-allocator.service --no-pager
Launch the signed EIF without debug mode:
cd /home/ec2-user/pq-signer
nitro-cli run-enclave \
--eif-path target/nitro/pq-signer-enclave.eif \
--cpu-count 2 \
--memory 4096 \
--enclave-cid 16
Verify it is running:
nitro-cli describe-enclaves
If memory allocation fails (Nitro error E27), lower both the allocator memory_mib and --memory to an instance-appropriate value such as 2048.
6. Start the host-agent
On the Nitro host:
cd /home/ec2-user/pq-signer
export AWS_REGION=eu-central-1
./target/release/pq-signer-host-agent serve-mtls \
--cid 16 \
--port 5005 \
--listen-port 8443 \
--client-ca infra/dev-cert/internal-ca.pem \
--server-cert infra/dev-cert/host-agent-server.pem \
--server-key infra/dev-cert/host-agent-server-key.pem
Keep this process running under the customer's process manager of choice (systemd, tmux, or equivalent).
serve-mtlsmust remain reachable only from the Gateway security group. It is not hardened against slow or stalled peers; production deployments must add concurrency and deadlines or replace this path.
7. Start the Gateway
On the Gateway EC2, ensure the host-agent name resolves to the Nitro host private IP if DNS is not already configured:
echo "<nitro-host-private-ip> pq-signer-host-agent.internal" | sudo tee -a /etc/hosts
getent hosts pq-signer-host-agent.internal
Start the Gateway:
cd /home/ec2-user/pq-signer
export AWS_REGION=eu-central-1
export DYNAMODB_TABLE="<terraform dynamodb_table_name>"
export KMS_KEY_ARN="<terraform wrapping_key_arn>"
./target/release/pq-signer-gateway serve-http \
--bind 0.0.0.0:8080 \
--host pq-signer-host-agent.internal \
--table "$DYNAMODB_TABLE" \
--ca infra/dev-cert/internal-ca.pem \
--client-cert infra/dev-cert/gateway-client.pem \
--client-key infra/dev-cert/gateway-client-key.pem \
--kms-key-arn "$KMS_KEY_ARN"
For production integration, place the HTTP endpoint behind the custodian's normal TLS and auth layer. Do not expose serve-http directly as a public or broadly reachable API.
8. Smoke test
8.1 Key generation
From an allowed client network:
export GATEWAY_URL="http://<gateway-public-ip>:8080/v1/pq-signer"
export USER_ID="customer-demo-$(date +%s)"
jq -n --arg user_id "$USER_ID" '{
request_type: "key_generation",
payload: { alg: "ML-DSA-44", user_id: $user_id }
}' > /tmp/pq-keygen.json
curl -sS -D /tmp/pq-keygen.headers -o /tmp/pq-keygen.json.out \
-H 'content-type: application/json' \
--data-binary @/tmp/pq-keygen.json \
"$GATEWAY_URL"
cat /tmp/pq-keygen.headers
jq . /tmp/pq-keygen.json.out
Expected:
{
"user_id": "customer-demo-...",
"mldsa_pubkey": "<base64>",
"birth_attestation": "<base64 COSE_Sign1>",
"enclave_version": "0.1.0"
}
8.2 Signing
MESSAGE_B64="$(printf 'PQ Signer smoke message' | base64)"
USER_ID="$(jq -r '.user_id' /tmp/pq-keygen.json.out)"
jq -n --arg user_id "$USER_ID" --arg message "$MESSAGE_B64" '{
request_type: "sign",
payload: {
alg: "ML-DSA-44",
user_id: $user_id,
message: $message,
request_attestation: false
}
}' > /tmp/pq-sign.json
curl -sS -D /tmp/pq-sign.headers -o /tmp/pq-sign.json.out \
-H 'content-type: application/json' \
--data-binary @/tmp/pq-sign.json \
"$GATEWAY_URL"
cat /tmp/pq-sign.headers
jq . /tmp/pq-sign.json.out
Expected:
{
"mldsa_signature": "<base64>",
"mldsa_public_key": "<base64>"
}
8.3 Verify the birth attestation
Birth-attestation verification proves the keygen response was created by an enclave whose PCRs match the custodian-approved release manifest and whose user_data matches the persisted key-record metadata.
Build pq-attest on the verifier machine:
cargo build --release -p pq-attest
Save the response attestation:
jq -r '.birth_attestation' /tmp/pq-keygen.json.out > /tmp/birth-attestation.b64
Export the matching DynamoDB record metadata. This includes wrapped_dk, which is not returned by the public keygen API:
USER_ID="$(jq -r '.user_id' /tmp/pq-keygen.json.out)"
aws dynamodb get-item \
--region eu-central-1 \
--table-name "$DYNAMODB_TABLE" \
--key "{\"user_id\":{\"S\":\"$USER_ID\"}}" \
--consistent-read \
--output json > /tmp/key-record.json
jq '{
mldsa_pubkey: .Item.mldsa_pubkey.B,
wrapped_dk: .Item.wrapped_dk.B,
user_id: .Item.user_id.S,
kms_key_arn: .Item.kms_key_arn.S,
alg: .Item.alg.S,
created_at: (.Item.created_at.N | tonumber)
}' /tmp/key-record.json > /tmp/birth-metadata.json
Verify with the exact manifest produced from the signed EIF:
./target/release/pq-attest verify \
--birth-attestation /tmp/birth-attestation.b64 \
--manifest docs/release/measurement-manifest.dev.json \
--metadata /tmp/birth-metadata.json
Output includes verified_pcrs, user_data_sha256, and nitro_root_der_sha256. Any PCR mismatch, certificate-chain failure, or metadata mismatch causes pq-attest to exit non-zero.
9. KMS boundary checks
Run the integrated happy-path check from the Gateway EC2:
./target/release/pq-signer-gateway m2-live-happy-path \
--host pq-signer-host-agent.internal \
--table "$DYNAMODB_TABLE" \
--ca infra/dev-cert/internal-ca.pem \
--client-cert infra/dev-cert/gateway-client.pem \
--client-key infra/dev-cert/gateway-client-key.pem \
--kms-key-arn "$KMS_KEY_ARN"
Then run the three KMS-boundary checks to prove the policy enforces its invariants.
A. Gateway role cannot call the wrapping CMK directly:
pq-signer-host-agent m3-kms-boundary-checks \
--mode gateway-iam \
--kms-key-arn "$KMS_KEY_ARN" \
--wrapped-dk-base64 <base64>
# Expected: AccessDenied / AccessDeniedException
B. Nitro host parent role cannot get plaintext without Recipient:
pq-signer-host-agent m3-kms-boundary-checks \
--mode nitro-host-parent \
--kms-key-arn "$KMS_KEY_ARN" \
--wrapped-dk-base64 <base64> \
--user-id <opaque-user-id>
# Expected: key-policy denial for missing Recipient
C. EncryptionContext is enforced (missing → policy denial, wrong → KMS InvalidCiphertextException, valid attested recipient → CiphertextForRecipient and never plaintext):
pq-signer-host-agent m3-kms-boundary-checks \
--mode encryption-context \
--kms-key-arn "$KMS_KEY_ARN" \
--wrapped-dk-base64 <base64> \
--user-id <opaque-user-id> \
--recipient-attestation <file>
CMK policy summary. The CMK policy permits the Nitro host/enclave role only when all of the following are true:
kms:RecipientAttestation:ImageSha384equals manifestPCR0.kms:RecipientAttestation:PCR1equals manifestPCR1.kms:RecipientAttestation:PCR2equals manifestPCR2.kms:RecipientAttestation:PCR8equals manifestPCR8.kms:EncryptionContextKeysis exactlyuser_id.kms:EncryptionContext:user_idis present.
The policy also explicitly denies the Nitro host/enclave role kms:Decrypt and kms:GenerateDataKey when Recipient is missing, blocking direct parent-role calls even though IAM allows the API.
10. Updating and rolling back
Update procedure:
- Rebuild a signed replacement EIF using the build steps in §2.
- Capture PCRs and write a new release measurement manifest from that exact EIF.
- Compare the new manifest against the currently deployed manifest.
- Update Terraform manifest PCR variables (
manifest_pcr0/1/2/8). terraform applywith the operator/deploy principal.- Deploy the replacement EIF to the Nitro host.
- Restart the enclave, host-agent, and Gateway processes.
- Re-run:
cargo fmt --checkcargo clippy -- -D warningscargo testmake m3-clean-release-evidence
- Re-run the live KMS boundary checks (§9).
Rollback procedure: restore both the previous EIF and the previous manifest PCR values together. Do not run a previous EIF under a CMK policy generated from a different manifest, and do not run a new EIF under an older PCR-bound policy.
11. Troubleshooting
key_generationreturns500. Check Gateway logs, host-agent reachability on TCP8443, KMS PCR policy, and DynamoDB table schema. The table key attribute must beS, notB.pq-attestfails with PCR mismatch. Verify the manifest came from the exact signed EIF currently running and that the enclave was not launched in debug mode.- mTLS failures. Regenerate certs with the Gateway
--hostname/IP in the host-agent certificate SAN and deploy the sameinternal-ca.pemto both instances. - Nitro
E27insufficient memory. Adjust/etc/nitro_enclaves/allocator.yamland matchnitro-cli run-enclave --memory. - KMS
InvalidCiphertextExceptiononDecrypt. ConfirmEncryptionContext.user_idmatches the value bound atGenerateDataKeytime and that the wrapped DK is the one from the DynamoDB row for thatuser_id. AccessDeniedon the Gateway role calling KMS directly. Expected. The Gateway has an explicit IAM deny on the wrapping CMK; only the enclave, via the host-agent relay with a validRecipient, can use it.
12. Evidence artifacts
After the procedures above, the deployment produces:
- Signed EIF:
target/nitro/pq-signer-enclave.eif - PCR capture:
target/nitro/pcrs.json - Release manifest:
docs/release/measurement-manifest.dev.json - Clean-build comparison:
docs/release/m3/clean-build-comparison.json - CloudTrail KMS trail: created by Terraform; contains every
DecryptandGenerateDataKeycall against the wrapping CMK. - Birth attestations: one COSE_Sign1 document per keypair, persisted in the
birth_attestationcolumn of the DynamoDB key table.
These artifacts are the substrate for customer-side compliance attestations and for post-incident review.