End-to-End Runtime Prompt Firewall Setup for Azure AI Foundry¶
Objective¶
Route all user prompts sent to Azure AI Foundry through AccuKnox LLM Defence first, using Azure API Management (APIM), with the following guarantees:
-
Client never talks to Foundry directly
-
Only the user question is sent to AccuKnox
-
Full payload is preserved for Foundry
-
Requests are blocked if AccuKnox returns BLOCK
-
Foundry is called only if allowed
-
Secrets are stored securely in APIM
-
Client code remains unchanged except endpoint + key
Final Runtime Flow¶
Client
|
| POST /foundry/models/chat/completions
| Authorization: Bearer <CLIENT_TOKEN>
| (full LLM payload)
|
Azure API Management (APIM)
|
|──► AUTHENTICATION (Inbound – fail fast)
| - Read Authorization header
| - Strip "Bearer "
| - Compare token with
| - If mismatch / missing → 401 Unauthorized
|
|──► PRESERVE REQUEST
| - Preserve full original request body
|
|──► PROMPT EXTRACTION
| - Extract messages[0].content only
|
|──► PROMPT SECURITY CHECK
| - POST to AccuKnox LLM Defence
| {
| "query_type": "prompt",
| "content": "<user prompt>"
| }
|
|──► PROMPT DECISION
| - If BLOCK → 403 (Foundry NOT called)
| - If ALLOW / MONITOR → continue
|
|──► BACKEND INVOCATION
| - Forward ORIGINAL payload (unchanged)
| - Replace Authorization header with
| Bearer
|
v
Azure AI Foundry
|
|──► MODEL INFERENCE
| - Full LLM payload processed
|
|──► RESPONSE RETURNED
|
v
Azure API Management (APIM)
|
|──► RESPONSE PRESERVATION
| - Preserve full Foundry response
|
|──► RESPONSE EXTRACTION
| - Extract choices[0].message.content
|
|──► RESPONSE SECURITY CHECK
| - POST to AccuKnox LLM Defence
| {
| "query_type": "response",
| "content": "<model output>",
| "session_id": "<prompt session id>"
| }
|
|──► RESPONSE DECISION
| - If BLOCK → 403 (response suppressed)
| - If ALLOW / MONITOR → return response
|
v
Client
Prerequisites¶
-
Azure API Management instance (Developer / Premium recommended)
-
Azure AI Foundry model deployed
-
Working Foundry inference curl:
POST https://ai-prompt-firewall-openai.services.ai.azure.com/models/chat/completions
-
AccuKnox LLM Defence API access + bearer token
-
APIM Product with subscription enabled
STEP 1 — Create Backends in APIM¶
1.1 Foundry Backend¶
APIM → Backends → Add
| Field | Value |
|---|---|
| Name | foundry-backend |
| Backend type | HTTP |
| URL | https://ai-prompt-firewall-openai.services.ai.azure.com |
| TLS | Enabled |
Note
Do not include /models or /chat
1.2 AccuKnox LLM Defence Backend¶
APIM → Backends → Add
| Field | Value |
|---|---|
| Name | llm-defence-backend |
| Backend type | HTTP |
| URL | https://cwpp.<domain>.accuknox.com |
| TLS | Enabled |
STEP 2 — Store Secrets Securely (Named Values)¶
APIM → Named values → Add
2.1 Foundry API Key¶
| Field | Value |
|---|---|
| Name | AI_FOUNDRY_API_KEY |
| Value | <foundry-api-key> |
| Secret | Enabled |
2.2 AccuKnox Defence Token¶
| Field | Value |
|---|---|
| Name | LLM_DEFENCE_TOKEN |
| Value | <Enter Application Token> |
| Secret | Enabled |
In order to get “LLM_DEFENCE_TOKEN“¶
1.Login to platform.
2.Go to AI/ML Security → Applications-> Prompt Firewall
3.Click on Add Application. Enter Application name and tags. Click on add.

4.Copy the generated LLM_DEFENCE_TOKEN.

STEP 3 — Create a New API (Separate)¶
APIM → APIs → Add API → HTTP
| Field | Value |
|---|---|
| Display name | Foundry Models Proxy |
| Name | foundry-models-proxy |
| URL scheme | HTTPS |
| API URL suffix | foundry |
| Web service URL | (leave empty) |
This exposes:
https://<apim-name>.azure-api.net/foundry
STEP 4 — Create Operation (Matches Foundry API)¶
APIM → APIs → Foundry Models Proxy → Add operation
| Field | Value |
|---|---|
| Display name | Chat Completions |
| Name | chat-completions |
| Method | POST |
| URL | /models/chat/completions |
Note
Do not include query params here
STEP 5 — Attach API to Product (Mandatory)¶
APIM → Products → Starter / Unlimited
-
Add Foundry Models Proxy
-
Ensure product has an active subscription
Note
Clients will use the APIM subscription key
STEP 6 — API Policy (CORE LOGIC)¶
Apply at:
APIM → APIs → Foundry Models Proxy → All operations → Policies
FINAL PRODUCTION POLICY (READY TO PASTE)¶
<policies>
<inbound>
<base />
<!-- 🔐 AUTHENTICATION -->
<set-variable name="clientBearer" value="@{
var auth = context.Request.Headers.GetValueOrDefault("Authorization", "");
return auth.StartsWith("Bearer ")
? auth.Substring(7)
: "";
}" />
<choose>
<when condition="@(
string.IsNullOrEmpty((string)context.Variables["clientBearer"]) ||
(string)context.Variables["clientBearer"] != ""
)">
<return-response>
<set-status code="401" reason="Unauthorized" />
<set-header name="Content-Type" exists-action="override">
<value>application/json</value>
</set-header>
<set-body>{
"error": "Invalid or missing bearer token"
}</set-body>
</return-response>
</when>
</choose>
<!-- Preserve original request body -->
<set-variable name="originalBody" value="@(context.Request.Body.As<string>(preserveContent: true))" />
<!-- Extract user prompt -->
<set-variable name="userPrompt" value="@{
var body = context.Request.Body.As<JObject>(preserveContent: true);
return (string)body["messages"]?[0]?["content"];
}" />
<!-- Call AccuKnox LLM Defence (PROMPT scan) -->
<send-request mode="new" response-variable-name="llmDefenceResponse" timeout="10" ignore-error="false">
<set-url>https://cwpp.airindia.accuknox.com/llm-defence/application-query</set-url>
<set-method>POST</set-method>
<set-header name="Content-Type" exists-action="override">
<value>application/json</value>
</set-header>
<set-header name="Authorization" exists-action="override">
<value>Bearer </value>
</set-header>
<set-body>@{
return new JObject(
new JProperty("query_type", "prompt"),
new JProperty("content", (string)context.Variables["userPrompt"])
).ToString();
}</set-body>
</send-request>
<!-- Parse defence response -->
<set-variable name="defenceResult" value="@(((IResponse)context.Variables["llmDefenceResponse"])
.Body.As<JObject>())" />
<!-- Store session_id for response correlation -->
<set-variable name="defenceSessionId" value="@(((JObject)context.Variables["defenceResult"])
["session_id"]?.ToString())" />
<!-- Block if prompt is unsafe -->
<choose>
<when condition="@(
((JObject)context.Variables["defenceResult"])
["query_status"]?.ToString() == "BLOCK"
)">
<return-response>
<set-status code="403" reason="Blocked by LLM Defence" />
<set-header name="Content-Type" exists-action="override">
<value>application/json</value>
</set-header>
<set-body>{
"error": "Prompt blocked by LLM Defence",
"severity": "@(((JObject)context.Variables["defenceResult"])["overall_severity"])",
"reason": "@(((JObject)context.Variables["defenceResult"])["description"])"
}</set-body>
</return-response>
</when>
</choose>
<!-- Forward request to Foundry -->
<set-backend-service backend-id="foundry-backend" />
<set-header name="Authorization" exists-action="override">
<value>Bearer </value>
</set-header>
<set-header name="Content-Type" exists-action="override">
<value>application/json</value>
</set-header>
</inbound>
<backend>
<base />
</backend>
<outbound>
<base />
<!-- Preserve model response -->
<set-variable name="modelResponse" value="@(context.Response.Body.As<JObject>(preserveContent: true))" />
<!-- Extract assistant content -->
<set-variable name="assistantContent" value="@(
(string)((JObject)context.Variables["modelResponse"])
["choices"]?[0]?["message"]?["content"]
)" />
<!-- Call AccuKnox LLM Defence (RESPONSE scan) -->
<send-request mode="new" response-variable-name="llmDefenceResponseScan" timeout="10" ignore-error="false">
<set-url>https://cwpp.airindia.accuknox.com/llm-defence/application-query</set-url>
<set-method>POST</set-method>
<set-header name="Content-Type" exists-action="override">
<value>application/json</value>
</set-header>
<set-header name="Authorization" exists-action="override">
<value>Bearer </value>
</set-header>
<set-body>@{
return new JObject(
new JProperty("query_type", "response"),
new JProperty("content", (string)context.Variables["assistantContent"]),
new JProperty("session_id", (string)context.Variables["defenceSessionId"])
).ToString();
}</set-body>
</send-request>
<!-- Parse response scan -->
<set-variable name="responseDefenceResult" value="@(((IResponse)context.Variables["llmDefenceResponseScan"])
.Body.As<JObject>())" />
<!-- Block if response is unsafe -->
<choose>
<when condition="@(
((JObject)context.Variables["responseDefenceResult"])
["query_status"]?.ToString() == "BLOCK"
)">
<return-response>
<set-status code="403" reason="Response blocked by LLM Defence" />
<set-header name="Content-Type" exists-action="override">
<value>application/json</value>
</set-header>
<set-body>{
"error": "Model response blocked by LLM Defence",
"session_id": "@(context.Variables["defenceSessionId"])"
}</set-body>
</return-response>
</when>
</choose>
</outbound>
<on-error>
<base />
</on-error>
</policies>
STEP 7 — Client Usage¶
Client → APIM (NOT Foundry)
curl -X POST \
"https://<apim-name>.azure-api.net/foundry/models/chat/completions?api-version=2024-05-01-preview" \
-H "Content-Type: application/json" \
-H "Bearer: AI_FOUNDRY_API_KEY" \
-d '{
"messages": [
{ "role": "user", "content": "I am going to Paris, what should I see?" }
],
"model": "mistral-medium-2505",
"max_tokens": 2048,
"temperature": 0.8,
"top_p": 0.1
}'
Behavior Summary¶
| Scenario | Result |
|---|---|
AccuKnox returns BLOCK |
❌ 403, Foundry not called |
AccuKnox returns ALLOW |
✅ Request forwarded |
AccuKnox returns MONITOR |
✅ Request forwarded |
| Defence API down | ❌ Fail-closed (configurable) |
| Client sees Foundry key | ❌ Never |
Extensible By Design¶
This architecture supports:
-
Monitor-only mode
-
Severity thresholds
-
Multi-message extraction
-
Async / shadow scanning
-
Policy fragments
-
Tenant-aware routing
-
Prompt + response logging