API Reference

The GitHub Copilot API Gateway exposes OpenAI-compatible endpoints that translate requests to your authenticated Copilot session. This reference documents all available endpoints.

Authentication

Authentication happens automatically through your VS Code GitHub Copilot session. For the API client, you have two options:

Option 1: No Authentication (Local)

When running locally (127.0.0.1), you can use any string as the API key:

Python

client = OpenAI(
    base_url="http://127.0.0.1:3030/v1",
    api_key="anything"  # Ignored by server
)

Option 2: Bearer Token (When API Key Configured)

If you've set server.apiKey in settings, include it in your requests:

HTTP

Authorization: Bearer your-configured-api-key

Base URL

http://127.0.0.1:3030

Or if running on LAN: http://YOUR-IP:3030

OpenAI API

Full compatibility with OpenAI's Chat Completions API.

POST /v1/chat/completions Create a chat completion ▼

Request Body

Parameter	Type	Description
`model` required	string	Model ID (e.g., `gpt-4o`, `gpt-4`, `gpt-3.5-turbo`)
`messages` required	array	Array of message objects with `role` and `content`
`stream`	boolean	Enable streaming responses (SSE)
`temperature`	number	Sampling temperature (0-2)
`max_tokens`	integer	Maximum tokens to generate
`tools`	array	List of tool/function definitions
`response_format`	object	Set to `{"type": "json_object"}` for JSON mode

Example Request

Python

from openai import OpenAI

client = OpenAI(
    base_url="http://127.0.0.1:3030/v1",
    api_key="copilot"
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is the capital of France?"}
    ],
    temperature=0.7
)

print(response.choices[0].message.content)

Example with Streaming

Python

stream = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Write a haiku"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

GET /v1/models List available models ▼

Returns a list of models available through your Copilot subscription.

Example Request

curl

curl http://127.0.0.1:3030/v1/models

Example Response

JSON

{
  "object": "list",
  "data": [
    { "id": "gpt-4o", "object": "model" },
    { "id": "gpt-4", "object": "model" },
    { "id": "gpt-3.5-turbo", "object": "model" },
    { "id": "claude-3.5-sonnet", "object": "model" }
  ]
}

Anthropic API

Compatible with Anthropic Claude SDK.

POST /v1/messages Create a message (Claude) ▼

Request Body

Parameter	Type	Description
`model` required	string	e.g., `claude-3-5-sonnet-20241022`
`messages` required	array	Array of message objects
`max_tokens` required	integer	Maximum tokens to generate
`system`	string	System prompt
`stream`	boolean	Enable streaming

Example Request

Python

import anthropic

client = anthropic.Anthropic(
    base_url="http://127.0.0.1:3030",
    api_key="copilot"
)

message = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Explain quantum computing"}
    ]
)

print(message.content[0].text)

Google API

Compatible with Google Generative AI SDK.

POST /v1beta/models/:model:generateContent Generate content (Gemini) ▼

URL Parameters

Parameter	Type	Description
`:model` required	string	Model name (e.g., `gemini-pro`)

Example Request

Python

import google.generativeai as genai

genai.configure(
    api_key="copilot",
    transport="rest",
    client_options={"api_endpoint": "http://127.0.0.1:3030"}
)

model = genai.GenerativeModel("gemini-pro")
response = model.generate_content("Tell me a joke")
print(response.text)

Llama API

Compatible with Llama client libraries.

POST /llama/v1/chat/completions Chat completions (Llama) ▼

Same request/response format as OpenAI Chat Completions, but routed through the Llama endpoint.

Example Request

curl

curl -X POST http://127.0.0.1:3030/llama/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3.1-70b",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Utilities

GET /v1/usage Get usage statistics ▼

Returns server usage statistics including request counts and token usage.

Example Response

JSON

{
  "totalRequests": 1234,
  "successfulRequests": 1200,
  "failedRequests": 34,
  "totalInputTokens": 50000,
  "totalOutputTokens": 75000,
  "uptime": 3600
}

GET /docs Swagger UI documentation ▼

Access interactive API documentation using Swagger UI. Test endpoints directly from your browser.

Open in browser: http://127.0.0.1:3030/docs

Try It Out: Send real requests and see responses
Schema Explorer: View detailed request/response schemas
Offline: Works locally without external dependencies

Need Integration Guides?

Check out the Wiki for step-by-step guides on integrating with LangChain, Cursor, Aider, and more.

View Integration Guides