Repository: explainers-by-googlers/prompt-api
Branch: main
Commit: 8c88a6022af8
Files: 10
Total size: 104.5 KB

Directory structure:
gitextract_vvyfil9_/

├── .github/
│   ├── dependabot.yml
│   └── workflows/
│       └── auto-publish.yml
├── .gitignore
├── .pr-preview.json
├── CONTRIBUTING.md
├── LICENSE.md
├── README.md
├── index.bs
├── security-privacy-questionnaire.md
└── w3c.json

================================================
FILE CONTENTS
================================================

================================================
FILE: .github/dependabot.yml
================================================
# See the documentation at
# https://docs.github.com/github/administering-a-repository/configuration-options-for-dependency-updates
version: 2
updates:
  # Update actions used by .github/workflows in this repository.
  - package-ecosystem: "github-actions"
    directory: "/"
    schedule:
      interval: "weekly"
    groups:
      actions-org: # Groups all Github-authored actions into a single PR.
        patterns: ["actions/*"]


================================================
FILE: .github/workflows/auto-publish.yml
================================================
name: CI
on:
  pull_request: {}
  push:
    branches: [main]
jobs:
  main:
    name: Build, Validate and Deploy
    runs-on: ubuntu-latest
    permissions:
      contents: write
    steps:
      - uses: actions/checkout@v6
      - uses: w3c/spec-prod@v2
        with:
          GH_PAGES_BRANCH: gh-pages
          BUILD_FAIL_ON: warning


================================================
FILE: .gitignore
================================================
index.html


================================================
FILE: .pr-preview.json
================================================
{
  "src_file": "index.bs",
  "type": "bikeshed",
  "params": {
    "force": 1
  }
}


================================================
FILE: CONTRIBUTING.md
================================================
# W3C Web Machine Learning Community Group

This repository is being used for work in the W3C Web Machine Learning Community Group, governed by the [W3C Community License
Agreement (CLA)](http://www.w3.org/community/about/agreements/cla/). To make substantive contributions,
you must join the CG.

If you are not the sole contributor to a contribution (pull request), please identify all
contributors in the pull request comment.

To add a contributor (other than yourself, that's automatic), mark them one per line as follows:

```
+@github_username
```

If you added a contributor by mistake, you can remove them in a comment with:

```
-@github_username
```

If you are making a pull request on behalf of someone else but you had no part in designing the
feature, you can remove yourself with the above syntax.


================================================
FILE: LICENSE.md
================================================
All Reports in this Repository are licensed by Contributors
under the
[W3C Software and Document License](https://www.w3.org/copyright/software-license/).

Contributions to Specifications are made under the
[W3C CLA](https://www.w3.org/community/about/agreements/cla/).

Contributions to Test Suites are made under the
[W3C 3-clause BSD License](https://www.w3.org/copyright/3-clause-bsd-license-2008/).


================================================
FILE: README.md
================================================
# Explainer for the Prompt API

_This explainer and the accompanied draft report are in active development by the Web Machine Learning Community Group. CG members are seeking feedback and support for this proposal to gain Working Group and implementer adoption. Implementations are experimentally available in [Google Chrome](https://developer.chrome.com/docs/ai/prompt-api) and [Microsoft Edge](https://learn.microsoft.com/en-us/microsoft-edge/web-platform/prompt-api)._

Browsers and operating systems are increasingly expected to gain access to language models. ([Example](https://developer.chrome.com/docs/ai/built-in), [example](https://learn.microsoft.com/windows/ai/apis/local-llms), [example](https://www.apple.com/apple-intelligence/).) Language models are known for their versatility. With enough creative [prompting](https://developers.google.com/machine-learning/resources/prompt-eng), they can help accomplish tasks as diverse as:

* Classification, tagging, and keyword extraction of arbitrary text
* Helping users compose text, such as blog posts, reviews, or biographies
* Summarizing, e.g. of articles, user reviews, or chat logs
* Generating titles or headlines from article contents
* Answering questions based on the unstructured contents of a web page
* Translation between languages
* Proofreading

The Google Chrome, Microsoft Edge, and the Web Machine Learning Community Group are exploring purpose-built APIs for some of these use cases (namely [translator / language detector](https://github.com/webmachinelearning/translation-api), [summarizer / writer / rewriter](https://github.com/webmachinelearning/writing-assistance-apis), and [proofreader](https://github.com/webmachinelearning/proofreader-api)). This proposal additionally explores a general-purpose "Prompt API" that allows web developers to prompt a language model directly. This gives web developers access to many more capabilities, at the cost of requiring them to do their own prompt engineering.

Currently, web developers wishing to use language models must either call out to cloud APIs, or bring their own and run them using technologies like [WASM](https://webassembly.org/) and [WebGPU](https://www.w3.org/TR/webgpu/), usually through JS runtime frameworks. By providing web platform API access to the browser or operating system's existing language model, we can provide the following benefits compared to cloud APIs:

* Local processing of sensitive data, e.g. allowing websites to combine AI features with end-to-end encryption
* Potentially faster results, since there is no server round-trip involved
* Offline usage
* Lower API costs for web developers
* Allowing hybrid approaches, e.g. free users of a website use on-device AI whereas paid users use a more powerful API-based model

Similarly, compared to developer-supplied model approaches, using a built-in language model can save the user's bandwidth, storage, and memory resources, while using a model that is optimized for the device. This pattern can also provide a lower barrier to entry for web developers by removing the need for developers to serve models and manage dependencies.

## Goals

Our goals are to:

* Provide web developers a uniform JavaScript API for accessing browser-provided language models of varying capabilities.
* Encapsulate model management and execution details as much as possible, e.g. downloads, updates, templating, parsing.
* Guide web developers to gracefully handle failure cases, e.g. no browser-provided model being available.
* Develop formal implementations guidelines and definitions; e.g. initial on-device models, and possible cloud services.

The following are explicit non-goals:

* We do not intend to force every browser to ship or expose a language model; in particular, not all devices will be capable of storing or running one. It would be conforming to implement this API by always signaling that no language model is available; it may also be viable to implement this API entirely by using cloud services instead of on-device models.
* We do not intend to provide guarantees of language model quality, stability, or interoperability between browsers. In particular, we cannot guarantee that the models exposed by these APIs are particularly good at any given use case. These are left as quality-of-implementation issues, similar to the [shape detection API](https://wicg.github.io/shape-detection-api/). (See also a [discussion of interop](https://www.w3.org/reports/ai-web-impact/#interop) in the W3C "AI & the Web" document.)

The following are potential goals we are not yet certain of:

* Allow web developers to know, or control, whether language model interactions are done on-device or using cloud services. This would allow them to guarantee that any user data they feed into this API does not leave the device, which can be important for privacy purposes. Similarly, we might want to allow developers to request on-device-only language models, in case a browser offers both varieties.
* Allow web developers to know some identifier for the language model in use, separate from the browser version. This would allow them to allowlist or blocklist specific models to maintain a desired level of quality, or restrict certain use cases to a specific model.

Both of these potential goals could pose challenges to interoperability, so we want to investigate more how important such functionality is to developers to find the right tradeoff.

## Experiments and Updates

### Sampling Parameters

Developers have expressed the value of tuning language model sampling parameters for testing and optimizing task-specific model behavior. At the same time, web standards engagements have highlighted the need for more interoperable API shapes for sampling parameters among different models.

The API was initially made available in extension contexts with the following sampling parameter options and attributes:

*   The static method `LanguageModel.params()`, which exposes default and maximum values for sampling parameters: `defaultTemperature`, `maxTemperature`, `defaultTopK`, `maxTopK`.
*   The `temperature` and `topK` options, which may be provided to `LanguageModel.create()` to control the sampling behavior of individual language model sessions.
*   The `temperature` and `topK` attributes on `LanguageModel` session instances, which expose the current values of the sampling parameters for that session.

Access to these features is limited to extension and experimental web contexts. Ongoing experimentation and community engagement will explore different API shapes that satisfy developer requirements and address interoperability concerns.

### Renamed Features

The following features have been recently renamed. The legacy aliases are deprecated, and clients should update their code to use the new names. The legacy aliases will be removed from extension contexts in a future release.

| Old Name (Deprecated in Extensions, Removed in Web) | New Name (Available in All Contexts)     |
| :-------------------------------------------------- | :----------------------------------------|
| `languageModel.inputUsage`                          | `languageModel.contextUsage`             |
| `languageModel.inputQuota`                          | `languageModel.contextWindow`            |
| `languageModel.measureInputUsage()`                 | `languageModel.measureContextUsage()`    |
| `languagemodel.onquotaoverflow`                     | `languagemodel.oncontextoverflow`. |


## Examples

### Zero-shot prompting

In this example, a single string is used to prompt the API, which is assumed to come from the user. The returned response is from the language model.

```js
const session = await LanguageModel.create();

// Prompt the model and wait for the whole result to come back.
const result = await session.prompt("Write me a poem.");
console.log(result);

// Prompt the model and stream the result:
const stream = session.promptStreaming("Write me an extra-long poem.");
for await (const chunk of stream) {
  console.log(chunk);
}
```

### System prompts

The language model can be configured with a special "system prompt" which gives it the context for future interactions. The system prompt must be the first message, whether passed via the `initialPrompts` option to `create()`, or as the first message to the first `prompt()` or `append()` method call.  Role and content formatting aligns with the "chat completions API" `{ role, content }` format, which are expanded upon in [the following section](#n-shot-prompting).

```js
// Option 1: Create a new session with a system prompt as the first message.
const session1 = await LanguageModel.create({
  initialPrompts: [{ role: "system", content: "Pretend to be an eloquent hamster." }]
});
console.log(await session1.prompt("What is your favorite food?"));

// Option 2: Create a new session and append a system prompt as the first message.
const session2 = await LanguageModel.create();
await session2.append([{ role: "system", content: "Pretend to be an eloquent hamster." }]);
console.log(await session2.prompt("What is your favorite food?"));

// Option 3: Create a new session and prompt with a system prompt as the first message.
const session3 = await LanguageModel.create();
console.log(await session3.prompt([
  { role: "system", content: "Pretend to be an eloquent hamster." },
  { role: "user", content: "What is your favorite food?" }
]));
```

### N-shot prompting

If developers want to provide examples of the user/assistant interaction, they can add more entries to the `initialPrompts` array, using the `"user"` and `"assistant"` roles:

```js
const session = await LanguageModel.create({
  initialPrompts: [
    { role: "system", content: "Predict up to 5 emojis as a response to a comment. Output emojis, comma-separated." },
    { role: "user", content: "This is amazing!" },
    { role: "assistant", content: "❤️, ➕" },
    { role: "user", content: "LGTM" },
    { role: "assistant", content: "👍, 🚢" }
  ]
});

// Clone an existing session for efficiency, instead of recreating one each time.
async function predictEmoji(comment) {
  const freshSession = await session.clone();
  return await freshSession.prompt(comment);
}

const result1 = await predictEmoji("Back to the drawing board");

const result2 = await predictEmoji("This code is so good you should get promoted");
```

(Note that merely creating a session does not cause any new responses from the language model. We need to call `prompt()` or `promptStreaming()` to get a response.)

Some details on error cases:

* Placing the `{ role: "system" }` prompt anywhere besides at the 0th position of the first `LanguageModelMessage` sequence sent to any of `create()`, `append()`, or `prompt()` will reject with a `TypeError`.
* If the combined token length of all the initial prompts is too large, then the promise will be rejected with a [`QuotaExceededError` exception](#tokenization-context-window-length-limits-and-overflow).

### Customizing the role per prompt

Our examples so far have provided `prompt()` and `promptStreaming()` with a single string. Such cases assume messages will come from the user role. These methods can also take arrays of objects in the `{ role, content }` format, in case you want to provide multiple user or assistant messages before getting another assistant message:

```js
const multiUserSession = await LanguageModel.create({
  initialPrompts: [{
    role: "system",
    content: "You are a mediator in a discussion between two departments."
  }]
});

const result = await multiUserSession.prompt([
  { role: "user", content: "Marketing: We need more budget for advertising campaigns." },
  { role: "user", content: "Finance: We need to cut costs and advertising is on the list." },
  { role: "assistant", content: "Let's explore a compromise that satisfies both departments." }
]);

// `result` will contain a compromise proposal from the assistant.
```

Because of their special behavior of being preserved on context window overflow, system prompts cannot be provided this way.

### Tool use

The Prompt API supports **tool use** via the `tools` option, allowing you to define external capabilities that a language model can invoke in a model-agnostic way. Each tool is represented by an object that includes an `execute` member that specifies the JavaScript function to be called. When the language model initiates a tool use request, the user agent calls the corresponding `execute` function and sends the result back to the model.

Here’s an example of how to use the `tools` option:

```js
const session = await LanguageModel.create({
  initialPrompts: [
    {
      role: "system",
      content: `You are a helpful assistant. You can use tools to help the user.`,
    },
  ],
  expectedInputs: [{ type: "text", languages: ["en"] }, { type: "tool-response" }],
  expectedOutputs: [{ type: "text", languages: ["en"] }, { type: "tool-call" }],
  tools: [
    {
      name: "getWeather",
      description: "Get the weather in a location.",
      inputSchema: {
        type: "object",
        properties: {
          location: {
            type: "string",
            description: "The city to check for the weather condition.",
          },
        },
        required: ["location"],
      },
      async execute({ location }) {
        const res = await fetch(
          "https://weatherapi.example/?location=" + location,
        );
        // Returns the result as a JSON string.
        return JSON.stringify(await res.json());
      },
    },
  ],
});

const result = await session.prompt("What is the weather in Seattle?");
```

In this example, the `tools` array defines a `getWeather` tool, specifying its name, description, input schema, and `execute` implementation. When the language model determines that a tool call is needed, the user agent invokes the `getWeather` tool's `execute()` function with the provided arguments and returns the result to the model, which can then incorporate it into its response.

#### Concurrent tool use

Developers should be aware that the model might call their tool multiple times, concurrently. For example, code such as

```js
const result = await session.prompt("Which of these locations currently has the highest temperature? Seattle, Tokyo, Berlin");
```

might call the above `"getWeather"` tool's `execute()` function three times. The model would wait for all tool call results to return, using the equivalent of `Promise.all()` internally, before it composes its final response.

Similarly, the model might call multiple different tools, if it believes they all are relevant when responding to the given prompt.

### Multimodal inputs

All of the above examples have been of text prompts. Some language models also support other inputs. Our design initially includes the potential to support images and audio clips as inputs. This is done by using objects in the form `{ type: "image", content }` and `{ type: "audio", content }` instead of strings. The `content` values can be the following:

* For image inputs: [`ImageBitmapSource`](https://html.spec.whatwg.org/#imagebitmapsource), i.e. `Blob`, `ImageData`, `ImageBitmap`, `VideoFrame`, `OffscreenCanvas`, `HTMLImageElement`, `SVGImageElement`, `HTMLCanvasElement`, or `HTMLVideoElement` (will get the current frame). Also raw bytes via `BufferSource` (i.e. `ArrayBuffer` or typed arrays).

* For audio inputs: for now, `Blob`, `AudioBuffer`, or raw bytes via `BufferSource`. Other possibilities we're investigating include `HTMLAudioElement`, `AudioData`, and `MediaStream`, but we're not yet sure if those are suitable to represent "clips": most other uses of them on the web platform are able to handle streaming data.

Sessions that will include these inputs need to be created using the `expectedInputs` option, to ensure that any necessary downloads are done as part of session creation, and that if the model is not capable of such multimodal prompts, the session creation fails. (See also the below discussion of [expected input languages](#multilingual-content-and-expected-input-languages), not just expected input types.)

A sample of using these APIs:

```js
const session = await LanguageModel.create({
  // { type: "text" } is not necessary to include explicitly, unless
  // you also want to include expected input languages for text.
  expectedInputs: [
    { type: "audio" },
    { type: "image" }
  ]
});

const referenceImage = await (await fetch("/reference-image.jpeg")).blob();
const userDrawnImage = document.querySelector("canvas");

const response1 = await session.prompt([{
  role: "user",
  content: [
    { type: "text", value: "Give a helpful artistic critique of how well the second image matches the first:" },
    { type: "image", value: referenceImage },
    { type: "image", value: userDrawnImage }
  ]
}]);

console.log(response1);

const audioBlob = await captureMicrophoneInput({ seconds: 10 });

const response2 = await session.prompt([{
  role: "user",
  content: [
    { type: "text", value: "My response to your critique:" },
    { type: "audio", value: audioBlob }
  ]
}]);
```

Note how once we move to multimodal prompting, the prompt format becomes more explicit:

* We must always pass an array of messages, instead of a single string value.
* Each message must have a `role` property: unlike with the string shorthand, `"user"` is no longer assumed.
* The `content` property must be an array of content, if it contains any multimodal content.

This extra ceremony is necessary to make it clear that we are sending a single message that contains multimodal content, versus sending multiple messages, one per each piece of content. To avoid such confusion, the multimodal format has fewer defaults and shorthands than if you interact with the API using only text. (See some discussion in [issue #89](https://github.com/webmachinelearning/prompt-api/pull/89).)

To illustrate, the following extension of our above [multi-user example](#customizing-the-role-per-prompt) has a similar sequence of text + image + image values compared to our artistic critique example. However, it uses a multi-message structure instead of the artistic critique example's single-message structure, so the model will interpret it differently:

```js
const response = await session.prompt([
  {
    role: "user",
    content: "Your compromise just made the discussion more heated. The two departments drew up posters to illustrate their strategies' advantages:"
  },
  {
    role: "user",
    content: [{ type: "image", value: brochureFromTheMarketingDepartment }]
  },
  {
    role: "user",
    content: [{ type: "image", value: brochureFromTheFinanceDepartment }]
  }
]);
```

Details:

* Cross-origin data that has not been exposed using the `Access-Control-Allow-Origin` header cannot be used with the prompt API, and will reject with a `"SecurityError"` `DOMException`. This applies to `HTMLImageElement`, `SVGImageElement`, `HTMLVideoElement`, `HTMLCanvasElement`, and `OffscreenCanvas`. Note that this is more strict than `createImageBitmap()`, which has a tainting mechanism which allows creating opaque image bitmaps from unexposed cross-origin resources. For the prompt API, such resources will just fail. This includes attempts to use cross-origin-tainted canvases.

* Raw-bytes cases (`Blob` and `BufferSource`) will apply the appropriate sniffing rules ([for images](https://mimesniff.spec.whatwg.org/#rules-for-sniffing-images-specifically), [for audio](https://mimesniff.spec.whatwg.org/#rules-for-sniffing-audio-and-video-specifically)) and reject with an `"EncodingError"` `DOMException` if the format is not supported or there is some error decoding the data. This behavior is similar to that of `createImageBitmap()`.

* Animated images will be required to snapshot the first frame (like `createImageBitmap()`). In the future, animated image input may be supported via some separate opt-in, similar to video clip input. But we don't want interoperability problems from some implementations supporting animated images and some not, in the initial version.

* For `HTMLVideoElement`, even a single frame might not yet be downloaded when the prompt API is called. In such cases, calling into the prompt API will force at least a single frame's worth of video to download. (The intent is to behave the same as `createImageBitmap(videoEl)`.)

* Attempting to supply an invalid combination, e.g. `{ type: "audio", value: anImageBitmap }`, `{ type: "image", value: anAudioBuffer }`, or `{ type: "text", value: anArrayBuffer }`, will reject with a `TypeError`.

* For now, using the `"assistant"` role with an image or audio prompt will reject with a `"NotSupportedError"` `DOMException`. (As we explore multimodal outputs, this restriction might be lifted in the future.)

Future extensions may include more ambitious multimodal inputs, such as video clips, or realtime audio or video. (Realtime might require a different API design, more based around events or streams instead of messages.)

### Structured output with JSON schema or RegExp constraints

To help with programmatic processing of language model responses, the prompt API supports constraining the response with either a JSON schema object or a `RegExp` passed as the `responseConstraint` option:

```js
const schema = {
  type: "object",
  required: ["rating"],
  additionalProperties: false,
  properties: {
    rating: {
      type: "number",
      minimum: 0,
      maximum: 5,
    },
  },
};

// Prompt the model and wait for the JSON response to come back.
const result = await session.prompt("Summarize this feedback into a rating between 0-5: "+
  "The food was delicious, service was excellent, will recommend.",
  { responseConstraint: schema }
);

const { rating } = JSON.parse(result);
console.log(rating);
```

If the input value is a valid JSON schema object, but uses JSON schema features not supported by the user agent, the method will error with a `"NotSupportedError"` `DOMException`.

The result value returned is a string that can be parsed with `JSON.parse()`. If the user agent is unable to produce a response that is compliant with the schema, the method will error with a `"SyntaxError"` `DOMException`.

```js
const emailRegExp = /^[a-zA-Z0-9.!#$%&'*+\/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$/;

const emailAddress = await session.prompt(
  `Create a fictional email address for ${characterName}.`,
  { responseConstraint: emailRegExp }
);

console.log(emailAddress);
```

The returned value will be a string that matches the input `RegExp`. If the user agent is unable to produce a response that matches, the method will error with a `"SyntaxError"` `DOMException`.

If a value that is neither a `RegExp` object or a valid JSON schema object is given, the method will error with a `TypeError`.

By default, the implementation may include the schema or regular expression as part of the message sent to the underlying language model, which will use up some of the [context window](#tokenization-context-window-length-limits-and-overflow). You can measure how much it will use up by passing the `responseConstraint` option to `session.measureContextUsage()`. If you want to avoid this behavior, you can use the `omitResponseConstraintInput` option. In such cases, it's strongly recommended to include some guidance in the prompt string itself:

```js
const result = await session.prompt(`
  Summarize this feedback into a rating between 0-5, only outputting a JSON
  object { rating }, with a single property whose value is a number:
  The food was delicious, service was excellent, will recommend.
`, { responseConstraint: schema, omitResponseConstraintInput: true });
```

If `omitResponseConstraintInput` is set to `true` without `responseConstraint` set, then the method will error with a `TypeError`.

### Constraining responses by providing a prefix

As discussed in [Customizing the role per prompt](#customizing-the-role-per-prompt), it is possible to prompt the language model to add a new `"assistant"`-role response in addition to a previous one. Usually it will elaborate on its previous messages. For example:

```js
const followup = await session.prompt([
  {
    role: "user",
    content: "I'm nervous about my presentation tomorrow"
  },
  {
    role: "assistant"
    content: "Presentations are tough!"
  }
]);

// `followup` might be something like "Here are some tips for staying calm.", or
// "I remember my first presentation, I was nervous too!" or...
```

In some cases, instead of asking for a new response message, you want to "prefill" part of the `"assistant"`-role response message. An example use case is to guide the language model toward specific response formats. To do this, add `prefix: true` to the trailing `"assistant"`-role message. For example:

```js
const characterSheet = await session.prompt([
  {
    role: "user",
    content: "Create a TOML character sheet for a gnome barbarian"
  },
  {
    role: "assistant",
    content: "```toml\n",
    prefix: true
  }
]);
```

(Such examples work best if we also support [stop sequences](https://github.com/webmachinelearning/prompt-api/issues/44); stay tuned for that!)

Without this continuation, the output might be something like "Sure! Here's a TOML character sheet...". Whereas the prefix message sets the assistant on the right path immediately.

(Kudos to the [Standard Completions project](https://standardcompletions.org/) for [discussion](https://github.com/standardcompletions/rfcs/pull/8) of this functionality, as well as [the example](https://x.com/stdcompletions/status/1928565134080778414).)

If `prefix` is used in any message besides a final `"assistant"`-role one, a `"SyntaxError"` `DOMException` will occur.

### Appending messages without prompting for a response

In some cases, you know which messages you'll want to use to populate the session, but not yet the final message before you prompt the model for a response. Because processing messages can take some time (especially for multimodal inputs), it's useful to be able to send such messages to the model ahead of time. This allows it to get a head-start on processing, while you wait for the right time to prompt for a response.

(The `initialPrompts` array serves this purpose at session creation time, but this can be useful after session creation as well, as we show in the example below.)

For such cases, in addition to the `prompt()` and `promptStreaming()` methods, the prompt API provides an `append()` method, which takes the same message format as `prompt()`. Here's an example of how that could be useful:

```js
const session = await LanguageModel.create({
  initialPrompts: [{
    role: "system",
    content: "You are a skilled analyst who correlates patterns across multiple images."
  }],
  expectedInputs: [{ type: "image" }]
});

fileUpload.onchange = async (e) => {
  await session.append([{
    role: "user",
    content: [
      { type: "text", value: `Here's one image. Notes: ${fileNotesInput.value}` },
      { type: "image", value: fileUpload.files[0] }
    ]
  }]);
};

analyzeButton.onclick = async (e) => {
  analysisResult.textContent = await session.prompt(userQuestionInput.value);
};
```

The promise returned by `append()` will reject if the prompt cannot be appended (e.g., too big, invalid modalities for the session, etc.), or will fulfill once the prompt has been validated, processed, and appended to the session.

Note that `append()` can also cause [overflow](#tokenization-context-window-length-limits-and-overflow), in which case it will evict the oldest non-system prompts from the session and fire the `"contextoverflow"` event.

### Configuration of per-session parameters

Tuning language model sampling parameters can be useful for both testing and adjusting task-specific model behavior. Common sampling parameters include [temperature](https://huggingface.co/blog/how-to-generate#sampling) and [topK](https://huggingface.co/blog/how-to-generate#top-k-sampling).

**Notice:** Sampling parameter features are currently only available within extension and experimental contexts. While they are useful for exploring model behavior, the current fields are not guaranteed to be supported or interpreted consistently across all models or user agents.

_The limited applicability and non-universal nature of these sampling hyperparameters are discussed further in [issue #42](https://github.com/webmachinelearning/prompt-api/issues/42): sampling hyperparameters are not universal among models._

In extension and experimental contexts:
* The `LanguageModel.params()` static method provides default and maximum values for temperature and topK parameters, once the user agent has ascertained or downloaded the specific underlying model.
* The `temperature` and `topK` instance attributes provide the current values for these parameters for a given session.
* Sampling parameters can also be configured at session creation time via the `temperature` and `topK` options for `LanguageModel.create()`


```js
// Sampling parameter support is limited to extension and experimental web contexts.
// Accessors are undefined, and options are ignored, outside of those contexts.
const customSession = await LanguageModel.create({
  temperature: 0.8,
  topK: 10
});
const params = await LanguageModel.params();
const conditionalSession = await LanguageModel.create({
  temperature: isCreativeTask ? params.defaultTemperature * 1.1 : params.defaultTemperature * 0.8,
  topK: isGeneratingIdeas ? params.maxTopK : params.defaultTopK
});
```

If the language model is not available at all in this browser, `params()` will fulfill with `null`.

Error-handling behavior:

* If values below 0 are passed for `temperature`, then `create()` will return a promise rejected with a `RangeError`.
* If values above `maxTemperature` are passed for `temperature`, then `create()` will clamp to `maxTemperature`. (`+Infinity` is specifically allowed, as a way of requesting maximum temperature.)
* If values below 1 are passed for `topK`, then `create()` will return a promise rejected with a `RangeError`.
* If values above `maxTopK` are passed for `topK`, then `create()` will clamp to `maxTopK`. (This includes `+Infinity` and numbers above `Number.MAX_SAFE_INTEGER`.)
* If fractional values are passed for `topK`, they are rounded down (using the usual [IntegerPart](https://webidl.spec.whatwg.org/#abstract-opdef-integerpart) algorithm for web specs).

### Session persistence and cloning

Each language model session consists of a persistent series of interactions with the model:

```js
const session = await LanguageModel.create({
  initialPrompts: [{
    role: "system",
    content: "You are a friendly, helpful assistant specialized in clothing choices."
  }]
});

const result = await session.prompt(`
  What should I wear today? It's sunny and I'm unsure between a t-shirt and a polo.
`);

console.log(result);

const result2 = await session.prompt(`
  That sounds great, but oh no, it's actually going to rain! New advice??
`);
```

Multiple unrelated continuations of the same prompt can be set up by creating a session and then cloning it:

```js
const session = await LanguageModel.create({
  initialPrompts: [{
    role: "system",
    content: "You are a friendly, helpful assistant specialized in clothing choices."
  }]
});

const session2 = await session.clone();
```

The clone operation can be aborted using an `AbortSignal`:

```js
const controller = new AbortController();
const session2 = await session.clone({ signal: controller.signal });
```

### Session destruction

A language model session can be destroyed, either by using an `AbortSignal` passed to the `create()` method call:

```js
const controller = new AbortController();
stopButton.onclick = () => controller.abort();

const session = await LanguageModel.create({ signal: controller.signal });
```

or by calling `destroy()` on the session:

```js
stopButton.onclick = () => session.destroy();
```

Destroying a session will have the following effects:

* If done before the promise returned by `create()` is settled:

  * Stop signaling any ongoing download progress for the language model. (The browser may also abort the download, or may continue it. Either way, no further `downloadprogress` events will fire.)

  * Reject the `create()` promise.

* Otherwise:

  * Reject any ongoing calls to `prompt()`.

  * Error any `ReadableStream`s returned by `promptStreaming()`.

* Most importantly, destroying the session allows the user agent to unload the language model from memory, if no other APIs or sessions are using it.

In all cases the exception used for rejecting promises or erroring `ReadableStream`s will be an `"AbortError"` `DOMException`, or the given abort reason.

The ability to manually destroy a session allows applications to free up memory without waiting for garbage collection, which can be useful since language models can be quite large.

### Aborting a specific prompt

Specific calls to `prompt()` or `promptStreaming()` can be aborted by passing an `AbortSignal` to them:

```js
const controller = new AbortController();
stopButton.onclick = () => controller.abort();

const result = await session.prompt("Write me a poem", { signal: controller.signal });
```

Note that because sessions are stateful, and prompts can be queued, aborting a specific prompt is slightly complicated:

* If the prompt is still queued behind other prompts in the session, then it will be removed from the queue, and the returned promise will be rejected with an `"AbortError"` `DOMException`.
* If the prompt is being currently responded to by the model, then it will be aborted, the prompt/response pair will be removed from the session, and the returned promise will be rejected with an `"AbortError"` `DOMException`.
* If the prompt has already been fully responded to by the model, then attempting to abort the prompt will do nothing.

Similarly, the `append()` operation can also be aborted. In this case the behavior is:

* If the append is queued behind other appends in the session, then it will be removed from the queue, and the returned promise will be rejected with an `"AbortError"` `DOMException`.
* If the append operation is currently ongoing, then it will be aborted, any part of the prompt that was appended so far will be removed from the session, and the returned promise will be rejected with an `"AbortError"` `DOMException`.
* If the append operation is complete (i.e., the returned promise has resolved), then attempting to abort it will do nothing. This includes all the following states:
  * The append operation is complete, but a prompt generation step has not yet triggered.
  * The append operation is complete, and a prompt generation step is processing.
  * The append operation is complete, and a prompt generation step has used it to produce a result.

Finally, note that if either prompting or appending has caused an [overflow](#tokenization-context-window-length-limits-and-overflow), aborting the operation does not re-introduce the overflowed messages into the session.

### Tokenization, context window length limits, and overflow

A given language model session will have a maximum number of tokens it can process. Developers can check their current context usage and progress toward that limit by using the following properties on the session object:

```js
console.log(`${session.contextUsage} tokens used, out of ${session.contextWindow} tokens available.`);
```

To know how many tokens a prompt will consume, without actually processing it, developers can use the `measureContextUsage()` method. This method accepts the same input types as `prompt()`, including strings and multimodal input arrays:

```js
const stringUsage = await session.measureContextUsage(promptString);

const audioUsage = await session.measureContextUsage([{
  role: "user",
  content: [
    { type: "text", value: "My response to your critique:" },
    { type: "audio", value: audioBlob }
  ]
}]);
```

Some notes on this API:

* We do not expose the actual tokenization to developers since that would make it too easy to depend on model-specific details.
* Implementations must include in their count any control tokens that will be necessary to process the prompt, e.g. ones indicating the start or end of the input.
* The counting process can be aborted by passing an `AbortSignal`, i.e. `session.measureContextUsage(promptString, { signal })`.
* We use the phrases "context usage" and "context window" in the API, to avoid being specific to the current language model tokenization paradigm. In the future, even if we change paradigms, we anticipate some concept of usage and context window still being applicable, even if it's just string length.

It's possible to send a prompt that causes the context window to overflow. That is, consider a case where `session.measureContextUsage(promptString) > session.contextWindow - session.contextUsage` before calling `session.prompt(promptString)`, and then the web developer calls `session.prompt(promptString)` anyway. In such cases, the initial portions of the conversation with the language model will be removed, one prompt/response pair at a time, until enough tokens are available to process the new prompt. The exception is the `initialPrompts`, which are never removed.

Such overflows can be detected by listening for the `"contextoverflow"` event on the session:

```js
session.addEventListener("contextoverflow", () => {
  console.log("We've gone past the context window, and some inputs will be dropped!");
});
```

If it's not possible to remove enough tokens from the conversation history to process the new prompt, then the `prompt()` or `promptStreaming()` call will fail with a `QuotaExceededError` exception and nothing will be removed. This is a proposed new type of exception, which subclasses `DOMException`, and replaces the web platform's existing `"QuotaExceededError"` `DOMException`. See [whatwg/webidl#1465](https://github.com/whatwg/webidl/pull/1465) for this proposal. For our purposes, the important part is that it has the following properties:

* `requested`: how many tokens the input consists of
* `context window`: how many tokens were available (which will be less than `requested`, and equal to the value of `session.contextWindow - session.contextUsage` at the time of the call)

### Multilingual content and expected input languages

The default behavior for a language model session assumes that the input languages are unknown. In this case, implementations will use whatever "base" capabilities they have available for the language model, and might throw `"NotSupportedError"` `DOMException`s if they encounter languages they don't support.

It's better practice, if possible, to supply the `create()` method with information about the expected input languages. This allows the implementation to download any necessary supporting material, such as fine-tunings or safety-checking models, and to immediately reject the promise returned by `create()` if the web developer needs to use languages that the browser is not capable of supporting:

```js
const session = await LanguageModel.create({
  initialPrompts: [{
    role: "system",
    content: `
      You are a foreign-language tutor for Japanese. The user is Korean. If necessary, either you or
      the user might "break character" and ask for or give clarification in Korean. But by default,
      prefer speaking in Japanese, and return to the Japanese conversation once any sidebars are
      concluded.
    `
  }],
  expectedInputs: [{
    type: "text",
    languages: ["en" /* for the system prompt */, "ja", "ko"]
  }],
  // See below section
  expectedOutputs: [{
    type: "text",
    languages: ["ja", "ko"]
  }],
});
```

The expected input languages are supplied alongside the [expected input types](#multimodal-inputs), and can vary per type. Our above example assumes the default of `type: "text"`, but more complicated combinations are possible, e.g.:

```js
const session = await LanguageModel.create({
  expectedInputs: [
    // Be sure to download any material necessary for English and Japanese text
    // prompts, or fail-fast if the model cannot support that.
    { type: "text", languages: ["en", "ja"] },

    // `languages` omitted: audio input processing will be best-effort based on
    // the base model's capability.
    { type: "audio" },

    // Be sure to download any material necessary for OCRing French text in
    // images, or fail-fast if the model cannot support that.
    { type: "image", languages: ["fr"] }
  ]
});
```

Note that the expected input languages do not affect the context or prompt the language model sees; they only impact the process of setting up the session, performing appropriate downloads, and failing creation if those input languages are unsupported.

If you want to check the availability of a given `expectedInputs` configuration before initiating session creation, you can use the `LanguageModel.availability()` method:

```js
const availability = await LanguageModel.availability({
  expectedInputs: [
    { type: "text", languages: ["en", "ja"] },
    { type: "audio", languages: ["en", "ja"] }
  ]
});

// `availability` will be one of "unavailable", "downloadable", "downloading", or "available".
```

### Expected output languages

In general, what output language the model responds in will be governed by the language model's own decisions. For example, a prompt such as "Please say something in French" could produce "Bonjour" or it could produce "I'm sorry, I don't know French".

However, if you know ahead of time what languages you are hoping for the language model to output, it's best practice to use the `expectedOutputs` option to `LanguageModel.create()` to indicate them. This allows the implementation to download any necessary supporting material for those output languages, and to immediately reject the returned promise if it's known that the model cannot support that language:

```js
const session = await LanguageModel.create({
  initialPrompts: [{
    role: "system",
    content: `You are a helpful, harmless French chatbot.`
  }],
  expectedInputs: [
    { type: "text", languages: ["en" /* for the system prompt */, "fr"] }
  ],
  expectedOutputs: [
    { type: "text", languages: ["fr"] }
  ]
});
```

As with `expectedInputs`, specifying a given language in `expectedOutputs` does not actually influence the language model's output. It's only expressing an expectation that can help set up the session, perform downloads, and fail creation if necessary. And as with `expectedInputs`, you can use `LanguageModel.availability()` to check ahead of time, before creating a session.

(Note that presently, the prompt API does not support multimodal outputs, so including anything array entries with `type`s other than `"text"` will always fail. However, we've chosen this general shape so that in the future, if multimodal output support is added, it fits into the API naturally.)

### Testing available options before creation

In the simple case, web developers should call `LanguageModel.create()`, and handle failures gracefully.

However, if the web developer wants to provide a differentiated user experience, which lets users know ahead of time that the feature will not be possible or might require a download, they can use the promise-returning `LanguageModel.availability()` method. This method lets developers know, before calling `create()`, what is possible with the implementation.

The method will return a promise that fulfills with one of the following availability values:

* "`unavailable`" means that the implementation does not support the requested options, or does not support prompting a language model at all.
* "`downloadable`" means that the implementation supports the requested options, but it will have to download something (e.g. the language model itself, or a fine-tuning) before it can create a session using those options.
* "`downloading`" means that the implementation supports the requested options, but will need to finish an ongoing download operation before it can create a session using those options.
* "`available`" means that the implementation supports the requested options without requiring any new downloads.

An example usage is the following:

```js
const options = {
  expectedInputs: [
    { type: "text", languages: ["en", "es"] },
    { type: "audio", languages: ["en", "es"] }
  ]
};

const availability = await LanguageModel.availability(options);

if (availability !== "unavailable") {
  if (availability !== "available") {
    console.log("Sit tight, we need to do some downloading...");
  }

  const session = await LanguageModel.create(options);
  // ... Use session ...
} else {
  // Either the API overall, or the expected languages and temperature setting, is not available.
  console.error("No language model for us :(");
}
```

### Download progress

For cases where using the API is only possible after a download, you can monitor the download progress (e.g. in order to show your users a progress bar) using code such as the following:

```js
const session = await LanguageModel.create({
  monitor(m) {
    m.addEventListener("downloadprogress", e => {
      console.log(`Downloaded ${e.loaded * 100}%`);
    });
  }
});
```

If the download fails, then `downloadprogress` events will stop being emitted, and the promise returned by `create()` will be rejected with a "`NetworkError`" `DOMException`.

Note that in the case that multiple entities are downloaded (e.g., a base model plus [LoRA fine-tunings](https://arxiv.org/abs/2106.09685) for the `expectedInputs`) web developers do not get the ability to monitor the individual downloads. All of them are bundled into the overall `downloadprogress` events, and the `create()` promise is not fulfilled until all downloads and loads are successful.

The event is a [`ProgressEvent`](https://developer.mozilla.org/en-US/docs/Web/API/ProgressEvent) whose `loaded` property is between 0 and 1, and whose `total` property is always 1. (The exact number of total or downloaded bytes are not exposed; see the discussion in [webmachinelearning/writing-assistance-apis issue #15](https://github.com/webmachinelearning/writing-assistance-apis/issues/15).)

At least two events, with `e.loaded === 0` and `e.loaded === 1`, will always be fired. This is true even if creating the model doesn't require any downloading.

<details>
<summary>What's up with this pattern?</summary>

This pattern is a little involved. Several alternatives have been considered. However, asking around the web standards community it seemed like this one was best, as it allows using standard event handlers and `ProgressEvent`s, and also ensures that once the promise is settled, the session object is completely ready to use.

It is also nicely future-extensible by adding more events and properties to the `m` object.

Finally, note that there is a sort of precedent in the (never-shipped) [`FetchObserver` design](https://github.com/whatwg/fetch/issues/447#issuecomment-281731850).
</details>

## Detailed design

### Instruction-tuned versus base models

We intend for this API to expose instruction-tuned models. Although we cannot mandate any particular level of quality or instruction-following capability, we think setting this base expectation can help ensure that what browsers ship is aligned with what web developers expect.

To illustrate the difference and how it impacts web developer expectations:

* In a base model, a prompt like "Write a poem about trees." might get completed with "... Write about the animal you would like to be. Write about a conflict between a brother and a sister." (etc.) It is directly completing plausible next tokens in the text sequence.
* Whereas, in an instruction-tuned model, the model will generally _follow_ instructions like "Write a poem about trees.", and respond with a poem about trees.

To ensure the API can be used by web developers across multiple implementations, all browsers should be sure their models behave like instruction-tuned models.

### Permissions policy, iframes, and workers

By default, this API is only available to top-level `Window`s, and to their same-origin iframes. Access to the API can be delegated to cross-origin iframes using the [Permissions Policy](https://developer.mozilla.org/en-US/docs/Web/HTTP/Permissions_Policy) `allow=""` attribute:

```html
<iframe src="https://example.com/" allow="language-model"></iframe>
```

This API is currently not available in workers, due to the complexity of establishing a responsible document for each worker in order to check the permissions policy status. See [this discussion](https://github.com/webmachinelearning/translation-api/issues/18#issuecomment-2705630392) for more. It may be possible to loosen this restriction over time, if use cases arise.

Note that although the API is not exposed to web platform workers, a browser could expose them to extension service workers, which are outside the scope of web platform specifications and have a different permissions model.

## Alternatives considered and under consideration

### How many stages to reach a response?

To actually get a response back from the model given a prompt, the following possible stages are involved:

1. Download the model, if necessary.
2. Establish a session, including configuring per-session options.
3. Add an initial prompt to establish context. (This will not generate a response.)
4. Execute a prompt and receive a response.

We've chosen to manifest these 3-4 stages into the API as two methods, `LanguageModel.create()` and `session.prompt()`/`session.promptStreaming()`, with some additional facilities for dealing with the fact that `LanguageModel.create()` can include a download step. Some APIs simplify this into a single method, and some split it up into three (usually not four).

### Stateless or session-based

Our design here uses [sessions](#session-persistence-and-cloning). An alternate design, seen in some APIs, is to require the developer to feed in the entire conversation history to the model each time, keeping track of the results.

This can be slightly more flexible; for example, it allows manually correcting the model's responses before feeding them back into the context window.

However, our understanding is that the session-based model can be more efficiently implemented, at least for browsers with on-device models. (Implementing it for a cloud-based model would likely be more work.) And, developers can always achieve a stateless model by using a new session for each interaction.

## Privacy and security considerations

Please see [the _Writing Assistance APIs_ specification](https://webmachinelearning.github.io/writing-assistance-apis/#privacy), where we have centralized the normative privacy and security considerations that apply to all APIs of this type.

## Stakeholder feedback

* W3C TAG: [w3ctag/design-reviews#1093](https://github.com/w3ctag/design-reviews/issues/1093)
* Browser engines and browsers:
  * Chromium: prototyping behind a flag
  * Mozilla: [mozilla/standards-positions#1213](https://github.com/mozilla/standards-positions/issues/1213)
  * WebKit: [WebKit/standards-positions#495](https://github.com/WebKit/standards-positions/issues/495)
* Web developers: positive
  * See [issue #74](https://github.com/webmachinelearning/prompt-api/issues/74) for some developer feedback
  * Examples of organic enthusiasm: [X post](https://x.com/mortenjust/status/1805190952358650251), [blog post](https://tyingshoelaces.com/blog/chrome-ai-prompt-api), [blog post](https://labs.thinktecture.com/local-small-language-models-in-the-browser-a-first-glance-at-chromes-built-in-ai-and-prompt-api-with-gemini-nano/)
  * [Feedback from developer surveys](https://docs.google.com/presentation/d/1DhFC2oB4PRrchavxUY3h9U4w4hrX5DAc5LoMqhn5hnk/edit#slide=id.g349a9ada368_1_6327)


================================================
FILE: index.bs
================================================
<pre class='metadata'>
Title: Prompt API
Shortname: prompt
Level: None
Status: CG-DRAFT
Group: webml
Repository: webmachinelearning/prompt-api
URL: https://webmachinelearning.github.io/prompt-api
Editor: Reilly Grant 83788, Google https://www.google.com, reillyg@google.com
Former editor: Domenic Denicola 52873, Google https://www.google.com/, d@domenic.me, https://domenic.me/
Abstract: The prompt API gives web pages the ability to directly prompt a language model
Markup Shorthands: markdown yes, css no
Complain About: accidental-2119 yes, missing-example-ids yes
Assume Explicit For: yes
Default Biblio Status: current
Boilerplate: omit conformance
Indent: 2
Die On: warning
</pre>

<pre class="link-defaults">
spec:webidl; type:exception; text:TypeError
spec:webidl; type:exception; text:SyntaxError
</pre>

<h2 id="intro">Introduction</h2>

The Prompt API gives web pages the ability to directly prompt a browser-provided language model. It provides a uniform JavaScript API that abstracts away specific details of the underlying model (such as templating or tokenization). By leveraging built-in language models, it offers benefits such as local processing of sensitive data, offline usage, model sharing, and reduced cost compared to cloud-based or bring-your-own-model approaches.

<h2 id="dependencies">Dependencies</h2>

This specification depends on the Infra Standard. [[!INFRA]]

As with the rest of the web platform, human languages are identified in these APIs by BCP 47 language tags, such as "`ja`", "`en-US`", "`sr-Cyrl`", or "`de-CH-1901-x-phonebk-extended`". The specific algorithms used for validation, canonicalization, and language tag matching are those from the <cite>ECMAScript Internationalization API Specification</cite>, which in turn defers some of its processing to <cite>Unicode Locale Data Markup Language (LDML)</cite>. [[BCP47]] [[!ECMA-402]] [[UTS35]].

These APIs are part of a family of APIs expected to be powered by machine learning models, which share common API surface idioms and specification patterns. Currently, the specification text for these shared parts lives in [[WRITING-ASSISTANCE-APIS#supporting]], and the common privacy and security considerations are discussed in [[WRITING-ASSISTANCE-APIS#privacy]] and [[WRITING-ASSISTANCE-APIS#security]]. Implementing these APIs requires implementing that shared infrastructure, and conforming to those privacy and security considerations. But it does not require implementing or exposing the actual writing assistance APIs. [[!WRITING-ASSISTANCE-APIS]]

<h2 id="api">The API</h2>

<xmp class="idl">
[Exposed=Window, SecureContext]
interface LanguageModel : EventTarget {
  static Promise<LanguageModel> create(optional LanguageModelCreateOptions options = {});
  static Promise<Availability> availability(optional LanguageModelCreateCoreOptions options = {});
  // **EXPERIMENTAL**: Only available in extension and experimental contexts.
  static Promise<LanguageModelParams?> params();

  // These will throw "NotSupportedError" DOMExceptions if role = "system"
  Promise<DOMString> prompt(
    LanguageModelPrompt input,
    optional LanguageModelPromptOptions options = {}
  );
  ReadableStream promptStreaming(
    LanguageModelPrompt input,
    optional LanguageModelPromptOptions options = {}
  );
  Promise<undefined> append(
    LanguageModelPrompt input,
    optional LanguageModelAppendOptions options = {}
  );


  Promise<double> measureContextUsage(
    LanguageModelPrompt input,
    optional LanguageModelPromptOptions options = {}
  );
  readonly attribute double contextUsage;
  readonly attribute unrestricted double contextWindow;
  attribute EventHandler oncontextoverflow;

  // **DEPRECATED**: This method is only available in extension contexts.
  Promise<double> measureInputUsage(
    LanguageModelPrompt input,
    optional LanguageModelPromptOptions options = {}
  );
  // **DEPRECATED**: This attribute is only available in extension contexts.
  readonly attribute double inputUsage;
  // **DEPRECATED**: This attribute is only available in extension contexts.
  readonly attribute unrestricted double inputQuota;
  // **DEPRECATED**: This attribute is only available in extension contexts.
  attribute EventHandler onquotaoverflow;

  // **EXPERIMENTAL**: Only available in extension and experimental contexts.
  readonly attribute unsigned long topK;
  // **EXPERIMENTAL**: Only available in extension and experimental contexts.
  readonly attribute float temperature;

  Promise<LanguageModel> clone(optional LanguageModelCloneOptions options = {});
};
LanguageModel includes DestroyableModel;

// **EXPERIMENTAL**: Only available in extension and experimental contexts.
[Exposed=Window, SecureContext]
interface LanguageModelParams {
  readonly attribute unsigned long defaultTopK;
  readonly attribute unsigned long maxTopK;
  readonly attribute float defaultTemperature;
  readonly attribute float maxTemperature;
};


callback LanguageModelToolFunction = Promise<DOMString> (any... arguments);

// A description of a tool call that a language model can invoke.
dictionary LanguageModelTool {
  required DOMString name;
  required DOMString description;
  // JSON schema for the input parameters.
  required object inputSchema;
  // The function to be invoked by user agent on behalf of language model.
  required LanguageModelToolFunction execute;
};

dictionary LanguageModelCreateCoreOptions {
  // Note: these two have custom out-of-range handling behavior, not in the IDL layer.
  // They are unrestricted double so as to allow +Infinity without failing.
  // **EXPERIMENTAL**: Only available in extension and experimental contexts.
  unrestricted double topK;
  // **EXPERIMENTAL**: Only available in extension and experimental contexts.
  unrestricted double temperature;

  sequence<LanguageModelExpected> expectedInputs;
  sequence<LanguageModelExpected> expectedOutputs;
  sequence<LanguageModelTool> tools;
};

dictionary LanguageModelCreateOptions : LanguageModelCreateCoreOptions {
  AbortSignal signal;
  CreateMonitorCallback monitor;

  sequence<LanguageModelMessage> initialPrompts;
};

dictionary LanguageModelPromptOptions {
  object responseConstraint;
  boolean omitResponseConstraintInput = false;
  AbortSignal signal;
};

dictionary LanguageModelAppendOptions {
  AbortSignal signal;
};

dictionary LanguageModelCloneOptions {
  AbortSignal signal;
};

dictionary LanguageModelExpected {
  required LanguageModelMessageType type;
  sequence<DOMString> languages;
};

// The argument to the prompt() method and others like it

typedef (
  sequence<LanguageModelMessage>
  // Shorthand for `[{ role: "user", content: [{ type: "text", value: providedValue }] }]`
  or DOMString
) LanguageModelPrompt;

dictionary LanguageModelMessage {
  required LanguageModelMessageRole role;

  // The DOMString branch is shorthand for `[{ type: "text", value: providedValue }]`
  required (DOMString or sequence<LanguageModelMessageContent>) content;

  boolean prefix = false;
};

dictionary LanguageModelMessageContent {
  required LanguageModelMessageType type;
  required LanguageModelMessageValue value;
};

enum LanguageModelMessageRole { "system", "user", "assistant" };

enum LanguageModelMessageType { "text", "image", "audio", "tool-call", "tool-response" };

typedef (
  ImageBitmapSource
  or AudioBuffer
  or BufferSource
  or DOMString
) LanguageModelMessageValue;
</xmp>

<h3 id="language-model-creation">Creation</h3>

<div algorithm>
  The static <dfn method for="LanguageModel">create(|options|)</dfn> method steps are:

  1. Return the result of [=creating an AI model object=] given |options|, "{{language-model}}", [=validate and canonicalize language model options=], [=compute language model options availability=], [=download the language model=], [=initialize the language model=], [=create a language model object=], and false.
</div>

<div algorithm>
  To <dfn>validate and canonicalize language model options</dfn> given a {{LanguageModelCreateCoreOptions}} |options|, perform the following steps. They mutate |options| in place to canonicalize and deduplicate language tags, and throw an exception if any are invalid.

  1. If |options|["{{LanguageModelCreateCoreOptions/expectedInputs}}"] [=map/exists=], then [=list/for each=] |expected| of |options|["{{LanguageModelCreateCoreOptions/expectedInputs}}"]:
    1. If |expected|["{{LanguageModelExpected/languages}}"] [=map/exists=], then [=Validate and canonicalize language tags=] given |expected| and "{{LanguageModelExpected/languages}}".

  1. If |options|["{{LanguageModelCreateCoreOptions/expectedOutputs}}"] [=map/exists=], then [=list/for each=] |expected| of |options|["{{LanguageModelCreateCoreOptions/expectedOutputs}}"]:
    1. If |expected|["{{LanguageModelExpected/languages}}"] [=map/exists=], then [=Validate and canonicalize language tags=] given |expected| and "{{LanguageModelExpected/languages}}".

  1. If |options|["{{LanguageModelCreateOptions/initialPrompts}}"] [=map/exists=], then:
    1. Let |expectedInputs| be |options|["{{LanguageModelCreateCoreOptions/expectedInputs}}"] if it [=map/exists=]; otherwise an empty [=list=].
    1. Let |expectedInputTypes| be the result of [=get the expected content types=] given |expectedInputs|.
    1. Perform [=validating and canonicalizing a prompt=] given |options|["{{LanguageModelCreateOptions/initialPrompts}}"], |expectedInputTypes|, and false.
</div>

<div algorithm>
  To <dfn>download the language model</dfn>, given a {{LanguageModelCreateCoreOptions}} |options|:

  1. [=Assert=]: these steps are running [=in parallel=].

  1. Initiate the download process for everything the user agent needs to prompt a language model according to |options|. This could include a base AI model, fine-tunings for specific languages or option values, or other resources.

  1. If the download process cannot be started for any reason, then return false.

  1. Return true.
</div>

<div algorithm>
  To <dfn>initialize the language model</dfn>, given a {{LanguageModelCreateOptions}} |options|:

  1. [=Assert=]: these steps are running [=in parallel=].

  1. Let |availability| be the result of [=compute language model options availability=] given |options|.

    1. If |availability| is null or {{Availability/unavailable}}, then return a [=DOMException error information=] whose [=DOMException error information/name=] is "{{NotSupportedError}}" and whose [=DOMException error information/details=] contain appropriate detail.

  1. Perform any necessary initialization operations for the AI model backing the user agent's prompting capabilities.

    This could include loading the appropriate model and any fine-tunings necessary to support |options| into memory.

    1. If |options|["{{LanguageModelCreateOptions/initialPrompts}}"] [=map/exists=], then:
      1. Let |expectedInputs| be |options|["{{LanguageModelCreateCoreOptions/expectedInputs}}"] if it [=map/exists=]; otherwise an empty [=list=].
      1. Let |expectedInputTypes| be the result of [=get the expected content types=] given |expectedInputs|.
      1. Let |initialMessages| be the result of [=validating and canonicalizing a prompt=] given |options|["{{LanguageModelCreateOptions/initialPrompts}}"], |expectedInputTypes|, and false.
      1. Load |initialMessages| into the model's context window.

    1. If |options|["{{LanguageModelCreateCoreOptions/tools}}"] [=map/exists=], then load |options|["{{LanguageModelCreateCoreOptions/tools}}"] into the model's context window.

  1. If initialization failed because the process of loading |options| resulted in using up all of the model's context window, then:

    1. Let |requested| be the amount of context window needed to encode |options|. The encoding of |options| as input is [=implementation-defined=].

    1. Let |maximum| be the maximum context window size that the user agent supports.

    1. [=Assert=]: |requested| is greater than |maximum|. (That is how we reached this error branch.)

    1. Return a [=quota exceeded error information=] whose [=QuotaExceededError/requested=] is |requested| and [=QuotaExceededError/quota=] is |maximum|.

  1. If initialization failed for any other reason, then return a [=DOMException error information=] whose [=DOMException error information/name=] is "{{OperationError}}" and whose [=DOMException error information/details=] contain appropriate detail.

  1. Return null.
</div>

<div algorithm>
  To <dfn>create a language model object</dfn>, given a [=ECMAScript/realm=] |realm| and a {{LanguageModelCreateOptions}} |options|:

  1. [=Assert=]: these steps are running on |realm|'s [=ECMAScript/surrounding agent=]'s [=agent/event loop=].

  1. Let |contextWindowSize| be the amount of context window that is available to the user agent for this model. (This value is [=implementation-defined=], and may be +∞ if there are no specific limits beyond, e.g., the user's memory, or the limits of JavaScript strings.)

  1. Let |initialMessages| be an empty [=list=] of {{LanguageModelMessage}}s.

  1. Let |initialMessagesUsage| be 0.

  1. If |options|["{{LanguageModelCreateOptions/initialPrompts}}"] [=map/exists=], then:
    1. Let |expectedInputs| be |options|["{{LanguageModelCreateCoreOptions/expectedInputs}}"] if it [=map/exists=]; otherwise an empty [=list=].
    1. Let |expectedInputTypes| be the result of [=get the expected content types=] given |expectedInputs|.
    1. Set |initialMessages| to the result of [=validating and canonicalizing a prompt=] given |options|["{{LanguageModelCreateOptions/initialPrompts}}"], |expectedInputTypes|, and false.
    1. Set |initialMessagesUsage| to the result of [=measure language model context usage=] given |initialMessages|, and |options|["{{LanguageModelCreateOptions/signal}}"].

  1. Return a new {{LanguageModel}} object, created in |realm|, with

    <dl class="props">
      : [=LanguageModel/initial messages=]
      :: |initialMessages|

      : [=LanguageModel/top K=]
      :: |options|["{{LanguageModelCreateCoreOptions/topK}}"] if it [=map/exists=]; otherwise an [=implementation-defined=] value

      : [=LanguageModel/temperature=]
      :: |options|["{{LanguageModelCreateCoreOptions/temperature}}"] if it [=map/exists=]; otherwise an [=implementation-defined=] value

      : [=LanguageModel/expected inputs=]
      :: |options|["{{LanguageModelCreateCoreOptions/expectedInputs}}"] if it [=map/exists=]; otherwise an empty [=list=]

      : [=LanguageModel/expected outputs=]
      :: |options|["{{LanguageModelCreateCoreOptions/expectedOutputs}}"] if it [=map/exists=]; otherwise an empty [=list=]

      : [=LanguageModel/tools=]
      :: |options|["{{LanguageModelCreateCoreOptions/tools}}"] if it [=map/exists=]; otherwise an empty [=list=]

      : [=LanguageModel/context window size=]
      :: |contextWindowSize|

      : [=LanguageModel/current context usage=]
      :: |initialMessagesUsage|
    </dl>
</div>

<h3 id="language-model-availability">Availability</h3>

<div algorithm>
  The static <dfn method for="LanguageModel">availability(|options|)</dfn> method steps are:

  1. Return the result of [=computing AI model availability=] given |options|, "{{language-model}}", [=validate and canonicalize language model options=], and [=compute language model options availability=].
</div>

<div algorithm>
  To <dfn>compute language model options availability</dfn> given a {{LanguageModelCreateCoreOptions}} |options|, perform the following steps. They return either an {{Availability}} value or null, and they mutate |options| in place to update language tags to their best-fit matches.

  1. [=Assert=]: this algorithm is running [=in parallel=].

  1. Let |availability| be the [=language model non-options availability=].

  1. If |availability| is null, then return null.

  1. Let |availabilities| be a [=list=] containing |availability|.

  1. Let |inputPartition| be the result of [=getting the language availabilities partition=] given the purpose of prompting a language model with text in that language.

  1. Let |outputPartition| be the result of [=getting the language availabilities partition=] given the purpose of producing language model output in that language.

  1. If |options|["{{LanguageModelCreateCoreOptions/expectedInputs}}"] [=map/exists=], then [=list/for each=] |expected| of |options|["{{LanguageModelCreateCoreOptions/expectedInputs}}"]:
    1. If |expected|["{{LanguageModelExpected/languages}}"] [=map/exists=], then:
      1. Let |inputLanguageAvailability| be the result of [=computing language availability=] given |expected|["{{LanguageModelExpected/languages}}"] and |inputPartition|.
      1. [=list/Append=] |inputLanguageAvailability| to |availabilities|.
    1. Let |inputTypeAvailability| be the [=language model content type availability=] given |expected|["{{LanguageModelExpected/type}}"] and true.
    1. [=list/Append=] |inputTypeAvailability| to |availabilities|.

  1. If |options|["{{LanguageModelCreateCoreOptions/expectedOutputs}}"] [=map/exists=], then [=list/for each=] |expected| of |options|["{{LanguageModelCreateCoreOptions/expectedOutputs}}"]:
    1. If |expected|["{{LanguageModelExpected/languages}}"] [=map/exists=], then:
      1. Let |outputLanguageAvailability| be the result of [=computing language availability=] given |expected|["{{LanguageModelExpected/languages}}"] and |outputPartition|.
      1. [=list/Append=] |outputLanguageAvailability| to |availabilities|.
    1. Let |outputTypeAvailability| be the [=language model content type availability=] given |expected|["{{LanguageModelExpected/type}}"] and false.
    1. [=list/Append=] |outputTypeAvailability| to |availabilities|.

  1. Return the [=Availability/minimum availability=] given |availabilities|.
</div>

<div algorithm>
  The <dfn>language model non-options availability</dfn> is given by the following steps. They return an {{Availability}} value or null.

  1. [=Assert=]: this algorithm is running [=in parallel=].

  1. If there is some error attempting to determine whether the user agent [=model availability/can support=] prompting a language model, which the user agent believes to be transient (such that re-querying could stop producing such an error), then return null.

  1. If the user agent [=model availability/currently supports=] prompting a language model, then return "{{Availability/available}}".

  1. If the user agent believes it will be able to [=model availability/support=] prompting a language model, but only after finishing a download that is already ongoing, then return "{{Availability/downloading}}".

  1. If the user agent believes it will be able to [=model availability/support=] prompting a language model, but only after performing a not-currently-ongoing download, then return "{{Availability/downloadable}}".

  1. Otherwise, return "{{Availability/unavailable}}".
</div>

<div algorithm>
  The <dfn>language model content type availability</dfn> given a {{LanguageModelMessageType}} |type| and a boolean |isInput|, is given by the following steps. They return an {{Availability}} value.

  1. [=Assert=]: this algorithm is running [=in parallel=].

  1. If the user agent [=model availability/currently supports=] |type| as an input if |isInput| is true, or as an output if |isInput| is false, then return "{{Availability/available}}".

  1. If the user agent believes it will be able to [=model availability/support=] |type| as such, but only after finishing a download that is already ongoing, then return "{{Availability/downloading}}".

  1. If the user agent believes it will be able to [=model availability/support=] |type| as such, but only after performing a not-currently-ongoing download, then return "{{Availability/downloadable}}".

  1. Otherwise, return "{{Availability/unavailable}}".
</div>

<h3 id="the-languagemodel-class">The {{LanguageModel}} class</h3>

Every {{LanguageModel}} has an <dfn for="LanguageModel">initial messages</dfn>, a [=list=] of {{LanguageModelMessage}}s, set during creation.

Every {{LanguageModel}} has a <dfn for="LanguageModel">top K</dfn>, an unsigned long, set during creation.

Every {{LanguageModel}} has a <dfn for="LanguageModel">temperature</dfn>, a float, set during creation.

Every {{LanguageModel}} has an <dfn for="LanguageModel">expected inputs</dfn>, a [=list=] of {{LanguageModelExpected}}s, set during creation.

Every {{LanguageModel}} has an <dfn for="LanguageModel">expected outputs</dfn>, a [=list=] of {{LanguageModelExpected}}s, set during creation.

Every {{LanguageModel}} has a <dfn for="LanguageModel">tools</dfn>, a [=list=] of {{LanguageModelTool}}s, set during creation.

Every {{LanguageModel}} has a <dfn for="LanguageModel">context window size</dfn>, an unrestricted double, set during creation.

Every {{LanguageModel}} has a <dfn for="LanguageModel">current context usage</dfn>, a double, initially 0.

<hr>

The <dfn attribute for="LanguageModel">contextUsage</dfn> getter steps are to return [=this=]'s [=LanguageModel/current context usage=].

The <dfn attribute for="LanguageModel">inputUsage</dfn> getter steps are to return [=this=]'s [=LanguageModel/current context usage=].

The <dfn attribute for="LanguageModel">contextWindow</dfn> getter steps are to return [=this=]'s [=LanguageModel/context window size=].

The <dfn attribute for="LanguageModel">inputQuota</dfn> getter steps are to return [=this=]'s [=LanguageModel/context window size=].

The <dfn attribute for="LanguageModel">topK</dfn> getter steps are to return [=this=]'s [=LanguageModel/top K=].

The <dfn attribute for="LanguageModel">temperature</dfn> getter steps are to return [=this=]'s [=LanguageModel/temperature=].

<hr>

The following are the [=event handlers=] (and their corresponding [=event handler event types=]) that must be supported, as [=event handler IDL attributes=], by all {{LanguageModel}} objects:

<table>
  <thead>
    <tr>
      <th>[=Event handler=]
      <th>[=Event handler event type=]
  <tbody>
    <tr>
      <td><dfn attribute for="LanguageModel">oncontextoverflow</dfn>
      <td><dfn event for="LanguageModel">contextoverflow</dfn>
    <tr>
      <td><dfn attribute for="LanguageModel">onquotaoverflow</dfn>
      <td><dfn event for="LanguageModel">quotaoverflow</dfn>
</table>

<hr>

<div algorithm>
  The <dfn method for="LanguageModel">prompt(|input|, |options|)</dfn> method steps are:

  1. Let |responseConstraint| be |options|["{{LanguageModelPromptOptions/responseConstraint}}"] if it [=map/exists=]; otherwise null.

  1. Let |omitResponseConstraintInput| be |options|["{{LanguageModelPromptOptions/omitResponseConstraintInput}}"].

  1. Let |operation| be an algorithm step which takes arguments |chunkProduced|, |done|, |error|, and |stopProducing|, and performs the following steps:

    1. Let |prefillSuccess| be the result of [=prefilling=] given [=this=], |input|, |omitResponseConstraintInput|, |responseConstraint|, |error|, and |stopProducing|.

    1. If |prefillSuccess| is true, then [=generate=] given [=this=], |responseConstraint|, |chunkProduced|, |done|, |error|, and |stopProducing|.

  1. Return the result of [=getting an aggregated AI model result=] given [=this=], |options|, and |operation|.
</div>

<div algorithm>
  The <dfn method for="LanguageModel">promptStreaming(|input|, |options|)</dfn> method steps are:

  1. Let |responseConstraint| be |options|["{{LanguageModelPromptOptions/responseConstraint}}"] if it [=map/exists=]; otherwise null.

  1. Let |omitResponseConstraintInput| be |options|["{{LanguageModelPromptOptions/omitResponseConstraintInput}}"].

  1. Let |operation| be an algorithm step which takes arguments |chunkProduced|, |done|, |error|, and |stopProducing|, and performs the following steps:

    1. Let |prefillSuccess| be the result of [=prefilling=] given [=this=], |input|, |omitResponseConstraintInput|, |responseConstraint|, |error|, and |stopProducing|.

    1. If |prefillSuccess| is true, then [=generate=] given [=this=], |responseConstraint|, |chunkProduced|, |done|, |error|, and |stopProducing|.

  1. Return the result of [=getting a streaming AI model result=] given [=this=], |options|, and |operation|.
</div>

<div algorithm>
  The <dfn method for="LanguageModel">append(|input|, |options|)</dfn> method steps are:

  1. Let |operation| be an algorithm step which takes arguments |chunkProduced|, |done|, |error|, and |stopProducing|, and performs the following steps:

    <p class="note">|chunkProduced| is never called because the [=prefilling=] algorithm does not generate chunks.</p>

    1. Let |prefillSuccess| be the result of [=prefilling=] given [=this=], |input|, false, null, |error|, and |stopProducing|.

    1. If |prefillSuccess| is true and |done| is not null, then perform |done|.

  1. Return the result of [=getting an aggregated AI model result=] given [=this=], |options|, and |operation|.
</div>

<div algorithm>
  The <dfn method for="LanguageModel">measureContextUsage(|input|, |options|)</dfn> method steps are:

  1. If |options|["{{LanguageModelPromptOptions/omitResponseConstraintInput}}"] is true and |options|["{{LanguageModelPromptOptions/responseConstraint}}"] does not [=map/exist=], then throw a "{{TypeError}}" {{DOMException}}.

  1. Let |expectedInputTypes| be the result of [=get the expected content types=] given [=this=]'s [=LanguageModel/expected inputs=].

  1. Let |messages| be the result of [=validating and canonicalizing a prompt=] given |input|, |expectedInputTypes|, and false.

  1. If |options|["{{LanguageModelPromptOptions/responseConstraint}}"] [=map/exists=] and is not null and |options|["{{LanguageModelPromptOptions/omitResponseConstraintInput}}"] is false, then implementations may insert an [=implementation-defined=] {{LanguageModelMessage}} to |messages| to guide the model's behavior.

  1. Let |measureUsage| be an algorithm step which takes argument |stopMeasuring|, and returns the result of [=measuring language model context usage=] given |messages|, and |stopMeasuring|.

  1. Return the result of [=measuring AI model input usage=] given [=this=], |options|, and |measureUsage|.
</div>

<div algorithm>
  The <dfn method for="LanguageModel">measureInputUsage(|input|, |options|)</dfn> method steps are:

  1. Return the result of running the {{LanguageModel/measureContextUsage()}} method steps given |input| and |options|.
</div>

<div algorithm>
  The <dfn method for="LanguageModel">clone(|options|)</dfn> method steps are:

  1. Return the result of [=cloning a language model=] given [=this=] and |options|.
</div>

<h4 id="language-model-prompting">Prefilling and generating</h4>

<div algorithm>
  To <dfn>prefill</dfn> given:

  * a {{LanguageModel}} |model|,
  * a {{LanguageModelPrompt}} |input|,
  * a boolean |omitResponseConstraintInput|,
  * an object-or-null |responseConstraint|,
  * an algorithm-or-null |error| that takes [=error information=] and returns nothing, and
  * an algorithm-or-null |stopPrefilling| that takes no arguments and returns a boolean,

  perform the following steps:

  1. [=Assert=]: this algorithm is running [=in parallel=].

  1. Let |messages| be the result of [=validating and canonicalizing a prompt=] given |input|, |expectedInputTypes|, and true if |model|'s [=LanguageModel/current context usage=] is greater than 0, otherwise false.

    If this throws an exception |e|, then:
    1. If |error| is not null, perform |error| given a [=DOMException error information=] whose [=DOMException error information/name=] is |e|'s [=DOMException/name=] and whose [=DOMException error information/details=] contain appropriate detail.
    1. Return false.

  1. If |responseConstraint| is not null and |omitResponseConstraintInput| is false, then implementations may insert an [=implementation-defined=] {{LanguageModelMessage}} to |messages| to guide the model's behavior.

  1. Let |requested| be the result of [=measuring language model context usage=] given |messages|, and |stopPrefilling|.

  1. If |requested| is null, then return false.

  1. If |requested| is an [=error information=], then:
    1. If |error| is not null, perform |error| given |requested|.
    1. Return false.

  1. [=Assert=]: |requested| is a number.

  1. If |model|'s [=LanguageModel/current context usage=] + |requested| is greater than |model|'s [=LanguageModel/context window size=], then:
    1. If |error| is not null, then:
      1. Let |errorInfo| be a [=quota exceeded error information=] with a [=QuotaExceededError/requested=] of |model|'s [=LanguageModel/current context usage=] + |requested| and a [=QuotaExceededError/quota=] of |model|'s [=LanguageModel/context window size=].
      1. Perform |error| given |errorInfo|.
    1. Return false.

  1. Let |expectedInputTypes| be the result of [=get the expected content types=] given |model|'s [=LanguageModel/expected inputs=].

  1. In an [=implementation-defined=] manner, update the underlying model's internal state to include |messages|.

     The process should use |model|'s [=LanguageModel/initial messages=], |model|'s [=LanguageModel/top K=], |model|'s [=LanguageModel/temperature=], |model|'s [=LanguageModel/expected inputs=], |model|'s [=LanguageModel/expected outputs=], and |model|'s [=LanguageModel/tools=] to guide how the state is updated.

     The process must conform to the guidance given in [[#privacy]] and [[#security]].

     If during this process |stopPrefilling| returns true, then return false.

     If an error occurred during prefilling:
     1. Let the error be represented as [=error information=] |errorInfo| according to the guidance in [[#language-model-errors]].
     1. If |error| is not null, perform |error| given |errorInfo|.
     1. Return false.

  1. Set |model|'s [=LanguageModel/current context usage=] to |model|'s [=LanguageModel/current context usage=] + |requested|.

  1. Return true.
</div>

<div algorithm>
  To <dfn>generate</dfn> given:

  * a {{LanguageModel}} |model|,
  * an object-or-null |responseConstraint|,
  * an algorithm-or-null |chunkProduced| that takes a [=string=] and returns nothing,
  * an algorithm-or-null |done| that takes no arguments and returns nothing,
  * an algorithm-or-null |error| that takes [=error information=] and returns nothing, and
  * an algorithm-or-null |stopProducing| that takes no arguments and returns a boolean,

  perform the following steps:

  1. [=Assert=]: this algorithm is running [=in parallel=].

  1. In an [=implementation-defined=] manner, subject to the following guidelines, begin the process of producing a response from the language model based on its current internal state.

     The process should use |model|'s [=LanguageModel/initial messages=], |model|'s [=LanguageModel/top K=], |model|'s [=LanguageModel/temperature=], |model|'s [=LanguageModel/expected inputs=], |model|'s [=LanguageModel/expected outputs=], |model|'s [=LanguageModel/tools=], and |responseConstraint| to guide the model's behavior.

     The prompting process must conform to the guidance given in [[#privacy]] and [[#security]].

     If |model|'s [=LanguageModel/tools=] is not empty, the model may use the provided tools by calling their <var ignore>execute</var> functions.

  1. While true:

    1. Wait for the next chunk of response data to be produced, for the process to finish, or for the result of calling |stopProducing| to become true.

    1. If such a chunk is successfully produced:

      1. Let it be represented as a [=string=] |chunk|.

      1. If |chunkProduced| is not null, perform |chunkProduced| given |chunk|.

    1. Otherwise, if the process has finished:

      1. If |done| is not null, perform |done|.

      1. [=iteration/Break=].

    1. Otherwise, if |stopProducing| returns true, then [=iteration/break=].

    1. Otherwise, if an error occurred during prompting:

      1. Let the error be represented as [=error information=] |errorInfo| according to the guidance in [[#language-model-errors]].

      1. If |error| is not null, perform |error| given |errorInfo|.

      1. [=iteration/Break=].
</div>

<h4 id="language-model-usage">Usage</h4>

<div algorithm>
  To <dfn>measure language model context usage</dfn> given:

  * a [=list=] of {{LanguageModelMessage}} |messages|,
  * an algorithm |stopMeasuring| that takes no arguments and returns a boolean,

  perform the following steps:

  1. [=Assert=]: this algorithm is running [=in parallel=].

  1. Let |inputToModel| be the [=implementation-defined=] input that would be sent to the underlying model in order to [=prefill=] given |messages|.

    <p class="note">This will generally consist of the encoding of all of the inputs, possibly with prompt engineering or other implementation-defined wrappers.</p>

    If during this process |stopMeasuring| starts returning true, then return null.

    If an error occurs during this process, then return an appropriate [=DOMException error information=] according to the guidance in [[#language-model-errors]].

  1. Return the amount of context usage needed to represent |inputToModel| when given to the underlying model. The exact calculation procedure is [=implementation-defined=], subject to the following constraints.

    The returned context usage must be nonnegative and finite. It should be roughly proportional to the amount of data in |inputToModel|.

    <p class="note">This might be the number of tokens needed to represent the input in a <a href="https://arxiv.org/abs/2404.08335">language model tokenization scheme</a>, or it might be related to the size of the data in bytes.</p>

    If during this process |stopMeasuring| starts returning true, then instead return null.

    If an error occurs during this process, then instead return an appropriate [=DOMException error information=] according to the guidance in [[#language-model-errors]].
</div>

<h4 id="language-model-options">Options</h4>

<div algorithm>
  To <dfn>get the expected content types</dfn> given a [=list=] of {{LanguageModelExpected}}s |expectedContents|:

  1. Let |expectedTypes| be an empty [=list=] of {{LanguageModelMessageType}}s.

  1. [=list/For each=] |expected| of |expectedContents|:
    1. If |expectedTypes| does not [=list/contain=] |expected|["{{LanguageModelExpected/type}}"], then [=list/append=] |expected|["{{LanguageModelExpected/type}}"] to |expectedTypes|.

  1. If |expectedTypes| does not [=list/contain=] "{{LanguageModelMessageType/text}}", then [=list/append=] "{{LanguageModelMessageType/text}}" to |expectedTypes|.

  1. Return |expectedTypes|.
</div>

<div algorithm>
  To <dfn export lt="validate and canonicalize a prompt|validating and canonicalizing a prompt">validate and canonicalize a prompt</dfn> given a {{LanguageModelPrompt}} |input|, a [=list=] of {{LanguageModelMessageType}}s |expectedTypes|, and a boolean |hasAppendedInput|, perform the following steps. The return value will be a non-empty [=list=] of {{LanguageModelMessage}}s in their "longhand" form.

  1. [=Assert=]: |expectedTypes| [=list/contains=] "{{LanguageModelMessageType/text}}".

  1. If |input| is a [=string=], then return <span style="white-space: pre-wrap">«
      «[
        "{{LanguageModelMessage/role}}" → "{{LanguageModelMessageRole/user}}",
        "{{LanguageModelMessage/content}}" → «
          «[
            "{{LanguageModelMessageContent/type}}" → "{{LanguageModelMessageType/text}}",
            "{{LanguageModelMessageContent/value}}" → |input|
          ]»
        »,
        "{{LanguageModelMessage/prefix}}" → false
      ]»
    »</span>.

  1. [=Assert=]: |input| is a [=list=] of {{LanguageModelMessage}}s.

  1. If |input| is an empty [=list=], then return <span style="white-space: pre-wrap">«
      «[
        "{{LanguageModelMessage/role}}" → "{{LanguageModelMessageRole/user}}",
        "{{LanguageModelMessage/content}}" → «
          «[
            "{{LanguageModelMessageContent/type}}" → "{{LanguageModelMessageType/text}}",
            "{{LanguageModelMessageContent/value}}" → ""
          ]»
        »,
        "{{LanguageModelMessage/prefix}}" → false
      ]»
    »</span>.

  1. Let |messages| be an empty [=list=] of {{LanguageModelMessage}}s.

  1. [=list/For each=] |message| of |input|:

    1. If |message|["{{LanguageModelMessage/content}}"] is a [=string=], then set |message| to <span style="white-space: pre-wrap">«[
        "{{LanguageModelMessage/role}}" → |message|["{{LanguageModelMessage/role}}"],
        "{{LanguageModelMessage/content}}" → «
          «[
            "{{LanguageModelMessageContent/type}}" → "{{LanguageModelMessageType/text}}",
            "{{LanguageModelMessageContent/value}}" → |message|["{{LanguageModelMessage/content}}"]
          ]»
        »,
        "{{LanguageModelMessage/prefix}}" → |message|["{{LanguageModelMessage/prefix}}"]
      ]»</span>.

    1. If |message|["{{LanguageModelMessage/prefix}}"] is true, then:

      1. If |message|["{{LanguageModelMessage/role}}"] is not "{{LanguageModelMessageRole/assistant}}", then throw a "{{SyntaxError}}" {{DOMException}}.

      1. If |message| is not the last item in |messages|, then throw a "{{SyntaxError}}" {{DOMException}}.

    1. If |message|["{{LanguageModelMessage/role}}"] is "{{LanguageModelMessageRole/system}}", then:

      1. If |hasAppendedInput| is true, then throw a "{{TypeError}}" {{DOMException}}.

    1. If |message|["{{LanguageModelMessage/content}}"] is an empty [=list=], then:

      1. Let |emptyContent| be a new {{LanguageModelMessageContent}} initialized with <span style="white-space: pre-wrap">«[
          "{{LanguageModelMessageContent/type}}" → "{{LanguageModelMessageType/text}}",
          "{{LanguageModelMessageContent/value}}" → ""
        ]»</span>.
      
      1. [=list/append=] |emptyContent| to |message|["{{LanguageModelMessage/content}}"].
    
    1. [=list/For each=] |content| of |message|["{{LanguageModelMessage/content}}"]:

      1. If |message|["{{LanguageModelMessage/role}}"] is "{{LanguageModelMessageRole/assistant}}" and |content|["{{LanguageModelMessageContent/type}}"] is not "{{LanguageModelMessageType/text}}", then throw a "{{NotSupportedError}}" {{DOMException}}.

      1. If |content|["{{LanguageModelMessageContent/type}}"] is "{{LanguageModelMessageType/text}}" and |content|["{{LanguageModelMessageContent/value}}"] is not a [=string=], then throw a "{{TypeError}}" {{DOMException}}.

      1. If |content|["{{LanguageModelMessageContent/type}}"] is "{{LanguageModelMessageType/image}}", then:

        1. If |expectedTypes| does not [=list/contain=] "{{LanguageModelMessageType/image}}", then throw a "{{NotSupportedError}}" {{DOMException}}.

        1. If |content|["{{LanguageModelMessageContent/value}}"] is not an {{ImageBitmapSource}} or {{BufferSource}}, then throw a "{{TypeError}}" {{DOMException}}.

      1. If |content|["{{LanguageModelMessageContent/type}}"] is "{{LanguageModelMessageType/audio}}", then:

        1. If |expectedTypes| does not [=list/contain=] "{{LanguageModelMessageType/audio}}", then throw a "{{NotSupportedError}}" {{DOMException}}.

        1. If |content|["{{LanguageModelMessageContent/value}}"] is not an {{AudioBuffer}}, {{BufferSource}}, or {{Blob}}, then throw a "{{TypeError}}" {{DOMException}}.

    1. Let |contentWithContiguousTextCollapsed| be an empty [=list=] of {{LanguageModelMessageContent}}s.

    1. Let |lastTextContent| be null.

    1. [=list/For each=] |content| of |message|["{{LanguageModelMessage/content}}"]:

      1. If |content|["{{LanguageModelMessageContent/type}}"] is "{{LanguageModelMessageType/text}}":

        1. If |lastTextContent| is null:

          1. [=list/Append=] |content| to |contentWithContiguousTextCollapsed|.

          1. Set |lastTextContent| to |content|.

        1. Otherwise, set |lastTextContent|["{{LanguageModelMessageContent/value}}"] to the concatenation of |lastTextContent|["{{LanguageModelMessageContent/value}}"] and |content|["{{LanguageModelMessageContent/value}}"].

          <p class="note">No space or other character is added. Thus, « «[ "{{LanguageModelMessageContent/type}}" → "{{LanguageModelMessageType/text}}", "`foo`" ]», «[ "{{LanguageModelMessageContent/type}}" → "{{LanguageModelMessageType/text}}", "`bar`" ]» » is canonicalized to « «[ "{{LanguageModelMessageContent/type}}" → "{{LanguageModelMessageType/text}}", "`foobar`" ]».</p>

      1. Otherwise:

        1. [=list/Append=] |content| to |contentWithContiguousTextCollapsed|.

        1. Set |lastTextContent| to null.

      1. Set |message|["{{LanguageModelMessage/content}}"] to |contentWithContiguousTextCollapsed|.

    1. [=list/Append=] |message| to |messages|.

    1. Set |hasAppendedInput| to true.

  1. If |messages| [=list/is empty=], then throw a "{{SyntaxError}}" {{DOMException}}.

  1. Return |messages|.
</div>

<h4 id="language-model-errors">Errors</h4>

When prompting fails, the following possible reasons may be surfaced to the web developer. This table lists the possible {{DOMException}} [=DOMException/names=] and the cases in which an implementation should use them:

<table class="data">
  <thead>
    <tr>
      <th>{{DOMException}} [=DOMException/name=]
      <th>Scenarios
  <tbody>
    <tr>
      <td>"{{NotAllowedError}}"
      <td>
        <p>Prompting is disabled by user choice or user agent policy.
    <tr>
      <td>"{{NotReadableError}}"
      <td>
        <p>The model output was filtered by the user agent, e.g., because it was detected to be harmful, inaccurate, or nonsensical.
    <tr>
      <td>"{{NotSupportedError}}"
      <td>
        <p>The input to be processed was in a language that the user agent does not support, or was not provided properly in the call to {{LanguageModel/create()}}.

        <p>The model output ended up being in a language that the user agent does not support (e.g., because the user agent has not performed sufficient quality control tests on that output language).
    <tr>
      <td>"{{UnknownError}}"
      <td>
        <p>All other scenarios, including if the user agent believes it cannot prompt the model and also meet the requirements given in [[#privacy]] or [[#security]]. Or, if the user agent would prefer not to disclose the failure reason.
</table>

<p class="note">This table does not give the complete list of exceptions that can be surfaced by the prompt API. It only contains those which can come from certain [=implementation-defined=] steps.


<div algorithm>
  To <dfn>clone a language model</dfn> given a {{LanguageModel}} |model| and a {{LanguageModelCloneOptions}} |options|:

  1. Let |global| be |model|'s [=relevant global object=].

  1. [=Assert=]: |global| is a {{Window}} object.

  1. If |global|'s [=associated Document=] is not [=Document/fully active=], then return [=a promise rejected with=] an "{{InvalidStateError}}" {{DOMException}}.

  1. Let |signals| be « |model|'s [=DestroyableModel/destruction abort controller=]'s [=AbortController/signal=] ».

  1. If |options|["`signal`"] [=map/exists=], then [=set/append=] it to |signals|.

  1. Let |compositeSignal| be the result of [=creating a dependent abort signal=] given |signals| using {{AbortSignal}} and |model|'s [=relevant realm=].

  1. If |compositeSignal| is [=AbortSignal/aborted=], then return [=a promise rejected with=] |compositeSignal|'s [=AbortSignal/abort reason=].

  1. Let |signal| be |options|["{{LanguageModelCloneOptions/signal}}"] if it [=map/exists=]; otherwise null.

  1. If |signal| is not null and is [=AbortSignal/aborted=], then return a promise rejected with |signal|'s [=AbortSignal/abort reason=].

  1. Let |promise| be [=a new promise=] created in |model|'s [=relevant realm=].

  1. Let |abortedDuringOperation| be false.

    <p class="note">This variable will be written to from the [=event loop=], but read from [=in parallel=].

  1. [=AbortSignal/add|Add the following abort steps=] to |compositeSignal|:

    1. Set |abortedDuringOperation| to true.

    1. [=Reject=] |promise| with |compositeSignal|'s [=AbortSignal/abort reason=].

  1. [=In parallel=]:

    1. [=Queue a global task=] on the [=AI task source=] to perform the following steps:

      1. If |abortedDuringOperation| is true, then return.

      1. Let |clonedModel| be a new {{LanguageModel}} object with:
        - [=LanguageModel/initial messages=] set to |model|'s [=LanguageModel/initial messages=].
        - [=LanguageModel/top K=] set to |model|'s [=LanguageModel/top K=].
        - [=LanguageModel/temperature=] set to |model|'s [=LanguageModel/temperature=].
        - [=LanguageModel/expected inputs=] set to |model|'s [=LanguageModel/expected inputs=].
        - [=LanguageModel/expected outputs=] set to |model|'s [=LanguageModel/expected outputs=].
        - [=LanguageModel/tools=] set to |model|'s [=LanguageModel/tools=].
        - [=LanguageModel/context window size=] set to |model|'s [=LanguageModel/context window size=].
        - [=LanguageModel/current context usage=] set to |model|'s [=LanguageModel/current context usage=].

      1. In an [=implementation-defined=] manner, copy any other state from |model| to |clonedModel|.

      1. If the copy operation fails:
        1. [=Reject=] |promise| with a "{{OperationError}}" {{DOMException}}.
        1. Return.

      1. [=Resolve=] |promise| with |clonedModel|.

  1. Return |promise|.
</div>

<h3 id="permissions-policy">Permissions policy integration</h3>

Access to the prompt API is gated behind the [=policy-controlled feature=] "<dfn permission>language-model</dfn>", which has a [=policy-controlled feature/default allowlist=] of <code>[=default allowlist/'self'=]</code>.

<h2 id="privacy">Privacy considerations</h2>

Please see [[WRITING-ASSISTANCE-APIS#privacy]] for a discussion of privacy considerations for the prompt API. That text was written to apply to all APIs sharing the same infrastructure, as noted in [[#dependencies]].

<h2 id="security">Security considerations</h2>

Please see [[WRITING-ASSISTANCE-APIS#security]] for a discussion of security considerations for the prompt API. That text was written to apply to all APIs sharing the same infrastructure, as noted in [[#dependencies]].


================================================
FILE: security-privacy-questionnaire.md
================================================
# [Self-Review Questionnaire: Security and Privacy](https://w3ctag.github.io/security-questionnaire/)

> 01.  What information does this feature expose,
>      and for what purposes?

This feature exposes two large categories of information:

- The implicit behavior of the underlying language model, in terms of what responses it provides to given inputs.

- The availability information for various capabilities of the API, so that web developers know what capabilities are available in the current browser, and whether using them will require a download or the capability can be used readily.

The privacy implications of both of these are discussed, in general terms, [in the _Writing Assistance APIs_ specification](https://webmachinelearning.github.io/writing-assistance-apis/#privacy), which was written to cover all APIs with similar concerns.

> 02.  Do features in your specification expose the minimum amount of information
>      necessary to implement the intended functionality?

We believe so. It's possible that we could remove the exposure of the download status information. However, it would almost certainly be inferrable via timing side-channels. (I.e., if downloading a language model or fine-tuning is required, then the web developer can observe the creation of the `LanguageModel` object taking longer.)

> 03.  Do the features in your specification expose personal information,
>      personally-identifiable information (PII), or information derived from
>      either?

No. Although it's imaginable that the backing language model could be fine-tuned on PII to give more accurate-to-this-user outputs, we intend to disallow this in the specification.

> 04.  How do the features in your specification deal with sensitive information?

We do not deal with sensitive information.

> 05.  Does data exposed by your specification carry related but distinct
>     information that may not be obvious to users?

It is possible that the multimodal inputs support for this API could pass along metadata, such as image metadata, to the underlying language model. There are cases where this can be useful, so we're unsure whether requiring that the implementation strip such information is the right path.

> 06.  Do the features in your specification introduce state
>      that persists across browsing sessions?

Yes. The downloading of language models, and any collateral necessary to support various options, persists across browsing sessions.

> 07.  Do the features in your specification expose information about the
>      underlying platform to origins?

Possibly. If a browser does not bundle its own models, but instead uses the operating system's functionality, it is possible for a web developer to infer information about such operating system functionality.

> 08.  Does this specification allow an origin to send data to the underlying
>      platform?

Possibly. Again, in the scenario where the model comes from the operating system, such data would pass through OS libraries.

> 09.  Do features in this specification enable access to device sensors?

No.

> 10.  Do features in this specification enable new script execution/loading
>      mechanisms?

No.

> 11.  Do features in this specification allow an origin to access other devices?

No.

> 12.  Do features in this specification allow an origin some measure of control over
>      a user agent's native UI?

No.

> 13.  What temporary identifiers do the features in this specification create or
>      expose to the web?

None.

> 14.  How does this specification distinguish between behavior in first-party and
>      third-party contexts?

We intend to use permissions policy to disallow the usage of these features by default in third-party (cross-origin) contexts. However, the top-level site can delegate to cross-origin iframes.

Otherwise, some of the possible [anti-fingerprinting mitigations](https://webmachinelearning.github.io/writing-assistance-apis/#privacy-availability) involve partitioning information across sites, which is kind of like distinguishing between first- and third-party contexts.

> 15.  How do the features in this specification work in the context of a browser’s
>      Private Browsing or Incognito mode?

One possible area of discussion here is whether backing these APIs with cloud-based models make sense in such modes, or whether they should be disabled.

Otherwise, we do not anticipate any differences.

> 16.  Does this specification have both "Security Considerations" and "Privacy
>      Considerations" sections?

We don't yet have a specification, but when we do, we anticipate it delegating to the corresponding sections in _Writing Assistance APIs_:

* [Privacy considerations](https://webmachinelearning.github.io/writing-assistance-apis/#privacy)
* [Security considerations](https://webmachinelearning.github.io/writing-assistance-apis/#security)

> 17.  Do features in your specification enable origins to downgrade default
>      security protections?

No.

> 18.  What happens when a document that uses your feature is kept alive in BFCache
>      (instead of getting destroyed) after navigation, and potentially gets reused
>      on future navigations back to the document?

Ideally, nothing special should happen. In particular, `LanguageModel` objects should still be usable without interruption after navigating back. We'll need to add web platform tests to confirm this, as it's easy to imagine implementation architectures in which keeping these objects alive while the `Document` is in the back/forward cache is difficult.

(For such implementations, failing to bfcache `Document`s with active `LanguageModel` objects would a simple way of being spec-compliant.)

> 19.  What happens when a document that uses your feature gets disconnected?

The methods of the `LanguageModel` objects will start rejecting with `"InvalidStateError"` `DOMException`s.

> 20.  Does your spec define when and how new kinds of errors should be raised?

We do not yet have a specification, but we anticipate that we will follow the model in the [_Writing Assistance APIs_](https://webmachinelearning.github.io/writing-assistance-apis/). This includes some well-specified and also implementation-defined errors that reveal the limitations of the model. The user agent can instead use an `"UnknownError"` `DOMException` if necessary to protect privacy or security.

> 21.  Does your feature allow sites to learn about the user's use of assistive technology?

No.

> 22.  What should this questionnaire have asked?

Seems fine.


================================================
FILE: w3c.json
================================================
 {
    "group":      [110166]
,   "contacts":   ["cwilso"]
,   "repo-type":  "cg-report"
}