> ## Documentation Index
> Fetch the complete documentation index at: https://docs.viddyscribe.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Generate AD Audio

> Generate an audio-only track with Audio Descriptions.
Supports Input via `media_id`, `url`, or direct `file` upload.
Returns a job_id for tracking progress.


<Note>
  This endpoint is available for **Standard AD only**. Descriptions are placed in existing dialogue gaps without changing the runtime. See [Standard AD vs Extended AD](/help/standard-vs-extended-ad) if you need pauses inserted for fuller descriptions. Extended AD is supported on the video and text endpoints.
</Note>

## Usage Examples

### 1. Using an existing Media ID

If you have already uploaded media and have a `media_id`, use this method.

```bash theme={null}
curl -X POST https://api.viddyscribe.com/enterprise/api/generate_ad_audio \
  -H "X-API-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "input": {
      "type": "media_id",
      "media_id": "550e8400-e29b-41d4-a716-446655440000"
    },
    "generation_config": {
      "language": "en-US"
    }
  }'
```

### 2. Using a Public URL

Upload from a URL and generate in a single step.

```bash theme={null}
curl -X POST https://api.viddyscribe.com/enterprise/api/generate_ad_audio \
  -H "X-API-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "input": {
      "type": "url",
      "url": "https://example.com/video.mp4"
    },
    "generation_config": {
      "language": "en-US"
    }
  }'
```

### 3. Uploading a Local File

Upload a file directly and generate in a single step.

<Note>
  Direct multipart upload supports local files up to 32 MB. For larger local files, see [Large Local File Upload](/large-local-file-upload).
</Note>

```bash theme={null}
curl -X POST https://api.viddyscribe.com/enterprise/api/generate_ad_audio \
  -H "X-API-Key: YOUR_API_KEY" \
  -F 'input={"type": "file"}' \
  -F "file=@/path/to/video.mp4" \
  -F 'generation_config={"language": "en-US"}'
```

## Generation Config Options

| Option                          | Type                 | Description                                                                                                                                                                                                                                  |
| ------------------------------- | -------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `auto_fit`                      | boolean              | Fits descriptions into detected no-speech zones when possible.                                                                                                                                                                               |
| `allow_descriptions_over_music` | boolean              | Allows descriptions over detected music when no better timing is available.                                                                                                                                                                  |
| `custom_captions`               | object               | Uses provided captions for transcript/context instead of auto-generating them. Send `{ "content": "<VTT/SRT/plaintext>" }`; `format` and `filename` are optional hints. The server parses, validates, and normalizes the cues automatically. |
| `volume_option`                 | `auto` or `max`      | Use `max` to set a target max peak level for the AD track, or `auto` for the default.                                                                                                                                                        |
| `volume_level_db`               | number               | Max peak level in dBFS when `volume_option` is `max`. Supported range is `-10` to `0`.                                                                                                                                                       |
| `audio_track_type`              | `mixed` or `ad_only` | Which audio track is returned as `audio_signed_url`. `mixed` (default) is the source dialogue plus AD narration. `ad_only` is just the narration WAV.                                                                                        |

## Troubleshooting

For the full list of API error codes, see [Error Codes](/api-reference/overview#error-codes).

| Error                        | HTTP Status | Description                                                                                                           |
| ---------------------------- | ----------- | --------------------------------------------------------------------------------------------------------------------- |
| `upload_not_ready`           | `409`       | Returned in the `code` field when the referenced media is still being verified after upload. Retry shortly.           |
| `upload_failed`              | `409`       | Returned in the `code` field when the referenced media failed upload verification. Re-upload the video and try again. |
| `concurrency_limit_exceeded` | `429`       | Too many concurrent submissions are being validated for the current team. Retry after a short delay.                  |


## OpenAPI

````yaml post /enterprise/api/generate_ad_audio
openapi: 3.0.3
info:
  title: ViddyScribe Enterprise API
  version: 1.0.0
  description: >
    Enterprise API for ViddyScribe video processing and audio description
    generation.


    ## Authentication

    All endpoints require API key authentication via the `X-API-Key` header.


    ## Workflow

    1. Upload media using `/upload_media` - Returns `media_id`

    2. Generate text with `/generate_ad_text`, video with `/generate_ad_video`,
    or audio with `/generate_ad_audio` - Returns `job_id`

    3. Poll for results using `/get_results` - Returns status and outputs when
    done
  contact:
    name: ViddyScribe Support
    email: hello@viddyscribe.com
servers:
  - url: https://api.viddyscribe.com
    description: Production server
security:
  - ApiKeyAuth: []
tags:
  - name: Media
    description: Media upload operations
  - name: Processing
    description: Video processing operations
  - name: Results
    description: Results retrieval operations
paths:
  /enterprise/api/generate_ad_audio:
    post:
      tags:
        - Processing
      summary: Generate AD audio
      description: |
        Generate an audio-only track with Audio Descriptions.
        Supports Input via `media_id`, `url`, or direct `file` upload.
        Returns a job_id for tracking progress.
      operationId: generateAdAudio
      requestBody:
        required: true
        content:
          application/json:
            schema:
              type: object
              required:
                - input
              properties:
                input:
                  oneOf:
                    - type: object
                      title: Using Media ID
                      required:
                        - type
                        - media_id
                      properties:
                        type:
                          type: string
                          enum:
                            - media_id
                          default: media_id
                        media_id:
                          type: string
                          format: uuid
                          description: Existing Media ID from upload_media
                        filename:
                          type: string
                          description: Optional filename override
                    - type: object
                      title: Upload from URL
                      required:
                        - type
                        - url
                      properties:
                        type:
                          type: string
                          enum:
                            - url
                          default: url
                        url:
                          type: string
                          format: uri
                          description: Public video URL
                        filename:
                          type: string
                          description: Optional filename override
                  discriminator:
                    propertyName: type
                generation_config:
                  type: object
                  description: Configuration for audio generation
                  properties:
                    format:
                      type: string
                      default: json
                      enum:
                        - vtt
                        - srt
                        - txt
                        - json
                        - edl
                      description: >-
                        Format used to serialize the text part of the response
                        (descriptions). Audio and video files are returned
                        separately by their respective endpoints.
                    language:
                      type: string
                      default: en-US
                      description: >-
                        Target language (BCP-47 code, e.g. `en-US`). See
                        [Languages and Voices](/api-reference/voices) for the
                        full list of 53 codes.
                    video_category:
                      type: string
                      default: Auto
                      enum:
                        - Auto
                        - Educational Lecture
                        - Educational Kids
                        - Government Meeting
                        - Documentary
                        - Narrative Story
                        - Social Media
                        - Tutorial/How-To
                        - Vlog
                        - Commercial/Advertisement
                        - News
                        - Entertainment
                        - Home Video
                        - Video Call
                      description: >-
                        Provides better audio descriptions with a selected
                        category of the video
                    voice:
                      type: string
                      default: Achernar
                      description: >-
                        Voice style to use for narration (e.g. `Achernar`). See
                        [Languages and Voices](/api-reference/voices) for the
                        full list of 31 voices and their language coverage.
                    custom_instructions:
                      type: string
                      default: ''
                      description: >-
                        Optional custom prompt to guide the AI (terminology,
                        style, focus areas). When omitted or empty, no custom
                        instructions are applied.
                    auto_fit:
                      type: boolean
                      default: true
                      description: >-
                        Only applies when `ad_type` is `standard_ad`. Fit
                        generated descriptions into detected no-speech zones
                        when possible.
                    allow_descriptions_over_music:
                      type: boolean
                      default: true
                      description: >-
                        Only applies when `ad_type` is `standard_ad`. Allow
                        descriptions to be placed over detected music when no
                        better timing is available.
                    custom_captions:
                      type: object
                      description: >
                        Optional captions to use for transcript/context instead
                        of auto-generating captions. Send only `content`
                        (required); `format` and `filename` are optional hints.
                        The server normalizes the payload and derives any
                        additional fields it needs.
                      required:
                        - content
                      properties:
                        content:
                          type: string
                          description: >-
                            Captions text (WebVTT, SubRip, or timestamped
                            plaintext).
                        format:
                          type: string
                          enum:
                            - vtt
                            - srt
                            - plaintext
                          description: >-
                            Optional format hint. Auto-detected from `content`
                            or `filename` if omitted.
                        filename:
                          type: string
                          description: >-
                            Optional filename hint (used for format
                            auto-detection when `format` is omitted).
                    volume_option:
                      type: string
                      enum:
                        - auto
                        - max
                      description: >-
                        Use `max` to define max peak level dB for the audio
                        description track, or `auto` to use the default peak
                        level.
                    volume_level_db:
                      type: number
                      minimum: -10
                      maximum: 0
                      description: >-
                        Custom max peak level in dBFS when `volume_option` is
                        `max`.
                    audio_track_type:
                      type: string
                      enum:
                        - mixed
                        - ad_only
                      default: mixed
                      description: >
                        Which audio track to return as `audio_signed_url`:

                        - `mixed` (default): source dialogue mixed with the AD
                        narration.

                        - `ad_only`: just the AD narration WAV (no source
                        audio).
                    description_content:
                      type: string
                      description: Custom EDL/Script content (skips AI generation)
            examples:
              media_id:
                summary: Using existing Media ID
                value:
                  input:
                    type: media_id
                    media_id: 550e8400-e29b-41d4-a716-446655440000
                  generation_config:
                    language: en-US
                    voice: Achernar
                    audio_track_type: mixed
              url:
                summary: Upload from URL
                value:
                  input:
                    type: url
                    url: https://example.com/video.mp4
                  generation_config:
                    language: en-US
                    voice: Achernar
                    audio_track_type: ad_only
          multipart/form-data:
            schema:
              type: object
              required:
                - input
                - file
              properties:
                input:
                  type: string
                  description: 'JSON string {"type": "file"}'
                  example: '{"type": "file"}'
                file:
                  type: string
                  format: binary
                  description: Video file to upload
                generation_config:
                  type: string
                  description: JSON string with configuration
                  example: '{"language": "en-US"}'
      responses:
        '200':
          description: Job created successfully
          content:
            application/json:
              schema:
                type: object
                properties:
                  job_id:
                    type: string
                    description: Unique job identifier for tracking
                  status:
                    type: string
                    example: queued
                    description: >-
                      Initial job status. Always `queued` for a fresh
                      submission.
                  media_id:
                    type: string
                    format: uuid
                    description: Media being processed
              example:
                job_id: task_abc123xyz
                status: queued
                media_id: 550e8400-e29b-41d4-a716-446655440000
        '400':
          description: Bad request
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/Error'
        '403':
          description: >-
            Forbidden - feature unavailable or uploaded input exceeds plan
            limits
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/Error'
        '409':
          description: >-
            Conflict - referenced media is not ready for generation or failed
            upload verification
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/Error'
        '429':
          description: Too many requests - rate, queue, concurrency, or plan limit exceeded
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/Error'
        '500':
          description: Internal server error
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/Error'
components:
  schemas:
    Error:
      type: object
      properties:
        error:
          type: string
          description: >-
            Primary error field. Many endpoints return a machine-readable value
            here, while some 409 media-state responses return a human-readable
            string and place the machine-readable identifier in code
        code:
          type: string
          description: >-
            Optional machine-readable identifier for responses that separate the
            human-readable error text from the stable error code, such as
            upload_not_ready or upload_failed
        upload_status:
          type: string
          description: >-
            Optional media upload status included on media-state conflict
            responses
        message:
          type: string
          description: Human-readable error message
        details:
          type: string
          description: Additional error details when available
      example:
        error: invalid_input
        message: video_id is required
  securitySchemes:
    ApiKeyAuth:
      type: apiKey
      in: header
      name: X-API-Key
      description: >
        API key for authentication. Obtain from your team admin.


        Example: `X-API-Key:
        vsk_abc123def456ghi789jkl012mno345pqr678stu901vwx234yz`

````