uform-gen2-qwen-500m Beta
Image-to-Text • unumUForm-Gen is a small generative vision-language model primarily designed for Image Captioning and Visual Question Answering. The model was pre-trained on the internal image captioning dataset and fine-tuned on public instructions datasets: SVIT, LVIS, VQAs datasets.
| Model Info | |
|---|---|
| More information | link ↗ | 
| Beta | Yes | 
Usage
Workers - TypeScript
  export interface Env {  AI: Ai;}
export default {  async fetch(request: Request, env: Env): Promise<Response> {    const res = await fetch("https://cataas.com/cat");    const blob = await res.arrayBuffer();    const input = {      image: [...new Uint8Array(blob)],      prompt: "Generate a caption for this image",      max_tokens: 512,    };    const response = await env.AI.run(      "@cf/unum/uform-gen2-qwen-500m",      input      );    return new Response(JSON.stringify(response));  },} satisfies ExportedHandler<Env>;Parameters
* indicates a required field
Input
-  0stringBinary string representing the image contents. 
-  1object-  promptstringThe input text prompt for the model to generate a response. 
-  rawbooleanIf true, a chat template is not applied and you must adhere to the specific model's expected formatting. 
-  top_pnumberControls the creativity of the AI's responses by adjusting how many possible words it considers. Lower values make outputs more predictable; higher values allow for more varied and creative responses. 
-  top_knumberLimits the AI to choose from the top 'k' most probable words. Lower values make responses more focused; higher values introduce more variety and potential surprises. 
-  seednumberRandom seed for reproducibility of the generation. 
-  repetition_penaltynumberPenalty for repeated tokens; higher values discourage repetition. 
-  frequency_penaltynumberDecreases the likelihood of the model repeating the same lines verbatim. 
-  presence_penaltynumberIncreases the likelihood of the model introducing new topics. 
-  image *one of-  0arrayAn array of integers that represent the image data constrained to 8-bit unsigned integer values -  itemsnumberA value between 0 and 255 
 
-  
-  1stringBinary string representing the image contents. 
 
-  
-  max_tokensinteger default 512The maximum number of tokens to generate in the response. 
 
-  
Output
-  descriptionstring
API Schemas
The following schemas are based on JSON Schema
{    "oneOf": [        {            "type": "string",            "format": "binary",            "description": "Binary string representing the image contents."        },        {            "type": "object",            "properties": {                "prompt": {                    "type": "string",                    "description": "The input text prompt for the model to generate a response."                },                "raw": {                    "type": "boolean",                    "default": false,                    "description": "If true, a chat template is not applied and you must adhere to the specific model's expected formatting."                },                "top_p": {                    "type": "number",                    "description": "Controls the creativity of the AI's responses by adjusting how many possible words it considers. Lower values make outputs more predictable; higher values allow for more varied and creative responses."                },                "top_k": {                    "type": "number",                    "description": "Limits the AI to choose from the top 'k' most probable words. Lower values make responses more focused; higher values introduce more variety and potential surprises."                },                "seed": {                    "type": "number",                    "description": "Random seed for reproducibility of the generation."                },                "repetition_penalty": {                    "type": "number",                    "description": "Penalty for repeated tokens; higher values discourage repetition."                },                "frequency_penalty": {                    "type": "number",                    "description": "Decreases the likelihood of the model repeating the same lines verbatim."                },                "presence_penalty": {                    "type": "number",                    "description": "Increases the likelihood of the model introducing new topics."                },                "image": {                    "oneOf": [                        {                            "type": "array",                            "description": "An array of integers that represent the image data constrained to 8-bit unsigned integer values",                            "items": {                                "type": "number",                                "description": "A value between 0 and 255"                            }                        },                        {                            "type": "string",                            "format": "binary",                            "description": "Binary string representing the image contents."                        }                    ]                },                "max_tokens": {                    "type": "integer",                    "default": 512,                    "description": "The maximum number of tokens to generate in the response."                }            },            "required": [                "image"            ]        }    ]}{    "type": "object",    "contentType": "application/json",    "properties": {        "description": {            "type": "string"        }    }}Was this helpful?
- Resources
- API
- New to Cloudflare?
- Products
- Sponsorships
- Open Source
- Support
- Help Center
- System Status
- Compliance
- GDPR
- Company
- cloudflare.com
- Our team
- Careers
- 2025 Cloudflare, Inc.
- Privacy Policy
- Terms of Use
- Report Security Issues
- Trademark