In the previous article, we talked about the basic AI vocabulary: LLMs, prompts, tokens, context windows, temperature, statelessness, hallucinations, tools, RAG, MCP, and guardrails. That was the “what are we dealing with?” conversation. Now we get to the fun part. We are going to make ColdFusion talk to an AI model.

Not build a full assistant. Not wire up tools. Not remember user preferences. Not search documents. Not connect to an enterprise MCP server guarded by three architects and a procurement process.

Just this:

  • Send a prompt.
  • Get a response.
  • Output the response.

This is the AI version of “Hello World.”

And just like every “Hello World,” it starts simple. Then, approximately twelve minutes later, someone asks if it can remember the user, query the database, call Jira, summarize a PDF, enforce security policy, and speak fluent pirate.

We will get there. For now, we are starting with the simplest useful layer in ColdFusion’s AI stack: ChatModel().

What we are building

In this article, we are going to create a basic ColdFusion page that sends a prompt to a large language model and displays the response.

The flow looks like this:

  1. Configure a chat model.
  2. Send a prompt using .chat().
  3. Read the response.
  4. Display the model’s message.
  5. Talk about where this approach works.
  6. Talk about where this approach is not enough.
  7. Clarify the difference between non-streaming and streaming responses.

This article is intentionally focused on ChatModel().

That matters because ChatModel() is the low-level, stateless way to interact with a model. It does not remember prior messages. It does not manage user sessions. It does not use tools. It does not magically know your database schema, your business rules, or why someone named a table tbl_final_final_v2_backup.

It accepts input and returns output. That makes it a great place to start.

What is ChatModel()?

ChatModel() creates a configured model object that ColdFusion can use to send prompts to an AI provider. At a high level, it is the simplest bridge between your CFML code and an LLM.

You give ColdFusion a configuration struct with details like:

  • provider
  • API key
  • model name
  • temperature
  • max tokens
  • timeout

ColdFusion gives you back a model object. Then you call .chat() with a plain string. The model returns a response. That is the basic loop.

It is important to understand what this is and what this is not.

ChatModel() is good for direct, one-off requests. It is not the layer you use when you need structured system messages, long-running conversational memory, tool orchestration, or user-specific assistant behavior. Those are later topics.

For now, think of ChatModel() as:

“Here is a prompt. Give me an answer.”

Nice. Simple. Dangerous if unsupervised. Basically every useful tool ever invented.

Before you start

You need a few things before this code will work.

  1. You need a ColdFusion version that includes the native AI services. This series is based on Adobe ColdFusion 2025 Update 8.
  2. You need the AI package installed if your ColdFusion installation requires it. Depending on your setup, this may be handled through the ColdFusion Administrator or Package Manager.
  3. You need access to an AI provider. That may mean an API key for a cloud provider or a local inference setup such as Ollama.
  4. You need to store secrets properly.

That last one deserves its own small lecture. Do not hardcode your API key directly into a .cfm file and commit it to source control. That is not “developer convenience.” That is a future incident report wearing a fake mustache.

Use environment variables, encrypted configuration, a secrets manager, ColdFusion Administrator settings, or another secure configuration mechanism appropriate for your environment.

For examples in this article, I will assume the key is already available as: application.aiApiKey

 

That does not mean this is the only right place to store it. It just keeps the examples readable.

The smallest useful example

Here is a basic ColdFusion AI call:

<cfscript>
chatModelConfig = {
	provider : "openAI",
	modelName : "gpt-5-nano",
	apiKey : application.aiApiKey,
	temperature : 0.3,
	maxTokens : 500,
	timeout : 30
};

chatModel = ChatModel( chatModelConfig );

response = chatModel.chat( "Explain ColdFusion session scope in one short paragraph." );

writeOutput( encodeForHtml( response.message ) );
</cfscript>

That is the basic shape. Create the configuration. Create the chat model. Send a prompt. Display the response. If everything is configured correctly, ColdFusion sends the prompt to the configured provider and receives a response struct. The generated text is available in response.message.

That is your first ColdFusion AI “Hello World.”

Nobody gets a certificate, but you are allowed to nod confidently at your screen.

Breaking down the configuration

Let’s walk through the configuration struct.

chatModelConfig = {
	provider : "openAI",
	modelName : "gpt-5-nano",
	apiKey : application.aiApiKey,
	temperature : 0.3,
	maxTokens : 500,
	timeout : 30
};

provider

The provider tells ColdFusion which AI provider you want to use. This might be OpenAI, Anthropic, Gemini, Mistral, Azure OpenAI, Ollama, or another provider supported by ColdFusion’s AI services.

Provider names and model availability may vary, so check the ColdFusion AI documentation and your provider’s own documentation when configuring this in a real application.

The nice part is that your CFML code does not need to become a sprawling mess of provider-specific HTTP calls. ColdFusion’s AI framework gives you a consistent interface so the provider can be treated more like configuration and less like a permanent architectural tattoo.

modelName

The model name tells the provider which model to use.

This matters. Different models have different strengths, costs, speeds, limits, and supported parameters. Some are better for reasoning. Some are faster. Some are cheaper. Some support larger context windows. Some support parameters that others do not.

For a Hello World article, the specific model is not the point. The point is that your application should not assume every model behaves exactly the same. That assumption will eventually crawl into your logs at 2:00 a.m. wearing a stack trace.

apiKey

The API key authenticates your application with the provider. Again, do not hardcode this in your source. If you are using a local provider such as Ollama, you may not need an API key. But for cloud providers, you usually will.

temperature

Temperature controls how predictable or creative the output should be. For this example, I used:

temperature : 0.3

That keeps the answer relatively focused. For developer education, technical explanations, classifications, summaries, and structured responses, I generally start with a lower temperature. I do not want the model to get wildly creative when explaining session scope.

There are times for creative output. “Explain session scope as a sea shanty” is one of them. Production documentation is usually not.

maxTokens

maxTokens limits how long the response can be. This helps control cost, latency, and verbosity. Without a limit, a model may answer a simple question with a dissertation. Sometimes that is useful. Sometimes the user asked for one paragraph and got the Fellowship of the Ring extended edition.

timeout

timeout controls how long ColdFusion should wait for the response. AI calls are remote calls unless you are using a local model. Remote calls fail. They stall. They timeout. They get rate limited. They occasionally do whatever network calls do when they sense you are demoing live.

Always set a timeout. Your application should have a plan for what happens when the model does not respond quickly enough.

The prompt

This line sends the prompt:

response = chatModel.chat( "Explain ColdFusion session scope in one short paragraph." );

The documented ChatModel.chat() call accepts a plain string. That string is treated as the user message. For simple tasks, that is enough.

Examples:

response = chatModel.chat( "Summarize this text in three bullet points: #articleText#" );
response = chatModel.chat( "Rewrite this announcement to sound friendlier: #announcementText#" );
response = chatModel.chat( "Classify this support request as billing, technical, account, or other: #requestText#" );

The simplicity is nice, but it also has limits.

With ChatModel.chat(), you are not sending a structured conversation with separate system, user, and assistant messages. You are sending a plain string. If you need structured prompts, personas, memory, or tools, that is where Agent() comes in.

For this article, plain string is perfect. The robot has to learn to crawl before it can run through your infrastructure holding scissors.

The response

This line displays the response:

writeOutput( encodeForHtml( response.message ) );

The response is a struct. The generated text is in response.message. I am using encodeForHtml() here intentionally. The model’s output is external content. It may be generated by an AI model instead of typed directly by a user, but you should still treat it as untrusted output.

Do not blindly dump model output into an HTML page. That is how you turn “AI assistant” into “cross-site scripting assistant.” If you expect plain text, encode it. If you expect HTML, sanitize it. If you expect JSON, parse and validate it. If you expect a number, validate that it is actually a number. The model is not your validation layer. ColdFusion is still invited to this meeting.

A slightly cleaner version

The first example is fine for a demo, but I usually prefer wrapping the call in a small function or CFC method so the page itself does not care about model configuration.

Here is a simple page-level version:

<cfscript>
public struct function askAi( required string prompt ) {
	var chatModelConfig = {
		provider : "openAI",
		modelName : "gpt-5-nano",
		apiKey : application.aiApiKey,
		temperature : 0.3,
		maxTokens : 500,
		timeout : 30
	};

	var chatModel = ChatModel( chatModelConfig );

	return chatModel.chat( arguments.prompt );
}

response = askAi( "Explain ColdFusion application scope in one short paragraph." );

writeOutput( encodeForHtml( response.message ) );
</cfscript>

This is still intentionally simple. In a real application, I would probably move this into a service CFC. That keeps the provider configuration, error handling, logging, and defaults in one place. For example:

component {

	public ai_service function init(
		required string apiKey,
		string provider = "openAI",
		string modelName = "gpt-5-nano"
	) {
		variables.apiKey = arguments.apiKey;
		variables.provider = arguments.provider;
		variables.modelName = arguments.modelName;

		return this;
	}

	public struct function ask(
		required string prompt,
		numeric temperature = 0.3,
		numeric maxTokens = 500,
		numeric timeout = 30
	) {
		var chatModelConfig = {
			provider : variables.provider,
			modelName : variables.modelName,
			apiKey : variables.apiKey,
			temperature : arguments.temperature,
			maxTokens : arguments.maxTokens,
			timeout : arguments.timeout
		};

		var chatModel = ChatModel( chatModelConfig );

		return chatModel.chat( arguments.prompt );
	}

}

Then your page becomes simpler:

<cfscript>
response = application.aiService.ask(
	prompt = "Explain ColdFusion queryExecute() in one short paragraph."
);

writeOutput( encodeForHtml( response.message ) );
</cfscript>

This is already better. Your AI configuration has a home. Your page is cleaner. Your future self is slightly less likely to mutter at you in a code review.

Add basic error handling

Now let’s be a little more realistic. AI calls can fail for several reasons:

  • invalid API key
  • unsupported model name
  • network timeout
  • provider outage
  • rate limit
  • bad parameter
  • quota issue
  • malformed response
  • cosmic nonsense

So we should handle errors.

<cfscript>
try {
	response = application.aiService.ask(
		prompt = "Explain ColdFusion queryExecute() in one short paragraph."
	);

	writeOutput( encodeForHtml( response.message ) );
} catch ( any error ) {
	writeLog(
		file = "ai",
		type = "error",
		text = "AI request failed: #error.message#"
	);

	writeOutput(
		encodeForHtml(
			"Sorry, I could not generate a response right now. Please try again later."
		)
	);
}
</cfscript>

This is not fancy, but it is much better than showing the user a raw exception. Users do not need to see provider errors. They especially do not need to see anything involving your configuration, environment, keys, model names, internal hostnames, or stack traces. That information belongs in logs, not in the browser.

A simple form example

Let’s turn this into a tiny working page. The user enters a prompt. ColdFusion sends it to the model. The response is displayed.

<cfparam name="form.prompt" default="">

<cfscript>
result = "";

if ( len( trim( form.prompt ) ) ) {
	try {
		response = application.aiService.ask(
			prompt = trim( form.prompt ),
			temperature = 0.3,
			maxTokens = 700,
			timeout = 30
		);

		result = response.message;
	} catch ( any error ) {
		writeLog(
			file = "ai",
			type = "error",
			text = "AI request failed: #error.message#"
		);

		result = "Sorry, I could not generate a response right now.";
	}
}
</cfscript>

<cfoutput>
<form method="post">
	<label for="prompt">Ask the model something</label>
	<br>

	<textarea
		id="prompt"
		name="prompt"
		rows="6"
		cols="80"
	>#encodeForHtml( form.prompt )#</textarea>

	<br>

	<button type="submit">
		Ask AI
	</button>
</form>

<cfif len( result )>
	<h2>Response</h2>
	<pre>#encodeForHtml( result )#</pre>
</cfif>
</cfoutput>

This is not a beautiful interface and frankly, it’s not supposed to be. It’s a basic test harness. You can use this kind of page to test prompts, model settings, latency, response quality, and error handling before building a more polished experience.

Every AI feature should start with something boring and observable. Boring is good. Boring means you can see what is happening before the UI team adds gradients, loading animations, and a mascot named Prompty.

Non-streaming responses

The examples above are non-streaming. That means the browser does not receive the answer until the model finishes generating the full response and ColdFusion receives it.

The flow is:

  1. User submits prompt.
  2. ColdFusion sends request to model.
  3. Model generates complete answer.
  4. ColdFusion receives complete response.
  5. ColdFusion renders the page.

This is the simplest approach. It is also the right approach for many tasks. Use non-streaming when:

  • the response is short
  • the task runs in the background
  • the output must be validated before display
  • the response should be parsed as JSON
  • the model may request tools
  • the answer should appear all at once
  • the user does not need a chat-style typing experience

Non-streaming is also easier to log, test, validate, sanitize, and retry. That makes it an excellent default. Not every AI feature needs to look like a chatbot dramatically typing one word at a time as though it is revealing the location of buried treasure. Sometimes a normal response is just fine.

Streaming responses

Streaming means the user sees the answer as it is being generated. Instead of waiting for the full response, the browser receives chunks. (Remember <cfflush>?)

The flow is more like:

  1. User submits prompt.
  2. ColdFusion starts the model request.
  3. The provider/model begins generating text.
  4. Partial text is sent back to the browser.
  5. The UI updates as chunks arrive.
  6. The final response completes.

Streaming can make an AI feature feel much faster because the user sees progress almost immediately. This is especially useful for:

  • chat interfaces
  • long explanations
  • generated articles
  • brainstorming
  • summarization
  • interactive assistant experiences

Streaming is mostly about user experience. The model may take the same total amount of time to finish, but the user does not sit there staring at a blank screen wondering if the application died, the network died, or the robot is just being dramatic.

Streaming is not always better

Streaming looks slick, but it introduces tradeoffs. Use caution when streaming:

  • structured JSON
  • content that must be validated before display
  • sensitive output
  • tool-based workflows
  • workflows that may need to redact or block the final answer
  • anything where partial content could confuse the user

For example, if the model is generating JSON that your application needs to parse, streaming partial JSON to the browser is not helpful. Half a JSON object is not “interactive.” It is just broken with confidence.

If your application needs to run an output guardrail before showing the answer, streaming becomes more complicated. You cannot fully inspect the final answer if you have already shown half of it to the user.

If your model might request a tool call, you may not want to stream text until you know whether the model is answering directly or asking your application to do something.

Streaming is great for “write me an explanation.” It is less great for “produce validated data that drives application behavior.”

A practical rule

Start with non-streaming.

Add streaming when the user experience genuinely benefits from it.

That is the boring answer, which means it is probably correct.

For a first AI feature, non-streaming lets you focus on the important fundamentals:

  • model configuration
  • prompt quality
  • response handling
  • error handling
  • output encoding
  • latency
  • cost
  • logging

Once those are stable, streaming can be added where appropriate. Do not start by optimizing the typing animation while the application is still leaking stack traces and accepting prompts longer than a Victorian novel.

A note about documented ColdFusion behavior

At this point in the series, we are staying close to the documented ChatModel() behavior: configure a model, call .chat() with a plain string, and read the returned message. That gives us a clean foundation.

If you are using a provider or API path that supports true token-by-token streaming, you may be able to build a streaming endpoint around that provider’s streaming interface. But that is a separate concern from the basic ChatModel().chat() example shown here. In other words, do not confuse these two ideas:

  • ChatModel().chat() returning a complete response.
  • A streaming UI that displays generated content incrementally.

They are related, but they are not the same thing. The first is the documented ColdFusion AI “Hello World” path. The second is a user experience and integration pattern that depends on the streaming capabilities exposed by the layer you are using. We will keep our first example simple and reliable.

That is how we avoid turning this article into “Welcome to Async Streaming Provider Abstraction and Browser Buffering, Please Bring Along a Helmet.”

Prompt design matters immediately

Even in this simple example, prompt wording matters. Compare these two prompts:

Explain ColdFusion.

And:

Explain ColdFusion session scope to an experienced web developer who is new to CFML. Keep it under 120 words and include one practical use case.

The second prompt is much better. It gives the model:

  • audience
  • topic
  • length
  • format
  • practical expectation

You do not need elaborate prompt engineering for every task. But you do need to tell the model what you actually want. Bad prompts create bad answers. Vague prompts create vague answers. Contradictory prompts create weird answers.

This is not unique to AI. It is also true of project requirements, conference abstracts, and text messages that say “we need to talk.”

Try a few prompts

Here are some good starter prompts to test with your new page.

Explain ColdFusion's queryExecute() function to a developer coming from PHP. Keep it under 150 words.
Summarize the following text in three bullet points:
[paste text here]
Rewrite this message so it sounds professional but not corporate:
[paste message here]
Classify this support request as billing, technical, account, or other. Return only the category:
[paste request here]
Explain why storing API keys directly in source code is dangerous. Use one short paragraph.

These are good examples because they are bounded. They tell the model what to do. They don’t require the model to know your database. They don’t require memory. They don’t require tools. They don’t require RAG. They are exactly the kind of thing ChatModel() is good at.

Ask for structured output carefully

Sooner or later, you will want structured output. For example:

Classify this support request and return JSON with category, priority, and summary.

That can work, but do not blindly trust the output. If you ask for JSON, your application should still:

  • parse the JSON
  • handle parse errors
  • validate required keys
  • validate allowed values
  • reject unexpected structures
  • avoid using the output directly for sensitive actions

The model may return perfect JSON. It may also return almost JSON, which is the most annoying kind of JSON because it looks fine until your parser throws a chair.

If structured output matters, validate it like any other external input.

Logging

You should log AI requests, but carefully. Useful things to log:

  • provider
  • model name
  • request timestamp
  • latency
  • success/failure
  • error type
  • token usage if available
  • user ID or request ID
  • feature name

Be careful with logging full prompts and responses. Prompts may contain sensitive user data. Responses may contain generated sensitive data. If you log everything, your AI feature may turn your logs into a second, worse database.

A reasonable approach is to log enough metadata to debug production issues while avoiding unnecessary sensitive content. For development, verbose logging may be useful. For production, be deliberate.

The phrase “we log every prompt forever” should make at least one security person sit up suddenly.

Where ChatModel() fits

At this point, it is worth repeating the boundary. Use ChatModel() when the task is simple and stateless. Good fits:

  • summarize this text
  • rewrite this paragraph
  • classify this message
  • generate a short description
  • explain this error
  • translate this content
  • draft a response
  • produce a one-off answer

Poor fits:

  • remember this user across multiple messages
  • maintain a conversation
  • follow a detailed persona
  • call application functions
  • retrieve user-specific database records
  • answer from private documentation
  • enforce complex policy controls
  • coordinate multi-step workflows

Those poor fits are not impossible. They just require other layers. That is the point of the rest of this series of articles. ChatModel() is the front door, not the whole building.

Common mistakes

Here are the mistakes I would expect to see in first-generation AI features.

Hardcoding secrets

Do not put API keys in source code. Not in .cfm files. Not in CFCs. Not in JavaScript. Not in a comment that says “temporary.” Especially not in a comment that says “temporary” because that means it will outlive us all.

Forgetting that the model is stateless

A ChatModel() call does not remember the previous call. If you need memory, use the right layer. We will cover that in the next article.

Trusting the output

Model output is not automatically true, safe, valid, or properly formatted. Validate it. Encode it. Sanitize it. Review it when appropriate.

Using AI for deterministic business logic

Do not use a model to calculate sales tax, determine permissions, compute payroll, or decide whether a user can access a record. Use code. The model can explain the result. It should not be the source of truth.

Setting no timeout

Every external call needs a timeout. AI is not special. It is just more expensive when it gets weird.

Sending too much context

Do not dump everything into the prompt because “more context is better.” Relevant context is better. Random context is noise with a usage bill.

Skipping error handling

Provider calls fail. Write the error handling before production teaches you this through interpretive dance.

A better first feature

A good first AI feature in a ColdFusion application is something useful but low risk. For example:

  • summarize a support request for an admin
  • rewrite an announcement draft
  • suggest a shorter page description
  • classify a message into a queue
  • explain a validation error in friendlier language
  • generate a first draft of an email that a human reviews

These are good first features because the AI assists the user without directly controlling sensitive business logic. That’s where AI shines early: reducing friction, not replacing the application.

Start with assistance. Earn your way toward automation.

Where we go next

We now have a working ColdFusion AI call. That is useful, but it has a glaring limitation: It does not remember anything. If the user says:

My name is David.

And then asks:

What is my name?

A basic stateless ChatModel() call will not know unless your application sends the earlier context again. That is not a bug. That is the model doing exactly what this layer does.

In the next article, we will move from one-off prompts to conversation state. We will introduce memory, user preferences, message windows, token windows, and per-user isolation.

That last part is important. Because if your AI assistant accidentally remembers one user’s conversation while talking to another user, you have not built a helpful productivity tool. You have built a privacy incident with a friendly typing animation.

Final thought

This first step is intentionally small. Configure a model. Send a prompt. Read response.message. Display it safely.

That is enough to begin. The real work is not making the model say something. That part is easy. The real work is deciding what context it gets, what it is allowed to do, how your application validates the result, and where AI actually improves the user experience.

ColdFusion gives us the integration point. The application still needs architecture, security, validation, logging, and common sense.

In other words, the robot can help… but ColdFusion is still driving.

All Comments
Sort by:  Most Recent