Connector tuning is an essential part of setting up a ColdFusion server. There are various configurations in connector that needs to be tuned. Incorrect values may lead to “Service Unavailable” or “Server too busy”. In this blog, we will discuss how to handle such errors caused by incorrect tuning and how to tune the connectors for the site correctly.
Connector tuning is an essential part of setting up a ColdFusion server. There are various configurations in connector that needs to be tuned. Incorrect values may lead to “Service Unavailable” or “Server too busy”. In this blog, we will discuss how to handle such errors caused by incorrect tuning and how to tune the connectors for the site correctly.
The connector setting may vary from site to site. It is very important to configure the connectors for your application appropriately. This blog will include connector tuning parameters for IIS. During installation, user can choose to configure connector for “Individual Site” or “ALL” sites in IIS connector configuration.
After the installation, the user can launch the “Web Server Configuration tool” and has the availability to create the connector for “Individual Site” or “ALL” sites in IIS.
When connector is configured with individual sites, separate connector for each site will be placed under {CF-Home}/config/wsconfig/{some no}/. Similarly for “ALL” configuration the connector is configured at global level, which means the same connector binary will be used across multiple sites.
The three most important parameters will be discussed here and will help us to understand the role of the same:-
- Reuse Connections
- Connection pool size
- Connection pool timeout
Re-use connections: – This setting determines the count of connections that can be re-used. When Tomcat connector makes a connection with Tomcat server, it does not closes the connection even after it finished serving the request. Instead it keeps the connection active, so that for the next request, the same connection can be re-used. This increases the performance by minimizing the overhead of creating new connection with tomcat server for every request. This settings needs to be tuned for connector configured with multiple sites. The max value for the re-use connection is determined based on the number of sites configured with same CF server and the load on each site.
The default re-use connection is 250.
Connection pool size: – This setting determines the maximum number of connections that can be created in the connection pool. When multiple requests arrive to the connector from IIS, connector creates new connections in the connection pool only if there are no free connections available in the pool. The connector will not create a new connection if connections reach the connection pool size limit. When connector is configured with “ALL” sites, the same connection pool will be used to serve the request for all sites. So the default value of the connection pool size, works well with the single site configuration, but fails to work well with “ALL” site configuration in some scenarios. Hence this value should be increased carefully based on the need and number of sites that are present within IIS.
The default connection pool size is same as, which is 250.
Connection pool timeout: – This setting determines the timeout value (in seconds) for idle connections in connection pool. This value must be in sync with the connectionTimeout attribute of your AJP connector in Tomcat's server.xml.
The default timeout for connection is indefinite, if not set in server.xml explicitly.
There are other parameters which CF connector inherits from Tomcat AJP connector. Please find the details of those settings from AJP documentation (http://tomcat.apache.org/connectors-doc/reference/workers.html)
The worker.properties is available at {CF-Home}/config/wsconfig/{some no}/ and the server.xml can be found at {CF-Home}/cfusion/runtime/conf/. Below are the changes required to tune the Site:-
- Open the worker.properties file, add below line as new entry worker.cfusion.connection_pool_size=500 (This is connection pool size inside connector which are available to handle request)
- Tune the entry for max_reuse_connections to appropriate value based on number of site. Optimal value is connection_pool_size / {no of site}
- Add another entry in new line worker.cfusion.connection_pool_timeout=60 (This value is idle connection timeout (in seconds), when sites are not under load connections will be recycled back to IIS)
- Now open the server.xml from {cf-home/cfusion/runtime/conf}, add/update the maxThreads=500 and connectionTimeout="60000" to connection node containing the AJP entry.
<Connector port="8012" protocol="AJP/1.3" redirectPort="8445" tomcatAuthentication="false"/>
- Now the AJP entry in server.xml should look like
<Connector port="8012" protocol="AJP/1.3" redirectPort="8445" tomcatAuthentication="false" maxThreads="500" connectionTimeout ="60000"> </Connector>
There can be multiple use cases. Let us consider three most widely used scenarios:-
- Connector created with “ALL” OR with “Individual” Site and single site in IIS
- Connector created with “ALL” and multiple sites in IIS
- Connector created with “Individual” site and multiple sites site in IIS
Use Case# 1: Connector created with “ALL” OR with “Individual” Site and single site in IIS
In an idle scenario, where the user has only one site (configured with ALL or individual connector) and not running under high load, the worker.properties, can look like this
worker.list=cfusion
worker.cfusion.type=ajp13
worker.cfusion.host=localhost
worker.cfusion.port=8012
worker.cfusion.max_reuse_connections=250
worker.cfusion.connection_pool_size=500
worker.cfusion.connection_pool_timeout=60
And server.xml should look like
<Connector port="8012" protocol="AJP/1.3" redirectPort="8445" tomcatAuthentication="false" maxThreads="500" connectionTimeout ="60000"> </Connector>
So, we added the connection_pool_size and connection_pool_timeout (in seconds) in the worker.properties. The corresponding connectionTimeout (in milliseconds) is added to server.xml along with maxThreads whose value is equivalent to the connection_pool_size in the worker.properties.
Use Case# 2: Connector created with “ALL” and multiple sites in IIS
Consider a scenario that the connector is created with “ALL” and there is only one site which is running under load. The default 250 re-use connections are utilized by site 1. Later on, the user adds another site in IIS.
Site 1 will make all 250 re-usable connections with ColdFusion and any request for new connection from site 2 will be ignored by ColdFusion. Hence it is required, to increase the re-use connection count to optimal value, so that site 2 does not starve for new connections. This can be achieved by configuring optimal value of max_reuse_connections count. Considering that the site 2 is not running under high load, 100 re-use connection will work. So the max_reuse_connections becomes 350 {250 (for site 1) + 100 (for site 2)}. But, it is a good practice, to start tuning the connection_pool_size first, and then the max_reuse_connections appropriately.
This case would require connection_pool_size=700, as max_reuse_connections= connection_pool_size / {no of site}. So, the worker.properties will look like this
worker.list=cfusion
worker.cfusion.type=ajp13
worker.cfusion.host=localhost
worker.cfusion.port=8012
worker.cfusion.max_reuse_connections=350
worker.cfusion.connection_pool_size=600
worker.cfusion.connection_pool_timeout=60
And server.xml should look like
<Connector port="8012" protocol="AJP/1.3" redirectPort="8445" tomcatAuthentication="false" maxThreads="700" connectionTimeout ="60000"> </Connector>
Note: The connectionTimeout is in milliseconds
Use Case# 3: Connector created with “Individual” site and multiple sites site in IIS
Consider a scenario that the individual connectors are created for each site. There are three sites – Site 1 is running under high load, site 2 and site 3 running are under low load. For all the sites, there are individual connectors. Now, ideally in this scenario, we should start tuning with the site running under high load first. We can disable the timeout for high traffic sites, if we are not sure for timeout. If not defined, the default timeout for connection is indefinite. To start with, don’t specify the re-use parameter. Set the connection_pool_size=500 and monitor the site. Gradually increase the value by 100 and likewise, till the site is stable. Say, at connection_pool_size=800, the site is stable. Now, set the max_reuse_connections=270 (connection_pool_size / {no of site} i.e. 800/3=270 approx)
Site 1
worker.list=cfusion
worker.cfusion.type=ajp13
worker.cfusion.host=localhost
worker.cfusion.port=8012
worker.cfusion.max_reuse_connections=270
worker.cfusion.connection_pool_size=800
worker.cfusion.connection_pool_timeout=60
Site 2 and site 3 are running under low traffic, but are bind to same ColdFusion instance (cfusion in this case). The below settings should be optimal:-
Site 2
worker.list=cfusion
worker.cfusion.type=ajp13
worker.cfusion.host=localhost
worker.cfusion.port=8012
worker.cfusion.max_reuse_connections=100
worker.cfusion.connection_pool_size=250
worker.cfusion.connection_pool_timeout=60
Site 3
worker.list=cfusion
worker.cfusion.type=ajp13
worker.cfusion.host=localhost
worker.cfusion.port=8012
worker.cfusion.max_reuse_connections=100
worker.cfusion.connection_pool_size=250
worker.cfusion.connection_pool_timeout=60
And server.xml should look like
<Connector port="8012" protocol="AJP/1.3" redirectPort="8445" tomcatAuthentication="false" maxThreads="1300" connectionTimeout ="60000"> </Connector>
Note: The connectionTimeout is in milliseconds and the maxThreads is the value equivalent to summation of all the connection_pool_size(s). So, in this case maxThreads=1300 {800 (for site 1) + 250 (for site 2) + 250 (for site 3)}.
Some key points to remember:-
- The connector tool should always run with “Run as Administrator” feature, even if the user is from the Administrator group i.e.
Using the command line:-
Using the GUI
- Any changes made to {CF-Home}/config/wsconfig/{some no}/, including isapi_redirect.dll or worker.properties, would require an IIS restart.
- Any changes made to {CF-Home}/cfusion/runtime/conf/server.xml requires “ColdFusion 11 Application Server” service restart.
- max_reuse_connections should always be less than or equivalent to connection_pool_size. It can’t be larger than the connection_pool_size.
- The above use cases are scenario based and may vary from site to site, depending upon the load, architecture and traffic on the site.
I did some adjustment based on the above, I took out the adjustment on the server XML of each site and put the below in the main CFUSION server XML. Based off the total amount of threads on the workers .properties file of each site (78 * 250 = 19500 connections total) below is a snip from the log and setting I adjusting in the workers.proerties file.
When I set the sites up in the wsconfig.exe I use the settings below
AppServer Host: Localhost
AppServer Cluster: NONE
Web Server: IIS
IIS Web Site: Site79 <– Iv tried all does not work
========
C:ColdFusion11cfusionruntimeconf
========
added: maxThreads="19500" connectionTimeout= "60000"/
========
========================================================================================
On Sat morning all the sites whent down again..
Workers.properties
======================
worker.list=SITE79
worker.SITE79.type=ajp13
worker.SITE79.host=localhost
worker.SITE79.port=8111
worker.SITE79.max_reuse_connections=250
worker.SITE79.connection_pool_size=250
worker.SITE79.connection_pool_timeout=60
======================
isapi_redirect.log
======================
[Fri Sep 30 05:15:18.796 2016] [67720:25840] [info] ajp_process_callback::jk_ajp_common.c (2244): (SITE79) current reuse count is 1 of max reuse connection 250 and total endpoint count 250
[Fri Sep 30 05:15:20.262 2016] [67720:58892] [info] ajp_process_callback::jk_ajp_common.c (2244): (SITE79) current reuse count is 2 of max reuse connection 250 and total endpoint count 250
[Fri Sep 30 06:10:39.559 2016] [67720:25156] [info] ajp_process_callback::jk_ajp_common.c (2244): (SITE79) current reuse count is 3 of max reuse connection 250 and total endpoint count 250
[Fri Sep 30 10:55:38.158 2016] [67720:13672] [info] ajp_process_callback::jk_ajp_common.c (2244): (SITE79) current reuse count is 4 of max reuse connection 250 and total endpoint count 250
[Fri Sep 30 10:55:58.219 2016] [67720:25156] [info] ajp_process_callback::jk_ajp_common.c (2244): (SITE79) current reuse count is 5 of max reuse connection 250 and total endpoint count 250
[Fri Sep 30 10:58:29.274 2016] [67720:52596] [info] ajp_process_callback::jk_ajp_common.c (2244): (SITE79) current reuse count is 6 of max reuse connection 250 and total endpoint count 250
[Fri Sep 30 10:58:29.430 2016] [67720:8728] [info] ajp_process_callback::jk_ajp_common.c (2244): (SITE79) current reuse count is 7 of max reuse connection 250 and total endpoint count 250
[Fri Sep 30 10:58:29.446 2016] [67720:32092] [info] ajp_process_callback::jk_ajp_common.c (2244): (SITE79) current reuse count is 8 of max reuse connection 250 and total endpoint count 250
[Fri Sep 30 10:58:29.602 2016] [67720:28808] [info] ajp_process_callback::jk_ajp_common.c (2244): (SITE79) current reuse count is 9 of max reuse connection 250 and total endpoint count 250
[Fri Sep 30 10:58:29.618 2016] [67720:57376] [info] ajp_process_callback::jk_ajp_common.c (2244): (SITE79) current reuse count is 10 of max reuse connection 250 and total endpoint count 250
[Fri Sep 30 10:58:29.774 2016] [67720:8240] [info] ajp_process_callback::jk_ajp_common.c (2244): (SITE79) current reuse count is 11 of max reuse connection 250 and total endpoint count 250
[Fri Sep 30 10:58:29.805 2016] [67720:26736] [info] ajp_process_callback::jk_ajp_common.c (2244): (SITE79) current reuse count is 12 of max reuse connection 250 and total endpoint count 250
[Fri Sep 30 10:58:29.976 2016] [67720:57660] [info] ajp_process_callback::jk_ajp_common.c (2244): (SITE79) current reuse count is 13 of max reuse connection 250 and total endpoint count 250
[Fri Sep 30 10:58:29.976 2016] [67720:64136] [info] ajp_process_callback::jk_ajp_common.c (2244): (SITE79) current reuse count is 14 of max reuse connection 250 and total endpoint count 250
[Fri Sep 30 10:58:30.148 2016] [67720:58892] [info] ajp_process_callback::jk_ajp_common.c (2244): (SITE79) current reuse count is 15 of max reuse connection 250 and total endpoint count 250
[Fri Sep 30 11:04:03.330 2016] [67720:25840] [info] ajp_process_callback::jk_ajp_common.c (2244): (SITE79) current reuse count is 16 of max reuse connection 250 and total endpoint count 250
[Fri Sep 30 11:04:03.346 2016] [67720:39288] [info] ajp_process_callback::jk_ajp_common.c (2244): (SITE79) current reuse count is 17 of max reuse connection 250 and total endpoint count 250
[Fri Sep 30 12:57:24.783 2016] [67720:24940] [info] ajp_process_callback::jk_ajp_common.c (2244): (SITE79) current reuse count is 18 of max reuse connection 250 and total endpoint count 250
[Fri Sep 30 13:39:24.079 2016] [67720:30796] [info] ajp_process_callback::jk_ajp_common.c (2244): (SITE79) current reuse count is 19 of max reuse connection 250 and total endpoint count 250
[Fri Sep 30 13:39:40.132 2016] [67720:64136] [info] ajp_process_callback::jk_ajp_common.c (2244): (SITE79) current reuse count is 20 of max reuse connection 250 and total endpoint count 250
[Fri Sep 30 21:45:33.844 2016] [67720:50664] [info] TerminateFilter::jk_isapi_plugin.c (2822): Tomcat/ISAPI/isapi_redirector/1.2.41 stopping
[Sat Oct 01 05:00:02.426 2016] [73156:2916] [info] jk_log_version::jk_connector_version.h (21): Connector Version: 295187
[Sat Oct 01 05:00:02.426 2016] [73156:2916] [info] init_jk::jk_isapi_plugin.c (3157): Starting Tomcat/ISAPI/isapi_redirector/1.2.41
[Sat Oct 01 05:00:02.426 2016] [73156:2916] [info] init_jk::jk_isapi_plugin.c (3355): Tomcat/ISAPI/isapi_redirector/1.2.41 initialized
[Sat Oct 01 05:00:02.457 2016] [73156:2916] [info] ajp_connection_tcp_get_message::jk_ajp_common.c (1409): (SITE79) can’t receive the response header message from tomcat, network problems or tomcat (127.0.0.1:8111) is down (errno=54)
[Sat Oct 01 05:00:02.457 2016] [73156:2916] [error] ajp_get_reply::jk_ajp_common.c (2376): (SITE79) Tomcat is down or refused connection. No response has been sent to the client (yet)
[Sat Oct 01 05:00:02.457 2016] [73156:2916] [info] ajp_service::jk_ajp_common.c (2903): (SITE79) sending request to tomcat failed (recoverable), (attempt=1)
[Sat Oct 01 05:00:02.566 2016] [73156:2916] [info] ajp_connection_tcp_get_message::jk_ajp_common.c (1409): (SITE79) can’t receive the response header message from tomcat, network problems or tomcat (127.0.0.1:8111) is down (errno=54)
[Sat Oct 01 05:00:02.566 2016] [73156:2916] [error] ajp_get_reply::jk_ajp_common.c (2376): (SITE79) Tomcat is down or refused connection. No response has been sent to the client (yet)
[Sat Oct 01 05:00:02.566 2016] [73156:2916] [info] ajp_service::jk_ajp_common.c (2903): (SITE79) sending request to tomcat failed (recoverable), (attempt=2)
[Sat Oct 01 05:00:02.566 2016] [73156:2916] [error] ajp_service::jk_ajp_common.c (2924): (SITE79) connecting to tomcat failed (rc=0, errors=1, client_errors=0).
[Sat Oct 01 05:00:02.566 2016] [73156:2916] [error] HttpExtensionProc::jk_isapi_plugin.c (2763): service() failed with http error 502
[Sat Oct 01 05:00:17.698 2016] [73156:2916] [info] ajp_connection_tcp_get_message::jk_ajp_common.c (1409): (SITE79) can’t receive the response header message from tomcat, network problems or tomcat (127.0.0.1:8111) is down (errno=54)
In the Workers.properties I added these edits
.max_reuse_connections=500
.connection_pool_size=1000
.connection_pool_timeout=60
in the server XML I added
maxThreads=”1000″ connectionTimeout= “60000”
when we recive the error the ColdFusion.exe is still runnng
Hi @Anit
I have an issue using CF11 with IIS8 and a pool with integrated pipeline mode (DefaultAppPool)
At least once for week all our CF sites get down (including CF Administrator), but .net site is still alive.
This is the error message:
**********************************************
Service temporarily Unavailable!
The server is temporarily unable to service your request due to maintenance or capacity problems. Please try again later.
Tomcat/ISAPI/isapi_redirector/1.2.41
**********************************************
After viewing this post we made change in worker.properties and server.xls as you said but error is still present.
Below are part of isapi_redirect
[Tue Sep 13 11:14:48.123 2016] [24484:62376] [info] ajp_process_callback::jk_ajp_common.c (2244): (cfusion) current reuse count is 62 of max reuse connection 250 and total endpoint count 500
[Tue Sep 13 11:14:48.216 2016] [24484:19248] [error] start_response::jk_isapi_plugin.c (1325): HSE_REQ_SEND_RESPONSE_HEADER failed with error=87 (0x00000057)
[Tue Sep 13 11:14:48.232 2016] [24484:19248] [error] isapi_write_client::jk_isapi_plugin.c (1454): WriteClient failed with 1229 (0x000004cd)
[Tue Sep 13 11:14:48.232 2016] [24484:19248] [info] ajp_process_callback::jk_ajp_common.c (2175): (cfusion) Writing to client aborted or client network problems
[Tue Sep 13 11:14:48.263 2016] [24484:19248] [info] ajp_service::jk_ajp_common.c (2903): (cfusion) sending request to tomcat failed (unrecoverable), because of client write error (attempt=1)
[Tue Sep 13 11:14:48.279 2016] [24484:19248] [info] HttpExtensionProc::jk_isapi_plugin.c (2759): service() failed because client aborted connection
[Tue Sep 13 11:14:48.310 2016] [24484:35796] [info] ajp_process_callback::jk_ajp_common.c (2244): (cfusion) current reuse count is 67 of max reuse connection 250 and total endpoint count 500
[Tue Sep 13 11:14:48.248 2016] [24484:40200] [info] ajp_process_callback::jk_ajp_common.c (2244): (cfusion) current reuse count is 65 of max reuse connection 250 and total endpoint count 500
[Tue Sep 13 11:14:48.248 2016] [24484:21576] [info] ajp_process_callback::jk_ajp_common.c (2244): (cfusion) current reuse count is 66 of max reuse connection 250 and total endpoint count 500
[Tue Sep 13 11:14:48.388 2016] [24484:15404] [info] ajp_process_callback::jk_ajp_common.c (2244): (cfusion) current reuse count is 68 of max reuse connection 250 and total endpoint count 500
[Tue Sep 13 11:14:48.388 2016] [24484:14716] [info] ajp_process_callback::jk_ajp_common.c (2244): (cfusion) current reuse count is 69 of max reuse connection 250 and total endpoint count 500
[Tue Sep 13 11:14:48.388 2016] [24484:74436] [info] ajp_process_callback::jk_ajp_common.c (2244): (cfusion) current reuse count is 70 of max reuse connection 250 and total endpoint count 500
[Tue Sep 13 11:14:48.404 2016] [24484:87292] [info] ajp_process_callback::jk_ajp_common.c (2244): (cfusion) current reuse count is 71 of max reuse connection 250 and total endpoint count 500
[Tue Sep 13 11:14:48.435 2016] [24484:52508] [info] ajp_process_callback::jk_ajp_common.c (2244): (cfusion) current reuse count is 72 of max reuse connection 250 and total endpoint count 500
[Tue Sep 13 11:14:48.435 2016] [24484:85128] [info] ajp_process_callback::jk_ajp_common.c (2244): (cfusion) current reuse count is 73 of max reuse connection 250 and total endpoint count 500
[Tue Sep 13 11:14:48.482 2016] [24484:72824] [info] ajp_process_callback::jk_ajp_common.c (2244): (cfusion) current reuse count is 74 of max reuse connection 250 and total endpoint count 500
[Tue Sep 13 11:14:48.482 2016] [24484:69484] [info] ajp_process_callback::jk_ajp_common.c (2244): (cfusion) current reuse count is 75 of max reuse connection 250 and total endpoint count 500
[Tue Sep 13 11:14:48.498 2016] [24484:74180] [info] ajp_process_callback::jk_ajp_common.c (2244): (cfusion) current reuse count is 76 of max reuse connection 250 and total endpoint count 500
[Tue Sep 13 11:14:48.935 2016] [24484:29368] [info] ajp_process_callback::jk_ajp_common.c (2244): (cfusion) current reuse count is 77 of max reuse connection 250 and total endpoint count 500
[Tue Sep 13 11:14:49.685 2016] [24484:28852] [info] ajp_process_callback::jk_ajp_common.c (2244): (cfusion) current reuse count is 78 of max reuse connection 250 and total endpoint count 500
[Tue Sep 13 11:14:49.685 2016] [24484:28596] [info] ajp_process_callback::jk_ajp_common.c (2244): (cfusion) current reuse count is 79 of max reuse connection 250 and total endpoint count 500
[Tue Sep 13 11:14:49.685 2016] [24484:34160] [info] ajp_process_callback::jk_ajp_common.c (2244): (cfusion) current reuse count is 80 of max reuse connection 250 and total endpoint count 500
[Tue Sep 13 11:14:49.685 2016] [24484:86516] [info] ajp_process_callback::jk_ajp_common.c (2244): (cfusion) current reuse count is 81 of max reuse connection 250 and total endpoint count 500
[Tue Sep 13 11:14:49.685 2016] [24484:51896] [info] ajp_process_callback::jk_ajp_common.c (2244): (cfusion) current reuse count is 82 of max reuse connection 250 and total endpoint count 500
[Tue Sep 13 11:14:49.685 2016] [24484:65600] [info] ajp_process_callback::jk_ajp_common.c (2244): (cfusion) current reuse count is 88 of max reuse connection 250 and total endpoint count 500
[Tue Sep 13 11:14:49.685 2016] [24484:8248] [info] ajp_process_callback::jk_ajp_common.c (2244): (cfusion) current reuse count is 84 of max reuse connection 250 and total endpoint count 500
[Tue Sep 13 11:14:49.685 2016] [24484:56724] [info] ajp_process_callback::jk_ajp_common.c (2244): (cfusion) current reuse count is 85 of max reuse connection 250 and total endpoint count 500
[Tue Sep 13 11:14:49.685 2016] [24484:78172] [info] ajp_process_callback::jk_ajp_common.c (2244): (cfusion) current reuse count is 86 of max reuse connection 250 and total endpoint count 500
[Tue Sep 13 11:14:49.685 2016] [24484:83320] [info] ajp_process_callback::jk_ajp_common.c (2244): (cfusion) current reuse count is 87 of max reuse connection 250 and total endpoint count 500
[Tue Sep 13 11:14:49.685 2016] [24484:75492] [info] ajp_process_callback::jk_ajp_common.c (2244): (cfusion) current reuse count is 83 of max reuse connection 250 and total endpoint count 500
[Tue Sep 13 11:14:49.685 2016] [24484:7616] [info] ajp_process_callback::jk_ajp_common.c (2244): (cfusion) current reuse count is 91 of max reuse connection 250 and total endpoint count 500
[Tue Sep 13 11:14:49.685 2016] [24484:23372] [info] ajp_process_callback::jk_ajp_common.c (2244): (cfusion) current reuse count is 93 of max reuse connection 250 and total endpoint count 500
[Tue Sep 13 11:14:49.685 2016] [24484:36712] [info] ajp_process_callback::jk_ajp_common.c (2244): (cfusion) current reuse count is 94 of max reuse connection 250 and total endpoint count 500
[Tue Sep 13 11:14:49.685 2016] [24484:50264] [info] ajp_process_callback::jk_ajp_common.c (2244): (cfusion) current reuse count is 95 of max reuse connection 250 and total endpoint count 500
[Tue Sep 13 11:14:49.685 2016] [24484:98552] [info] ajp_process_callback::jk_ajp_common.c (2244): (cfusion) current reuse count is 97 of max reuse connection 250 and total endpoint count 500
[Tue Sep 13 11:14:49.685 2016] [24484:78448] [info] ajp_process_callback::jk_ajp_common.c (2244): (cfusion) current reuse count is 96 of max reuse connection 250 and total endpoint count 500
[Tue Sep 13 11:14:49.685 2016] [24484:27948] [info] ajp_process_callback::jk_ajp_common.c (2244): (cfusion) current reuse count is 98 of max reuse connection 250 and total endpoint count 500
[Tue Sep 13 11:14:49.685 2016] [24484:28024] [info] ajp_process_callback::jk_ajp_common.c (2244): (cfusion) current reuse count is 99 of max reuse connection 250 and total endpoint count 500
[Tue Sep 13 11:14:49.685 2016] [24484:89012] [info] ajp_process_callback::jk_ajp_common.c (2244): (cfusion) current reuse count is 89 of max reuse connection 250 and total endpoint count 500
[Tue Sep 13 11:14:49.685 2016] [24484:12440] [info] ajp_process_callback::jk_ajp_common.c (2244): (cfusion) current reuse count is 90 of max reuse connection 250 and total endpoint count 500
[Tue Sep 13 11:14:49.685 2016] [24484:70800] [info] ajp_process_callback::jk_ajp_common.c (2244): (cfusion) current reuse count is 92 of max reuse connection 250 and total endpoint count 500
[Tue Sep 13 11:14:49.748 2016] [24484:30328] [info] ajp_process_callback::jk_ajp_common.c (2244): (cfusion) current reuse count is 100 of max reuse connection 250 and total endpoint count 500
[Tue Sep 13 13:10:20.236 2016] [24484:1156] [info] jk_open_socket::jk_connect.c (817): connect to 127.0.0.1:8014 failed (errno=61)
[Tue Sep 13 13:10:20.236 2016] [24484:1156] [info] ajp_connect_to_endpoint::jk_ajp_common.c (1127): (cfusion) Failed opening socket to (127.0.0.1:8014) (errno=61)
[Tue Sep 13 13:10:20.236 2016] [24484:1156] [error] ajp_send_request::jk_ajp_common.c (1787): (cfusion) connecting to backend failed. Tomcat is probably not started or is listening on the wrong port (errno=61)
[Tue Sep 13 13:10:20.236 2016] [24484:1156] [info] ajp_service::jk_ajp_common.c (2903): (cfusion) sending request to tomcat failed (recoverable), because of error during request sending (attempt=1)
[Tue Sep 13 13:10:21.251 2016] [24484:62412] [info] jk_open_socket::jk_connect.c (817): connect to 127.0.0.1:8014 failed (errno=61)
[Tue Sep 13 13:10:21.251 2016] [24484:62412] [info] ajp_connect_to_endpoint::jk_ajp_common.c (1127): (cfusion) Failed opening socket to (127.0.0.1:8014) (errno=61)
[Tue Sep 13 13:10:21.251 2016] [24484:62412] [error] ajp_send_request::jk_ajp_common.c (1787): (cfusion) connecting to backend failed. Tomcat is probably not started or is listening on the wrong port (errno=61)
[Tue Sep 13 13:10:21.251 2016] [24484:62412] [info] ajp_service::jk_ajp_common.c (2903): (cfusion) sending request to tomcat failed (recoverable), because of error during request sending (attempt=1)
[Tue Sep 13 13:10:21.470 2016] [24484:1156] [info] jk_open_socket::jk_connect.c (817): connect to 127.0.0.1:8014 failed (errno=61)
[Tue Sep 13 13:10:21.470 2016] [24484:1156] [info] ajp_connect_to_endpoint::jk_ajp_common.c (1127): (cfusion) Failed opening socket to (127.0.0.1:8014) (errno=61)
[Tue Sep 13 13:10:21.470 2016] [24484:1156] [error] ajp_send_request::jk_ajp_common.c (1787): (cfusion) connecting to backend failed. Tomcat is probably not started or is listening on the wrong port (errno=61)
[Tue Sep 13 13:10:21.470 2016] [24484:1156] [info] ajp_service::jk_ajp_common.c (2903): (cfusion) sending request to tomcat failed (recoverable), because of error during request sending (attempt=2)
[Tue Sep 13 13:10:21.470 2016] [24484:1156] [error] ajp_service::jk_ajp_common.c (2924): (cfusion) connecting to tomcat failed (rc=-3, errors=65, client_errors=2).
[Tue Sep 13 13:10:21.470 2016] [24484:1156] [error] HttpExtensionProc::jk_isapi_plugin.c (2763): service() failed with http error 503
[Tue Sep 13 13:10:22.251 2016] [24484:80024] [info] jk_open_socket::jk_connect.c (817): connect to 127.0.0.1:8014 failed (errno=61)
[Tue Sep 13 13:10:22.251 2016] [24484:80024] [info] ajp_connect_to_endpoint::jk_ajp_common.c (1127): (cfusion) Failed opening socket to (127.0.0.1:8014) (errno=61)
[Tue Sep 13 13:10:22.251 2016] [24484:80024] [error] ajp_send_request::jk_ajp_common.c (1787): (cfusion) connecting to backend failed. Tomcat is probably not started or is listening on the wrong port (errno=61)
[Tue Sep 13 13:10:22.251 2016] [24484:80024] [info] ajp_service::jk_ajp_common.c (2903): (cfusion) sending request to tomcat failed (recoverable), because of error during request sending (attempt=1)
[Tue Sep 13 13:10:22.564 2016] [24484:62412] [info] jk_open_socket::jk_connect.c (817): connect to 127.0.0.1:8014 failed (errno=61)
[Tue Sep 13 13:10:22.564 2016] [24484:62412] [info] ajp_connect_to_endpoint::jk_ajp_common.c (1127): (cfusion) Failed opening socket to (127.0.0.1:8014) (errno=61)
[Tue Sep 13 13:10:22.564 2016] [24484:62412] [error] ajp_send_request::jk_ajp_common.c (1787): (cfusion) connecting to backend failed. Tomcat is probably not started or is listening on the wrong port (errno=61)
[Tue Sep 13 13:10:22.564 2016] [24484:62412] [info] ajp_service::jk_ajp_common.c (2903): (cfusion) sending request to tomcat failed (recoverable), because of error during request sending (attempt=2)
[Tue Sep 13 13:10:22.564 2016] [24484:62412] [error] ajp_service::jk_ajp_common.c (2924): (cfusion) connecting to tomcat failed (rc=-3, errors=66, client_errors=2).
[Tue Sep 13 13:10:22.564 2016] [24484:62412] [error] HttpExtensionProc::jk_isapi_plugin.c (2763): service() failed with http error 503
[Tue Sep 13 13:10:22.611 2016] [24484:1156] [info] jk_open_socket::jk_connect.c (817): connect to 127.0.0.1:8014 failed (errno=61)
[Tue Sep 13 13:10:22.611 2016] [24484:1156] [info] ajp_connect_to_endpoint::jk_ajp_common.c (1127): (cfusion) Failed opening socket to (127.0.0.1:8014) (errno=61)
[Tue Sep 13 13:10:22.611 2016] [24484:1156] [error] ajp_send_request::jk_ajp_common.c (1787): (cfusion) connecting to backend failed. Tomcat is probably not started or is listening on the wrong port (errno=61)
[Tue Sep 13 13:10:22.611 2016] [24484:1156] [info] ajp_service::jk_ajp_common.c (2903): (cfusion) sending request to tomcat failed (recoverable), because of error during request sending (attempt=1)
[Tue Sep 13 13:10:23.329 2016] [24484:49060] [info] jk_open_socket::jk_connect.c (817): connect to 127.0.0.1:8014 failed (errno=61)
Any idea about solution?
@David, thank you. The connection_pool_size = maxThreads always. So, you can’t keep both these values different. You can do the connector tuning based on your requirements and keeping the different use cases mentioned above. In case of Standard edition, the value of “Maximum number of simultaneous Template requests” under ColdFusion Admin changes to the value set for “maxThreads” in server.xml
What is the relationship between maxThreads in server.xml and “Maximum number of simultaneous Template requests” in CFIDE?
The CFIDE value appears to be saved in neo-runtime.xml, however is synchronized with maxThreads on some re-starts. (it’s not 100% consistent, but very close).
The reason I ask is that our CFIDE value has traditionally been very low – around 50. The numbers being quoted above are huge by comparison (700 in scenario 2)
We run 2 sites per server; 1 busy site and a CFIDE. Single connector configuration. We were finding the CFIDE site was becoming unresponsive for new connections, even through active requests on the server were well under the configured value for “Maximum number of simultaneous Template requests”.
We were seeing this sort of thing in the metrics.log file: “Max threads: 50 Current thread count: 50 Current thread busy: 50 Max” – Although only around 5-10 active requests according for FusionReactor (multiple thread dumps confirmed this)
We came to the conclusion that we were running out of connector threads, therefore tuned the connector with the values for Use Case 2 above. – Then we noticed the “Maximum number of simultaneous Template requests” had changed.
What I think what we want to do is run with large values for “connection_pool_size”, however keep MaxServerThreads at 50. So that if we have >50 threads executing in CF, excess threads get queued in the connector until a CF thread becomes available.
However, this goes against the above advice of having connection_pool_size = maxThreads.
Thoughts/Suggestions?
@Byron, Here are my responses again.
**
A
**
I will iterate, you can make connection_pool_size as large as you would like. Rather, you can set the connection_pool_size/maxThread very large and then keep updating the max_reuse_connections as and when you add sites. But, you are allocating threads, which would be used in the future or may not be used at all. However, the sockets will remain open, the threads will consume resources and memory as well. The memory or resource utilization, or the socket utilization can’t be calculated. The tuning is alike any other server level tuning, where the basics remain the same – don’t set too low and neither too high parameters, but rather optimize. There are several such discussion around the web, and I am sharing one such link for you http://www.genericarticles.com/mediawiki/index.php?title=How_to_optimize_tomcat_performance_in_production#Guidelines_for_maxThreads:
**
B
**
I see the sense here Byron. But, the connector tool is not automated. It doesn’t check the corresponding value for connection_pool_size, when max_reuse_connections is set or vice versa. All four suggestions are valid in this case and any one out of these, could be taken into consideration. At this point, you can go ahead and log a bug (as you said) and we will consider it from an ER perspective.
AFA, your below query is concerned,
What is the logic for having the max_reuse_connections setting, but not the connection_pool_size setting, present in the web connector configuration program? Perhaps you are trying to prevent the web connector configuration program from “undoing” custom settings that CF-ops may already have in place in their worker.properties files(?)
We have already an ER for this and are working on it.
**
C
**
Okay, I see the confusion here. There are two defaults in this context. One is from Tomcat, which is default maxThreads=200 and the ColdFusion default is 250. I should have mentioned, “The default connection pool size for “ColdFusion” is “250”, instead. The Tomcat defaults are need not be relevant here. I have updated the blog there 🙂
**
D
**
Correct. The max_reuse_connections would be connection pool size based, not CF instance based since max_reuse_connections are set in the worker.properties file for each connection pool, and not per CF server. I correct myself here.
Let’s consider the Use Case# 3 again. Three sites and individual connectors are created for each site.
-Site 1 is running under high load
-Site 2 and Site 3 running are under low load
For calculation purpose and better understanding, let’s assign some values for connection_pool_size (as in the, actual example).
Site 1
connection_pool_size = 800
Site 2 and Site 3
connection_pool_size = 250
Total connection_pool_size/maxThreads = 1300 {800 (for site 1) + 250 (for site 2) + 250 (for site 3)
If you have connector created with “ALL” then the resources allocated, will be shared across connections. Thus, max_reuse_connections = (connection_pool_size / {no of site} i.e. 1300/3=450 or 500 (approx), gives the max_reuse_connections. Important point to note here is, max_reuse_connections set here, is for all websites configured with that single connector, and irrespective of the load/traffic on individual site.
For the same scenario, the individual connectors would have their own dedicated connections. I have done some more analysis around this. The tuning needs to be done, based on individual sites here. The max_reuse_connections calculation will again be, based on the connection_pool_size. However, the max_reuse_connections should also be proportionate to the load/traffic on the site and not simply based on the max_reuse_connections = (connection_pool_size / {no of site} calculation. New values “CAN” be:-
Site 1
connection_pool_size = 800
max_reuse_connections = 270 (As per the previous calculation)
max_reuse_connections = 500 (New value based on connection_pool_size & proportionate to the load/traffic)
Site 2 and Site 3
connection_pool_size = 250
max_reuse_connections = 100 (As per the previous calculation)
max_reuse_connections = 100 (This value didn’t change, because the impact is not that proportionate for low traffic sites)
Again, point to remember is, while creating individual connectors, we would prefer to keep the max_reuse_connections not go below 100/150 (even if it’s a low traffic running site). For moderate traffic, the value may be 200/250. For a site running under high volume, the value may go around 500 or as higher as required.
And yes, let me know, if there are any further concerns. 🙂
@Charlie, I did check the posts to be “moderated”. However, didn’t find any of your posts, awaiting moderation. Could you please comment again? I have responded to Byron’s query and could answer your queries as well, if not covered yet.
And yes apologies for the trouble you had with my blogs. There was some maintenance going on. But, now it’s up and running again.
Thank you Anit for taking the time to reply. You mentioned “Let me know, if there are any further concerns”, and so I will continue. Lucky you 🙂
(The following is in regard to my prior post (#48) and your reply to it (#50)).
Item **A** is my top priority and the rest is included to complete the discussion.
You started by talking about my summary of “RE-USE CONNECTIONS”. I would like to reply to that (and talk about my own testing), but I will put that section at the BOTTOM of this post since it is covers new ground.
My original post had summary sections *A,B,C,D* and you replied to those summary sections. Below are my follow up replies.
==========================
**
A
**
Your reply is pretty much what I was hoping it would be. However, I will reword my last point (from item *A*) and I am hoping for a reply when you have time. This is the number one item that I was curious about in my first post:
I originally asked:
“What sort of resource hit does a LARGE connection pool size take?
It would be nice, for example, to make the connection_pool_size (and MaxThreads) large enough IN ADVANCE so that as we ADD more sites to our pool(s), that we don’t have to remember to reset those two parameters (and thus be forced to reset IIS and CF!).”
(and you replied: “You can make connection_pool_size as large as you would like. However, the resources would be occupied, even if they are not used actually. Same explanation applies for max_reuse_connections as well).
Here is my question again, reworded:
In the interest of “set it one time and forget it”, once I have my max_reuse_connections count set (based on my heaviest trafficked site), I would like to make my connection_pool_size(s) and maxThreads VERY large so that when I add more IIS sites, I don’t have to constantly adjust those two values (which require an IIS and CF service restart). What is the actual resource hit ON THE SERVER and CF-SERVICE (if any) in setting a VERY LARGE connection_pool_size/maxThreads size? (so that I understand the implications of setting them a LOT larger than I really need). Are there strong reasons to not do this?
===========================
**
B
**
I call it a bug, you say it is not a bug.
We both agree on the key point:
…..
“… isn’t pool size supposed to be at least as big as the reuse size?” (me)
“… yes, max_reuse_connections should always be less than or equivalent to connection_pool_size.” (Anit)
…..
The bottom line is that it is EASILY possible to run the web connector configuration program in a way that causes a misconfiguration in the worker.properties file.
You say it is not a bug because the CF-op is supposed to MANUALLY adjust the connection_pool_size setting in the worker.properties file to compensate for the value selected for the max_reuse_connections setting when the connector configuration program is run. To be fair, yes, you do mention this in your blog.
I would suggest that either:
A. the connector configuration program should also prompt for a value for connection_pool_size (since it already prompts for max_reuse_connections) and warn that connection_pool_size needs to be set at least as big as the max_reuse_connections size
or
B. the connector configuration program should set the connection_pool_size to be as large as the max_reuse_connections to try to eliminate misconfigurations for CF-ops that do not take the time to manually adjust the worker.properties file
or
C. remove the max_reuse_connections selection entirely from the connector configuration program (and let BOTH settings be set manually by the CF-op in the workers.properties file, and default them both to 250 when the connector configuration program is run).
or
D. at a minimum, place a note on the same screen as the max_reuse_connections setting (in the web connector configuration program) to “remind” the CF-op that raising this value over 250 means that care should be taken to manually raise the connection_pool_size in the worker.properties file to a value matching or greater than the max_reuse_connections setting. Of course, to take that to it’s conclusion, you would also add a note mentioning that the server.xml file also needs tweaked to match the summation of the connection_pool_sizes :-
(What is the logic for having the max_reuse_connections setting, but not the connection_pool_size setting, present in the web connector configuration program? Perhaps you are trying to prevent the web connector configuration program from “undoing” custom settings that CF-ops may already have in place in their worker.properties files(?)
But there are also a lot of CF-ops that will never tweak their worker.properties files by hand, and instead rely on the connector configuration program to make the right choices. And for those CF-ops, it currently is easily possible to “break” their worker.properties settings by setting the max_reuse_connections in the connector configuration program to a value greater than 250.
I don’t envy your position in this. Tough choices for sure).
================
**
C
**
I see an inconsistency. Early in the blog you mention the defaults are 200 for both connection_pool_size and max_reuse_connections. Don’t you really mean 250? (The web connector configuration program sets max_reuse_connections to 250 by default, and my isapi_redirect.log shows endpoint count to be 250 by default when both connection_pool_size and maxThreads are left unset. Thus we can ignore the Tomcat docs about server.xml’s default maxThreads=200 (http://tomcat.apache.org/tomcat-5.5-doc/config/ajp.html)).
================
**
D
**
I am sorry, but I still fail to understand this one. You are telling us:
…….
1. “In Use Case# 3, there are three sites and individual connectors are created for each site. However, all the three sites are bind to the same ColdFusion instance (cfusion in that case). Thus the divisor in this case would remain 3”
but then later you say:
2. “If you have connector created with “ALL” then the resources allocated, will be shared across connections. Whereas, the individual connectors would have their own dedicated connections.” …
……
Drilling down on your above comment: Individual connectors would have their OWN dedicated connections.
This tells me that max_reuse_connections would be connection POOL based, not CF instance based since max_reuse_connections are set in the worker.properties file for each connection pool, no? And the isapi_redirects.log file seems to back this up since it tracks the max_reuse_connections PER connection pool, not per CF server.
Sorry to be dense, but you lost me on this one and still don’t understand the math in your Case# 3.
=================
**
“RE-USE CONNECTIONS”.
**
It is funny you should bring up my “mistake”. I did a lot more testing AFTER I wrote that post, and I agree that my “RE-USE CONNECTIONS” summary was off base, BUT only in this regard: IIS sites that share a CF connection pool DO share one common max_reuse_connections setting (as you say) as long as they ALSO share the same IIS application pool! However, my testing and examination of my isapi_redirects.log file show that each IIS APPLICATION POOL will use it’s OWN reuse pool, even if only one CF connection pool is shared between those application pools. At least, that is what the isapi_redirects.log file is telling me. Each application pool has its own unique identifier in the isapi_redirects.log file and tracks its OWN max reuse connection count. No?
Example: IIS CF site #1 is in Application pool #1. IIS CF site #2 is in application pool #2. Both application pools share one CF connection pool. Both application pools track its own independent max reuse connection count in the isapi_redirects.log file. So if the connection pools max_reuse_connections is set to 250, then both application pools show how many of their OWN 250 are being used, INDEPENDENT from one another, in the same isapi_redirects.log file (since they share the same connection pool). In this case (two IIS application pools sharing one CF connection pool), the connection_pool_size needs to be set to a minimum of {{number of IIS application pools sharing the same CF connection pool} * {max_reuse_connections for that connection pool}}.
Am I off base here?
And if I reset one application pool, the ‘current reuse count’ in the isapi_redirects.log file is reset for ONLY that one application pool. (There is another discussion to be had about one application pool that contains multiple CF connection pools, but I’ll stop here).
Thanks again Anit for taking the time. I understand that you may be too busy to address everything I wrote. But I wanted to get it on the record and be complete.
Byron
Hi Byron,
Apologies for the delayed response. I was just caught up with some other ColdFusion issues. I appreciate your deep dive analysis and would try to answer as descriptive as possible, covering all your concerns.
Before, I respond to your questions below, I would like to amend your example for “RE-USE CONNECTIONS”. This setting determines the count of connections that can be re-used. EACH WEBSITE in a connection pool can use UP TO the re-use value (max_reuse_connections). So, for example: let’s say that you configure your connectors with the “ALL” option. You have 10 websites. You decide to set your max_reuse_connections to 300 (based on metric logging results, or based on advice found in this blog). Therefore, the 300 reuse_connections would be used by all 10 websites and not 3000. Please see the “Use Case# 2: Connector created with “ALL” and multiple sites in IIS”. The connection_pool_size/ MaxThreads should be based on the traffic/load on the site.
You are correct. The connection_pool_timeout/connectionTimeout) set to 60/60000 should work in almost all scenarios.
==============================
Now jumping to your questions:
==============================
**
A.
**
The tuning can be started from either from max_reuse_connections or connection_pool_size. There is no defined rule for it. You can start with setting some values for max_reuse_connections, leaving the connection_pool_size (infinite) and once you have the approximate value for max_reuse_connections, then you can tune connection_pool_size. For us, instead of going with a random start value for max_reuse_connections, we rely on metric.log. Once we get the max thread count, we know, what value to start with for connection_pool_size. And then, we can further tune the max_reuse_connections. That’s the reason, we start with connection_pool_size, because we have some base to start tuning with. Having said that, it all depends upon your approach for connector tuning. Start with one, tune it, and then tune the other dependent one.
You can make connection_pool_size as large as you would like. However, the resources would be occupied, even if they are not used actually. Same explanation applies for max_reuse_connections as well.
**
B.
**
It’s not a bug.
You can set the max_reuse_connections as per your requirement. And similarly the connection_pool_size as well. However, changing one will not result into change of other automatically. You need to manually update the connection_pool_size/max_reuse_connections/connection_pool_timeout as per your requirement, every time you want a change.
At our end, we change the following in worker.properties
max_reuse_connections=1000
connection_pool_size=500
And the isapi_redirect.log shows the change:-
[Thu May 07 04:06:12.434 2015] [12116:4320] [info] ajp_process_callback::jk_ajp_common.c (2154): current reuse count is 1 of max reuse connection 1000 and total endpoint count 500
And yes, max_reuse_connections should always be less than or equivalent to connection_pool_size. It can’t be larger than the connection_pool_size. I have mentioned the same in the “Some key points to remember” section as well.
**
C.
**
It’s not a bug, but probably can be considered as an ER (Enhancement Request). A default worker.properties looks like:-
worker.list=cfusion
worker.cfusion.type=ajp13
worker.cfusion.host=localhost
worker.cfusion.port=8014
worker.cfusion.max_reuse_connections=250
If you notice, connection_pool_size is not a part of default settings. Thus, the corresponding maxThreads is not set. As a part of tuning, when we set connection_pool_size in the worker.properties, we add the corresponding maxThreads in server.xml.
**
D.
**
In Use Case# 3, there are three sites and individual connectors are created for each site. However, all the three sites are bind to the same ColdFusion instance (cfusion in that case). Thus the divisor in this case would remain 3 and not ONE, because there are three sites – Site 1, Site 2 & Site 3.
If you have connector created with “ALL” then the resources allocated, will be shared across connections. Whereas, the individual connectors would have their own dedicated connections. The reuse connection plays an important role in terms of performance, but the calculation has a little tweak to it. While creating individual connectors, we would prefer to keep the max_reuse_connections not go below 100 or 150 (even if it’s a low traffic running site). For moderate traffic, the value may be 200/250. For a site running under high volume, the value may go around 500 or as higher as required.
Thus, for Site 1 running under high load, the max_reuse_connections may range between 270 till 500 or even go higher, depending on the site load.
Similarly, for Site 2 and Site 3, max_reuse_connections=100 (connection_pool_size / {no of site} i.e. 250/3=100 approx)
The ground rule max_reuse_connections = (connection_pool_size / {no of site} holds good for all scenarios, with some minor tweaks.
Let me know, if there are any further concerns.
Anit, my prior post (#42) has bad assumptions (because I misunderstood one key premise).
I can’t edit it, so instead I will post an update here. PLEASE IGNORE post #42. All relevant info has been moved to this re-written post.
I would really appreciate it if you could address the QUESTIONS section when you have time.
IF YOU ONLY HAVE TIME TO ANSWER QUESTION #A, it would be GREATLY appreciated! Thanks in advance.
……..
Thanks to Charlie Arehart for sharing his experiences with the connection timeout (and how it fixes his client’s problems). At first, I could not figure out why that was so, since from the way this blog is worded, I assumed that connection timeouts did NOT apply to the RE-USABLE connections (because I mis-understood what this blog meant by re-use connections).
After hours of time on google, reading the Tomcat docs, and then doing my own testing and log examinations, I would like to offer this:
SUGGESTION: for me, it would be less confusing if this blog FIRST defined what the connection pool is (the big picture). THEN define the connection timeouts (and how it applies to ALL connections). THEN LASTLY define the reusable connections … but perhaps present the RE-USE CONNECTIONS section as follows:
—–
RE-USE CONNECTIONS: This setting determines the count of connections that can be re-used. EACH WEBSITE in a connection pool can use UP TO the re-use value (max_reuse_connections). So, for example: let’s say that you configure your connectors with the “ALL” option. You have 10 websites. You decide to set your max_reuse_connections to 300 (based on metric logging results, or based on advice found in this blog). Therefore {{10 websites} times {300-re-use-per-site} =3000-possible-total-connections-at-once}. And so your connection POOL size (set with connection_pool_size and MaxThreads) could be 3000 (or possibly higher, see **).
The corollary is that IF you configure each IIS website to use its OWN connection pool, then it is OK to have max_reuse_connections and connection pool size (connection_pool_size and MaxThreads) be set to the same value since only one website is sharing the pool in this scenario. But it is also OK to have them (connection_pool_size and MaxThreads) be set larger than max_reuse_connections (see **).
If the connections don’t get recycled, then you may experience system lock-up or instability once the entire pool is used up. SO IT IS CRITICAL to set the timeout (connection_pool_timeout / connectionTimeout) to something other than the default (infinity). Values of 60 / 60000 seem to work well.
(end suggestion section)
———
** ANIT: please see QUESTION #A (below) relating directly to the above summary, thanks.
===============
Sections below:
> TESTING for appropriate re-use value
> NOTES
> QUESTIONS (I hope you have time to answer them)
===============
TESTING:
(this is how I did it; I do not claim to be the final word here! Anit has test suggestions in his blog too)
===============
1. By taking the time to fully ‘load test’ our sites, and carefully monitor the Metric Logging while our sites are under FULL load, one can establish a fairly accurate “active” thread count that each site uses. Try here to get metrics for a “worst case” load so that the “current thread count” seen in the metric logs reflects this load. Load test the busiest site that you host (or, if unsure, load test them all, but do so one by one). While running these load tests, set connection_pool_size to a very high value (I used 3000), remove the max_reuse_connections setting (infinity), and for the connection_pool_timeout, I chose to to set it at 60 (which is what I plan to use in production). This way, spikes in thread use under load is “captured” in the metric logs, but don’t accumulate (they timeout and go back to the pool).
2. If you plan to segregate sites into separate connection pools, then be selective during the load tests so you know which results go to which pool.
3. Taking the results from step 1, set max_reuse_connections to the highest “current thread count” that you observed in the metric logs.
4. By flipping the equation Anit provided, I set connection_pool_size = (# of sites) * (max_reuse_connections). This is subject to adjustment; for example, perhaps you have one VERY active site, and the rest are very low activity. Then connection_pool_size could be a lower value than what the formula says. (ANIT: please see QUESTION #A (below) relating directly to this).
5. IF you plan to add more sites to a connection pool, then make sure to compensate the connection_pool_size: either ahead of time, or each time you add a site. (ANIT: please see QUESTION #A (below) relating directly to this). If this new site is MORE active than the one(s) you tested earlier, then compensate the max_reuse_connections for that too (see #1 above).
===============
NOTES:
===============
> Anit’s formula {(max_reuse_connections) = (connection_pool_size) / (# of sites)} is a general GUIDE ONLY. Even his own case examples don’t use it 4 out of 5 times. It is much better to run metric tests and figure out the values based on your actual loads.
> connection_pool_size must always be equal or larger than max_reuse_connections. If sharing a connection pool among sites, then connection_pool is generally set to a value greater than max_reuse_connections, as discussed in the blog and prior sections in this post.
> connection_pool_timeout is not set by default and that means it is infinity. If we set it, it is in seconds. (This setting needs to be set in production in most cases!)
> {max_reuse_connections, connection_pool_size and connection_pool_timeout} are set in the worker.properties file(s). Changes to the worker.properties file(s) require IIS restart.
> maxThreads needs to be the SAME value as the summation of ALL connection_pool_size(s).
> connectionTimeout needs to be the SAME value as connection_pool_timeout (and it is in milliseconds, not seconds).
> maxThreads and connectionTimeout are set in the server.xml file. Changes to the server.xml file require CF service restart.
===============
QUESTIONS:
===============
**
A.
**
This blog says “it is a good practice to start tuning the connection_pool_size first”, THEN back our way into the re-use size. In other words, this blog says: {re-use = {{pool-size} / {#-of-sites-in-pool}} .
BUT since heavy-use sites take LOTS of re-use, WHY should we throttle re-use? Why can’t we tune re-use size first? and THEN set pool-size {pool-size = {{re-use} * {#-of-sites-in-pool}} ?? What sort of resource hit does a LARGE connection pool size take?
It would be nice, for example, to make the connection_pool_size (and MaxThreads) large enough IN ADVANCE so that as we ADD more sites to our pool(s), that we don’t have to remember to reset those two parameters (and thus be forced to reset IIS and CF!).
……..
B. Bug?: I see problems when I increase the max_reuse_connections while running the connector installation program (web server configuration tool). In a test, I increased the max_reuse_connections to 555 but the connection_pool_size remains at 250 as seen in my isapi_redirect.log:
ajp_process_callback::jk_ajp_common.c : current reuse count is 4 of max reuse connection 555 and total endpoint count 250.
Shouldn’t the connector install program set the connection_pool_size to be 555 as well since isn’t pool size supposed to be at least as big as the re-use size?
……
C. Bug?: Also, the MaxThreads in the server.xml file is not getting set. Tomcat docs say the default is 200
(at http://tomcat.apache.org/tomcat-5.5-doc/config/ajp.html)
and so EVEN for a standard connector installation (max_reuse_connections = 250 and connection_pool_size = 250, shouldn’t the installation program set the MaxThreads size to be 250 as well? since I thought that MaxThreads was supposed to be the SAME size as the pool?)
……
D. Typo?: In your blog at “Case#3/Site1”, there is only ONE site in its pool. But in your formula, you divide by 3. Although 3 is the total sites in IIS in your example, there is only ONE site in Site 1’s connection pool, so the divisor should be ONE, correct ??
……………….
Respectfully,
Byron
@Anit, thanks, but to be clear I didn’t suggest anything about “tweaking” the timeout setting. I said simply that “changing the timeout from its infinite value (not ‘indefinite’, as said in the blog entry) has been the only change necessary”, so we’re in agreement on that.
As for your observing that “In case you have Reuse Connections and Connection pool size values set incorrectly, then you will definitely experience ‘Service Unavailable’ errors”, I’ll just say again that I have never (not once, in hundreds of sessions helping people solve those errors) solved it by doing change to the other connection settings. This is why I say that I feel the article puts way too much emphasis on it.
But so be it: if you guys are confident that some people experience that problem and solve it with that solution, then I’ll say no more on that and again will await (popcorn in hand) thoughts on the other things Byron. 🙂
@Colin, hope you find happiness in the new world you’re departing for. I’ll stay here back in the old country, serving folks who are happy to have the assistance when they need it. I definitely help them avoid “wasting their life” tuning and troubleshooting, and out of thousands I have helped, I can’t remember one who’s been left concluding that CF was not “fit for purpose”. But hey, to each his own.
@Bryon, We will definitely respond to your queries. Please allow us some time.
@Charlie, thanks for chiming in. I completely agree that connection_pool_timeout is the most important parameter, in terms of tuning. But, you don’t need to tweak that actually. Even if you go with 60s, it works in almost all cases. In case you have Reuse Connections and Connection pool size values set incorrectly, then you will definitely experience “Service Unavailable” errors. Thus, these values require “tuning”.
@Colin, We had a blog article for CF10 and this article for CF11 is almost 8 months old. Tuning is important, for getting optimized performance from connector. Adobe never charged anyone for Connector Tuning.
“ColdFusion tuning” is the main reason why we left this product two years ago. There is zero documentation on this subject beyond default settings from Adobe. Servers continue to crash meanwhile, and the same old threads about this subject continue year after year for well over a decade. Everyone and his dog has an opinion on this subject. You can waste your life “tuning” this thing – to no avail. It just seems a way for consultants to make money out of people.
CF is just not fit for purpose in our eyes, and we are very glad we dumped it.
@Byron, while you await a reply from Anit or others, I’d like to simply things. From my experience, helping now many dozen people, the only thing needed in this “connector tuning” is to set the timeout (in both files). All the other stuff about threads and number of connections and stuff as not proven once to be necessary with anyone I’ve helped.
I said this in comment 38 above, but I know a lot of people are probably not inclined to read all the comments. And it’s also only one opinion–and I realize it’s counter to what Adobe says here.
I just think that (as your lengthy comment shows) the whole blog entry (this one and the one cf cf10) just puts so mch focus on thread counts, when I’ve found that changing the timeout from its infinite value (not “indefinite”, as said in the blog entry) has been the only change necessary.
Folks can do with that what they will.
In any case, I will be curious to see the answers to your questions.
THANK-you Anit for presenting this subject. Your willingness to answer questions is GREATLY appreciated since it helps us understand this better.
Since the information presented in this Blog and comments has become so scattered, I wrote the following summary to try to pull it all back together and to distill the thinking process (I hope you don’t mind).
I tried very hard to make sure there are no errors, but maybe I misunderstood something and if so, I hope you (or others) will correct me! (Also, there are some things that I couldn’t state with certainty, so I put them in the QUESTIONS section instead of the ASSUMPTIONS section).
Sections below:
> ASSUMPTIONS
> NOTES
> QUESTIONS (I hope you have time to answer them)
===============
ASSUMPTIONS:
===============
1. There can be ONE connection pool for all IIS CF sites, or there can be SEVERAL connection pools spanning multiple sites.
2. For EACH connection pool, there TWO types of connections:
> “persistent” connection types that (set by max_reuse_connections)
> “use and release back to pool” connection types {(connection_pool_size) MINUS (max_reuse_connections) EQUAL (“use and release back to pool” connections)}.
**NOTE**: for lack of a better word, I use this term (“use and release back to pool” connection types) a lot in the following discussion.
3. The “persistent” connections (max_reuse_connections) are initially opened by a request, the request finishes, and then that connection stays open INDEFINATELY and waits for another request (ie: connection_pool_timeout does not apply to these connections).
4. The “use and release back to pool” connections are only opened if all the “persistent” (max_reuse_connections) connections are busy.
5. Once a “use and release back to pool” connection is opened, then after the request that opens it goes idle AND the connection_pool_timeout is reached, it is closed and released back to the pool.
6. If we take the time to fully ‘load test’ our sites, and carefully monitor the Metric Logging while our sites are under FULL load, we can establish a fairly accurate “active” thread count that each site uses. We should try here to get metrics for a “worst case” load so that the “current thread count” seen in the metric logs reflects this load. While running these load tests, we should set connection_pool_size to a VERY high value, set max_reuse_connections to 0 for simplicity, and set connection_pool_timeout to infinity by removing that setting (default is infinity). This way, spikes in thread use under load is “captured” and the logs continue to show the high thread use even after the spike is past (because the previously used threads are not returned to the pool due to infinity timeout). That is a very cool way to do this, thanks Anit! (Also related is question (E) below). Note: connection_pool_timeout should usually be used in production (also related to question (F) below).
7. If we plan to segregate sites into separate connection pools, then we need to be selective during our load tests. If we are putting all our sites into one pool, then we will need to load test them all simultaneously (or load test them one by one and then add up their max current thread counts).
8. Taking the results from step 6 & 7, we can then use the ‘current thread count’ results, and that would be a good number to use for max_reuse_connections.
9. However, total current request attempts might, at times, exceed the “persistent” connections available (ie: the max_reuse_connections that you set in the previous step). To account for this, we should make the connection_pool_size bigger than the max_reuse_connections. By flipping the equation Anit provided us, we would set connection_pool_size = (# of sites) * (max_reuse_connections).
Setting connection_pool_size this high also compensate for a “too low” max_reuse_connections setting caused by one (or more) of the following three situations: A CF admin that is unsure what their connection requirements are because they didn’t load test with Metric Logging. A CF admin whose Metric Test results are out-of-date and their connection requirements have drifted upward. A CF admin that has added additional active CF sites to IIS and has not had a chance to re-run load testing metrics.
===============
NOTES:
===============
> Anit’s formula {(max_reuse_connections) = (connection_pool_size) / (# of sites)} is a general GUIDE ONLY. Even his own case examples don’t use it 4 out of 5 times.
> If connection_pool_size is not set, then it defaults to the value that max_reuse_connections is set to.
> connection_pool_size must always be equal or larger than max_reuse_connections (which is not a concern if the only setting we tweak is max_reuse_connections (see above comment)).
> connection_pool_timeout is not set by default and that means it is infinity. If we set it, it is in seconds. If we set the connection_pool_size to be LARGER than the max_reuse_connections size, then make sure to set a value for connection_pool_timeout or the new connections (called “use and release back to pool” throughout my post) made available by the enlarged pool can only be used once then sit idle for time=infinity.
> {max_reuse_connections, connection_pool_size and connection_pool_timeout} are set in the worker.properties file(s) … there can be more than one worker.properties file; see item #1 in the Assumptions section above. Changes to the worker.properties file(s) require IIS restart.
> maxThreads needs to be the SAME value as the summation of ALL connection_pool_size(s).
> connectionTimeout needs to be the SAME value as connection_pool_timeout (but it is in milliseconds, not seconds).
> maxThreads and connectionTimeout are set in the server.xml file. Changes to the server.xml file require CF service restart.
> Anit’s notes mention the default max_reuse_connections is 200. In my CF.11.Ent, it is 250 (and so the default connection_pool_size would also be 250). This must have been changed in newer builds since you wrote this blog.
===============
QUESTIONS:
===============
A. WHAT is the actual RESOURCE hit for using a LARGE max_reuse_connections?? You mention it is packet-oriented/TCP/socket based, but in a 64b, high capacity server, that can’t be much of a hit. WHAT are the other reasons, if any, that we should be concerned about using a HIGH max_reuse_connections setting ??
B. WHAT is the actual RESOURCE hit for the “use and release back to pool” portion of a LARGE connection_pool_size when ALL (or most) if those “use and release” connections are in use? You mention that the socket needs recreated for each reconnection, but in a 64b, high capacity server, that can’t be much of a hit. WHAT are the other reasons, if any, that we should be concerned about using a HIGH connection_pool_size setting ??
(Richard Herbert also asked this on Jun 10, 2014, but you did not answer directly).
(NOTE: a lot of us are wondering WHY we shouldn’t just set max_reuse_connections very high and connection_pool_size even higher and call it good? That is why I ask the above two questions).
C. In your Case#3/Site1, there is only ONE site in its pool. But in your formula, you divide by THREE. Although 3 is the total sites in IIS in your example, there is only ONE site in Site 1’s connection pool, so the divisor should be ONE, correct ??
D. IF connection_pool_timeout is infinity (ie: not set) AND the max_reuse_connections is set too low, THEN raising the connection_pool_size is only a temporary fix since a server UNDER LOAD may still fail because the “use and release back to pool” connections are NEVER released back to the pool(!) after their first use (because of the infinite timeout!) and therefore they are no longer able to contribute to helping a server under load. Thus, I have to wonder WHY is the DEFAULT for connection_pool_timeout set to infinity in the first place?? In what situation does that make sense? There would seem to be NO HARM in setting the default to 60 (or whatever) because even if max_reuse_connections and connection_pool_size ARE the same size (and connection_pool_timeout is not used in that case), STILL, having connection_pool_timeout set does no harm. Wouldn’t it be better for Adobe to set it to a default value of something other than infinity IN PREPARATION FOR its possible need, especially in light of the possible negative effects of having it set to infinity ??
E. In the Metric Logging, is this true?
“current thread count” = {((active max_reuse_connections) plus (waiting max_reuse_connections)) PLUS ((active “use and release back to pool” connections) plus (inactive but not yet returned to pool “use and release back to pool” connections))} ?? If not, could you correct it so we understand exactly what “current thread count” in the metric logs mean?
F. And finally, once we dial in our connector tuning, THEN should the four request limits in CF admin (template, flash, web, cfc) be considered as a possible bottleneck to performance as well? (I know, I hijacked my own post, but I figured it was worth asking now, since I have your attention about request tuning. You can’t blame me for trying 🙂 … if you don’t want to discuss this here, can you point us to anything written about tuning these settings?)
Respectfully,
Byron Knapp
(Coldfusion admin since version 2)
I have been working with instability issues on our servers for some time now.
My main issue turned out to be the number of templates in the system.
With the setting “save class files” turned on we have had a lot of problems where the server simply “locks up” and starts queuing requests/locking up (and more often than not never recovers so a restart is needed).
My initial thougts and efforts went into connector tuning but I newer got the servers to behave as “expected”.
This link in stack overflow got me in the right direction finally: http://stackoverflow.com/questions/10941977/how-to-resolve-deadlock-issue-in-coldfusion-9-coldfusion-util-abstractcacheloc
This indicates that the locking problem has something to do with the saved class files getting corrupted in some way.
When I deselected the “save class files” on my servers the behaviour became more normal and understandable.
This might be something to look out for for those who have many sites / applications and many .cfm templates on the server.
(our number is close to 100 000 templates total)
Regards,
Helge
Thank you all for your feedback and suggestions.
@Chris, we did a fresh install of CF11 on a new instance of Windows 2012, so migration shouldn’t be a factor but it’s a good thing to look at when I have a chance.
@Charlie, Anit – interesting! I was hoping the installation would be pretty forgiving and not require tuning, but I realize that going with the individual connectors per site is a bad idea. I’m a little hesitant to switch to All since we had a bad accidental experience with that during ramp-up, but once we get proper backups and schedule some downtime I think we’ll move to a single connector and see if that fixes matters.
I’ll post back once we have a chance to try these out!
@Jeff, it would be a lot of task to create individual site level connectors. If you wish to, you are more than welcome. However, i will suggest creating a connector with “ALL”. “Use Case# 2” should be your scenario. How did you identified that the issue is caused due to connector tuning and not server tuning, probably memory tuning? Please look at https://coldfusion.adobe.com/post.cfm/coldfusion-11-iis-connector-tuning#comment-7849D9C1-F887-67D0-F7EFA395EAAF603C for Metric logging.
@Jeff, as to the question about these settings (and in addition to my lengthy comments on many related issues above), I’ll add that I have helped many people where the ONLY change that was necessary was the timeout (60 secs and 60000 ms in each file, as discussed in the blog entry above).
And note that this is about IDLE connection timeout. It has NOTHING to do with how long you think a page may run (many people misconstrue this and are afraid to set the value “this low”.
I’ll add also that folks can use the IIS “worker process monitor” feature (at the server level in the IIS interface) to see when requests are piling up in the app pools (a “worker process” is an instance of a given app pool).
While requests can be shown piling up there because they ARE running in CF for a long time (which would be its own separate problem to solve), if you see them piling up there and there are NO requests running in CF (as shown by CFSTAT, the metrics log, the CF Server Monitor, FusionReactor, SeeFusion, or a java thread monitoring tool), then I’d argue those are idle connections that ought to be timed out. As the blog entry notes, the default is infinite (they say “indefinite” but the Tomcat docs do clarify it should be “infinite”.)
Finally, I’ll note that it has never yet been necessary (in any assistance I’ve offered) to change the pool_size or max_reuse. I think that is a total mystery and until we get diagnostics to know what those should be, I doubt anyone can tell you the “right numbers”. (Please vote up the bug entry I mention above, 3783364, to get Adobe to help us to better monitor and troubleshoot that setting.)
Hi Jeff,
I’ve just been through this, with around 100 websites, and upgrading from cf8 to cf11, using cf10 as an intermediate stage.
While I had multiple issues to work through, the intermittent cf hangs appear to have been because the migration program didn’t update all the jvm references from cf10 to cf11.
Both the java.home and application.home were still pointing to ColdFusion10.
I then checked through all my settings, and found that Java Ext Dirs and /gateway were also pointing to ColdFusion10 addresses.
After correcting these, my webserver has been running for the last 3 weeks without a problem.
There are some other changes you can try, but this would be the first thing to check.
After installing CF11 and having the server crash once or twice a day with the Warning message, I see we need to do some tuning. However, having read through dozens of CF/Tomcat/IIS blog and forum posts I’m still left confused.
We’re running Windows 2012 with CF11 and have about 50 sites running in IIS. We have individual connectors for each site which is where I’m running into trouble, I think. Should I just go back and choose ‘All’?
I initially thought doing each individually would give me better performance customization per site thinking we could give more resources to our more-trafficed sites, but this seems like a huge headache to have to go through each workers.properties file. Also, I’m not sure what I should be using for values with so many sites defined. Should I, going by Use Case #3, eg:
maxThreads=750 in server.xml
leaving only:
max_reuse_connections=1 (connection_pool_size / number of sites)
connection_pool_size=15 (15+15+15+….+15 at 50 sites = 750)
connetion_pool_timeout=60
I realize that Definitely isn’t the correct answer, but I’m not finding any good responses when hosting a large number of sites using multiple connectors. Is this just a bad idea?
And if so, can I just run wsconfig.exe and select All to convert them to one big connection? Then if so, as Tom asked here (https://coldfusion.adobe.com/post.cfm/tuning-coldfusion-10-iis-connector-configuration), what would reasonable values be? I realize there won’t be a specific number here, but ballpark and an explanation would be very helpful.
Thanks for any help and direction!
Hey @Chris, I’ll note that sometimes I get errors going to that site (and some others at Adobe), where I find that if I just go to another browser it works. Something to try.
BTW, I don’t think it’s that there’s a problem with Adobe supporting a given browser. Rather I think it’s a cookie problem, and by going to another browser you may not have there the same cookies as the one that’s failing for you.
Or you can use browser features like “new incognito window” in Chrome, or “new private window” in Firefox, which should open the page without passing in any previously created cookies. I’ve found that to work also when my normal browsing of the page did not. (Just experienced it in FF, and it worked in a “new private window”.)
And finally one could also clear their cookies for Adobe and maybe find it would work normally.
Of course, it could have been just that there was a server-side burp. I see your comment was at 3:11am, which if that’s eastern time may have been a maintenance window for them. To be clear, I was able to visit the page now, at about 11:30am ET.
Hope that’s helpful. And thanks for considering trying to look at those feature requests. 🙂
Hello,
I´ve been monitoring the discussion here and I have also tried to do some changes on our servers, and things is getting more stable.
One thing I had troubles with was the cfstat utility – it gave “strange” results.
It turns out that the CFSTAT settings in neo-metric.xml was the same on all servers (for the setting cfstatport). This is a bug (https://bugbase.adobe.com/index.cfm?event=bug&id=3361929) I don´t know if its still a bug in CF11.
The solution is to change the port address and start cfstat with the port parameter (e.g. cfstat 10 -x -port 7997).
(The connector port in cfadmin also needs to be set to the AJP port in server.xml – that setting must also be updated manually when there is more than one instance of CF running on the same server)
The strange thing about this is that after changing the stat port it seems that Coldfusion (10 Enterprise) is also serving requests much faster after restart. Earlier the reqs would queue up for 2-3 minutes before the server was “stable”, but now I don´t see any delay.
It might be a coincidence (why would these things be connected..)
But it might be worth looking into if you have a similar setup.
@Anit, I’m not sure I agree with your point against changing the defaults that CF creates.
You say “the environment varies from user to user”, and of course that’s true, But again, in all 3 of your use cases you show adding 2 values which are never enabled by default, and you change a 3rd.
Given that, and that you have to pick SOME value for the default, I’m simply asking why you don’t put these other values in as the default initially? I know it doesn’t know, but why not change the connector to do that? (Perhaps you may prefer to respond to that in the feature request I just posted.)
@Anit, thanks for that, and I have now created two feature requests:
https://bugbase.adobe.com/index.cfm?event=selectBug&CFGRIDKEY=3783361
and
https://bugbase.adobe.com/index.cfm?event=selectBug&CFGRIDKEY=3783364
Interested folks should please go vote for and/or comment on them.
@Charlie, It’s not easy to write paragraph(s) as notes on behalf of the users, and we appreciate your concerns w.r.t. ColdFusion and the users.
The max_reuse_connection and the timeout settings are already available during the connector configuration (Web server configuration tool > “Add” > Advanced Settings), but usually users skip that part. The connection_pool_size is an explicit parameter that needs to be added. The value of which can be identified by metric.log or monitoring tools.
The reason, we don’t set the “default value” in worker.properties is, the environment varies from user to user. And that is the reason, I have mentioned clearly in the blog that, “The above use cases are scenario based and may vary from site to site, depending upon the load, architecture and traffic on the site.” Thus, we leave it for the users to tune the values, based on their application and adjust the parameters.
Yes, I agree that the web server configuration tool can be enhanced and the connector tuning could be simpler and optimized. Please log an enhancement request and we will look into it.
Once again, we appreciate everyone’s thought on this one and thank everyone to make the product better and better.
@Charlie – You make some great points. The worst part of running ColdFusion is that you don’t really know what’s going on under the covers. If ColdFusion had some way of alerting or logging these issues and providing recommendations it would certainly make the tuning process a lot easier. The Adobe recommendation of trying a certain setting and then waiting for it to fail sucks because I have clients to answer to. Once the server becomes unresponsive my phone rings off the hook. Again, if ColdFusion was more pro-active I think it would be a lot better. Maybe these features will be considered for ColdFusion 11 or possibly the next version.
@Anit, given that you have confirmed that you are in fact proposing a change to the last 3 lines you list for workers.properties, will you guys be changing how CF configures these value by default, going forward?
I realize not everyone is in any single user case, but you clearly configure the current default values, which again all these recommendations here change, so it seems best to change the default, right? It seems you likely currently create them to default as for a use case 1, so why not set CF to create the values you propose here for that?
And along those same lines, is there any chance you guys may consider evolving the web server configuration tool (and CF installer, which runs it implicitly if external web server integration is selected) so that they prompt folks to consider these various scenarios, to help them set better values than the current default, or even the single new default values as I just suggested?
I realize I asked in the last comment for a tool to help determine better values over time. But this is about at least giving people a heads-up, at initial configuration, to consider the very ideas that you are proposing in this blog entry.
And if you may consider changing that web server configuration tool, could you (while you’re at it) change it so that (if on Windows) you require the user to use “run as administrator”? Right now, people have to KNOW to do that for it to work completely, and people often don’t realize it, to their downfall (as it gives no error or warning. It just may do the job incompletely.).
This idea of the tool detecting whether the user is running as admin (on Windows) is not outlandish: the CF10 Mandatory Update tool (also a java app) did the same thing.
Finally, and perhaps most important, everyone reading this blog entry and applying the changes Anit suggests should note that THOSE CHANGES WILL BE LOST if the web server configuration tool is run again. It always sets them back to the current defaults (which Anit is telling us to change.) This is why I think it’s important that the tool create better defaults, and/or prompt us to consider use cases and make better default choices.
But perhaps better still, related to this last point, perhaps the tools should detect if users have made non-standard changes to these files (workers.properties and server.xml) with respect to these settings, and it should prompt them to either preserve them or let them be overwritten.
As I said in the last comment, the problems caused for folks by the issues raised in this blog entry are really hurting a lot of CF10 and 11 users. Desperate times call for desperate measures: the web server config tool should be made a LOT smarter to help with this problem–and not hurt it by losing the hard-won tweaks that people may make.
@Anit, first, thanks for responding to my observation and adding that additional commentary in the blog. Great to see you responding (quickly and positively) to such suggestions. Indeed, thanks for being so responsive to all the questions folks are asking.
This seems to be an issue that is striking a LOT of people, whether on CF10 or CF11.
**One of the challenges, I think, is that we currently have no insight into the connection pooling: no metrics at all, whether from CF or Tomcat (or the web server, whether IIS or Apache) to tell us how many connections are used, by what site, etc.**
If we had that info, I think it would be MUCH easier both for people to understand then when they are having the problem, and why, as well as perhaps how better to try to address it with the tuning knobs discussed here.
Can you guys either expose something, or show folks how they could? I’ve tried to do a little digging but so far to no avail.
For now, trying to do these web server tweaks is like the early days of CF6 and above, when it was first based on Java, and people were often finding their servers hanging up in ways it had not with CF 5 and before. This was before CF8 added the CF Server Monitor or tools like FusionReactor or SeeFusion became available, where we could really see what was going on.
And instead people were for years taking stabs in the dark to solve problems: “change this jvm argument, change your heap size, change your gc algorithm, change your CF admin settings”.
Only when we had better diagnostics to help us understand what was REALLY going on in our servers. Perhaps it was that we were hitting the heap size limit, and now we had tools to tell us that (many could not get JVM-provided tools working for various reasons). And even then, with better tools they found out WHY the heap was high: too many sessions (often from spiders), or heavy use of query caching, or ehcache caching, etc.
Or perhaps a different cause of a CF outage was hung requests, perhaps waiting for a CFHTTP call or CFQUERY, etc, which had nothing to do with jvm tweaking.
My point is, only with better diagnostics did we really tend to get to the bottom of the problems. (Sadly, many still operate in the “point and shoot” mode of troubleshooting, and don’t avail themselves of those valuable tools.)
So I think we’re in the same boat here, and I REALLY, REALLY hope that you (Adobe) or someone in the community will help us to really see what the true situation is on our servers with respect to these connections. Right now, it’s again “throwing darts against the wall hoping to pop the balloon with a prize”. We might win, we may have to keep throwing, and people are getting frustrated with that approach.
Most important, this is only people who are finding this blog entry (or the old one for CF10) and trying these settings. There are many (perhaps most using CF 10 and 11) who may have this problem and NOT know of this blog entry.
We really need CF not only to help us diagnose the problem, but even better would be if it would help us in some PROACTIVE way. Perhaps tracking the key connection metrics over time and providing a report in either the logs or the CF Admin (or even an email) to pre-empt the problem by recommending tweaks based on analysis over time.
I realize the latter idea is total pie-in-the-sky dreaming. But this problem is so significant for so many, that it seems radical reconsideration of the problem and its solutions is warranted. Until them, anything will be better than the current situation.
I have some more thoughts related to the tuning process and the blog entry, but I’ll raise them in a separate comment, as I’m sure I lost some readers 10 paragraphs ago. 🙁
I recently enabled metrics and have been monitoring the log for over a week now. My log seems to be a bit different from what has been reported above by @Anit.
For instance, please see a line from my log below:
“Information”,”scheduler-1″,”06/27/14″,”08:10:14″,,”Max threads: 150 Current thread count: 0 Current thread busy: 0 Max processing time: 0 Request count: 0 Error count: 0 Bytes received: 0 Bytes sent: 0 Free memory: 1746275528 Total memory: 2081423360 Active Sessions: 111″
The “Current Thread Count”, “Current Thread Busy”, “Max processing Time”, “Request Count”, “Error Count”, “Bytes Received”, and “Bytes Sent” fields are always 0. Is there another setting that has to be enabled in order to record that data?
The memory and Active Session columns appear to be accurately recording data.
@Charlie, I have added additional information for scenario 1, so that, it adds more sense to the users. Thanks for your suggestion and we welcome your inputs on this.
@Dave Cordes, As mentioned above, In such scenarios, keep the max_reuse_connection to 200 (default value). To start with, increase the connection_pool_size to a value between 800-1000 and monitor your site. If you experience “Service unavailable”, gradually increase the pool size and tune the connector.
Once you are able to identify an appropriate value for connection_pool_size, then set the max_reuse_connection to somewhere between 100 to 200. If needed, increase it to a higher value, and monitor the site, till it becomes stable.
Note: Remember to set the maxThreads in server.xml, as you change the connection_pool_size.
@Carl, Log rotation can be implemented for connector.
You can use both log_rotationtime and log_filesize, as per your requirement. This should not impact the performance, unless the logs become larger in size and occupy memory highly.
@Anit – Do you think rotating the connector log would be useful to apply? One could apply log rotating to minimise the log size which makes it easier to read either current or past log. Does a large log have an impact on performance over time once it grows say beyond 200Mb?
EG sample isapi_redirect.properties:
extension_uri= /jakarta/isapi_redirect.dll
log_file= c:ColdFusion11configwsconfig1isapi_redirect_%d-%m-%Y.log
log_level= info
log_rotationtime= 86400
worker_file= c:ColdFusion11configwsconfig1workers.properties
worker_mount_file= c:ColdFusion11configwsconfig1uriworkermap.properties
iis_buffer_enable= true
where log_rotationtime is specified in seconds so 24 hours in this case.
Thanks in advance, Carl M.
I’m still confused about Use Case #2. If I have about 50 websites on my server and my connection pool size set to 1,000. Doing the math I should set my max reuse connections to 20. However, the majority of the websites on my server don’t get a ton of traffic so I adjusted the value higher to 50. My thoughts are I could most likely set it to even higher like 100 but I don’t know if that would be a bad decision or not. Anyone have an environment like myself and can comment on some appropriate settings?
Folks (Anit, and everyone reading this), about scenario 1 above, beware that it’s not at all clear that the changes there entail a change from the default values currently in those files on initial configuration.
Anit, I really think it would help people a lot if you made that more clear, even if just a sentence after the displayed changes. I have talked to more than one person who has misinterpreted things (they did not compare it to their current settings, so did not realize they ought to make any change at all, even if theirs was scenario 1).
This is less an issue in scenario 2, as the text conveys a need to think through what values to use.
I have some more thoughts on all this, again from helping people deal with trying to use the info, which I’ll share later (hopefully today).
@Anit – Interesting, thanks for sharing.
So for my situation, where 99% of entries over a 120 hour period are as above and only 1 entry is > 9, your analysis would infer a value for connection_pool_size of, say, 12?
When you say “tune it further if needed” do you mean, if you noticed that the “Current thread count” was starting to hit your new “connection_pool_size” value you would revise it upwards?
Does the “Active Sessions” value have any bearing? I see yours are in the same order as your “Current thread count” value. Mine are 1200-2500. Does that sound right? E.g.
“Information”,”scheduler-2″,”06/16/14″,”07:47:04″,,”Max threads: 150 Current thread count: 4 Current thread busy: 0 Max processing time: 83425 Request count: 9072 Error count: 72 Bytes received: 1649 Bytes sent: 22768583 Free memory: 306065064 Total memory: 1052966912 Active Sessions: 1218″
“Information”,”scheduler-2″,”06/16/14″,”08:37:04″,,”Max threads: 150 Current thread count: 7 Current thread busy: 2 Max processing time: 84617 Request count: 9133 Error count: 72 Bytes received: 1649 Bytes sent: 22920761 Free memory: 147950448 Total memory: 1051000832 Active Sessions: 2565″
I appreciate your support here but if there’s a better medium for this discussion let me know.
Below is a sample output of metric log
“Information”,”scheduler-0″,”01/02/14″,”14:11:23″,,”Max threads: 500 Current thread count: 85 Current thread busy: 12 Max processing time: 29266063 Request count: 31045 Error count: 583 Bytes received: 642863 Bytes sent: 1061422778 Free memory: 529568096 Total memory: 1801977856 Active Sessions: 81″
“Information”,”scheduler-3″,”01/02/14″,”14:12:23″,,”Max threads: 500 Current thread count: 85 Current thread busy: 4 Max processing time: 29679670 Request count: 31682 Error count: 584 Bytes received: 653848 Bytes sent: 1085861435 Free memory: 256575784 Total memory: 1803091968 Active Sessions: 80″
The Current thread count gives you an estimated value for connection pool size. For e.g, in the above scenario, based on the two threads, I would set the connection_pool_size >100 (more than 85). And then i will tune it further, if needed.
@Anit – Thanks for the prompt response.
Does Adobe have any writings on how to interpret the metrics.log output and how that output would influence the “connection pool size” setting? After several hours of logging all I’m seeing is loads of…
Max threads: 150 Current thread count: 4 Current thread busy: 0
@Richard “max_reuse_connections” is not a function of “connection_pool_size”, they are dependent on each other. Change in one, would result in change in the settings of other. Setting a value “very high” would leave the resources engaged, even when they are not required. On the other hand, setting a value “too low”, would establish a new connection for every request, instead of using the existing connection (as there are none or less available). This would impact the response time.
Connection pool size is determined by your developer, based on the connection requests you are receiving. If you are not able to get an idea about the relevant/nearest value, then enable Metric logging (http://helpx.adobe.com/coldfusion/kb/coldfusion10-enable-metric-logging.html). This is the same for ColdFusion 11 as well. This would give you an idea about the “connection pool size”. And then based on that, you can tune the connection_pool_size.
“connection_pool_timeout” indeed is a very important parameter. Extreme high values, will queue up the requests, without timing them out and the server will hang, throwing “Service Unavailable”. A “very low” value in turn will kill the request, without even processing them.
Given that the suggested value of “max_reuse_connections” is a function of “connection_pool_size”, how do you determine a value for “connection_pool_size”? What would be the consequences of setting it to a ridiculous value of “100000” or “2”? What factors influence the choice of the “connection_pool_timeout” value and what would be the consequences of similar extreme values?
@Helge, If you set the max_reuse_connection value to the least, your site will still work. But, every time a new request arrives, a new thread would be created (although existing threads are available). That would impact the performance.
In your scenario, keep the max_reuse_connection to 200(default value). To start with, increase the connection_pool_size to a value between 800-1000 and monitor your site. If you experience “Service unavailable”, gradually increase the pool size and tune the connector.
Note: Remember to set the maxThreads in server.xml, as you change the connection_pool_size.
My scenario is:
300+ sites on IIS 8 where some have very low traffic (even zero traffic) and some have high traffic.
Connector created with “ALL” and multiple sites in IIS
It´s impossible to know which site has the most traffic as some sites are running events where traffic is very high in short periods of time (during registration etc.)
My setup now is
worker.CL.max_reuse_connections=100
worker.CL.connection_pool_size=600
worker.CL.connection_pool_timeout=60
If I were to follow the “formula” my max_reuse_connections would be very low (say 1-2) or my connection_pool_size would be very high.
So what would your advice be on such scenarios?
Thanks,
Helge
Regarding the Use Case3, the site with heavy traffic and load should follow the steps mentioned above (max_reuse_connections= connection_pool_size / {no of site}). Sites with low traffic, configured with same instance, should use the standard parameters or the one mentioned in the Use Case3 above, as the traffic is less.
reuse_connections keeps the connection active, so that for the next request, the same connection can be used again, where as, connection_pool_size is a pool of connection that can be used, when multiple requests comes to connector. So one is no of connections in a pool and the other one is how many connections can be reused. Definitely, they are different from each other.
The ajp13 protocol is packet-oriented. The web server communicates with the servlet container over TCP connections. To cut down on the expensive process of socket creation, the web server will attempt to maintain persistent TCP connections to the servlet container, and to reuse a connection for multiple request/response cycles.
There is a reuse flag in AJP Tomcat, which is set to true. This TCP connection is then used to handle new incoming requests, within the same already establised connection, keeping it active. Since the “reuse” value is set to true for ColdFusion, there needs to be a parameter, that should take care of the maximum value for the same. That is called as “max_reuse_connections”
I don’t understand max_reuse_connections. how is it different from connection_pool_size?
why is it connection_pool_size / { number of sites}
why does Use Case 3 not follow this rulw for sites 2 & 3?
Is max_reuse_connections a custom Adobe extension as it is not on the AJP connector page
You must be logged in to post a comment.