Hello Readers,

Trust you are doing well.

Today, I am going to discuss an interesting issue we encountered in one of our 4-node production RAC databases. We experienced intermittent connection failures with the error ORA-12537: TNS connection closed from application servers. This setup was running on Solaris SPARC machines.

Upon analyzing the issue, we found that it occurred significantly during peak periods. The issue was not limited to application services; we also received an alert from OEM indicating that a DB connection from the DBSNMP user to the database failed once during peak period.

Initially, we suspected a firewall timeout, but since there was no firewall between the application server and the database, we ruled out this possibility.

Further investigation revealed that none of the scan listener logs showed any traces related to this error. However, the local listener on node 2 had a flood of “TNS-12518/TNS-12547/TNS-00517” errors during peak time, directing our focus to node 2. According to the application configuration, certain services were bound to node 2, resulting in a significantly higher number of application connections on that node compared to the others.

Upon discussion with the application team, we learned they used to start additional JVMs during peak periods to handle the increased connection requests. This indicated that some limit was being reached at the DB/OS level, causing the local listener to refuse connections after a certain threshold was met.

Related errors:

  • ORA-12537: TNS connection closed
  • TNS-12518: TNS:listener could not hand off client connection
  • TNS-12547: TNS:lost contact
  • TNS-12560: TNS:protocol adapter error
  • TNS-00517: Lost contact
  • Solaris Error: 32: Broken pipe

Constant monitoring of the number of connections, processes, and sessions revealed that the local listener would refuse or close connections once the value reached approx. 8,000.

In Solaris, the resource controls facility is configured through the project database. The values for each resource control are enclosed in parentheses and appear as plain text separated by commas. These values form an “action clause,” which consists of a privilege level, a threshold value, and an action associated with that threshold. The “project.max-port-ids” limit defines the maximum allowable number of event ports. The default threshold for this value in Solaris 11 is 8192.

During our investigation, we found that this value was intermittently reaching its maximum limit, which could be causing the local listener to refuse or close connections. 

The following command can be used to capture details about the current port usage:

prctl <pid of local listener> | ggrep -A 3 project.max-port-ids


On node 2, port IDs associated with local listener process were intermittently reaching 8.19K, while on the other nodes, this value remained well below 6K. Consequently, no connection rejections or refusals were observed on the other nodes.


Based on our analysis, we recommended increasing the project.max-port-ids value from 8K to 16K on the production DB server node 2 to address the issue of connection rejections by local listeners. After implementing this change, we did not observe any further connection issues, indicating that the solution worked as expected.

Concurrently, we engaged the application team to investigate the application/middleware tier for potential connection pool changes or connection leaks.

Recommended Reads:

How to Use the prctl Command to Display Default Resource Control Values.
How to Use prctl to Temporarily Change a Value.
How to Use prctl to Display, Replace, and Verify the Value of a Control on a Project.
Resolving OS system dependent operation:pwportcr failed with status: 11 issues on Solaris (Doc ID 2359700.1)

Let me know for any questions and any further information in comments or LinkedIn.

Regards
Adityanath.

Advertisements

Leave a comment

Advertisements
Blog Stats

560,798 hits

Advertisements
Advertisements