Skip to content
November 21, 2015 / kiranpatils

Your Content Delivery Server is not starting up?

Challenge:

This week we faced strange issue. After doing a deployment one of our production server/content delivery server was not coming up. It was generating 3 KB log file and getting crashed. We were eager to see Homepage of site. But whatever efforts we did all was in vain. We were sure nothing is related to deployment. Because same code has been deployed on one of the CD Server, and it was working absolutely fine.

Finally, we have been able to make it up. Still we don’t know root cause and we are in touch with Sitecore folks. But I am sure you will enjoy the journey how we reached to find an issue and how we’ve been able to resolve it.

Solution:

It was Hard crash and checking upon Event logs (system) we found following entries:

A process serving application pool ‘ABC’ suffered a fatal communication error with the Windows Process Activation Service. The process id was ‘8008’. The data field contains the error number.

Application popup: w3wp.exe – Application Error : The exception unknown software exception (0xe0434352) occurred in the application at location 0x0760aea8.

A process serving application pool ‘ABC’ suffered a fatal communication error with the Windows Process Activation Service. The process id was ‘5324’. The data field contains the error number.

A process serving application pool ‘ABC’ suffered a fatal communication error with the Windows Process Activation Service. The process id was ‘3776’. The data field contains the error number.

We tried following things:

  1. Disabled custom modules
  2. Generate Recycle Event log entry settings has been changed as per this blog : http://blogs.msdn.com/b/webapps/archive/2014/10/29/suppress-was-warnings-due-to-regular-iis-application-pool-recycling.aspx
  3. Given rights to appropriate folder (Somehow it was different from other Content Delivery server. So, please be aware if you do anything with folder rights on web servers): http://stackoverflow.com/questions/11010807/application-pool-defaultapppool-is-being-automatically-disabled-due-to-a-serie
  4. Changed rapid failure to 10 Minutes : http://geekswithblogs.net/VBTips/archive/2011/09/30/application-pool-crashing-issue.aspx
  5. We cleaned up records from following tables :
    1. EventQueue
    2. History
    3. PublishQueue

But nothing helped. Then we Configured a crash dump and analyzed Second chance exceptions:
To analyze this further we installed DebugDiag and generated crash dump. And we found following error:

In w3wp__ABCWCMS__PID__4156__Date__11_19_2015__Time_09_42_20AM__967__Second_Chance_Exception_E0434352.dmp an unhandled .net exception happened on 71 and has caused the process to crash.

When an unhandled exception is thrown in a Microsoft ASP.NET-based application that is built on the Microsoft .NET Framework 2.0, the application unexpectedly quits because the default policy for unhandled exceptions has changed in the .NET Framework 2.0. By default, the policy for unhandled exceptions is to end the worker process.

Exception Details
System.AccessViolationException
Attempted to read or write protected memory. This is often an indication that other memory is corrupt.

at Sitecore.Pipelines.HttpRequest.BeginDiagnostics.Process(HttpRequestArgs args)
at (Object , Object[] )
at Sitecore.Pipelines.CorePipeline.Run(PipelineArgs args)
at Sitecore.Nexus.Web.HttpModule.(Object , EventArgs )

at System.Web.HttpApplication.SyncEventExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute()
at System.Web.HttpApplication.ExecuteStep(IExecutionStep step, Boolean& completedSynchronously)
at System.Web.HttpApplication.PipelineStepManager.ResumeSteps(Exception error)
at System.Web.HttpApplication.BeginProcessRequestNotification(HttpContext context, AsyncCallback cb)
at System.Web.HttpRuntime.ProcessRequestNotificationPrivate(IIS7WorkerRequest wr, HttpContext context)
at System.Web.Hosting.PipelineRuntime.ProcessRequestNotificationHelper(IntPtr rootedObjectsPointer, IntPtr nativeRequestContext, IntPtr moduleData, Int32 flags)
at System.Web.Hosting.PipelineRuntime.ProcessRequestNotification(IntPtr rootedObjectsPointer, IntPtr nativeRequestContext, IntPtr moduleData, Int32 flags)

It looks like something happens when Sitecore’s Nexus module starts up. Dump analysis pointed us to following links:

  1. https://support.microsoft.com/en-us/kb/911816
  2. http://blogs.msdn.com/b/tess/archive/2006/04/27/584927.aspx

Basically, It happens with the code that is getting executed on Non HttpContext (ThreadPool.QueueUserWorkItem) and has some unhandled exception then it will restart application pool. Links suggested two solutions:

  1. Write a code to handle those errors : This is a good approach
  2. Switch back to legacy exception handling mechanism : This is okay approach

As of now we have switched to use legacyexceptionhandling mechanism. And in touch with Sitecore folks to find out root cause of the problem. Will circle back with you guys!

If you would like to know more about CrashDump, DebugDiag and Analyze it. I would suggest read following links:

  1. https://kb.sitecore.net/articles/499200
  2. https://kb.sitecore.net/articles/488758
  3. http://blogs.msdn.com/b/tess/archive/2009/03/20/debugging-a-net-crash-with-rules-in-debug-diag.aspx
  4. http://blogs.msdn.com/b/tess/archive/tags/debugging+labs/

Or give me a shout. Will help you!

Happy Sitecoring! 🙂

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: