Skip to content
July 22, 2011 / kiranpatils

Does your CD servers crash?

Challenge:

During my experience with Sitecore till date, I have seen few issues where CD Servers were crashing and after doing lot of R&D, efforts etc. I am able to fix them. So, Just wanted to share my findings with you!

If your CD Servers are crashing, then this post is for you! — It may give you solution or pointers towards solution!

Solution:

Before I get on to main thing, let me first explain the difference between “Hard Crash” and “Soft Crash”:

Monitor Sitecore log files for any application restarts. If a new log file is generated, this often means that a restart has occurred. Identify the reason for the restart and address it.

  • If the log file ends with a “Sitecore shutting down” message – this is a “soft crash” and for some reason the hosting environment has forced the process to recycle (for example, due to some scheduled recycling or some critical file changes).
  • If the log file does not end with a “Sitecore shutting down” message – this a “hard crash” due to for example a stack overflow error or some worker thread deadlocks. A good solution could be to collect a process crash dump and analyze it with WinDbg.

Above content has been copied from Sitecore scrapbook entry and it explains everything!]

There may be one or other reason, due to which your Sitecore CD Servers crash, let me explain them one by one:

1. hostingenvironment initiated shutdown : This falls in category of “Soft Crash” category and you will be able to see this error message in your Sitecore log file.

–  ASP.NET Worker process restarted — It comes when ASP.NET worker process gets restarted, automatically by IIS due to Application Pool settings or any webroot change which causes ASP.NET Worker process to restart:

  • Check Application Pool settings : To solve this error, first you can verify your Application Pool settings and verify that does any setting affected this restart? We have found one setting which was causing this error and that setting is Idle time configured in IIS — IIS restarts the application pool if worker process is idle for X amount of time. This will not be the case for the Sitecore instance, which gets lots of requests throughout a day. But if it is not the case for you then you should use Sitecore’s URL Agent — how to do it? Please read my earlier blog post which explains it in detail! [following DRY rule — Don’t Repeat Yourself!]
  • Check ASP.NET worker process restart reasons : Second, you should check that does anything happened which caused ASP.NET Worker process to restart?  — Read my another blog post which explains the reasons in which case ASP.NET Worker process restart?

2. OutOfMemoryException – This falls in “Hard Crash” category, where your Sitecore instance will restart abruptly and  when you open the Sitecore log files, it will show you lots of OutOfMemoryException errors, let’s see how we can get out of it:

  • Check caching configurations : It may happen that aggressively, you might have configured Cache size too big without considering your RAM size. So, please ensure caching configurations are perfect!

You can use my shared source module : CacheTuner and configure MemoryThreshold value in Web.config to 60% of your RAM size, So, worker process will not cross the limit, and if it does so, it will do the log entry for you!

  • Check IIS is running in X64 Mode : This was the another reason which we faced. One of our CD server was crashing abruptly, and frankly it took our lots of time to find the root cause. Basically, we did the lot of R&D on this server and we forgot that We were running our IIS on 32 bit mode — and Sitecore also recommends that you should run your IIS on 64 BIT mode only. So, you will have better performance! So, how to check it? Nice scrapbook entry : – http://sdn.sitecore.net/scrapbook/64%20bit%20environments.aspx
  • Get Crash Dump and analyze it :  And if nothing works for you from above methods. Good to install DebugDiagnostics tool, get the dumps and analyze it! [Frankly, too complex task to do! All the best!]

3. Check Memory usage : This is the main problem we faced on our CD servers, we configured our CD servers RAM when we started  the project and our DB size was also not too BIG. But even after lots of years we have never increased the RAM size and the DB size has become too BIG till that time. So, to analyze the situation we configured Performance counters on CD servers and after that analyzed it [Do you know PAL is a nice tool which helps you to analyze the performance counters easily! — I would strongly recommend to use it!]. If you see the Memory is always under pressure — Either it can be memory leak OR Order a RAM!

Digging a CD crash, is the too funny task! [You need to have so much patience to solve it — Because for days and days you don’t get hint what’s going wrong and where to check for the problem and solution? 🙂 ]

Special thanks to Sitecore support team [Especially Maxim Savelyev] for helping us!

I hope I covered all the cases which causes CD servers to crash — You got anything new? Please do share with everyone!

Happy CD Crash Stopping! 🙂

Advertisements

5 Comments

Leave a Comment
  1. link / May 27 2013 9:56 am

    My programmer is trying to convince me to move to .
    net from PHP. I have always disliked the idea because of the costs.
    But he’s tryiong none the less. I’ve been using WordPress on a
    number of websites for about a year and am anxious about switching to another platform.
    I have heard excellent things about blogengine.net.

    Is there a way I can transfer all my wordpress posts into it?

    Any help would be greatly appreciated!

    • kiranpatils / Jun 1 2013 10:07 pm

      I think yes, just export your WP posts and import it in to blogengin.net

      Kiran

  2. Kapil Naker / Feb 3 2015 12:33 pm

    We had ran into very sporadic issue of connection timeout and CD site gets crashed.When everything failed to fix this sporadic issue, we started checking our network settings and we realized that Infrastructure team had made one mistake. We had 2 DB servers one for core,master and web dbs and the other was for Analytics. So Infra team had cloned the Analytics db server but forgot to change the Mac address of the cloned db server. hence both db servers have the same Mac address, due to which we used to get network related errors and connection time outs. We changed the Mac address of the cloned server and it resolved the issue.

    Hope this information helps!!.

    • kiranpatils / Feb 26 2015 12:15 am

      Thanks Mr. Kapil for sharing your learnings! Really helpful!

Trackbacks

  1. Does your Web Application on Production Servers Crash Unexpectedly? « .Net, Sitecore and Automation

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: