Archive for the ‘iis’ Category

scom web application monitoring – making it useful – part 1

August 30, 2010

I could go on for days about SCOM and the URL monitoring and how it needs to be improved. Honestly.. it kinda sucks. So here I will attempt to describe what I think is wrong with it and how I work around it. The items in bold below are what I feel like are failures in the way this was designed.

Also I am not writing this as strictly a “how to monitor a web app” post, there are already plenty of those. This is just about the changes required to make this useful. Here is a good article with the basics on setting up a web application monitor in SCOM.

  • Requirements

To begin with, you will need to figure out what you need to monitor. In many cases it is simple enough to pull up the main page of a website and as long as it comes up, is in a reasonable timeframe, and is giving an HTTP status code of 200, you’re OK. This sort of monitoring is useful, but you can do so much more in order to get a lot more out of it. What I like to do is get the devs to code you up something special through some sort of bribery or blackmail. In our case what they did was define 5 business processes, for example “make a payment” and create a page that does the back end work of making that transaction but also the other end of the work which is cleaning up after itself. What you will get in the end isn’t exactly user experience, but it’s a good way to track the ongoing performance of a process relative to itself, and it’s a very good up/down indicator. Since we have dev environments as well, I have those on a development scom server, and I have the below web monitoring in place there as well in the first production like environment. This allows our QA folks to compare state and response time and see if the environment is working before they release code or start a test, but also they can see the impact of the new code by comparing response times from before and after the code release.

  • Once you have your URL’s, it’s time to get to work.

Create a web application monitor and give it your URL. The problem with those default settings is that by default you are only logging the transaction response time and not alerting on it. From an alert standpoint, there is no timeout for your web request, matter of fact, the only thing SCOM will tell you out of the box is just if it was eventually able to pull up a URL as long as it doesn’t have an HTTP response code > 400. This default setting is not useful!

To fix this, what you want to do is add response time criteria like this.

image

Because of a problem with the service level dashboard that I will explain later, I only put one HTTP request in each web application monitor. This brings me to a little UI weirdness here because you can also set response times in the “configure settings” for the specific URL pull like this.

 image

I always leave this performance criteria blank because I can see the other one easier and get more out of it. This one here just seems redundant.

  • Seeing the data

Now once you gather some data you will want to, well, see what’s going on. In order to do this, create a new performance view in the monitoring console and scope it to “collected by specific rules”, and then you get to go manually pick your rules. This is where Microsoft fails again, because the list of rules is not searchable and they all have arbitrary names. For web requests I figured out they are called “Performance Collection: Transaction response time total for Name of web app monitor”. like this screenshot.

image

Now that you have done that, you will be able to see a nice blank performance chart with some stuff to check.

image

Now when we pick one, we get a pretty graph like this.

image

This brings me to my next issue with all of this.. it’s that the performance chart settings are user specific.. meaning I cannot create a view of any sort that contains performance information and have the counters checked already. No matter which ones I put in, and it doesn’t matter if you are using a performance view or even a dashboard view that contains a performance view, those have to be selected every time. This is a pain!

This also means that if you wanted to say, get fancy with a URL to a specific view, you cannot just create one of these and have folks click the link and end up at a pretty performance chart with the counters already checked. The fact that you cannot do this is a serious limitation with SCOM, IMO.

  • setting up alert parameters (what you cannot change)

You will likely have to play with the values a bit in order to get them not to false alert. And this brings me to my next problem with SCOM web monitoring, it’s that you cannot change anything about how it samples other than where it is from (what host) and how often it samples. What I would love to do is be able to say “only alert when two consecutive thresholds are exceeded”, but that’s not an option. We get a lot of failures at night during our backup window that cause a single transaction to go out of SLA, and we get alerts based on that. As a result, we have to set our thresholds for response time to the highest level it could possibly be so that we aren’t false alerted every night, but this makes it so high that the alerting becomes less useful during the daytime. As of now I do not have a workaround for this.

  • stopping duplicate alerts

When you do get your first alert you will see that two are sent.. one for the URL pull and one for the aggregate monitor on the web application monitor. This doesn’t really make sense to me why this would be set up this way at all, so let’s fix it.

Start by right clicking on one of the alerts and open the health explorer for it. Expand it out and you will see something like this.

image

Each of the red lines has an alert set up for it, and the lower one for the actual request rolls up into the web application one. In my mind the web application one is redundant, so I am going to disable it. Right click, choose “monitor properties”, go to alerting, and uncheck it.

image

Now you will receive one alert instead of two.

  • useful alert details

Of course the text of the alerts isn’t useful at all out of the box (it doesn’t tell you if the URL failed for time, SSL, http response, or anything). I am using this article as a basis for fixing this, but I don’t have it totally worked out yet. This will continue to require some further tweaking.

This post ended up being longer than I intended (there’s a lot to fix) so I am going to break it up into two parts and get the service level dashboard stuff into a 2nd post.

scom bug with the service level dashboard

August 30, 2010

I have a web application URL monitor here and I am attempting to remove a performance counter on which I have a service level objective set. Because of this, if I try to delete the counter from the web application monitor… I cannot and I receive this error.

image

Since I just made these service level objectives recently, I was able to quickly figure this one out, but the product really should handle this condition more gracefully.

scom and redirects to views

August 24, 2010

I am probably the only person on the planet doing things this way, but I want to document this anyway.

In the scom web console, I am publishing views for various bits of the business. Then I am using this procedure mentioned here. This is not perfect because you cannot save the way you want a performance view to look (which counters are checked) but it at least is a start. (Hello Microsoft! Please fix!) In order for the whole thing to be easily memorable, I create a url like http://weberrors, and this error contains… the website errors.

How I am doing this is pretty easy.

  1. On any server really with IIS (I use my scom server), I create a new website.
  2. For the site name I use redirect.weberrors so that I know what it is for (I have quite a few of these)
  3. Create the path inside IIS as c:inetpubredirect.weberrors
  4. for binding I use "http” and “all unassigned”, but you have to enter a host name, which is “weberrors” for my example
  5. once the site is created, click on it and look for “http redirect” on the right hand side.
  6. click “redirect requests to this destination” and input the URL you made from the link at the top of this post
  7. it is also important that you check “redirect all requests to exact destination”, if you do not, see the note at the bottom
  8. now the IIS part is all done, open up DNS for your domain
  9. create a new entry, CNAME and make “weberrors” CNAME to your IIS server
  10. once everything replicates, folks in your company will be able to type in “weberrors” in their browser and see the errors

This is a pretty simple thing and it makes navigating to specific spots in the SCOM UI much easier.

 

Note: If you do not check the box for “redirect all requests to exact destination” then when IIS redirects it will add an extra slash “/” to your URL. Scom does not like this! You will get an error:

  • Unfortunately the "Name of your view" view cannot be displayed.

All you have to do is what’s in step #7 above. That’ll make the redirects do their thing properly.

.net health monitoring

July 20, 2010

This is a little blurb I use almost everywhere for almost everything that will log all sorts of useful info about a .net app in the application log. It will grab unhandled exceptions as well as application lifetime events (app pool or domain restarts, etc.) This is a really good one to use when your devs won’t add this to the code themselves! It will work (or has for me) straight up in any .net code. All you do is place this in the web.config.

<healthMonitoring enabled="true">

      <eventMappings>

        <clear />

        <!– Log ALL error events –>

        <add name="All Errors" type="System.Web.Management.WebBaseErrorEvent" startEventCode="0" endEventCode="2147483647" />

        <!– Log application startup/shutdown events –>

        <add name="Application Lifetime Events" type="System.Web.Management.WebApplicationLifetimeEvent" startEventCode="0" endEventCode="2147483647"/>

      </eventMappings>

      <rules>

        <clear />

        <add name="Application Events" eventName="Application Lifetime Events" provider="EventLogProvider" profile="Default" minInstances="1" maxLimit="Infinite" minInterval="00:01:00" custom="" />

        <add name="All Errors Default" eventName="All Errors" provider="EventLogProvider" profile="Default" minInstances="1" maxLimit="Infinite" minInterval="00:00:00" />

      </rules>

    </healthMonitoring>

SCOM 2007 R2 – workgroup/DMZ server notes

July 1, 2010

This is harder than it should be. Here are my notes on doing this.

1. On cert server go here: http://blah/certsrv/

2. request cert. choose type other and paste in the below OID

3. OID = 1.3.6.1.5.5.7.3.1,1.3.6.1.5.5.7.3.2

4. Make sure to check key exportable. Make sure to use FQDN of server for name and common name.

5. Open up server mgt for certificate manager and approve.

6. Go back to website, install the cert.

7. Mmc, certificates for personal. Export the cert. make private key exportable.

8. Copy cert to client server.

9. On server do mmc for client, import cert, mark as exportable.

10. Run momcertimport on client, choose cert.

11. Restart system center manager service on client.

12. Wait a min and go to mom console, administration, pending management. Approve it.

13. Done!

Adsiedit.msc – where is it?

May 26, 2010

I was trying to use this and did not realize it was not installed. In order to get it you need to install the support tools. They are on the windows server 2003 cd, or can be downloaded here.

IIS6 potential gotcha

April 14, 2010

If you install .net 2.0 before you install IIS6 then you will not be able to see .net 2 in the allowed extensions list because it  needs to register with IIS. Here’s a screenshot where it is missing:

image

 

To resolve this, go to the following directory:

c:windowsMicrosoft.NetFrameworkv2.0.50727

Then run this command:

aspnet_regiis.exe –i

Now you can refresh the MMC and enable .net 2.0!

windows dns server and EDns – update (added namebench info/link)

March 30, 2010

I have had an issue with DNS server in Windows 2003 server previously that’s covered pretty well in this article by my buddy Marcus. The short version is that EDNS is enabled by default on 2003 server, and this doesn’t play well with the rest of the internet, so it’s best to turn it off if you are using windows 2003 for external (internet) DNS.

Right now I’m working on a Windows 2008 R2 server and was having similar problems that made me check for EDNS many moons ago on 2003 server. This link came up in a search and it says that they made EDNS off in 2008 RTM, but it’s back on again in R2. He includes a link to Microsoft’s KB article about EDNS. 

Luckily this is pretty easy to turn off. All you do is run this command:

dnscmd /config /enableednsprobes 0

I wanted to update this post with a link to a cool tool I have been using. It’s called name bench and it’s a DNS benchmarking tool. Works good, does exactly what you want, and the price is right.

a very cool audit utility

March 10, 2010

I have been using a very cool utility for a while now that I just realized I failed to plug, it’s called Open Audit. What it does is basically is run a wmi scan on your network (it will do nmap too) and submit that data back to a mysql database through a web service. Then there’s a fancy UI that you can do searches, queries, etc.. You can get cool stuff like hardware type, see how many sticks of ram, or an IP address, or a driver, or a hotfix.

The application is written in PHP and mysql and I run the application on a windows host (it can run on just about anything) and use XAMPP, which is a pretty cool. It’s a single download that contains apache, php and mysql, all preconfigured and ready to rock and roll. All you need is some minor configuration.

While it’s pretty cool as it comes, the real power is that you can modify it all you want. So what you can do is take one of the default “views” such as list_viewdef_all_servers.php. If you open it, there’s a sql query inside that looks like this:

SELECT * FROM `system` WHERE (system_os_name LIKE ‘%Server%’)

Then you can copy/rename the page and modify that query however you like. Here’s a modification that I made so I could find servers on our internal (but shouldn’t be) net.

SELECT * FROM `system` WHERE (system_os_name LIKE ‘%Server%’) AND net_ip_address LIKE ‘192.100%’

After doing this for every custom query that I wanted, I realized that all of this data is in a mysql database. This allows you to run queries straight up, and since you used XAMPP, well you can then log in there, choose your database, find the query window, and paste your sql query right there and get the results on the spot. It’s pretty cool.

This now leads me to a gotcha I encountered today while doing my own query this way. What I had was a query that looked like this:

SELECT * FROM `system` WHERE (system_os_name LIKE ‘%Server%’) AND net_ip_address LIKE ‘192.100%’

And I was trying to find a subnet that was 10.1.13, so I made the query like this:

SELECT * FROM `system` WHERE (system_os_name LIKE ‘%Server%’) AND net_ip_address LIKE ‘10.1.13.%’

This kept returning zero results, which I knew was not the case. After looking at the data, I saw that the IP addresses were stored like this:

010.001.013.xxx

As a result I had to change my query to look like this instead:

SELECT * FROM `system` WHERE (system_os_name LIKE ‘%Server%’) AND net_ip_address LIKE ‘010.001.013.%’

wsus and cloning vm’s

March 10, 2010

I am currently working on WSUS (windows software update services) here at work, for the most part I’m following this excellent article at Ars. After screwing around with this for much longer than I should have, I was having issues with about half the servers not showing up in the WSUS console. Many things could have been the culprit and I checked them all.. group policy, dns, firewall rules, etc. What was the most frustrating was that I could see the clients touch the WSUS server by looking in the IIS logs, and there were no errors whatsoever, but half the servers wouldn’t show up.

Eventually I realized that it was exactly half of the servers and a light bulb went off. In our environment we have a bunch of web and app servers that are all virtual, and when we build them, we get the first node working right, and then clone and rename the vm to be the redundant node in the farm.This lead me to do some searching and I found this link. Admittedly, this is an old problem, but the first time I have run across it.

The following is a repost of the pertinent bits that have caused my trouble and are the resolution for it.

5. Imaged clients with a duplicate client ID will only appear once in the WSUS Admin Console. Each AU client must have a unique id which is created for each individual install. When imaging systems it is recommended always to use SysPrep. The WSUS admin console will only display one client for each unique ID. If you have multiple clients created from one image which are sharing the same ID, only one will appear in the WSUS admin console. All clients will check in and download updates, but only one will appear and display status in the WSUS admin console. In cases where clients are not checking in, and they were created from images without running SysPrep, the following steps will reset the existing duplicative client IDs.

a. Run regedit and go to HKEY_LOCAL_MACHINESOFTWAREMicrosoftWindowsCurrentVersionWindowsUpdate

b. Delete the PingID, SUSClientID and the AccountDomainSID values

c. Stop and start the Wuauserv Service

d. From the command prompt run: wuauclt /resetauthorization /detectnow

or-

From the command line, once you are sure the AU client is properly configured and not disabled, you could run a batch file (which might look something like this sample) and get the same results:

rem Fixes problem with client machines not showing up on the server due to imaging method

reg delete HKLMSOFTWAREMicrosoftWindowsCurrentVersionWindowsUpdate /v AccountDomainSid /f

reg delete HKLMSOFTWAREMicrosoftWindowsCurrentVersionWindowsUpdate /v PingID /f

reg delete HKLMSOFTWAREMicrosoftWindowsCurrentVersionWindowsUpdate /v SusClientId /f

cls

@echo Triggering detection after resetting WSUS client identity

net stop wuauserv

net start wuauserv

wuauclt /resetauthorization /detectnow