SCOM has what I feel is a major bug in that it will allow you to save items (monitors, rules, overrides, etc.) in the default MP. Doing this is bad for a lot of reasons, and not only does SCOM allow you to do this, but it is the default option as well. In my case it turned out that an occasional lack of attention allowed me to do this and then removing MP’s later becomes a huge pain in the rear. Anyway I found this good article on how to clean up the mess.
cleaning up the default management pack
December 10, 2010exchange 2010 MP for scom = room for improvement
November 30, 2010At first I really liked this MP, it knows a LOT about exchange and there was some serious effort put into making sure it grabs everything. After a while though, there are some things you need to be able to change, but can’t.
Take this alert for disk space, we want to change the % that it alerts on, well guess what you can change with the override?
That’s right, the only thing you can do is enable or disable the rule.. that’s it. And while we’re on the subject of disabling a rule, that’s not working for at least this one:
Here’s a few of the instances of this alert..
And if you look at the overrides, this rule is clearly disabled… but still alerting.
I’m still trying to figure both of these out.
weird scom error
November 30, 2010I’m getting this repeatedly in SCOM and not sure why. Can’t seem to find out anything about it.
- Event data collection process unable to write data to the Data Warehouse. Failed to store data in the Data Warehouse. The operation will be retried.
Exception ‘InvalidOperationException’: The given value of type Int32 from the data source cannot be converted to type tinyint of the specified target column.
urlscan issue
November 5, 2010I have the following URLscan value:
RuleList=DenyUserAgent
[DenyUserAgent]
DenyDataSection=AgentStrings
ScanHeaders=User-Agent[AgentStrings]
Mozilla/5.0 (Windows; U; Windows NT 5.1; ru; rv:1.8.1.1) Gecko/20061204 Firefox/2.0.0.1
Opera/9.02 (Windows NT 5.1; U; ru)
In the logfiles I am seeing where it is blocking non russian mozillas, like this:
2010-11-05 21:49:23 76.94.140.86 896362 GET /programs/images/t8.jpg Rejected rule+’DenyUserAgent’+triggered User-Agent: mozilla/5.0+(windows;+u;+windows+nt+6.1;+en-us;+rv:1.9.2.3)+gecko/20100401+firefox/3.6.3 mozilla/5.0+(windows
(The logfile truncates after a certain length.) I do not understand why it is blocking this mozilla version with a totally different user agent. ???
Looking for an answer on this one….
.net apps using SQL connections that aren’t closed properly
October 15, 2010I already blogged about this partially here, and of course having debug disabled comes into play as well. Anyway, we figured out a problem here at work that is well documented here. It’s a good read and was really helpful for us to figure out how to recode the app.
2008 R2 DC’s and SCOM
September 27, 2010We migrated our domain to 2008 AD from 2003 AD last week and found a couple of issues via SCOM.
- DNSSec Zone TrustAnchors - for this one you’ll get an alert that looks like this: Zone TrustAnchors on DNS Server dns.name is not responding to queries.
- DNS 2008 Server External Address Resolution Alert – for this you’ll just get a failure of the alert. To be fair the same alert for 2003 has the same problem, but we had fixed that a long time ago, thus the info had been forgotten.
That’s it!
Dawson Forest, AKA Georgia Nuclear Aircraft Laboratory
September 21, 2010I’m a bit of a conspiracy theorist at heart, and I love stories about secret facilities and whatnot. Today I was reading the ajc.com and stumbled across an article that mentioned one, which lead me down this path of discovery. To be clear I don’t think there is really any conspiracy here, but it is interesting to know that a secret test facility is close.
This is the first thing I found that started this whole bit of research:
http://www.ajc.com/news/atlanta/former-secret-test-site-616831.html?cxtype=rss_news
Dawson Forest is owned by City of Atlanta and is planned for a water reservoir or a 2nd airport:
http://en.wikipedia.org/wiki/Dawson_Forest
Which used to be the site of the Georgia Nuclear Aircraft Laboratory
http://en.wikipedia.org/wiki/Georgia_Nuclear_Aircraft_Laboratory
This is a link to the history of the testing facility:
http://www.pickensprogress.com/archive/insidedawsonforest.html
Another link to the history, summary basically:
http://northgeorgiamountainramblings.wordpress.com/2010/04/28/when-the-cold-war-came-to-dawsonville/
Youtube video about the facility
http://www.youtube.com/watch?v=Bn6N2iV2_os
Here are some flickr pictures about it (some very good ones):
http://www.flickr.com/photos/robertlz/sets/72157600036376038/detail/
Some more pics in this page:
http://www.abovetopsecret.com/forum/thread230310/pg8
Some dude’s blog post about it:
http://northgeorgiamountainfreak.blogspot.com/2008/05/north-georgias-area-51-in-dawsonville.html
Some more pictures and a map:
AboveTopSecret.org link with a bunch of stuff:
http://www.abovetopsecret.com/forum/thread230310/pg15
Some videos from a guy who went there:
http://www.youtube.com/user/Ratz667#p/a
Facebook page about it:
http://www.facebook.com/pages/Dawson-Forest-GNAL/154640534555862?v=app_4949752878
Geocache site with some pics:
http://www.geocaching.com/seek/gallery.aspx?guid=40ed95a9-8c89-44da-83d8-130820a25849
Link to a pdf about the radiation measurements:
http://www.gaepd.org/Files_PDF/gaenviron/radiation/radrpt2002_dfw.pdf
Affect on pine beetles:
http://links.jstor.org/pss/2473643
Time Magazine article on the pines:
http://www.time.com/time/magazine/article/0,9171,895712,00.html
Pictures (old ones)
http://www.abovetopsecret.com/forum/thread230310/pg12
More pictures from satellite:
http://virtualglobetrotting.com/map/abandoned-government-lab/view/?service=1
scom web application monitoring part 2 – presenting the data – service levels and the dashboard
August 30, 2010This is the 2nd post in a short series on monitoring web applications with SCOM. Part 1 is here.
One of the biggest issues I have with SCOM is the sheer amount of data… it is so easy to grab a parameter here, a value here, and you throw that in with all of the stuff the management packs will give you already and suddenly you have a lot to choose from and picking and presenting that data becomes the difficult thing. Do yourself a favor and don’t show management the SCOM console, it looks more complicated than it is and I don’t think it presents that well except to technical folks.
Creating dashboards is limited, there needs to be some more work here from Microsoft. For example, like I mentioned in my previous post, you cannot save what a performance view is supposed to look like, meaning which (or all) counters are checked. I understand why Microsoft did this for the default performance view per user, but IMO once you create a dashboard view, that becomes impractical and there should be a way to make the selections a part of the view.
The dashboard also has the problem of not looking too great via the web console. It’s limited and looks kinda fugly. As a result we have tried using the actual SCOM client that we installed as a citrix app so that we can display it on the flat screen via the wyse terminal. This has the problem of not being able to default a view without a lot of work, and we keep running into issues where you need the detail pane here but not here, and you need to be able to select your views on the left hand side sometimes, but you don’t want the “action” pane visible, and you end up with something that looks like a hack.
Microsoft seems to have realized this and has since created a “solution accelerator” called the service level dashboard. I’m not going to go into what it takes to install this because there are already a ton of sites out there already that have the info. It isn’t the easiest thing to get installed because it requires a sharepoint installation which it customizes and bastardizes quite a bit, and it also needs access to the operations manager database, data warehouse, pretty much everything involving SCOM. In my case it was easier to put the actual sharepoint install on my SCOM server, which I did, and ended up having to figure out why sharepoint stepped all over my SCOM website. This wasn’t rocket science but it took some effort. If I was doing it over again, I would go ahead and install sharepoint before I installed SCOM, or find a home somewhere else that isn’t on the SCOM RMS.
Once you go through the motions of getting sharepoint and the service level dashboard installed, we can get to work.
I ran out of time today so it looks like this will be a 3 part post.
scom web application monitoring – making it useful – part 1
August 30, 2010I could go on for days about SCOM and the URL monitoring and how it needs to be improved. Honestly.. it kinda sucks. So here I will attempt to describe what I think is wrong with it and how I work around it. The items in bold below are what I feel like are failures in the way this was designed.
Also I am not writing this as strictly a “how to monitor a web app” post, there are already plenty of those. This is just about the changes required to make this useful. Here is a good article with the basics on setting up a web application monitor in SCOM.
- Requirements
To begin with, you will need to figure out what you need to monitor. In many cases it is simple enough to pull up the main page of a website and as long as it comes up, is in a reasonable timeframe, and is giving an HTTP status code of 200, you’re OK. This sort of monitoring is useful, but you can do so much more in order to get a lot more out of it. What I like to do is get the devs to code you up something special through some sort of bribery or blackmail. In our case what they did was define 5 business processes, for example “make a payment” and create a page that does the back end work of making that transaction but also the other end of the work which is cleaning up after itself. What you will get in the end isn’t exactly user experience, but it’s a good way to track the ongoing performance of a process relative to itself, and it’s a very good up/down indicator. Since we have dev environments as well, I have those on a development scom server, and I have the below web monitoring in place there as well in the first production like environment. This allows our QA folks to compare state and response time and see if the environment is working before they release code or start a test, but also they can see the impact of the new code by comparing response times from before and after the code release.
- Once you have your URL’s, it’s time to get to work.
Create a web application monitor and give it your URL. The problem with those default settings is that by default you are only logging the transaction response time and not alerting on it. From an alert standpoint, there is no timeout for your web request, matter of fact, the only thing SCOM will tell you out of the box is just if it was eventually able to pull up a URL as long as it doesn’t have an HTTP response code > 400. This default setting is not useful!
To fix this, what you want to do is add response time criteria like this.
Because of a problem with the service level dashboard that I will explain later, I only put one HTTP request in each web application monitor. This brings me to a little UI weirdness here because you can also set response times in the “configure settings” for the specific URL pull like this.
I always leave this performance criteria blank because I can see the other one easier and get more out of it. This one here just seems redundant.
- Seeing the data
Now once you gather some data you will want to, well, see what’s going on. In order to do this, create a new performance view in the monitoring console and scope it to “collected by specific rules”, and then you get to go manually pick your rules. This is where Microsoft fails again, because the list of rules is not searchable and they all have arbitrary names. For web requests I figured out they are called “Performance Collection: Transaction response time total for Name of web app monitor”. like this screenshot.
Now that you have done that, you will be able to see a nice blank performance chart with some stuff to check.
Now when we pick one, we get a pretty graph like this.
This brings me to my next issue with all of this.. it’s that the performance chart settings are user specific.. meaning I cannot create a view of any sort that contains performance information and have the counters checked already. No matter which ones I put in, and it doesn’t matter if you are using a performance view or even a dashboard view that contains a performance view, those have to be selected every time. This is a pain!
This also means that if you wanted to say, get fancy with a URL to a specific view, you cannot just create one of these and have folks click the link and end up at a pretty performance chart with the counters already checked. The fact that you cannot do this is a serious limitation with SCOM, IMO.
- setting up alert parameters (what you cannot change)
You will likely have to play with the values a bit in order to get them not to false alert. And this brings me to my next problem with SCOM web monitoring, it’s that you cannot change anything about how it samples other than where it is from (what host) and how often it samples. What I would love to do is be able to say “only alert when two consecutive thresholds are exceeded”, but that’s not an option. We get a lot of failures at night during our backup window that cause a single transaction to go out of SLA, and we get alerts based on that. As a result, we have to set our thresholds for response time to the highest level it could possibly be so that we aren’t false alerted every night, but this makes it so high that the alerting becomes less useful during the daytime. As of now I do not have a workaround for this.
- stopping duplicate alerts
When you do get your first alert you will see that two are sent.. one for the URL pull and one for the aggregate monitor on the web application monitor. This doesn’t really make sense to me why this would be set up this way at all, so let’s fix it.
Start by right clicking on one of the alerts and open the health explorer for it. Expand it out and you will see something like this.
Each of the red lines has an alert set up for it, and the lower one for the actual request rolls up into the web application one. In my mind the web application one is redundant, so I am going to disable it. Right click, choose “monitor properties”, go to alerting, and uncheck it.
Now you will receive one alert instead of two.
- useful alert details
Of course the text of the alerts isn’t useful at all out of the box (it doesn’t tell you if the URL failed for time, SSL, http response, or anything). I am using this article as a basis for fixing this, but I don’t have it totally worked out yet. This will continue to require some further tweaking.
This post ended up being longer than I intended (there’s a lot to fix) so I am going to break it up into two parts and get the service level dashboard stuff into a 2nd post.