Work
Dell… Why do you hate me so?
by danlor on May.19, 2005, under Technology, Work
Speaking of Dell… I had a hard disk crash in one of our 2600’s yesterday. Actually, I just discovered it yesterday. For some reason I was not alerted to the problem so I’ll have to look in to that later. The drive really failed a couple days ago. It was late in the afternoon, but I decided to call it in anyways. The server was critical to operations, and it was better to get things moving on the support end. Leaving a RAID 5 Array in degraded mode is never a good idea. I called up and got a guy named Carlos who spoke good English and was technically knowledgeable. Cool! He asked if I had reseated the drive. I told him that I had not because different vendors have different policies about that. He asked me to try and manually rebuild the drive, which immediately failed. He then had me reseat the drive, which immediately failed. I thought we were in good shape for replacement… but no. “It could be the backplane of something… maybe the controller”, he said.
Now for me, there was a simple solution. Send both. I don’t really care if the problem cannot be pin pointed. They can do that in their labs with test equipment if they really want to know. I just want my drive fixed.
I offered to look into the logs to see if there was an event there, and sure enough, there was. Luckily it turned the call around, and he offered to send me a replacement drive. Considering it was 5 pm, and the weather was hideous, I could not see them making it that evening. He assured me they would, and I went home for dinner.
I was quite shocked to get a call about 7:30 telling me that the courier had arrived. I drove into the office, pulled the failed drive out of the server and opened the box. Uh… what the hell? They sent me a bare drive. Servers use hot swap rails and cages for their drives. This means I have to swap the rails from one drive to the other… except for a small problem… the new drive is BIGGER than the old, and does not fit properly.
After staring at the drive for a few moments… I notice that there is a bright orange sticker on the side of the drive shipping box… “REFERBISHED” You have got to be kidding me. They sent me a USED drive for a server?!?! That has already FAILED for someone!!! Perfect. After applying much more pressure than I was comfortable with, I got the rails to accept the replacement drive. I was afraid it would not fit in the server, but it slid in fine. I then sat and watched with trepidation as the server attempted to rebuild the array with the new drive.
It took about 2 and a half hours, but the rebuild completed, and the array is now back online. I’m not sure what to do at this point. We are virtually guaranteed a double drive failure in this array in the future.
Beware Windows 2003 Service Pack 1
by danlor on Apr.02, 2005, under Technology, Work
While it has been tested for eighteen months, it still is not ready for prime time. After installing it on a clean 2003 server at work, the server was nolonger bootable. Uninstallation failed as well. The machine locks up during boot, right after the splash screen shows.
So… You have been warned.
Another Amazing Week in Technology
by danlor on Dec.07, 2004, under Technology, Work
So. Where to start.
Just to show that I am an equal opportunity bitcher, I had an hp d530 ultra slim blow a motherboard last week. hp was nice enough to send me a used motherboard to replace it. Unfortuantely, they did not make sure it was usable before shipping it. The video connector was mostly broken off the board. They did not argue and sent me working used board. At least it works now. TAKE NOTES DELL!!!! [sigh]
Ok, so… the fun of my day today. Everyone is really keen on all these lovely cisco IP phone we have now. I have to say the whole thing has been quite mixed from my point of view. On a scale of 1 to 10, our success runs in the 3-4 range.
And now that I have discovered the most wonderful “feature” ever, I can tell everyone I know to run away from these pieces of shit screaming. As fast as you can. Are you ready? You sure?
Every cisco IP phone ever made is a network time bomb. Every fucking one. Want proof? Read this:
If the phone is not powered by an AC power adapter and the phone is connected to an Ethernet switch that does not provide POE support, the circuit inside the phone’s uplink port remains closed. In this state, any traffic sent by the switch to the phone may loop back to the switch and create a loop back storm that disables the entire VLAN.
This may mean nothing to most of you non network admins, but all you real admins right now are shuddering. Read that again to just make sure you fully understand the situation.
It took out one of my branches last night. A cleaning lady knocked a powersupply loose. Took everything out: Security, atm, teller machines, phones, everything. And it can happen to anyone. Any company that has IP phones, and a mix of POE and non POE ports on thier system (everyone), are in danger. Just imagine little Suzie moving some stuff around in her cube, and accidentally plugging her computer and phone into the opposite ports. POOF the main vlan for the comapny is now DEAD! By design no less. Here is another quote for you:
Devices that are capable of receiving POE, such as Cisco IP Phones, close the loop back circuit on their uplink Ethernet port when they are powered down to enable the POE discovery pulse message to be looped back to the switch.
Again, you might want to read that twice. Cisco ip phones use the biggest mistake in networking to power themselves up. It is completely unnessesary as well. 802.3af uses resistance and capacitance on the line to determine the same thing. And it is backward compatible to NOT take your network down. how quaint.
Cisco is quick to trumpet the fact that cisco hard ware is “immune” to this problem. They claim that their hardware has loopback recovery features to prevent the vlan from crashing down. Just one problem. It shuts down real live working servers. cisco has shut down their own fucking Call Manager software (phone switch) on our network three times due to “errors”. The problem has been remedied by moving the server to an HP switch that does not suck.
Full doc on this “feature” can be found here
SBC on strike
by danlor on May.24, 2004, under Technology, Work
Wow. Looks like this week will be fun. SBC is one strike for three days, and we have three lines down. Looks like our phone system that SBC was supposed to install will be put on hold as well. They could be out as long as three months!
At least there is some good news. Our Mailfrontier stopped another few hundred viruses that got through Symantec over the weekend, and on top of that, I got vidication from our “virus protection” out sourcing company.
I have been working with them for over a month now to try and tighten our virus filters. Many strains were getting through, and I felt it was just a matter of time before our luck ran out and our network got leveled. But I kept getting responses like this which was addressed to our department:
…The strong majority of Ben’s emails are actually benign, even though the desktop software recognizes them as a virus. Here’s what is happening:
These viruses are composed of two components: an infectious attachment, and HTML code in the message that uses an IE exploit (http://www.kb.cert.org/vuls/id/980499) to execute the attachment. The NAV gateway processes mail in two stages: first checking for any attachments files to be removed, and then checking for viruses.
It is instructed to remove any file that matches the following patterns:
letter.zip
*.pif
*.scr
*.rar.
These patterns were configured by us to mitigate the most common exploitable attachments.
Once the attachments are processed, the gateway software will then scan the mail with its AV component. Now, because the attachment has already been deleted, the AV component considers it benign and forwards it to the destination. You can corroborate this by viewing the source of the xxxxx@xxxxxx.xxx mails; they have a “DELETED0.TXT” attachment which shows the exploit has been stripped.
After being delivered to the end user, the desktop AV software notices the exploitable HTML (this HTML is in the body of the message, not encoded as an attachment and therefore not stripped) and complains that it has seen a virus. In this case, however, it has only seen the HTML code, not the attachment necessary for the virus to propagate.
I should note that it is possible to disable the attachment scanning and rely only on the AV software, which may make for more thorough cleansing, at the cost of lesser protection. Let us know if you want to try this route.
The true situation is this…
SAV strips Attachment
SAV scans message – Sees no problem
SAV sends message on
Mailfrontier Scans message – Finds exploit scripts
Mailfrontier Blocks message with virus/vulnerablility and redirects it to an external “holding pen” on one of my personal email servers. Now THAT is a dangerous mailbox. I don’t even like to LOOK at it.
Imagine my joy today when the SMTP gateway of our security vendor showed up in my inbox complaining about viruses I had sent to them! Turns out when accessing the “benign” scripts in my inbox, THEY GOT HACKED!!!! The virus harvested their inboxes for email addresses, and then started sending!
I think they learned their lesson. I asked them if they would like to use my consulting service to get a handle on their virus vulnerablilites. 🙂
Dell – The Modules Arrived
by danlor on May.21, 2004, under Technology, Work
Lucky for me, I have an on-the-ball lady working in receiving. She knew I was expecting a parts shipment from Dell. She also knew that our CAO was not. Guess who dell shipped the package to. I’m going to have to start taking pictures. I’m not sure how much more of this you all will believe with out a little physical proof!
So, she brought me the box to verify if it was what I was looking for. Sure enough, it was the memory… But wait… Here we go. 4 refurbished memory modules. Unfriggin believable. They want me to put USED, FAILED memory in my server. But I’m sure this meets Dells stringent QA testing, waiting I seem to remember a quote in a previous email that makes me nervous…
2005-05-19
>Ben
> send the log. Because if there is less the 10 errors within a given
>time thats normal and well within specs. thats why it is ECC . Thank
>you for using Dell’s online support for Workstations/Servers.
On top of that, the modules are not paired. This is DDR RAM. Shit. I’m screwed. And of course they want these all back within 10 calendar days as well. I have no way to know if any of these sticks will work, and not enough time to thoroughly test. I have a feeling I’m going to be calling CDW to buy new ram.
Mailfrontier – A bright spot in my day
by danlor on May.21, 2004, under Technology, Work
Not all is frustration and gloom here at work. Overall, things run quite well. So, in order to counter act the one-two punch that is Dell and Symantec, there must be some pretty good stuff over here! What could it be? It’s Mailfrontier ASG/EG. the ASG/EG stands for Antispam Gateway/Enterprise Gateway. These guys are cool. They have been around for about two years now, but the ASG product is just a tad over a year old. It is great. Truly great stuff.
The software was primarily designed to block SPAM, but it now also blocks viruses and does Policy based filtering as well. And it does its job admirably.
When we decided to look into running the ASG software, we were getting about SPAM 10000 messages a month, with 7000 real. We thought we had trouble then. So far we are looking to break 80000 messages this month for spam alone. It is breath taking to watch the stats climb.
Our users are happy, and they get beautifully rendered spam reports personalized every morning sitting in their inboxes. They are easy to read, and concise. The users get personalized white and black listings as well.
The policy stuff is pretty new, so there is room for improvement, but the virus handler is deft. Yesterday, Mailfrontier/McAffee stopped 276 viruses from getting into our network. Not a big deal? Well keep in mind that Mailfrontier sits INSIDE our Symantec antivirus firewall. That’s 276 infections it saved me from cleaning up today. That makes me very happy.
To fill out one of Mailfrontier’s HUGE gaps in functionality, I built an SQL/Crystal reporting engine that automatically imports the logs from the previous day and gives me stats any way I can imagine.
This is what software should be. I want software to make my life easier.
Dell… Continued
by danlor on May.21, 2004, under Technology, Work
Got a response back yesterday and as expected it was amazing. Well… I might as well through the entire mail sequence.
2004-05-19
>Ben
>send the log. Because if there is less the 10 errors within a given
>time thats normal and well within specs. thats why it is ECC . Thank
>you for using Dell’s online support for Workstations/Servers.
Now, I was a little taken aback by this. ECC memory is there to PREVENT CRASHES of applications and data corruption caused by memory errors. It IS NOT there to give Dell an excuse not to replace my RAM. It’s as bad as Sony! So, I took the exported log file that the Dell open manmanage system generated, and attached it to the reply. I get this later in the day.
2004-05-19
> the logfile sent is unreadable. all i see if a bunch of number and
>letter strings. Try and resend it using TXT format .. that should
>work..david
Now, keep in mind that I sent them their log file directly exported from open manage. they export a ZIP file, not a txt file. when you attach a zip file, it gets MIME encoded as base-64 so the smtp gateways can handle the binary transfer. He got it, and couldn’t figure out what it was. I’m getting tech support on a SERVER from a guy who doesn’t understand EMAIL! I know the attachment was ok because I extracted it on my side and opened it from his replay :-).
So. He wanted a plain text version of the log. This is a problem. When I unzipped the log, it is html. lots-o-html. Why would the export html if they want txt? Go figure… oh yea… this is Dell! Frustrated, I just cut and pasted the entire rendered page into a text email. My email program through out the tables and formatting, and I ended up with a rather jumbled mess of log entries. I didn’t feel like repaginating, so I just hit send. I got this yesterday.
2004-05-20
>Ben,
>Swapping the ram seems to have make the issue follow the stick of
>ram since the error are consistent with 1 stick at a time . I
>will go ahead and ship out 4sticks and make sure we cover
>all bases. They will be there friday morning and the ref. number is
>xxxxxxxxx
4 Sticks…. that’s wierd. I only have 2 in the machine. Hmmm… I hope they are the right size. Oh well. If they aren’t, we’ll try again. You have to be patient with these guys. It is currently 1:11 pm, and the sticks are not here yet. Looks like we will continue play on Monday.
Another Day, Another Dell Support Call!
by danlor on May.18, 2004, under Technology, Work
So, just incase you thought that those other problems were isolated issues, and the server support is better, this one is for you!
In March, the little amber light on the front of our 2600 started blinking. I took a look at the open manage software and all the indicators were green. No failures. I Then browsed through all the diagnostic lists looking for an error, but none could be found. Stumped, I called support and got a hold of a woman who pointed me to the Open Manage log. There I found the problem. The DIMM in slot one was getting ECC errors. SLAM DUNK! thank god…. Right? WRONG. She instructed me to go ahead and down the server and swap the modules between slots to see if the error followed. I asked her to just send out the modules, but she refused. She wanted to be sure it wasn’t the memory slot… Yea. Ok. whatever. I asked her to just send out a motherboard with the memory, but she didn’t want to do that either.
The problem is that this server is always in use. It is a primary system. Downing it is a major problem. After three days, I was able to get a window to work on the server. Unfortunately, the server is designed in such a way that it cannot be worked on in the rack. The covers can’t be removed while it is mounted. So I have to pull it out of the rack, then dismantle the cooling system to get down to the motherboard. I swap the modules and power it back up.
My luck was poor that day, and the error did not reoccur. I was destined to deal with this issue again at a later date… Today!
Dell likes playing chicken with their warentees. They try to push out repairs to future dates, after the warentee expires. Little did the last tech know that we extended our warentee.
So I call Dell up again… Guess what. They want me to swap modules! I just sent them the error log including the errors from the previous call. I’m currently waiting for the next hoop.
Dell… How would you like to be screwed over today?
by danlor on May.16, 2004, under Technology, Work
It’s hard to believe I was once a Dell fan. I recommended their hardware to many people over the years. They built their machines out of off the shelf components, and made sure that when things sometime went wrong, they were put back the way they should be quickly.
But things changed. Soon, Dell decided to scrap it’s three year lockdown for corporate machines. They began using proprietary designs and hardware. They raised their prices. They moved their support departments overseas.
When I started work where I am today, we were in the process of moving the entire company to Dell desktops. Things were going ok, until we started to have motherbaord failures in our GX150s. The machines would randomly reset for no reason, even when booted to DOS from a floppy. We spent over a year off and on trying to get the machine diagnosed to no avail. we were never ever to get any kind of error code, and we could never narrow it down to any one component. No matter what we did, the problem kept on coming. We eventually just retired the machine. It was still under warentee, but we could not get it repaired. They insisted on getting an error code of some kind before authorizing any kind of repair. They also didn’t want to spend time swapping out components that might not have been bad.
a few months later, we found the problem in THREE more GX150s. Luckily, these units were under Dell’s “gold” support program. “Gold” means that Dell will “waste” money trying to fix your machine. We got a hold of a tech who had actually SEEN the problem before, and knew exactly what it was. Turns out that quite a few Dell computers have trouble with the cpu socket and static generated by the CPU fan. It is a known defect internally. He sent us out a tech with new motherboards, and fixed the three machines. Gleefully, I called normal support back and referenced my gold support incident hopeful that Iwould finally get the machine repaired. Nope. No error, no tech. We called our sales rep, and he told us there was nothing he could do. I called tech support multiple times trying to get some one who would listen, but they all said the same thing. There was nothing they could do. The final guy I talked to told me to call Gold support and lie about my service tag, and use a tag off another machine. The machines still sits in my junk bin.
Last september, one of our Dell laptops decided not to charge it’s batteries any more. We narrowed it down to the logic board, and scheduled a repair. The expected repair date was DECEMBER 25. Three months. That’s right. I called them every couple weeks to make sure everything was still on schedule. In December, the 25th came and went, no tech. I called them back on th 27th to be informed my warentee had now expired. We called our sales rep, who once again told us there was nothing he could do. Fun fun fun.
We have since switched to whitebox and hp/compaq machines. So far no trouble. We call for support, repair tech comes out.