Monday, November 14, 2011

Installing FAST Search Server 2010 for SharePoint, Gotchas, That'll Get 'Cha!

My client has a need for FAST Search Server 2010 for SharePoint in their SharePoint 2010 farm. So, I went about installing it... Two tickets with Microsoft and three weeks later FAST is installed and configured. I think I ran in to every strange exception and gotcha that FAST has, and I haven't even started to configure it in SharePoint yet. Wow. This was an interesting one, especially since the install and configuring went so easily in my R&D environment. The install at the client site, absolute nightmare.

If you want the breakdown on how to install FAST, your best bet is to hit up Microsoft's instructions. They are very complete and go over everything that you need for any type of deployment: MSDN Instructions I'm not really going to go in to the install here. I might do a post on the deployment.xml file, but Microsoft does better with their install instructions than I could. However, if you are looking for the weird stuff, and the gotchas, you have come to the right place.

First you need to have Windows Server 2008 R2 64bit. FAST won't install on anything else. I chose to update the servers with all of the latest service packs and updates before beginning my install. This is a good idea to do, just because if you run in to a problem and you need to call Microsoft, the engineers there will likely insist on you updating your OS first. Updating first gets all of that noise out of the way.
My client wanted to have a fault tolerant system, so my architecture included two servers. In FAST terms that meant that I would have one "Admin" server, the server that would run the Administration service, and one "non-admin" server, a server that was part of the FAST cluster, but did not run the Admin service. Only one server in the FAST farm can run the Admin service.
In creating the multi-server deployment you need to create a file to tell FAST what services will be running where. This is a simple xml file that is referred as the deployment.xml. More on that little terror later.

Similar to SharePoint, you install the binaries on to your FAST servers then run a configuration wizard to configure the farm. That part is as simple as three or four clicks. Not a lot is done by the install program other than to do the usual file move, registry updates and assembly deployments. The configuration is where the real action happens. After installing FAST's binaries, I downloaded and installed FAST's service pack. This is a recommended step by Microsoft and just makes good sense. Why not do the service pack install before you configure your deployment? I don't see a valid reason why not. So that is what I do. For installs, I like to turn off the Windows Firewall so that I don't have any trouble with that service blocking any ports that need to be open for the install and configuration to work. After configuration is complete, I add rules in to the firewall for my newly installed services, and turn it back on. So, fortified by my experience with EVERY SINGLE Microsoft program to date and how they dealt with the Windows Firewall, I switched it off and wen to start the configuration.

PowerShell Requirements Now, FAST's configuration wizard is basically just a user interface that passes what it gathers and validates to a PowerShell script that actually does the configuration. In order to run PowerShell scripts, you first have to run a quick command in PowerShell to tell the server that it is OK to run PowerShell scripts. You can run individual cmdlets until your are blue in the face, but once you try to run those same cmdlets from a ps1 file, you get a nasty error. Sooooo, on gotcha that I managed to avoid right off of the bat is that I always set my servers to be able to run PowerShell scripts during their initial Windows configuration. Check out the Set-ExecutionPolicy cmdlet for more information.
I run a lot of my own scripts, and I never sign them, so I always set my policy to Unrestricted. It is a little bit of a security risk, but I mitigate that by setting the policy to AllSigned after I have completed my installation. All scripts that will be run on a regular basis should be signed and taken through the normal configuration management policy of good software design.

Service Account Requirements
The account requirements are kind of misleading, and you really need to be careful with them, ESPECIALLY if you are in an environment that is heavy handed when it comes to GPOs. I chose my Admin server and start to run the configuration wizard on that server. The first thing it asks is to enter in the username and password of the account that will be running the FAST windows services. I had prepared for this by making sure that the service account was a domain account, that FAST uses had rights it needed on the database server to create and configure databases, and I made sure that the account had the minimum rights for an account to run a service (log on locally, log on as a service). So, after I enter in the username and password, I get a validation error saying that the account is invalid. What?

I double check the hardware and software requirements and find that all I need is an account with log on locally, and that the install account is an administrator on the server.
The install account MUST be a member of the local Administrators group. This is a hard requirement, the script does a validation check. The goofy thing is, that if the install account is not a member of the Administrators group it is the validation on the SERVICE account that fails saying it is an invalid account. This was a HUGE gotcha for me.
So, after I added that account to the administrators group, I was able to get past that validation error.
The other minor gotcha here is that the account that the install script uses to do the database work is not the install account, it is the service account. So, before you begin the FAST configuration, be sure to grant the service account at least dbcreator rights. I like to set the account that is doing the database work to SA during the time of install. That way any scripts or whatever the install wants to do it can do on the database server. If you do this, be extra careful that you remove the service account from the SA role IMMEDIATELY after completing the install!!

Disjointed Namespace
The next section of the configuration wants you to enter the Fully Qualified Name of the server in to a text box. This I do, FASTAdmin.SharePoint.MyClient.com. Blam! Validation error: Please enter in a valid computer name. After going through the normal, spell checks and whatnot I found that I did indeed have a valid computer name. I went to another server in the same subnent, and pinged my Admin server to see if I got proper DNS resolution. I did. So... Whiskey Tango Foxtrot?
When all else fails... Read the log files. So I go to the log files and find that my LDAP look up is failing on the server FQN. What?? I learn that this might have a problem if your DNS name space is disjointed. What does that mean?
Well... Say you are in an environment that originally set up in a UNIX, or some other type of network that does not support Directory Services and DNS integration. To segment your network in to logical units, the network engineer used DNS to designate the divisions of the domain. This is done easily enough by adding prefixes in DNS through BIND commands. When everything is done you have a nicely segmented namespace with each division having prefix telling you exactly what division owns what computer. The FQN of the computers ends up being COMPUTERNAME.Division.MyClient.com.

Enter Active Directory. Active Directory uses a protocol called Lightweight Directory Access Protocol. It also tightly integrates with DNS. When configuring AD, it is recommended that it is implemented to closely mirror your DNS environment. So, for each DNS segmentation, AD should be segmented with a child domain. This is to ensure that LDAP resolution and DNS resolution both can occur. HOWEVER, Windows has some boxes you can check and settings you can hack so that you can have a single domain, yet keep your segmented DNS environment. This is called a disjointed DNS namespace. It causes problems...

Enter in Windows Identity Foundation. WIF is new in the Windows world, and what it does in introduces claims based authentication. SharePoint 2010 and FAST use claims extensively. FAST uses WIF for all of its authentication, and it just so happens that WIF uses LDAP exclusively to validate computer names. What happens if your LDAP and DNS do not match? Your LDAP query fails and you can't install FAST...
In my log file I see exactly that. The LDAP query that is failing, CN=FASTAdmin DC=SharePoint DC=MyClient DC=com does not exist because there is no CN=SharePoint. The SharePoint part of my FQN is simply a DNS prefix, not an actual child domain. I would have to fix this problem before I could move on.

I am happy to learn that the FAST team and Microsoft have addressed my very problem, and have fixed it in FAST Search SP1. So I download that guy and find the setting in the new psconfig.ps1 file that needs to be updated (set $disjointNamespace = $True) . I save the file and run it... only to see the exact same error pop up in my log file... Grrrrrr....

As far as I can tell with my work at the client site, and my R&D work, FAST simply will not work with a disjointed namespace. It failed every time I tried it. So, to fix the issue, the DNS prefix was removed and the FQN of the server was set to FASTAdmin.MyClientDomain.com. If you know how to get FAST running in a disjointed namespace, let me know!!

After the move, I no longer saw that particular exception in my log file... New and exciting exceptions awaited me!

FIPS Encryption The next exception encountered was a lot easier to solve... technically speaking. The solution, however kicked off a political firestorm that rages to this day. Anyway, this blog is not about office politics...
So, in the logs the exception is:
"This implementation is not part of the Windows Platform FIPS validated cryptographic algorithms"


This one is easy to solve. Simply open the registry and navigate to hklm\system\currentcontolset\control\lsa\fipsalgorithmpolicy and flip the bit to 0. Close the registry manager and that is all there is to that one.

Microsoft Distributed Transaction Coordinator (MSDTC)
MSDTC is required for FAST to run. There is a part of the install script that will attempt to install this service if it is not installed. BUT if you have a GPO that blocks MSDTC from installing you will get the following exception:
Error Unable to get a handle to the Transaction Manager on this machine. (8004d01b)

This one requires that you change your Group Policy settings, then confirm that DTC is installed properly. Next go in to the properties and make sure that the check box for Allow Remote Clients on the security tab is checked. This is important if you are installing a non-admin sever later.

Firewall
Now things get really funky. Microsoft is notorious for adding their firewall product to their Operating Systems, then having their applications completely ignore that a firewall is even there. This leaves the user to scratch their head and wonder why they can't connect to anything on their computer. So, a common process is to disable the firewall completely, do the installs, configure the ports that are needed to access the application, write firewall rules for them, THEN turn the firewall back on. FAST Search Server breaks from this mold... FAST actively looks for the firewall, confirms it is on, then writes its own firewall rules. If the firewall is not detected, yup you guessed it, an exception is thrown and the configuration fails. You can turn the firewall off when after FAST is running, but during the install you must have the firewall turned on. If not, the configuration will not continue. Major headache.

PSConfig.ps1 Problems The final problem I ran in to on the configuration of the Admin server was that during the execution of the cmdlets that create the Administration Database, the script would somehow loose the FAST PowerShell snap ins. I don't know why, I don't know how. It seems to be a unique problem only to this client's environment, however I thought I would include it here, just in case somebody else is getting this exception and does not know how to clear it.
The following exception kept popping up during the configuration:
Exception- at System.Diagnostics.Process.StartWithCreateProcess(ProcessStartInfo startInfo)

If you poke through the PSConfig.ps1 script you can see that when this method is called, the install script spawns a new PowerShell instance. For whatever reason it was when this instance was spawned that my FAST snap ins would be lost.
What I did to clear it was to trace back to where the method was being called, it was in another file called commontasks.ps1, located in the %FASTInstallDirectory%\installer\scripts\include folder. At line 2118, I added a little logic to detect if the FAST snap ins were loaded, and, if not, load them:
If((Get-PsSnapin|?{$_.Name -eq "Microsoft.FASTSearch.PowerShell"})-eq $null){$PSSnapin = Add-PsSnapin Microsoft.FASTSearch.PowerShell -ErrorAction SilentlyContinue | Out-Null}
This code will ensure that the snap in is loaded and the configuration can continue. What I found out later was that because of this problem, my database did not get created. Not really a big deal, because the configuration of FAST doesn't write anything to the database it just configures the FAST services' XML files and other such things. After the configuration completed, a database will need to be created for FAST to use. It must be created before you do any other configuration, such as adding a non-admin server or connecting FAST to the SharePoint farm. Fortunately, it is very easy to do, provided you use all of the same database settings that you passed to the PSConfig.ps1 script.
Run the following in a FAST Administrative PowerShell instance:
Install-FASTSearchAdminDatabase -DbName YOURADMINDATABASENAME -DbServer YOURDBSERVERNAME -Force

It is important to realize as you run this particular cmdlet, that it will run as the account that you are logged in as. So, be sure to run it using an account that has at least dbcreator on the database instance.

After all of these problems, my Admin instance of FAST Search Server 2010 for SharePoint was complete! My problems were over right?? WRONG!!

Deployment.XML File and IPSec Requirements
I'll tackle these two issues together, because they are closely related, and throw the same types of exceptions... Grrrrrr...
After the pain and suffering of installing the Admin server, the non-admin server should have been a walk in the park. I knew all of the gotchas, and I was able to avoid them during the binary install, and most of the server configuration. But as soon as I attempted to configure the non-admin boxes, problems started to pop up.
After running for a good amount of time, the configuration would fail, and the following exception would be in the log:
%FASTInstallDirectory%\bin\MonitoringServiceConfig.exe" Output - Error: The file %FASTInstallDirectory%\etc\middleware.cfg' was not found.

What??? Can an exception get more cryptic? Why not just throw "Object reference not set to an instance of an object"? That would be equally as useless, and appropriate at the same time! OK, rant over. This post is too long to subject you to my feelings on Microsoft's exception handling.
What happens during the configuration of a non-admin server is that the non-admin server will attempt to make an IPSec connection to the admin server, and download a series of files that configures the services on the non-admin box. In this way you can set up many non-admin servers quickly, with only having to enter in configuration data once. Sounds great right? Right...
The problem is that the configuration script does not check to see if this whole download procedure completes successfully. It happily chugs on to validate if the files that were supposed to have been downloaded exist. If they don't exist, then the script blows up, and you get the idiotic, meaningless exception above...
So, again PowerShell to the rescue. You can attempt to run the IPSec connection and file download using a PowerShell cmdlet. The good news with this cmdlet is that you will get a MUCH better exception if it fails.
The cmdlet is as follows:
Set-FASTSearchIPSec –create
The script will prompt you for a username and password, use your FAST service account.
The cdmdlet would chug a long for a bit, then produce the following exception:
An error occurred while configuring IPSec - Could not connect to the admin node.
This may be because of,
  1. Invalid admin node name
  2. Invalid baseport. Baseport of admin node and non-admin node must be same
  3. Admin node is not up and running
  4. Missing IPSec rules on admin node. If you added this host to the deployment.xml after running this script on the admin node, you need to rerun the IPSec cmdlet on the admin node
Awesome... What if, as in my case, all of that stuff is good? What if when you run this same cmdlet on the admin node everything is awesome??? You have to look at the underlying technology to figure out what is up.
What is the underlying technology? IPSec. What is IPSec? Internet Protocol Security. On Microsoft Operating Systems, what does everything that uses the Internet come down to? Internet Explorer. I freaking E.
Here is what is going on. IPSec and IE use the same connection settings, for some goofball reason, configured in IE. So if you go in to IE's Internet Options, Connections Tab, and click on LAN settings you see a little check box that toggles if IE automatically detects the Internet connection settings. What this is really doing is sending out a broadcast to see if an Internet Proxy Server responds. If it does, IE uses that Proxy server to connect to the Internet. If it detects nothing, the process will time out and IE will happily connect directly.
If this box is checked, IPSec will attempt to connect to the Internet the exact same way as IE... Only IPSec doesn't handle the time out like IE does. If IPSec does not detect a Proxy server, IPSec up and fails. Fun, huh?
You clear this problem by unchecking the Automatically Configure Settings box, or, if you are using a proxy server, manually inputting your proxy server settings.
This kind of crap is why people hate Microsoft so much. If you integrate everything, fine integrate EVERYTHING. Don't integrate half, then just quit!! It is frustrating as all outdoors when you run in to a problem like this. Why would you check in IE for an IPSec issue?????? It makes zero sense. It is like checking your MS Paint setting so that your video card will work correctly.

With that exception cleared we are good to go, right? Wrong... Run the configuration script again and... same error. Repressing the urge to go on a multi-state shooting spree, I run the Set-FASTSearchIPSec cmdlet again to see what is happening now. A new exception now graces my screen:
XML Validation error: Data at the root level is invalid. Line 1, position 1.
Really??? Really??? What XML file???!!!???
One of the files downloaded is the deployment.xml file that you configured and put in to your Admin server configuration. This file tells the configuration script what your indexing configuration is going to look like, which servers are running what service, etc. But there is a little catch. If you try to use Visual Studio to create your XML file, Visual Studio will create the file using an encoding method that FAST doesn't like, and it will add a single bit character to the very beginning of the file. That will cause everything to go kerplow!! Stupid, right? Yup!! Sure is. Can't use Microsoft's products with... Microsoft's products.
So how to clear this exception?
First you need to go back to your Admin server. When you get there you need to navigate to %FASTInstallDirectory%\etc\config_data\deployment and find the configuration.xml file. Open that guy up and copy everything but the encoding statement at the top of the file. If you don't have an encoding statement at the top of the file, get everything.
Rename deployment.xml to something else... Microsoft_Is_Stupid.xml sounds good to me.
Open up Notepad and paste everything in the new file that it creates when the program is stared. Save the file in the same location as the other file, and name it deploment.xml.
Now open up a FAST Admin Shell and run Set-FASTSearchConfiguration. Or restart all of the FAST Services, or reboot, whatever...
After all that is done and the Admin server is back up, run the Set-FASTSearchIPSec cmdlet again. Confirm that it completes successfully. When it has, re-run your configuration script. It should complete successfully this time.

I realize that a lot of these problems come from FAST being a newly acquired software package. I know that the developers on the FAST side of things are working to come in to the Microsoft fold, and that this version of FAST is a "1.0" product. I realize that my client's environment was unique considering their security zeal. This eases the pain of the install a bit, but what really grinds my gears is the absolute lack of documentation of these issues. I had to search blog after blog after blog to find out what was going on both under the hood and with the exceptions. Microsoft provided very little in terms of providing support. Sure, I learned a TON about how FAST works and all of the moving parts that go along with configuring it, but I paid for that with the stress and absolute agony of this install.