Monday, November 18, 2013

Moving Folders to Managed Metadata - Keeping the Folder Taxonomies of Lists

I started a new client. They have had a SharePoint deployment for a while, and, because of the natural mistrust that IT people have for their users, they have not implemented a distributed Administration model. Users have the ability to add files and folders to lists and libraries, but do not have the ability to create new sites or libraries.

When a company does this, what happens is that the users recognize the power of lists and libraries, but they are frustrated with the lack of freedom they have to organize and categorize their data. So, what do they do? They create folders. Lots and lots and lots of folders. When you only have the "Shared Documents" library to work with, and all sorts of different files to share, some organization is better than none, and, without guidance, folders are a familiar and easy to use organizational tool.

But what happens when users are left to put folders in place unchecked??? You get DEEEEEEEP hierarchies of folders. Data becomes difficult to find, files get duplicated, AND you can run in to filenames that are too long for SharePoint to manipulate. Bad news.

I am in a situation where I need to remove all of the folders from my document libraries, preserve the folder taxonomy, maintain the ability to see only what is in a specific folder, and make it easy for users to add new documents to the library using their familiar folder taxonomy.

The answer, of course, is to import the folder taxonomy in to the Managed Metadata service, then add a Managed Metadata column to that list. The only other wrinkle that I have in this project is that I do not want to publish these folder taxonomies out to the rest of the enterprise. I actually want to get rid of them at a later date. So, I need to create LOCAL term sets rather than GLOBAL term sets. What's the difference? Well, the local term set, while existing in the global Managed Metadata Service, is only available to the site collection within it resides, via a Group who's IsSiteCollectionGroup property is set to true. The global term set, of course is available to anyone who subscribes to the Managed Metadata Service.

Now, you COULD accomplish all of the above requirements by simply going to the library, add a new Managed Metadata column, click on Customize you term set, and populate the term set manually. Then open each folder and apply the proper tag to each file in the folder. Then open the list in Explorer view and copy everything to the root, deleting folders on the way. That works great for small sets, but I have libraries with over 50,000 items in them and hundreds of folders. I need something to automate the process.

So... First things first. I need to go to the list and get all of the folder names, but I need to maintain the taxonomy. How do we do this?? We first need to know a little something about the SPFolder class. What is great about the SPFolder class is that it has a property collection called SubFolders. This gives a listing of all of the the child subfolders that are contained within the original folder. That makes our job easier. What makes our job harder is that each child folder has its own SubFolder property, and the child below that, and the child below that until the end of time. Whew... How do we pull all of these folder names out?? We have to create our own object that has a nested list, and then we use something called a recursion method to pull everything out.

What is a recursion method? Put simply, a recursion method is a method that calls itself. They are very useful in situations where you have to get the factors of something or if there are many nested elements within an object, such as a SPFolder object or a SPWeb object. This type of nested object is very very common in the SharePoint world, as well as everywhere else.

So first we create our custom object. Since we are only looking for the names of the folders, we only have to deal with strings.
public class FolderObject {
        public string folderName { get; set; }
        public List subFolders { get; set; }
    }

Pretty simple, right? All we have is a string with a generic list of the object that we just created. This gives us the same structure as the folder name with its child subfolder names.

Now, how do we get all of the folders within a list? Easy, we do a CAML query. BUT, we have to be a bit careful. We want to EVENTUALLY get all of the folders in the entire list, but we want to maintain the taxonomy, so we must do a NON recursive CAML query. Otherwise our query would return all of the folders in the list on the same level. We don't want that. We just want the folders in the top most level so that we can walk down the folder hierarchy train.
Remember that when dealing with SharePoint lists, you only have one type of object chat can be within, SPListItem objects. So, when you do the query, the output will be a SPListItemCollection.


public List GetAllFolders(SPList list) {
            List folderNames = new List();
            SPQuery query = new SPQuery();
            query.Query = "1";
            SPListItemCollection items = list.GetItems(query);
            foreach (SPListItem item in items) {
                SPFolder itemFolder = item.Folder;
                folderNames.Add(GetFolderTaxonomy(itemFolder));
            }
            return folderNames;
        }

No problem so far. This little bit of code gets us the root folders in the list. After we have the SPListItemCollection, we run a loop on them to obtain the SPFolder item associated with the SPListItem. Now comes the fun part. We send the folder object to our recursive method, GetFolderTaxonomy.

public FolderObject GetFolderTaxonomy(SPFolder folder) {
            FolderObject folderObject = new FolderObject();
            List subFolderList = new List();
            folderObject.folderName = folder.Name;
            if (folder.SubFolders.Count > 0) {
                foreach (SPFolder subFolder in folder.SubFolders) {
                    FolderObject subFolderObject = GetFolderTaxonomy(subFolder);
                    subFolderList.Add(subFolderObject);
                }
            }
            folderObject.subFolders = subFolderList;
            return folderObject;
        }

Here we first create a FolderObject for our folder. We then add the name of the folder as the folderName string property. Next, we check to see if the folder has any child folders. If they do, we create a FolderObject for the child and then send the child SPFolder object to the GetFolderTaxonomy method. So on it goes until the very bottom of the taxonomy is reached, where the SubFolders.Count equals zero. Then the folder name is added to its FolderObject, and that object, with a null List property is added to its parent's List. So on and so forth up the taxonomy chain until all folders have been added.
They don't call recursive methods "brute force" methods for nothing...

Now we have a List that contains the root folders of the list, and contained within their own List are their child folders. We can move on to the next part, adding a local Managed Metadata term set, and creating a term for each folder.

Doing anything in the Manged Metadata Service requires us to reach out and get a series of instances. The Managed Metadata Service is a hierarchical service, not unlike how folders work. There are many differences, but, for simplicity's sake, think of the Managed Metadata Service like a folder structure. You need to get in to the top folder before you can get in to the child folders. The Managed Metadata Service is set up like this:
Term Store
Taxonomy Group
Term Store
Terms

When you are wiring up your code, you have to make sure you have an instance of the parent before you can adjust the child.

First, you need an instance of The TaxonomySession that is associated with the SPSite. Think of this like creating the Client session of a WCF service... Because that is what you are doing...
Then, you reach out to the TermStore that contains or will contain your Taxonomy Group. The TermStore is going to be associated with the Managed Metadata Application Service. Unless you have a very large deployment of the Manged Metadata Service, you will only have one term store, so it is reasonably easy to get ahold of.
Next, you want to either get a hold of, or create, the Taxonomy Group that will hold your Term Set. In this case, our Taxonomy Group is going to be special. We want a group that is associated with our Site Collection and our Site Collection only. Typically, we would call the CreateGroup method on the TermStore object to create a new Taxonomy Group, but we don't want a global Group, we want a local group.

Back in the early days of SharePoint 2010, creating a local term set programmatically was a big deal. The method to create it was internal, and could only be accessed via reflection or playing around with some of the Publishing Features to make it happen. Now, what we do is call the GetSiteCollectionGroup method of the TermStore object. You can't have more than one local Taxonomy Group. So, this method will check to see if there is a Site Collection Taxonomy Group. If there is, it passes on that instance, if there isn't it creates it.

Now that we have our Taxonomy group, we can create our term store. Since I want a Term Store that is unique to my folder structure in my list, I am going to create a new one with the name of my list. After that we simply create the terms in the same taxonomy as our folders.

Again, each Term object has its own Terms collection property, just like the SPFolders had their own SubFolders collection property. We use the same recursive method technique to pull our folder names out of our List and in to Taxonomy Terms

As always when using the Manged Metadata objects, you need to reference the Microsoft.SharePoint.Taxonomy.dll and add a using statement for Microsoft.SharePoint.Taxonomy.

public TermSet PopulateTermSet(List folderNames, SPList list, SPSite site) {
            TaxonomySession taxSession = new TaxonomySession(site);
            TermStore store = null;
            TermSet listTermSet = null;
            if (taxSession.TermStores.Count > 0) {
                store = taxSession.TermStores[0];
                Group localGroup = store.GetSiteCollectionGroup(site);
                try {
                    listTermSet = localGroup.TermSets[list.Title];
                } catch { }
                if (listTermSet == null) {
                    listTermSet = localGroup.CreateTermSet(list.Title);
                } else {
                    listTermSet = localGroup.CreateTermSet(string.Format("{0}-{1}", list.ParentWeb.Title, list.Title));
                }
                foreach (FolderObject folderName in folderNames) {
                    Term term = null;
                    try {
                        string normalizedName = TermSet.NormalizeName(folderName.folderName);
                        term = listTermSet.Terms[normalizedName];
                    } catch { }
                    string termName = string.Empty;
                    if (term == null) {
                        termName= folderName.folderName;
                    } else {
                        term = null;
                        termName = string.Format("{0}-{1}",list.Title, folderName.folderName);
                    }
                    term = listTermSet.CreateTerm(termName, 1033);
                    if (folderName.subFolders.Count > 0) {
                        PutSubTermSet(folderName.subFolders, term);
                    }
                }
                store.CommitAll();
            }
            return listTermSet;
}

public void PutSubTermSet(List folderObjects, Term rootTerm) {
            foreach (FolderObject folderObject in folderObjects) {
                Term term = null;
                try {
                    string normalizeName = TermSet.NormalizeName(folderObject.folderName);
                    term = rootTerm.Terms[normalizeName];
                } catch { }
                if (term == null) {
                    term = rootTerm.CreateTerm(folderObject.folderName, 1033);
                }
                if (folderObject.subFolders.Count > 0) {
                    PutSubTermSet(folderObject.subFolders, term);
                }
            }
}

In the code you see a lot of try/catch blocks with empty catch blocks. This is because we need to check to see if there is a Term that is already named the same as another term on the same level. If there is, we want to change the name of the term to be more descriptive as to its location. The wrinkle in this code that should be noted is the use of TermSet.NormalizeName. When a Term is created, any special characters and whatnot are stripped out of the name. So when looking for Terms of the same name, you need to make sure that you are searching with the proper string. The TermSet.NormalizeName does that for you.

That's all there is to creating a local Managed Metadata Term Set for a List and transposing the folder taxonomy in to the Term Set. From here it is just a matter of adding the proper term to the proper file.

For this you need to first add a Managed Metadata Column to the list, and associate it with the Term Set you created for that list. This is very easy and I have covered it in other blog posts.

public bool AddTermSetToList(SPList list, TermSet termSet) {
            try {
                TaxonomyField locationField = list.Fields.CreateNewField("TaxonomyFieldTypeMulti", "File Location") as TaxonomyField;
                locationField.AllowMultipleValues = true;
                locationField.SspId = termSet.TermStore.Id;
                locationField.TermSetId = termSet.Id;
                list.Fields.Add(locationField);
                list.Update();
                return true;
            } catch {
                return false;
            }
        }

Then, you must iterate through each SPListItem in the list, adding the proper term to the newly created File Location TaxonomyField. This is relatively easy, find the SPFolder property for the SPListItem (SPListItem.Folder), then obtain the SPFolder's Parent Folder Name property (SPListItem.Folder.Name).

public bool ApplyTermsToItems(SPList targetList, TermSet listTermSet) {
            foreach (SPListItem item in targetList.Items) {
                string folderName = item.File.ParentFolder.Name;
                if (!string.IsNullOrEmpty(folderName) && targetList.Title != folderName) {
                    TaxonomyField locationField = (TaxonomyField)targetList.Fields["File Location"];
                    Term term = null;
                    try {
                        term = listTermSet.Terms[folderName];
                    } catch { }
                    if (term == null) {
                        foreach (Term innerTerms in listTermSet.Terms) {
                            term = GetTerm(folderName, innerTerms);
                            if (term != null) {
                                break;
                            }
                        }
                    }
                    if (term != null) {
                        string termString = string.Concat(term.GetDefaultLabel(1033), TaxonomyField.TaxonomyGuidLabelDelimiter, term.Id);
                        TaxonomyFieldValueCollection taxonomyCollection = new TaxonomyFieldValueCollection(locationField);
                        taxonomyCollection.PopulateFromLabelGuidPairs(string.Join(TaxonomyField.TaxonomyMultipleTermDelimiter.ToString(), new[] { termString }));
                        item["File Location"] = taxonomyCollection;
                        item.Update();
                    } else {
                        return false;
                    }
                }
            }
            return true;
        }

Because we are, again, dealing with a hierarchical structure, to find the proper terms, you must iterate through the terms:

public Term GetTerm(string termName, Term rootTerm) {
            Term theTerm = null;
            if (rootTerm.Terms.Count > 0) {
                foreach (Term innerTerm in rootTerm.Terms) {
                    if (innerTerm.Name != termName) {
                        if (innerTerm.Terms.Count > 0) {
                            theTerm = GetTerm(termName, innerTerm);
                        } else {
                            continue;
                        }
                    } else {
                        theTerm = innerTerm;
                        break;
                    }
                }
            }
            return theTerm;
        }

From here, we move our files to the root folder of the list.

public bool MoveAllFiles(SPList targetList) {
            foreach (SPListItem item in targetList.Items) {
                if (item.File.ParentFolder.Name != targetList.RootFolder.Name) {
                    string sourceurl = item.File.ParentFolder.Url;
                    string targeturl = targetList.RootFolder.Url;
                    string fileName = string.Format("{0}/{1}", targetList.RootFolder.Url, item.File.Name);
                    item.File.MoveTo(fileName, true);
                }
            }
            return true;
        }

Finally, we delete all of the folders.

public bool DeleteAllFolders(SPList targetList) {
            SPListItemCollection folderCollection = GetAllFoldersToCollection(targetList);
            for (int i = folderCollection.Count - 1; i >= 0; i--) {
                folderCollection.Delete(i);
            }
            return true;
        }

Now you have your same list, but without folders. I also adjusted the list's properties to disable folder creation, and added Metadata Navigation. Now my users can navigate through their folder structure, but not have to deal with the problems of having file names too long.