Login


Splitting a Name into First and Last Names

By Jonathan Wood on 1/28/2011 (Updated on 1/30/2011)
Language: C#
Technology: .NET
Platform: Windows
License: CPOL
Views: 25,455
General Programming » Text Handling » Parsing » Splitting a Name into First and Last Names

Introduction

I recently had to import customer data from an old database to a new one. One difference was that the old database stored the first and last names in a single column while the new database had separate columns.

At first glance, it seemed like an extremely trivial exercise to write a routine to split a name into the first and last names. However, examining the data more carefully, I could see there were a number of variations that made this slightly more complex.

For example, if you take a name like "Richard Smith", you can simply search for the space and split the name there. But what about names like "Richard R. Smith", "Richard R. Smith, Sr.", or "Richard R. Smith, III"?

To handle these cases, a little more work is required.

The SplitName Class

Listing 1 shows my SplitName class. This class has a static method Split(), which takes a name and returns the corresponding first and last names.

The Split() method calls the private FindWordStart() method to parse the last space-delimited token from the name. It then calls the private IsSuffix() method to determine if that token is a suffix such as "Jr.", "Sr", or "III". I've found that some users will append one of these suffixes to their name. The suffix does not represent the last name and so if IsSuffix() returns true, Split() calls FindWordStart() again to parse the second-to-last space-delimited token from the name.

The first and last names are then formed by splitting the name at the final position that was parsed. If the name only contained one name (no spaces), this approach assumes that single name is the last name. Note that you could also use String.Split() to separate the space-delimited tokens. However, String.Split() has a fair amount of unnecessary overhead associated with it and I would expect my code runs faster by avoiding it.

Listing 1: SplitName Class

/// <summary>
/// Class to split names into first and last names
/// </summary>
class SplitName
{
    private static List<string> Suffixes = null;

    static SplitName()
    {
        // Initialize suffixes
        Suffixes = new List<string>();
        Suffixes.Add("jr");
        Suffixes.Add("sr");
        Suffixes.Add("esq");
        Suffixes.Add("ii");
        Suffixes.Add("iii");
        Suffixes.Add("iv");
        Suffixes.Add("v");
        Suffixes.Add("2nd");
        Suffixes.Add("3rd");
        Suffixes.Add("4th");
        Suffixes.Add("5th");
    }

    /// <summary>
    /// Splits a name string into the first and last name
    /// </summary>
    /// <param name="name">Name to be split</param>
    /// <param name="firstName">Returns the first name</param>
    /// <param name="lastName">Returns the last name</param>
    public static void Split(string name, out string firstName, out string lastName)
    {
        // Parse last name
        int pos = FindWordStart(name, name.Length - 1);

        // If last token is suffix, include next token
        // as part of last name also
        if (IsSuffix(name.Substring(pos)))
            pos = FindWordStart(name, pos);

        // Set results
        firstName = name.Substring(0, pos).Trim();
        lastName = name.Substring(pos).Trim();
    }

    /// <summary>
    /// Finds the start of the word that comes before startIndex.
    /// </summary>
    /// <param name="s">String to examine</param>
    /// <param name="startIndex">Position to begin search</param>
    private static int FindWordStart(string s, int startIndex)
    {
        // Find end of previous word
        while (startIndex > 0 && Char.IsWhiteSpace(s[startIndex]))
            startIndex--;
        // Find start of previous word
        while (startIndex > 0 && !Char.IsWhiteSpace(s[startIndex]))
            startIndex--;
        // Return final position
        return startIndex;
    }

    /// <summary>
    /// Returns true if the given string appears to be a name suffix
    /// </summary>
    /// <param name="s">String to test</param>
    private static bool IsSuffix(string s)
    {
        // Strip any punctuation from string
        StringBuilder sb = new StringBuilder();
        foreach (char c in s)
        {
            if (Char.IsLetterOrDigit(c))
                sb.Append(c);
        }
        return Suffixes.Contains(sb.ToString(), StringComparer.OrdinalIgnoreCase);
    }
}

Conclusion

Having tested the routine above with a fairly long list of names, I can say it is quite reliable in the vast majority of cases. During my testing, I found a couple of entries that the code could not handle. But, to be honest, some confused me as well. If you have a situation where the customer is entering their own name, or even if an employee is typing them in, there's no telling what you may end up with. Sometimes a name will be entered such that there is simply no way to determine what parts constitute the first and last names.

Note that my code makes no attempt to remove prefixes such as "Mr.", "Mrs.", etc. that some people may include with their name. The main purpose of this routine is to make a smart and reasonably reliable guess at where to split a name when converting a single-column name to a two-column name. It doesn't attempt to remove any part of the original name.

You can easily modify the code above to check additional suffixes such as "President", "CEO", etc. But that seems a little excessive to me. Obviously, these are not part of someone's name and probably shouldn't be entered in the first place.

End-User License

Use of this article and any related source code or other files is governed by the terms and conditions of The Code Project Open License.

Author Information

Jonathan Wood

I'm a software/website developer working out of the greater Salt Lake City area in Utah. I've developed many websites including Black Belt Coder, Insider Articles, and others.