String manipulation with worksheet formulas - Part 2
Deutsche Version / German version
Now continuing the first part of string manipulations with formulas the following will describe how to extract the nth element of a larger string. This technique is for example useful in the following cases:
- Extract first, middle and last name
- Get the last or first word of a sentence (words separated by spaces)
As an assumption for the following I’d assume that words in a larger string are separated by spaces. e.g. "This is a longer text of words separated by spaces"
So now let’s start extracting the individual elements of this string
1. Getting the first element
Getting the first word is rather simple. Just combine the functions LEFT and FIND (and add some error checking):
=IF(LEN(A1)=0,"",IF(ISNUMBER(FIND(" ",A1)),LEFT(A1,FIND(" ",A1)-1),A1))
This formula searches for the first occurence of a space character and returns all characters to the right of this space. If no space character is found the complete string is returned. For our example this formula would return "This".
2. Getting the last element
Now it’s getting a little bit more complicated as we don’t know in advance how many elements our string contains and Excel does not provide a FIND function which searches from the right (as you have in VBA with the method InStrRev). But is is still possible to achieve with formulas alone. We start with getting the position of the last delimiter (the last space character):
- Using SUBSTITUTE and FIND:
=FIND("^^",SUBSTITUTE(" "&A1," ","^^",LEN(" "&A1)-LEN(SUBSTITUTE(" "&A1," ",""))))-1 - Using LOOKUP:
=LOOKUP(2,1/(MID(" "&A1,seq,1)=" "),seq)-1
where seq is again a defined name with the formula: =ROW(INDIRECT("1:1024"))
Personally I prefer the second alternative as it is shorter (and for me easier to remember). But both formulas should return the same result. For our example theywill both return: 44
Now the only thing you have to do use this formula part within a MID formula (and don’t forget to add 1 to the position to skip the space):
- Using SUBSTITUTE and FIND:
=MID(A1,FIND("^^",SUBSTITUTE(" "&A1," ","^^",LEN(" "&A1)-LEN(SUBSTITUTE(" "&A1," ",""))))-1+1,1024) - Using LOOKUP:
=MID(A1,LOOKUP(2,1/(MID(" "&A1,seq,1)=" "),seq)-1+1,1024)
Both formulas would return the string: "spaces"
Getting the nth element
Adapting the approach of SUBSTITUTE and FIND for getting the last element you could also get the position for the nth and n+1 element within a string. Using this you get a kind of ‘monster’ formula:
=IF(OR(LEN(A1)=0,ISERROR(FIND("^^",SUBSTITUTE(" "&A1," ","^^",$B$1)))),"", MID(A1,FIND("^^",SUBSTITUTE(" "&A1," ","^^",$B$1)),IF(ISERROR(FIND("^^",SUBSTITUTE(
A1," ","^^",$B$1))),1024,FIND("^^",SUBSTITUTE(A1," ","^^",$B$1)))-FIND("^^",SUBSTITUTE(
" "&A1," ","^^",$B$1))))
where B1 stores the desired element number n.
And as stated in the first part of this article this could of course be achieved much easier using VBA but maybe it wouldn’t be that much fun …
Frank
Pages: 1 2
Andrew:
Hello Mr. Kabel,
Both formulas for extracting the last element seem to include the last space also. I think that removing the “-1″ or using TRIM will prevent this. (Apologies if I am mistaken of course :-))
This seems to work also.
=RIGHT(A1,LEN(A1)-FIND(”^^”,SUBSTITUTE(A1,” “,”^^”,LEN(A1)-LEN(SUBSTITUTE(A1,” “,”")))))
Anyway, I really appreciate your tips.
15 December 2004, 6:08 amFrank Kabel:
Hi Andrew
thanks for your comments and you’re right. Messed this up during the translation. I’ll correct the posting accordingly.
Frank
15 December 2004, 6:22 amJon Edmonds:
For the 1st bit, would this be better?
=IF(LEN(A1),LEFT(A1,FIND(” “,A1&” “)-1),”")
Assuming TRIM is not needed.
16 December 2004, 8:01 amFrank Kabel:
Hi Jon
think your idea/solution is better (and at least shorter)
Frank
16 December 2004, 8:32 amPatrick Rabau:
=LEFT(A1,FIND(” “,A1&” “)-1)
is even shorter and seems to work also when A1 is empty.
17 December 2004, 9:47 pmMalcolm Morris:
Hi,
I know this is off topic, but this page in your blog is the nearest to a discussion of my (excel) problem that I can find.
In Excel, worksheet functions, I need to extracting a substring from a longer string which I can handle, except for the case when the substring does not exist. In that case Ecxel returns #VALUE!. eg find ‘d’ in ‘abcefg’ fails
Is there any way to avoid/override that and put in a ‘not found’ or similar?
Any help would be appreciated.
Malcolm
15 March 2005, 11:53 pmDianne Butterworth:
Malcolm,
Try something like
=IF(ISERROR(FIND(”d”,”abcefg”)),”Not Found”,FIND(”d”,”abcefg”))
Dianne
16 March 2005, 8:32 amross:
use 2 nested if’s?
16 March 2005, 4:58 pmGlen:
Hi,
I’m having a similar problem trying to extract the last part of a string from a users profile path that has been imported into excel from AD.
The path reads \\server\profiles\firstname.lastname
I need to extract the firstname.lastname. Can anyone help?
Thanks,
Glen.
16 May 2005, 4:51 amDianne Butterworth:
Glen,
You could use a user-defined function like this (taking advantage of the InStrRev VBA function):
Function FullName(strToSearch As String) As String
If InStrRev(strToSearch, “\”) > 0 Then
FullName = Mid(strToSearch, _
InStrRev(strToSearch, “\”) + 1)
End If
End Function
Which raises a question for me that others might be able to answer.
When writing a function that I’ll be using on a worksheet as =MyFunction(A1), I normally would write the UDF to take a range argument:
Function MyFunction (rngMyArg as Range) as String
then convert rngMyArg.Value into a string to be handled.
Is there any advantage/disadvantage/difference in writing the UDF:
Function MyFunction (strMyArg as String) as String
as I’ve done in the above FullName function?
I tested it (in a limited way) and it seems to return the same string values.
Best practice?
16 May 2005, 9:27 amfloor:
I have a large database (10 000 items)with address information. I would like to seperate a substring (country and institution) withinin a large string that does not have a uniform format. For example:
Department of Obstetrics and Gynaecology, Sahlgrenska University Hospital, Gothenburg, Sweden. elisabeth.jangsten@vgregion
or:
MRC Social and Public Health Sciences Unit, University of Glasgow, Glasgow, UK
or:
Halmstad University, Halmstad, Sweden. karen.odberg-pettersson@phs.ki.se
Could you think of a formula to:
1. create a seperate variable/string for the country (Sweden, UK)(maybe after first seperating the email adres, “@” ?)
2. make a seperate variable/string for the insstitutions “Sahlgrenska University Hospital”, “University of Glasgow”, “Halmstad University”. So whenever somewhere the string “university” is can excel seperate the whole string that is around “University” delimited by “, ”
I would be very gratefull with your help!
Thank you very much!!
6 June 2008, 7:31 amMichael:
floor -
Here's a quick UDF that returns the country. It may show you the way for the University:
Dim stuff() As String
stuff = Split(Phrase, ", ", -1, vbTextCompare)
Country = stuff(UBound(stuff))
If InStr(1, Country, "@", vbTextCompare) Then
Country = Left(Country, InStr(1, Country, ".", vbTextCompare) - 1)
End If
End Function
...mrt
6 June 2008, 2:34 pmHans Schraven:
You stated:
[italic]=IF(LEN(A1)=0,"",IF(ISNUMBER(FIND(" ",A1)),LEFT(A1,FIND(" ",A1)-1),A1))
This formula searches for the first occurence of a space character and returns all characters to the [Bold]RIGHT[[/bold] of this space.[/italic]
I suppose you meant to the LEFT.
second remark:
instead of using formulas I'd prefer a UDF, using the split function in VBA, assuming words are being separated by spaces.
This creates an array with all the separate words.
firstword=sq(0)
secondword=sq(1)
lastword=sq(ubound(sq))
floor:
Michael!
Thank you for your help! I really have no experience with this and do not quite understand the formula BUT have been able to make a seperate column with the country in it using your UDF! It is not working for all cells because there is no uniform format I guess, still it is a great help! So thank you.
If you could help me with a UDF that seperates the University I would be very gratefull again. Sorry I really have no experience at all with this, so if you could help me I would appreciate it very much!
What I need is a UDF that whenever in a text string the word university appeares it gives the whole string around university delimited by commas. Ideally in addition if the word istitute appears it should also give me the string around institute delimited by comas.
For istance:
Department of Pathology and Microbiology, The Aga Khan University Hospital, Karachi, Pakistan. bushra.moiz@aku.edu
should return: The Aga Khan University Hospital
Department of Biological Sciences and Bioengineering, Indian Institute of Technology, Kanpur 208016, India
in this case it should return:
Indian Institute of Technology
Section of Genetics, Department of Zoology, Aligarh Muslim University, Uttar Pradesh, India
should again return: Aligarh Muslim University
Thanks,
Floor
9 June 2008, 4:03 amHans Schraven:
A more universal UDF
9 June 2008, 4:39 am[VB]Function nthword(tekst As String, n As Integer, separator As String)
sq = Split(tekst, IIf(separator = "", " ", separator))
nthword = sq(UBound(sq))
If n
Hans Schraven:
sq = Split(tekst, IIf(separator = "", " ", separator))
nthword = sq(UBound(sq))
If n &#060 UBound(sq) + 1 Then nthword = sq(n - 1)
End Function
for instance:
tekst = "Department of Pathology and Microbiology, The Aga Khan University Hospital, Karachi, Pakistan"
n = 2
separator = ", "
nthword= "The Aga Khan University Hospital"
NB. the separator can be any character or group of characters
9 June 2008, 4:45 amfloor:
Hans-
Thank you for your answer!
I copy pasted the UDF as you have written into VBA excel, and then wrote =nthword(A1). This does not work so I am probably doing something wrong? Sorry I am a real beginner in this.
9 June 2008, 7:40 amHans Schraven:
I'd prefer (it's necessary) for your example
9 June 2008, 2:27 pm=nthword(A1;2;", ")
floor:
Hans thank you very much, but.. the formula is still not working.. What am I doing wrong? Can anyone help?
10 June 2008, 2:32 amHans Schraven:
read in my code
10 June 2008, 3:20 am< : that must be the sign for "greater-than"
unfortunately I don't know the method tot put those characters on this board. The link, linked to 'escape' doesn't inform me either.
floor:
I think you are referring to <
I have changed that to > and then put in the formula in VB.
In excel sheet I wrote
=nthword(A1;2;", ")
Still it is not recognizing the formula. Does it work at your side?
Thanks Hans, I really appreciate your helping me!
10 June 2008, 3:53 amHans Schraven:
Excel/Menu Bar/Insert/Function/Userdefined.
10 June 2008, 4:13 amProbably you put the UDF in a module in another workbook.(personal.xls or something like that)
Sorry, I made a mistake it should be:
If n
Hans Schraven:
I made the mistake that it should be the sign for 'less-than'.
10 June 2008, 4:15 amHow did you manage tot put the 'greater than' sign in your post ?
floor:
Hi Hans,
The formula is working now! It is not completely what though because it is always taking the second string delimited by commas, wheras I would like to have a function that searches for the word University and takes the string were the word university is in... (because for instance in my third example as you can see the University is in the third substring: Section of Genetics, Department of Zoology, Aligarh Muslim University, Uttar Pradesh, India) This is my problem, there is no universal display in the address infromation, it is not always on the same place/ in the same order...
For the > or
10 June 2008, 4:42 amfloor:
For the > or
10 June 2008, 4:43 amfloor:
look at the left lower corner of your key bord. there is a key for it on my keyboard next to shift.. hope I can help you too!
10 June 2008, 4:44 amHans Schraven:
if you only want the part with University in it:
wordfilter = join(filter(Split(tekst, IIf(separator = "", " ", separator)),filt),"")
End Function
=wordfilter(A1;"University";", ")
10 June 2008, 6:15 amMichael:
Hi Floor -
My UDF assumes that the country is found after the last comma-space combination, per your three examples, and if email address exists, it follows after a period-space combination after the country. As you saw, it won't work in other cases. A different version is below. Here's the UDF for the Institution. It should work as long as they are all universities. If not, Country2 should show you the way.
Dim stuff() As String
Dim i As Integer
stuff = Split(Phrase, ", ", -1, vbTextCompare)
For i = LBound(stuff) To UBound(stuff)
If InStr(1, stuff(i), "University", vbTextCompare) Then
Institute = stuff(i)
Exit For
Else
Institute = vbNullString
End If
Next i
End Function
Here's a new UDF for the Country that will require you to add all possible countries per this example, but should catch the format anomalies:
Const NumCountries = 4 'Change to suit
Dim stuff() As String
Dim i As Integer
Dim j As Integer
Dim Countries(NumCountries) As String
Countries(0) = "UK"
Countries(1) = "Sweden"
Countries(2) = "France"
Countries(3) = "Germany"
Countries(4) = "Norway" 'Stay in synch with NumCountries, zero-based indexing
stuff = Split(Phrase, ", ", -1, vbTextCompare)
For j = 0 To NumCountries
For i = LBound(stuff) To UBound(stuff)
If InStr(1, stuff(i), Countries(j), vbTextCompare) Then
Country2 = stuff(i)
Exit For
Else
Country2 = vbNullString
End If
Next i
If Country2 vbNullString Then Exit For
Next j
If InStr(1, Country2, "@", vbTextCompare) Then
Country2 = Left(Country2, InStr(1, Country2, ".", vbTextCompare) - 1)
End If
If Country2 = vbNullString Then Country2 = "Not Recognized"
End Function
Good luck
10 June 2008, 6:17 amMichael
Michael:
The line after "Next i" should be
If Country2 "not equal" vbNullString etc.
10 June 2008, 6:19 amfloor:
Thank you both Hans and Michael!
I am so glad you are willing to help me!
Michael, if I change If Country2 "not equal" vbNullString etc. I get a syntax error... If you have a suggestion, please let me know again.. thanks so much!
Floor
10 June 2008, 9:30 amMichael:
Hi Floor -
I'm sorry, by "not equal" I meant the "less than" sign followed by the "greater than" sign in VBA. I was too lazy to look up the escape characters, and hoped you'd catch my meaning. Here's an attempt at what I meant.
If Country2 < > vbNullString Then Exit For
...Michael
10 June 2008, 1:43 pmfloor:
Michael, yes the formula works now..
I get a lot of not recognized. I tried to add more coutries by just typing for instance countries (5) = "India" but I guess I need to change more underneath that as well, because that is not working.
Also for some reason the formula already picks up strings like scotland, Holland, Department of health, because of it's position?
Anyway, If you don't feel like helping anymore I understand, you have already put a lot of effort in this.
I can always go with your first country UDF as that works better for me now...
If you do feel like it, could you let me know what I need to add in addition to all the countries that I want to add?
Thanks, and thanks again for all your help so far, it really is great because I was about to copy past everything manually..
Floor
11 June 2008, 2:27 amfloor:
I have figured it out.. Added more countries.. so thanks.. no need anymore!

11 June 2008, 8:24 amHans Schraven:
Michael,
Your function 'institute' is a re-invention of the function 'filter' that is part of VBA.
11 June 2008, 8:36 ammrt:
Hans -
Thank you. I've rolled my own that way for so long, I never looked for a built in function that did the same thing.
Floor -
Now that you've got it ;-), here's an improved version that would have made your life easier. Sorry to be so long getting back to you.
Dim Stuff() As String
Dim Countries As String
Dim Country() As String
Dim i As Integer
Dim j As Integer
Countries = "UK, Sweden, France, Germany, Norway" 'Add Countries separated by comma-space
Country = Split(Countries, ", ", -1, vbTextCompare)
Stuff = Split(Phrase, ", ", -1, vbTextCompare)
For j = LBound(Country) To UBound(Country)
For i = LBound(Stuff) To UBound(Stuff)
If InStr(1, Stuff(i), Country(j), vbTextCompare) Then
Country3 = Stuff(i)
Exit For
Else
Country3 = vbNullString
End If
Next i
If Country3 #&060; #&062; vbNullString Then Exit For
Next j
If InStr(1, Country3, "@", vbTextCompare) Then
Country3 = Left(Country3, InStr(1, Country3, ".", vbTextCompare) - 1)
End If
If Country3 = vbNullString Then Country3 = "Not Recognized"
End Function
And here's a version of Institute() that handles Institutes
Dim Stuff() As String
Dim i As Integer
Stuff = Split(Phrase, ", ", -1, vbTextCompare)
For i = LBound(Stuff) To UBound(Stuff)
If InStr(1, Stuff(i), "University", vbTextCompare) Then
Institute = Stuff(i)
Exit For
ElseIf InStr(1, Stuff(i), "Institute", vbTextCompare) Then
Institute = Stuff(i)
Exit For
Else
Institute = vbNullString
End If
Next i
End Function
Glad we're there.
11 June 2008, 2:00 pm...Michael
Hans Schraven:
Michaal
to shrink and speed up your code:
If Instr(Phrase,"University")>0 Then
c0="University"
ElseIf Instr(Phrase,"Institute")>0 Then
c0="Institute"
End If
If c0 [#&060; #&062;] "" Then Institute=join(filter(split(phrase,", "),c0),"")
End Function
mrt:
Hi Hans -
Thank you. I can see that it does by taking a larger view. Got to think about that last part. I'm learning something here
For the first part, no need to check if InStr() is greater than zero. InStr() returns zero if not found, giving False to the If. Life would be easier if formula FIND() or SEARCH() did that instead of returning errors.
...Michael
...Michael
12 June 2008, 6:07 amHans Schraven:
Michael,
Your remarks led me to the one-liner below:
Institute=join(filter(split(phrase,", "),IIf(InStr(phrase, "University"), "University", IIf(InStr(phrase, "Institute"), "Institute", ""))),"")
End Function
mrt:
Hi Hans -
Ooo...I like that. Now we're approaching Obfuscated C
Well, maybe not.
I didn't get the need for Join() until I found that you can't get at something like Filter(0)
Thanks
12 June 2008, 1:01 pmMichael (aka mrt)