Formula Tips

Al sent me a formula to revise. Before:

=IF($B11<>0,IF(OR(ISNUMBER(FIND(“-“,J11))=TRUE,ISNUMBER(FIND(” “,J11))=TRUE,ISNUMBER(FIND(“_”,J11))=TRUE),
IF(ISNUMBER(FIND(“-“,J11))=TRUE,TRIM(SUBSTITUTE(J11,”-“,””)),IF(ISNUMBER(FIND(” “,J11))=TRUE,TRIM(SUBSTITUTE(J11,”
 “,””)),IF(ISNUMBER(FIND(“_”,J11))=TRUE,TRIM(SUBSTITUTE(J11,”_”,””)),TRIM(J11)))),TRIM(J11)),””)

After:

=IF($B1<>0,TRIM(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(J1,”,”,””),” “,””),”-“,””),”_”,””)))

What the formulas do and what they are supposed to do is not why I am sharing them with you. There are few techniques that can be demonstrated with these formulas that might help you become a better formula writer. This isn’t a comprehensive guide to writing formulas, just a few tips.

Formula Length

Making a formula shorter doesn’t necessarily make it better, but usually it does. I’d guess about 5% of the time, a longer formula is preferable to a shorter one. If the longer formula makes it easier to read and understand and doesn’t introduce volatility or undue calculation time, then it’s good. But verbosity for it’s own sake doesn’t appeal to me. If it’s shorter, that’s less stuff I have to keep in short term memory while I’m deciphering.

Long formulas are hard to read. I don’t care who you are or how long you’ve been writing Excel formulas, fifty character formula take time to understand (unless you wrote it). So if you get a workbook with some monster formulas, don’t feel bad that you can’t trace through it in no time.

Terminology

Formulas are anything in Excel that starts with an equal sign (=). Examples of formulas are =1+1, =A1+1, and =SUBTOTAL(9,A1:A100). Formulas, like the last one, can contain functions. Functions are the built-in Excel keywords like SUM, SUBTOTAL, IF, SUMIF, OFFSET, and many others. When you’re referring to those keywords, call them functions. When you’re referring to the whole schmear, call it a formula.

Nesting

Functions take arguments (sometimes called parameters) and spit out a return value. Arguments can be many things like numbers, strings, cell references, and even other functions. When a function is used as an argument in another function, it’s called nesting.

Not nested: SUBSTITUTE("Richard J. Kusleika", ".","") returns “Richard J Kusleika” and says ‘replace all the periods in the string with nothing (an empty string)’.

Nested: SUBSTITUTE(SUBSTITUTE("Richard J. Kusleika",".","")," ","") returns RichardJKusleika and says ‘replace all the periods in the string with nothing, then take that result and replace all the spaces with nothing’.

The result of the innermost SUBSTITUTE is the first argument to the outer SUBSTITUTE. Excel always solves formulas from the inside out.

Error Proof Functions

Some functions are error proof. SUBSTITUTE is one of them. =SUBSTITUTE("Some Text","blah","") will return “Some Text”. It replaces every instance of “blah” with nothing. It’s just that there aren’t any instances. But it doesn’t return an error, and you can use that fact to make your formulas leaner.

TRIM is another function that doesn’t error out. TRIM will remove any spaces from the start and end of a string. If there aren’t any spaces, it just returns the string. You can use TRIM with any string you want and not worry that an error will be returned.

Other functions like FIND do return errors. FIND returns the position of the string-to-find in the string-to-be-searched. If the string-to-find isn’t in there, an error is returned.

Knowing what functions return in error cases like this can help you write more succinct formulas.

Booleans

Booleans are TRUE and FALSE. If you’re testing whether something is TRUE, you don’t have to say IF(Something=TRUE,… because Excel knows how to evaluate whatever you use as the first IF argument. Instead of =IF(A1=TRUE, A2, B2) write =IF(A1, A2, B2)

Instead of =IF(ISERR(FIND("blah","Some Text"))=TRUE, "Not found", "Found") write =IF(ISERR(FIND("blah","Some Text")), "Not found", "Found")

When Excel expects a Boolean, it will do the conversion for you.

Patterns

Look for patterns in your formulas and try to call functions the fewest number of times. In the first formula at the top of this post, the TRIM function is called over and over. Whatever else this formula does, it could first be improved by moving the TRIM function to the outside:

=TRIM(IF($B11<>0,IF(OR(ISNUMBER(FIND(“-“,J11))=TRUE,ISNUMBER(FIND(” “,J11))=TRUE,ISNUMBER(FIND(“_”,J11))=TRUE),
IF(ISNUMBER(FIND(“-“,J11))=TRUE,SUBSTITUTE(J11,”-“,””),IF(ISNUMBER(FIND(” “,J11))=TRUE,
SUBSTITUTE(J11,” “,””),IF(ISNUMBER(FIND(“_”,J11))=TRUE,SUBSTITUTE(J11,”_”,””),J11))),J11),””))

Because TRIM is error proof, we can call it one time on the result of the rest of the formula and simplify. If you find you’re using the same function over and over, ask yourself if you can’t move it more to the outside and call it less.

Multi-cell Formulas

I’m a big fan of huge one-cell formulas, but I shouldn’t be (an neither should you). If you’re typing the same sequence over and over in a formula, move that sub-formula out to a new cell and refer to the cell in your main formula. The classic case of this is the VLOOKUP function. In certain cases, VLOOKUP returns an error when it can’t find the value. To catch this, you call it twice, like this:

=IF(ISNA(VLOOKUP(A1,Sheet2!A1:C100,3,FALSE)),”Not found”,VLOOKUP(A1,Sheet2!A1:C100,3,FALSE))

If you put the VLOOKUP in another cell, like B1, you could say

=IF(ISNA(B1),”Not Found”,B1)

Share your formula tips in the comments.

Posted in Uncategorized

36 thoughts on “Formula Tips

  1. I wish the SUBSTITUTE() formula was abbreviated to, say, SUB(). More often than not, I’ll need to nest them.

  2. In Excel 2007 and above, the IFERROR function can greatly simplify your error-catching.

    =IFERROR(VLOOKUP(A1,Sheet2!A1:C100,3,FALSE),”Not Found”) returns the same results as =IF(ISNA(VLOOKUP(A1,Sheet2!A1:C100,3,FALSE)),”Not found”,VLOOKUP(A1,Sheet2!A1:C100,3,FALSE)). It also executes the VLOOKUP only once, so it saves processing time.

    As for VLOOKUPs, I always label the range instead of using addresses. I find it to be MUCH more readable.

    My 2 cents…

  3. When I use VLOOKUPS, I replace FALSE with 0,

    Instead of this: VLOOKUP(A1,Sheet2!A1:C100,3,FALSE)

    I use this: VLOOKUP(A1,Sheet2!A1:C100,3,0)

    Hey don’t laugh. Those 4 extra characters really add up over the course of 15 years.

  4. Good write up Dick
    The only thing I would add would be the use of named ranges to simplify and make long formulas more readable

  5. You touched on the fact that FIND (and SEARCH also) will error out when the text being searched for is not found within the text being searched. Because of this, most people will set up an IF function to see whether FIND is returning a number (which is the test I prefer) or an error or not; however, depending on what is being done, you do not always have to use such a construction. Consider this example… there is a text string in A1 and you want to return all the text after the first dash. You do NOT have to do it this way…

    =IF(ISNUMBER(FIND(“-“,A1)),MID(A1,FIND(“-“,A1)+1,LEN(A1)),””)

    rather, you can do this much shorter formula instead…

    =MID(A1,FIND(“-“,A1&”-“)+1,LEN(A1))

    All I did is concatenate the text being searched for (a dash in this case, but it could be any text string of any length) onto the text that was being searched. If there is no dash in the string, the starting position for the MID function will end up being a value past the end of the text, so nothing will be returned. Okay you say, so how do I apply this for finding the text in front of the dash then? Consider this…

    =LEFT(A1,MOD(FIND(“-“,A1&”-“)-1,LEN(A1)))

    I use the same trick of concatenating the text being looked for inside the FIND function onto the text being search, but now I take the MOD (with respect to the length of the text being searched) of one less (to get in front of the dash) than the returned value from the FIND function… this forces the value to zero when the dash being sought turns out to be the one we concatenated onto the text. Now, of course, you cannot use this trick everywhere, but when the opportunity to use it does present itself, the result is usually a smaller more efficient (less function calls being executed) formula.

  6. Very good write up Dick… (and thanks to Mike and Rick for the extra tips..)

    While one cell formulas can be impressive, it is easy and simple to have helper cells. Also, I like the IFERROR, it reduces the formula lengths.. Thanks for this.

  7. The TRIM function does more than remove spaces from “the start and end of the string” as you stated. The equivalent VBA TRIM function, in fact does that, removes both leading and trailing spaces from a string.

    The worksheet function TRIM does the following: “Removes all spaces from text except for single spaces between words.”

    The worksheet TRIM function is much more powerful than the VBA TRIM function.

  8. Mike Alexander,

    Speaking of making formulas easier to read.

    David Wasserman label the ranges instead of using addresses so here is what your formula should look like:

    Vlookup(Bacon, Fridge,3,0)

  9. Mike,
    you might save another character by skipping zero as the 4th argument of VLOOKUP is optional
    VLOOKUP(A1,Sheet2!A1:C100,3,) works too

  10. Ah but Dick, your formula won’t work. Al was testing B11 and J11, you test B1 and J1. You can’t just apply it wherever you like!!

  11. “you might save another character by skipping zero as the 4th argument of VLOOKUP is optional
    VLOOKUP(A1,Sheet2!A1:C100,3,) works too”

    Won’t this return approximate results? A little off topic but has anyone found a good use for getting approximate results? I have always wondered why it was an option. “Please return a wild guess as to what I am looking for.”

    I just discovered the site yesterday. I spend all day in Excel and am finding amazing new ways to do things I used to do the long way. TRIM, SUBSTITUTE, ISERROR, ISNA… why was I not told about these before? Time to start redoing my formulas.

  12. @Dick

    Your formula does not work the same as Al’s formula does. First, a minor point, you are missing the “false” argument for your IF function… if B1 is zero, your formula reports FALSE where as Al’s formula reports the empty string. The fix for that is obvious and simple. Now, for the major difference in your formulas. Let me start by saying your formula is probably what Al actually wants, but it does not work the same as the formula Al gave you. Al’s formula will replace only **one** of the symbols (the first tested one that exists) in the text string while leaving any of the other characters alone… your function replaces all of the characters (which, as I said, is more than likely what Al really wants to do). So, as an example, if the text in the cell was this…

    “One Two_Three-Four”

    then Al’s formula returns this…

    “One TwoThree_Four”

    and your formula returns this…

    “OneTwoThreeFour”

  13. @David Wasserman
    Absolutely! I love Microsoft for their beautiful ISERROR()…who needs backward compatability?

  14. Using your techniques I hope to make my long unwieldy formula shorter. In the meantime, I break up them up with line breaks (alt-enter) to make it easier to read like so:

    =IF(ISNA(OFFSET(Setup!Table,
    MATCH(C49,OFFSET(Setup!Table,0,MATCH(Table!TestType,OFFSET(Setup!Table,0,0,1,COLUMNS(Setup!Table)),0)-1,ROWS(Setup!Table),1),0)-1,
    2,
    1,
    1)),””,OFFSET(Setup!Table,
    MATCH(C49,OFFSET(Setup!Table,0,MATCH(Table!TestType,OFFSET(Setup!Table,0,0,1,COLUMNS(Setup!Table)),0)-1,ROWS(Setup!Table),1),0)-1,
    2,
    1,
    1))

  15. Mark Kayser
    you’re right, 0 needs to be there
    I’ve tested that before my previous post but incorrect results haven’t occurred

  16. Ah, the final parameter of the VLOOKUP function…

    The TRUE vs FALSE or 1 vs 0, if you prefer, will do an inexact match (if the lookup range is in ascending order) or match exactly. The default is TRUE.

    One reason to use the inexact match is if you are trying to find a result that occurs between a range of values.

    For example, say you want to match a score to a grade, your lookup table might look like this:
    0 F
    60 D
    70 C
    80 B
    90 A
    100 A+

    If your score is greater than or equal to the lower range and less than the next higher range, you will get that grade.

    Note that if your score is less than the lowest value in the lookup range, the VLOOKUP formula will return a #N/A. If your lookup range is not in ascending order, the VLOOKUP will not function as expected. This is not the case if you use a FALSE in the final parameter. In that case, the lookup values can be in any order (just be careful to avoid duplicates).

  17. @Mark

    David’s answer is dead-on, of course, but I just wanted to mention the broader context of discretizing data for data mining.

    The data mining add-in for Excel 2007 (using SQL Sever 2005/2008 in the background) is a great tool. Frequently you need to create discrete instead of continuous variables, e.g. I don’t want to know your age, just Kid, Teen, Adult, Senior or Short, Medium, Tall instead of your height in inches.

    There are other tools for doing this, but the VLOOKUP formula can be a very quick way of binning continuous data for a preview before data modeling.

  18. I know you were using the VLOOKUP() merely as a “for-instance”, but it can be shored up a bit without having to repeat the function which does cause overhead even if it isn’t noticeable on today’s PC.
    The VLOOKUP() test can be changed to:

    =IF(COUNTIF(Sheet2!A1:A100,A1)=0,”Not found”,VLOOKUP(A1,Sheet2!A1:C100,3,FALSE))

    I believe the COUNTIF() is more lightweight than the VLOOKUP(). Comparison makes sense, if it can’t find the value, no sense in doing the VLOOKUP.

  19. @Rick –

    If A1 doesn’t contain any dashes, should your last formula return “” or the entire contents of A1? The problem is your unnecessary MOD call. LEFT is another of the error-free functions. LEFT(“abc”,5) returns “abc”, not an error, so it’s sufficient to use

    =LEFT(A1,FIND(“-“,A1&”-“)-1)

    @JimC –

    You may want to avoid unnecessary use of OFFSET and other volative functions. Excel has to recalc cell formulas calling volatile functions whenever anything triggers a recalc. Excel only needs to recalc other cell formulas when any range they refer to changes. In your sample formula, your outermost OFFSET calls could be rewritten as

    INDEX(Setup!Table,MATCH(C49,INDEX(Setup!Table,0,MATCH(Table!TestType,INDEX(Setup!Table,1,0),0)),0),3)

    There are 2 things which could cause this to return #N/A: Table!TestType not appearing in the in the first row of Setup!Table or C49 not appearing in the column of Setup!Table in which Table!TestType appears in the in the first row Setup!Table. When multiple function calls could each return errors, it often makes sense to isolate the error. In this case it’d make sense to use 3 cells to calculate this single result.

    AA1:
    =MATCH(Table!TestType,INDEX(Setup!Table,1,0),0)

    AA2:
    =MATCH(C49,INDEX(Setup!Table,0,AA1),0)

    AA3:
    =IF(COUNT(AA2),INDEX(Setup!Table,AA2,3),IF(COUNT(AA1),C49&” not in Table column “&AA1,
    “TestType “&Table!TestType&” not in top row of Table”))

    That said, long formulas can be less susceptible to #REF! errors since there are fewer or no intermediate cells which could be deleted or pasted over following cuts.

    Minirant: when will Excel come with a 3rd entry mode besides enter in cell or enter in formula bar, namely, enter or edit constants in cell but formulas in the formula bar?

  20. @fzz

    I assumed that if there was no dash, then there could not be any characters to the left of the (missing) dash, so I figured the empty string should be returned for that case; that is why I used the MOD function, to force the number of characters to return from the LEFT function to zero.

  21. A small formula improvement I picked up recently from Daniel of Excel Hero

    Every one knows a single match columns and mutiple index columns is faster than vlookup

    But a Single Match Column and 1 Array entered index per column is still faster.

    This is the fastest way to search and pick

    It beats VLOOKUP by a factor of 3. And yet most people continue using VLOOKUPS.

  22. @sam-

    Thanks for that!

    @Dick-

    I try to avoid IF() functions whenever I can. One method to do so with text output would be to change your formula from this:

    =IF($B10,TRIM(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(J1,”,”,””),” “,””),”-“,””),”_”,””)))

    to this:

    =REPT(TRIM(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(J1,”,”,””),” “,””),”-“,””),”_”,””)),$B10)

    Same number of function calls, but it eliminates branching and saves the IF() nest for when mandatory.

    I use this technique a lot.

    Here’s a link to my Excel 2007 Chess Game Viewer that uses absolutely no IF() functions at all (and no VBA):

    http://www.excelhero.com/blog/2010/02/excel-2007-chess-game-viewer.html

    Cheers,

    Daniel Ferry
    excelhero.com/blog

  23. @Rick –

    Interesting logical question whether there were neither characters before nor after the first dash in “abcdefg”. Maybe just me, but this intent would be more explicitly stated as

    =LEFT(A1,COUNTIF(A1,”*-*”)*(FIND(“-“,A1&”-“)-1))

    for such characters before.

  24. Excellent post!

    The long formula used here brought up an old wish I have:
    A formula editor with syntax highlighting, identing, intellisense and a debugger. Similar to the VBA IDE. This would make the live of a Excel maniac much easier.

    Dominik

  25. @David Wasserman
    Unfortunately IFERROR(… does not return the same result as IF(ISNA(…

    IFERROR is whitewash which will hide any errors in your formula’s results. ISNA is very specific.

    The important points for most lookup functions such as VLOOKUP (with 0 or FALSE parameter) and MATCH (with 0 for exact match) are that a #N/A error specifically means the data you were looking for is not in the lookup range. #REF or #NUM most likely means a malformed lookup (eg VLLOKUP with column range width).
    Others errors such as #DIV/0 or #NAME would be the result of bad data in your lookup range, in other words you rlookup returned a result from a cell which itself had an error in (of course this error could be a #N/A or #REF or whatever.

    ISNA allows you to much more precisely pin down errors which are the result of missing data in your list and either highlight these or suppress them as appropriate to your model. If I am asked to do some work on a big model that someone else has built I will sometimes do a find for =iferror(*,””) to see if there are formulas which are sweeping all errors under the carpet.

    If only we had IFNA(… – but Dick’s method of only using the VLOOKUP once and refering the ISNA to that cell removes the calculation overhead of doing it twice, and the maintenance overhead of updating your function twice (when necessary), so it is not a biggie.

    Good response on the use of the True / approximate match. Other examples I use to show True doing something useful are finding the period (eg year/quarter) that a date falls in and getting the discount band for an order depending on volume ordered (bot depending on sorted data, as with the exam grades).
    Using “True” is also the only situation where VLOOKUP(value, range, 1, True) makes sense – getting the value back that you just looked up? In these “approximate match” situations this would return the boundary value rather than the category it is in. Again, useful to find the date that the relevant period starts on, or the level at which a discount band starts (maybe the discount only applies to the quantity over that level, not the whole order).

    @Mark – you should also be aware that if your data is in sorted order and you use the “True” parameter to lookup something which is there (so no approximation needed) but is repeated, you get the last result found, whereas False returns the first result found. So if you had a product price list in which you never overwrite a price but add the new one with the date it is in effect from, True would give you the current price if the list is sorted in ascending order by product code then date price came into force.
    False would give you the original price unless you sort ascending by product code and desceding by date, but this is not very intuitive, most people expect data to have newest info at the bottom generally.

    @Sam – I agree, very often a single MATCH can be used for multiple INDEX functions (with or without arrays, depends on the user level) and is easier to maintain than many VLOOKUPS, far more efficient, and lends itself well to the ISNA function being directed only at the MATCH result, rather than for many separate lookup functions.

    @David Hager – explicit intersections, good. Explicit intersections using named ranges in place of cell refs, great. Implicit intersections on named ranges / whole columns, even greater (where appropriate). A formula that reads =Units * (SalePrice – Cost) is so easy to read, yet baffles so many users as to how this could possibly work on every row of a large list of data. (Arthur C Clarke and all that)

  26. My post had greater and less than signs in which got killed (fair enough). This bit should have read:
    …malformed lookup (eg VLOOKUP with column value LESS THAN 1 or GREATER THAN range width).

  27. About IFERROR:
    I discovered yesterday that when you use the IFERROR function, Excel2007 creates a named variable. It’s name is “_xlfn.IFERROR” and it is to be found in the ThisWorkbook.Names collection. Its visible property is false so it does not appear in the name manager. Moreover its RefersToRange creates an error (“=#NAME?”) so that VBA code used to manage names can run into unexpected errors…
    Does anyone know why this varibale is created by excel?

  28. @Sam-

    When you said:
    “Every one knows a single match columns and mutiple index columns is faster than vlookup
    But a Single Match Column and 1 Array entered index per column is still faster.”

    Could you elaborate on the “Single Match Column and 1 Array entered index per column”? Maybe give an example.

    Thanks

  29. >> Mike Alexander said: When I use VLOOKUPS, I replace FALSE with 0

    I prefer not to do that. Especially in long formulas, using the explicitly spelled out TRUE or FALSE in a VLOOKUP helps when auditing the formula. A “TRUE” or “FALSE” is much harder to overlook than a “1? or “0? (And I never EVER omit the last parameter, to make sure I WANT the TRUE and haven’t just forgotten to put in the desired value!).
    The few extra key strokes don’t matter, if they help with the maintainability of the formula. And with the “intelligent” parameter options in Excel 2007 and later, I just hit “tab” for TRUE or “down arrow” and “tab” for FALSE to fill the parameter, so for “False” it is just “down arrow – tab” – close banana. Done. One more keystroke than typing a zero. But lots of benefits when auditing and revisiting the formula after three months!!!


Posting code? Use <pre> tags for VBA and <code> tags for inline.

Leave a Reply

Your email address will not be published.