Dynamic Text Sorting
A few days ago I solved a problem that I've been working on for more than five years: How to create a multi-cell array formula that returns a sorted list of text entries. It's easy enough to do this with numeric entries, but the text solution has eluded me.

Range A2:A21 is named data. Select C2:C21 and type (or paste) the formula below. Then press Ctrl+Shift+Enter (rather than Enter).
=INDEX(data,MATCH(ROW(INDIRECT("1:"&ROWS(data)))-1,COUNTIF(data,"<" &data),0))
Text entries made in the data range appear in the C2:C21 range in alphabetical order.
Caveat: It doesn't work for numeric entries, or duplicated text entries.
If you'd like to get rid of the #N/A display, wrap it up in an IF function:
=IF(ISNA(INDEX(data,MATCH(ROW(INDIRECT("1:"&ROWS(data)))-1,COUNTIF(data,"<" &data),0))),"",INDEX(data,MATCH(ROW(INDIRECT("1:"&ROWS(data)))-1,COUNTIF(data,"<" &data),0)))
Here's how to do it using the new IFERROR function (and this may be the best reason yet to upgrade to Excel 2007):
=IFERROR(INDEX(data,MATCH(ROW(INDIRECT("1:"&ROWS(data)))-1,COUNTIF(data,"<" &data),0)),"")
Why did it take me so long to figure this out? I was fixated on an entirely different approach — one that I'm convinced would never work. I hadn't thought about it for about a year, but then all of a sudden it came to me.
I can't think of any good reason to actually use this, but I feel pretty good about figuring it out.
Toad:
Too bad your array formula can’t correct the spelling of “javelina”.
Seriously, this is a great achievement, and one of which you can be justly proud. As soon as you can get it to work with numeric entries and duplicated text entries, that is. Have it on my desk in the morning. Thanks.
11 October 2006, 8:38 pmJohn Walkenbach:
Every time I write the word “javelina,” I know that there’s a 50-50 chance that I spelled it correctly.
But enough of that. The champagne is flowing.
11 October 2006, 9:22 pmPaul B:
John, have a look at the file here from Biff it looks like it will do numeric entries also
Found it on the excel newsgroups, but can’t find it again
See this sample file:
http://cjoint.com/?kmd6wEsiac
11 October 2006, 9:43 pmPaul B:
Found the link on Google
11 October 2006, 9:48 pmhttp://heh.pl/&355
gruff999:
Ouch. It`s hard to look at someones prepacked solution to something you`ve worked on for so long. But there aren`t really any shortcuts. Mr Google and his cut-and-paste confections can`t replace the experience gained through 5 years of determined effort. Kudos to John (as though he needs any more).
12 October 2006, 2:25 amAndre:
Maybe you can’t think of any use for it, but I can!!
In my work I WILL use this quite often — Thank you very much.
12 October 2006, 3:36 amAlderaic:
actually I might have some uses for this, it is a perfect way to feed a sorted combobox or validation list (I dont think you can autosort validation lists)
so it might be more usefull than you think (in my book, everything to prevent using a vba macro for small details is better, because it means that the end user will probably not disable anything and mess with the end result)
12 October 2006, 6:09 amAlderaic:
after a quick test, it works indeed like a charm to sort validation lists, I have some worksheets were user always have to add vendors and stuff like that and it will help a lot.
12 October 2006, 7:14 am(actually MS should make a feature of auto-sorting validation lists but it is another point)
chip:
Good stuff. I had not read about the IFERROR in 2007. And right when I finally made myself a button to add that automatically.
12 October 2006, 7:32 amBrett:
Very nice!
Only tenously related - I wrote this to avoid wrapping IF(ISERROR( )) around everything (I won’t see Excel 2007 in my workplace for 10 years). However, I’m probably not recognizing a fault with this. What’s bad about this UDF, please?
Function ErrTrap(Returned, Default)
‘Function to trap returned errors and substitute Default
If WorksheetFunction.IsError(Returned) Then
ErrTrap = Default
Else
ErrTrap = Returned
End If
End Function
Brett
12 October 2006, 7:50 amJohn Walkenbach:
That solution from cjoint.com is pretty good. For my text trick, I will modify it so it uses only one array formula. Check back in 2011.
12 October 2006, 8:00 amJohn Walkenbach:
Brett, the only potential problem with that function is if the user disables macros. And, it would probably increase recalc time significantly if you used it a lot. VBA functions are MUCH slower than built-in functions.
12 October 2006, 9:47 amLori:
Nice formula - guess it could be useful for the source of a self sorting chart perhaps?
12 October 2006, 9:52 amAlso replacing ‘INDEX(’ by ‘HLOOKUP(”žž”,’ should return an array usable in a named formula (assuming no duplicates).
Zach:
Brett,
I would add that UDFs in general are just too glitchy to bother if you need to share the workbook. That's just my personal opinion based on too many bad experiences, and I will use a UDF when there is no other way, but usually there is. My approach to the problem is a toolbar macro to put an IF-ISERROR wrapper on for me. IFERROR sounds neat, though I'd much rather just have a workbook level option to just suppress errors in the same way you can suppress zeros. Most of the time the error is not an issue, it's just a matter of not wanting the ugly thing on your final printout.
Dim rngCell As Range
Dim intState As Integer
On Error Resume Next
Application.ScreenUpdating = False
intState = Application.Calculation
Application.Calculation = xlCalculationManual
For Each rngCell In Selection
If Left(rngCell.Formula, 11) "=IF(ISERROR" Then
rngCell.Formula = "=if(iserror(" & Mid(rngCell.Formula, 2, 10000) _
& "),0," & Mid(rngCell.Formula, 2, 10000) & ")"
End If
Next
Application.Calculation = intState
Application.ScreenUpdating = True
End Sub
Brett:
John,
Thanks for the feedback!
Zach,
Thanks for the example!! I just needed to put back in the angle brackets for the inequality and deal with the ampersands and it works like a charm.
Brett
12 October 2006, 3:47 pmTushar Mehta:
I doubt John expected this discussion to drift so far from his original post... {grin}
I've never understood this fascination that people have with (a) hiding errors and (b) cramming what at times seems like all the world's knowledge into a single cell.
Addressing the the first issue. Errors are *good.* Stop hiding them. They help people discover problems with the software model. And, with their business (or scientific) model. They also help people discover inconsistencies in their operating environment and their data. Essentially, blindly hiding errors is wrong. Flat out wrong.
For example, if a particular division leads to a "divide by zero" error, it should be an invitation to find out why that happens, not an excuse to mask the error and pretend everything is OK. Yes, after investigating the error, it may be perfectly legitimate to work around the error but that should be the end of the process not an excuse to flood a worksheet with error hiding formulas.
The bottom line is I cannot think of a single good designer -- of code, of other software model, or of business or scientific or engineering process or in any other discipline -- who hides errors.
And, that brings me to the next point. What's with this frenzy to cram all the world's knowledge into a single cell? I dislike this kind of design because it degrades readability and maintainability of a worksheet. A secondary effect is that the performance of that Excel model deteriorates. And, that is particular ironic since most people who use these kinds of put-everything-into-a-single-formula are also among those who get excited with formulas that use 3.14159 fewer CPU cycles and 2.718 fewer keystrokes.
Look at a simple formula =IF(ISERROR(xxx), "", xxx). At the very least, maintenance suffers since a change to xxx requires the developer to remember to change both instances of xxx. Also, from a performance perspective Excel has to evaluate xxx twice. It would be so much more elegant to use 2 cells, the first with the =xxx formula, the 2nd with =if(ISERROR(xxx-cell), "", xxx-cell). Now, maintenance and performance both benefit.
Yes, there are a few legitimate reasons for complicated formulas. But, by and large, maybe someone can help figure out if
1) people are confusing soil conservation with cell conservation,
2) it's a way for the expert to put the lay person in their place, and/or
3) it's a macho thing about "Mine's bigger than yours."
Frivolity aside, the simpler software model, by being more transparent and providing more intermediate results, can provide significantly better business support.
For an example where sacrificing cell conservation benefits the software model see my posts in http://www.mrexcel.com/board2/viewtopic.php?t=216489
For an example of improving performance see my suggestions in http://mrexcel.com/board2/viewtopic.php?t=221217.
For an example of where an overly complex software model masks potential design and implementation bugs and also dramatically degrades business support see my suggestions and comments in http://www.mrexcel.com/board2/viewtopic.php?t=117743.
12 October 2006, 6:49 pmZach:
Yikes. The ampersand is easy enough to deal with, but the loss of the confused me, and it's my code. I had to go back to my original to figure out what I was trying to do. I hate to clutter up the site, but I want to give a workaround a shot.
if x "" y
if x ""y
if x []y
if x []y
if x> y
if x>y
this & that
this "&" that
this [&] that
this {&} that
this that
this that
Hopefully one of these will work.
12 October 2006, 6:55 pmSimon Murphy:
John
nice work on the sort formula, I'm sure I needed that recently
Zach in 2003 (and XP - maybe others?) you can set errors to be not displayed in printouts on the sheet tab in print preview.
for iferror (and other simple UDFs) I've gone retro and started (re)using XLMs as they are much faster than VBA, heres an iferror one:
=ARGUMENT("PossibleError",31)
=ARGUMENT("ErrorReplacement",31)
=RESULT(7)
=RETURN(IF(ISERROR(PossibleError),ErrorReplacement,PossibleError))
Professional Excel Development has an iferror xll which is the best solution if you can get people to install it.
Cheers
13 October 2006, 2:52 amSimon
jkpieterse:
Hi Zach,
I wrote a similar routine here:
http://www.jkp-ads.com/Articles/FormulaWrapper.htm
13 October 2006, 8:08 amKevin Vaughn:
Tushar Mehta, I followed those links. Good reading.
13 October 2006, 9:38 amJon Peltier:
Tushar -
"What's with this frenzy to cram all the world's knowledge into a single cell?"
I often wonder about this. I'm not dumb (not TOO dumb) and still my brain overheats when I try to decipher some of these one-cell formulas that people have cobbled together. One-cell is a good design for bacteria, but for us higher life forms, multi-cell is better.
I'm also struck by the refusal of people to use multiple ranges to allow different arrangements of the same data. The optimum arrangements for a pivot table data source, a chart, and a tabular report are all different (in fact, a tabular report for printed output is different than for on-screen display). Rather than trying to make one arrangement fit all uses poorly, start with a flat table as the primary database. Then use whatever dynamic or static formulas are needed to generate linked ranges for the other purposes. Hide the ranges you don't want the user to mess with; for example, the printed output need never be visible, as it's only required when printing the report. Sure, the file's a bit larger, but I can't possibly fill my 200 GB hard drive, and the time and effort required to force fit one range into multiple functions is excessive.
13 October 2006, 11:20 amAlderaic:
totally agree, anyway once you get to the point where you are unable to understand your OWN formula, you take a step back from the one cell formula.
Actually I am usually avoiding array formula if I can, though this exemple (not the error catching part, the sorting one) is really usefull to me and understandable too.
on the other part, when you need two dimensional lookup for exemple, you quite often end up with big formulas, and so far I dont know any way around it except for UDF, and I refuse to use UDF as I think it is the most unreliable thing in excel today.
13 October 2006, 11:47 amDoug Glancy:
Jon,
You've mentioned this type of multi-table layout here before and I was, and am, interested in seeing an example if you ever have time to put one up on your site.
13 October 2006, 12:03 pmJohn Walkenbach:
Jon & Tushar, creating incomprehensible one-cell solutions is all about the challenge of being able to do it, and pushing Excel to its limits. I agree that it's not always the best way to do things, but I'm more of an Excel hobbyist than someone who actually uses it for useful work.
And, I've found that using a huge incomprehensible "megaformula" (rather than a bunch of intermediate formulas) reduces calculation time and makes a significantly smaller file.
13 October 2006, 12:09 pmCharles Chickering:
Jon -
"but I can't possibly fill my 200 GB hard drive"
That's what I said about my 400MB hard drive back in the 90's... after I last installed XP Pro I checked disk usage b4 installing anything else it was around 2.5GB for a bare install.
Charles
P.S. I miss DOS....
13 October 2006, 2:02 pmHarald Staff:
Where else should all the world's knowledge be if not in a single spreadsheet cell? It is the perfect location for it. Engraved on an atom.
13 October 2006, 2:57 pmWord and Access MVPs might disagree of course. What do they know
Zach:
Simon, that's exactly what I want. I can live with the errors on screen, but not on final printouts. I can't believe that was there all along and I missed it. I guess that's what I get for being an XL2000 holdout for so long.
jk, does the use of a static variable make it perform faster? I rarely have performance issues with such a simple thing, but it'd be worth noting anyway. I was amused to see you use right(x,len(x)-1) to get the cell value starting from the second position. It seems more elegant than using mid with an arbitrary value, and so obvious too. I guess it kind of goes back to John's original point about being so stuck on one way of doing something that you somehow block that far simpler solution from popping up.
Tushar, I don't disagree about hiding errors, I just disagree on the definition of an error. Sometimes the #DIV/0 would be better replaced with a 100% or (100%) when doing a variance analysis. I know it's not mathematically correct, but I'm in finance, not physics. Another example would be a failed vlookup. I don't consider an unfound value to be an error and would appreciate some sort of default instead of that #N/A.
I do agree with you on the megaformula point. I wouldn't qualify megaformulas based on length, though. I think a very long sumproduct formula doing a 2 dimensional lookup can be highly readable and wouldn't call it a megaformula. I see megaformulas more as those that combine different steps of a process into one, and the only reason I see for doing so is to make your spreadsheet hard to audit and I'm sure there's a lot of not so good reasons people would want to do such a thing. In my experience dealing with auditors (big 4 firms, I might add) a nice spreadsheet is given far too much credibility.
13 October 2006, 7:43 pmJon Peltier:
John -
I know, those megaformulas help to sell books!
"using a huge incomprehensible "megaformula" (rather than a bunch of intermediate formulas) reduces calculation time and makes a significantly smaller file."
As I stated, I hardly care about file size. For most of my programs, the worksheets aren't too crowded, there's a mix of formulas and VBA, and more time is wasted waiting for the user to click a button than waiting for a formula to recalc. So I'm not too worried about speed either.
13 October 2006, 9:08 pmJon Peltier:
Charles -
"That's what I said about my 400MB hard drive back in the 90's..."
I suppose if I loaded up the PC with multimedia files, I'd be able to fill up 200 GB. I notice that 200 GB external drives are now down below $100, so that's not too critical these days.
13 October 2006, 9:14 pmBrett:
Tushar: to help you understand-
I'm modeling reimbursement changes for health plan fee schedules. I've got two or more lists which can vary in length and reimbursement value as management dictates. I want changes to flow to the summary, so I keep utilization liked to reimbursement with Vlookups. Often if fees are not on one schedule, but are on the other, that fee gets to play. If neither exists a manual fee has been calculated and that is used. I use the ISERROR to catch the NA from VLookup and lookup in another schedule. I'm sure there a many different ways to do this. I could create a comprehensive list of the reimbursement codes to be looked up, use helper columns to figure the fee to use, then use lookup to pull a text version of the result cell address, make the address values and use text-to-columns to make the text real cell references. This would do away with lookups altogether in the final model. I'm always ready to learn new techniques, though.
If I really want to suppress an error from printing, I'll use conditional formatting to change the font color, blending it to the cell color. No doubt errors should be traced to their causes in almost all circumstances.
Brett
14 October 2006, 7:53 amjkpieterse:
Zach: The statis is just there so that if you decide to use a different "template" for the formula, it will be remembered next time.
14 October 2006, 11:43 amjkpieterse:
It should read Static of course.
14 October 2006, 11:44 amjkpieterse:
John:
"using a huge incomprehensible "megaformula" (rather than a bunch of intermediate formulas) reduces calculation time "
That strongly depends on the type of calc. I once speeded up a workbook by a factor 4 (!) by splitting up single-cell array formulas into about 5 columns of intermediate calculations. FIlezize of course increased, but it was great to have calc time down from about 8 seconds to 2 (which is still slow).
14 October 2006, 11:48 amBrad Yundt:
Yet another formula to alphabetize a list. Duplicates are repeated, while blanks and empty strings are ignored. When the list is exhausted, empty strings are returned instead of #N/A. This one is not volatile, and was originally posted here: http://www.experts-exchange.com/Applications/Q_21731705.html
=IF(COUNTIF(List,"?*")
15 October 2006, 11:11 pmColin Delane:
Whilst I agree with Tushar (Reply #16 October 12th, 2006 at 6:49 pm) that “blindly hiding errors is wrong”, the solution is not to allow errors to occur and then display them as suggested, but to build mechanisms into your model that will handle foreseeable errors and thereby prevent them (eg. By using Data Validation, or VBA).
16 November 2006, 10:14 pmIn addition, often a formulae may return an error value when the underlying data is not incorrect – just the method of dealing with that data. (eg. Percentage calculations often return "#DIV/0!" errors because of zero value denominators. It is not incorrect to have these zeroes, but it is bad modelling to allow the error value to show – because the error then flows through to any subsequent formulae that are dependent upon that cell. Accordingly, use IF statements to deal with the error so that a zero is returned rather than "#DIV/0!".
I believe that errors in models are never good – it means that the model builder hasn’t done enough work to make the model sufficiently robust. Imagine the airline industry if error handling wasn’t part of the design process – we’d have planes falling out of the sky on a daily basis because there were no duplicate systems, or automatic actions like the deployment of oxygen masks, etc.!!
vsoler:
I have a solution that works numbers, texts, duplicate numbers, duplicate texts and blanks.
10 February 2007, 1:29 pmHowever, I am unable to pack it into a single megaformula. I need some extra columns for intermediate calculations. Can it be of any interest?
vsoler:
Suppose that Range A1 to A10 contains
m
2
3
z
b
(blank)
b
99
2
(blank)
As you can see, there are numbers, texts, blanks, duplicate numbers and duplicate texts.
Then key in, in B1, the following formula:
=SUMPRODUCT(--(A1>$A$1:$A$10))+SUMPRODUCT(--($A$1:A1=A1))-COUNT.BLANK($A$1:$A$10)
and copy it down to B10.
Eventually, in C1 key in:
10 February 2007, 5:59 pm=IF(ROW(A1)-ROW($A$1)+1
vsoler:
For any unknown reason, the previous post did not appear complete. Here is what was left:
Eventually, in C1 key in:
10 February 2007, 6:02 pm=IF(ROW(A1)-ROW($A$1)+1
Timothy:
John's original post DOES work for numeric values! I needed to do what John Walkenbach does, but I needed it for numeric values. My data were in cells F14:F25. I entered the following, and it worked perfectly. ...but it still does not work for duplicated numeric entries. Note I used ">" rather than "" &F14:F25),0))
3 January 2008, 5:33 pmTom:
Thank you!
I came across your post by chance and it solved a problem that has been bugging me for ages (lots of time anyway....). I have made a list of content of DVD where each DVD contained more than one item. I wanted to keep the DVD's in numerical order and added somewhere a dropdown box which allows me to look up the content and get the DVD number. but it wasnt alphabetized (which sucked). I think now it should work perfectly.
Now all I need to do is understand what it does
6 June 2008, 1:50 am