Unique Fruit
I need help. I have this:

I can tell my how many unique fruits are in column A with this array formula
E4: =SUM(1/COUNTIF(A2:A13,A2:A13))
And I can tell how many rows have both Apple in column A and 1 in column B with this array formula
E5: =SUM((A2:A13=”Apple”)*(B2:B13=1))
But I can’t seem to figure out the formula to tell me the count of unique fruits that have a 1 in column B. The answer is two, apple and peach. What’s the formula that gets me there?
Jeff:
=SUMPRODUCT((A2:A13=”Apple”)*(B2:B13=1)+(A2:A13=”Peach”)*(B2:B13=1))
11 July 2007, 4:48 amIoannis Varlamis:
try (Ctrl+Shift+Enter):
11 July 2007, 4:58 am=SUM(1*(MATCH(IF(B2:B13=1,A2:A13,”"),IF(B2:B13=1,A2:A13,”"),0)=ROW(INDIRECT(”1:”&ROWS(A2:A13)))))-1
Jamie Collins:
Stolen from Chip’s site:
=SUM(IF(FREQUENCY(IF(LEN(IF(B2:B13=1,A2:A13,”"))>0,MATCH(IF(B2:B13=1,A2:A13,”"),IF(B2:B13=1,A2:A13,”"),0),”"),
IF(LEN(IF(B2:B13=1,A2:A13,”"))>0,MATCH(IF(B2:B13=1,A2:A13,”"),IF(B2:B13=1,A2:A13,”"),0),”"))>0,1))
Jamie.
–
11 July 2007, 4:59 amavner:
If you enter in c2 the formula =A2&B2 and drag it to c13
11 July 2007, 5:22 amthen the answer can be :
{=SUM(1*(RIGHT(C2:C13)=”1″)*(1/COUNTIF(C2:C13,C2:C13)))}
Bob Phillips:
You really should post such queries on the public NGs Dick, you would have been swamped with answers in minutes.
11 July 2007, 5:29 amJamie Collins:
11 July 2007, 7:08 amSELECT COUNT(*) AS unique_fruit_count
FROM (
SELECT F1 AS fruit_name
FROM [EXCEL 8.0;HDR=NO;IMEX=1;DATABASE=C:\Tempo\DB.xls;].[Sheet4$A2:B13]
WHERE F2 = 1
GROUP BY F1
) AS DT1;
Michael:
Hi Dick -
If you sort on B2:B13 ascending or your data is entered such that Apple 1 will be above Apple 2 and Peach 1 above Peach 2 etc, then
=SUMPRODUCT(–(B2:B13=1),–(A2:A13A1:A12))
works.
Non-array alternate for E5: =SUMPRODUCT(–(A2:A13=”Apple”),–(B2:B13=1))
…Michael
ps. A learning experience: When I copied the original E5 formula and pasted it into my spreadsheet it didn’t work. Took me a while to find that the problem was that from the web, it comes over with curly quotes.
11 July 2007, 7:46 amfzz:
Why even post in newsgroups? Why not just search them? This has been asked AND ANSWERED many times. But I suppose bloggers can choose to reinvent wheels whenever they want. So WTH,
=COUNT(1/FREQUENCY(MATCH(Fruits,Fruits,0)*(Numbers=1),
ROW(Fruits)-MIN(ROW(Fruits))))-1
More generally,
=COUNT(1/FREQUENCY(MATCH(IDs,IDs,0)*CriteriaArray,
11 July 2007, 10:44 amROW(IDs)-MIN(ROW(IDs))))-1
fzz:
BTW, I liked the SQL answer, especially the Jet-centric nature of it, requiring a nested query rather than the more direct
SELECT Count(Distinct Fruit)
FROM TableHoweverSpecified
WHERE Number=1;
that’s possible using real SQL RDBMSs.
11 July 2007, 10:50 amJan:
Hey Jamie I thought you didn’t trust the Excel SQL. (External Data - Mixed Data Types 2004 3rd June) but I admire the originality. But surely it should be.
SELECT F1
11 July 2007, 4:42 pmFROM Table1
GROUP BY F1, F2=1
HAVING (F2=1)=True
Ola Sandström:
Without restrictions, this should be the easiest:
11 July 2007, 8:01 pm=SUM(1/COUNTIF(C2:C13;C2:C13))-1
Column C: =IF(B2=1;A2)
//Ola
byundt:
This is a harder calculation to make foolproof than it appears.
fzz’ COUNT and FREQUENCY formula is nice because it does not require array entry. However, it underpredicts by 1 if every entry in Fruits is unique and every value in Numbers is 1. And it will return -1 if there are one or more blanks in Fruits.
Ioannis Varlamis’ formula underpredicts by 1 when every entry in Fruits is unique and every value in Numbers is 1. Otherwise, it works.
The formula on Chip’s site (as cited by Jamie Collins) handles the above cases, but underpredicts by 1 when one of the Fruits is blank but its Numbers is still 1. This somewhat shorter array formula behaves similarly. It assumes row 14 is either blank, has a duplicate Fruits or Numbers does not equal 1:
=SUM((N(MATCH(IF(B2:B14=1, A2:A14,”"),IF(B2:B14=1, A2:A14,”"),0)=ROW(A2:A14)-ROW(A$2)+1)))-1
Brad
11 July 2007, 9:07 pmJamie Collins:
Jan: “Hey Jamie I thought you didn’t trust the Excel SQL”
Very observant, but my true feelings on this (i.e. put the data in a SQL DBMS) is probably OT for an Excel blog .
Note I used
GROUP BYmerely to get a distinct set e.g. could also have done this:SELECT DISTINCT F1 AS fruit_name
FROM Table1
WHERE F2 = 1;
A
HAVINGclause is normally reserved for predicates involving set functions; yourF2 = 1(no need for= TRUE) can and IMO should go in theWHEREclause e.g. consider this contrived example:SELECT F1 AS fruit_name, F2 AS foo
FROM Table1
WHERE F2 = 1
GROUP BY F1, F2
HAVING COUNT(*) > 1;
The COUNT(*) cannot be evaluated in the
WHEREclause, hence the need for aHAVINGclause.I recently read a paper by Hugh Darwen, in which he ponders whether
HAVINGwould be in the SQL language at all if early SQL DBMSs had supported derived tables e.g. contrast this to the above:SELECT DT1.fruit_name, DT1.foo
FROM (
SELECT F1 AS fruit_name, F2 AS foo, COUNT(*) AS bar_tally
FROM Table1
GROUP BY F1, F2
) AS DT1
WHERE DT1.bar_tally > 1;
Jamie.
–
12 July 2007, 2:31 amJan:
Please ignore my previous post I must have been drunk.
12 July 2007, 3:24 amWhen I use it properly “Having” is much more elegant than the sub select.
Jamie Collins:
Jan, That’s one of my least favourite Per Gessle songs (”It must have been lunch but I’m sober now…”)
Sure,
HAVINGis elegant and a ‘nice to have’ feature but non-essential and one that confused the heck out of me as a newbie.Jamie.
–
12 July 2007, 4:42 amIan Ashforth:
Brilliant, just what I was looking for and needed this very day, thanks to all, saved me loads of time.
12 July 2007, 5:16 amIoannis Varlamis:
Byundt, thanks for remark.
Perhaps, better:
=SUM(1*((MATCH(A2:A13,A2:A13,0)=ROW(INDIRECT(”1:”&ROWS(A2:A13))))+(B2:B13=1)=2))
(Ctrl+Shift+Enter)
Ioannis
12 July 2007, 9:32 amElias:
Give a ty to this.
=SUMPRODUCT(–(B2:B13=1),–(MATCH(A2:A13&”#”&B2:B13,A2:A13&”#”&B2:B13,0)=ROW(A2:A13)-ROW(A2)+1))
Saludos
12 July 2007, 8:26 pmOla:
Just shortend, Elias formula:
13 July 2007, 6:41 pm=SUM((B2:B13=1)*(MATCH(A2:A13;A2:A13;0)=ROW(A2:A13)-ROW(A2)+1))
As a bonus, it makes it easy to add a 3rd, 4th, … column with restrictions.
//Ola Sandström
Elias:
Hola Ola,
Your formula isn’t working; to use this kind of formula you should concatenate the two columns to find a unique entry.
Saludos
15 July 2007, 8:41 amOla:
Elias, it must have been too early in the morning. Thanks for spotting that.
15 July 2007, 5:11 pm//Ola
Ola:
It could be interesting to know that Google now supports Array formulas.
http://google-d-s.blogspot.com/2007/07/array-formulas-without-ctrl-shift-enter.html
Example:
=ARRAYFORMULA(SUMPRODUCT((B3:B14=1)*(MATCH(A3:A14&”#”&B3:B14;A3:A14&”#”&B3:B14;0)=ROW(A3:A14)-ROW(A3)+1)))
=ARRAYFORMULA(SUM((N(MATCH(IF(B3:B14=1,0;A3:A14;”");IF(B3:B14=1,0;A3:A14;”");0,0)=ROW(A3:A14)-ROW(A$3)+1,0)))-1,0)
Both works when imported: http://spreadsheets.google.com/pub?key=pAxhcqJ1DpABcRxoPqKNFYw
However Zoho seams to have problems
http://web.omnidrive.com/APIServer/public/8qr1RsuHYalOM7wsHSHS4J8w/UniqueFruits2.xls
//Ola
15 July 2007, 7:30 pmJan:
Dick I know it isn’t Excel but I think we need another look at Google Spreadsheets vs Excel article (things have come along in the last year since you last looked at it and to be fair the last article was a bit sparse). We all know this is going to have a long term effect on the Microsoft strangle hold and I would hate to be caught out.
The other day I bought a computer and I dreaded loading all my stuff back on, but it turns out it was a breaze. Nearly everything I needed was already on the web, except MS Office. Then I remembered that other than for work, I didn’t even need that ! How long till all my stuff is on-line at Google?
16 July 2007, 5:03 amOla:
I must say that the ARRAYFORMULA is a smart move by Google (maybe inspired by TED.com).
It’s the sort of thing the computer press would write about - it’s a good story.
If your thinking of it, I have a fresh zip-file with documents from Zoho, OpenOffice and Google that might help:
http://web.omnidrive.com/APIServer/public/8qr1RsuHYalOM7wsHSHS4J8w/Spreadsheet_Functions.zip
//Ola
16 July 2007, 8:16 amPieter Brueghel the Elder:
The best formula is the one you are able to use. Ugliness can be interesting insofar it is functional, but in general, simple is beautiful and desirable in the face of a problem.
From the formulae that solve Dick’s problem as it is, and whithout blanks, the shortest one is 51 characters long, the longest (from Chip pearson’s website) has 207 characters. The one formula I will remember is avner’s; the fact that he concatenates the two colums, to me, just makes his solution more elegant:
=SUM(1*(RIGHT(c2:c13)=”1″)*1/COUNTIF(c2:c13,c2:c13))
The non-array form is longer (58 chars) but just as simple:
=SUMPRODUCT((RIGHT(c2:c13)=”1″)*1,1/COUNTIF(c2:c13,c2:c13))
16 July 2007, 1:31 pmfzz:
FTHOI, more robust.
=COUNT(1/FREQUENCY(MATCH(Fruits&”",Fruits&”",0)
16 July 2007, 3:59 pm*(Fruits”")*(Numbers=1),ROW(Fruits)-MIN(ROW(Fruits))))
-(SUMPRODUCT(1/COUNTIF(Fruits,Fruits&”"),
(Fruits”")*(Numbers=1))
fzz:
@#$% edit box!
FTHOI, more robust.
=COUNT(1/FREQUENCY(MATCH(Fruits&”",Fruits&”",0)
*(Fruits<>”")*(Numbers=1),ROW(Fruits)-MIN(ROW(Fruits))))
-(SUMPRODUCT(1/COUNTIF(Fruits,Fruits&”"),
(Fruits<>”")*(Numbers=1))<ROWS(Fruits))
Ola: does ThinkFree’s spreadsheet handle array formulas?
16 July 2007, 4:02 pmOla:
fzz: Not as good as Google. I took the spreadsheet that I used before and imported it to ThinkFree-Calc. Half of the array formulas worked. Ctrl+Shift+Enter works, but I had to use that to make SUMPRODUCT work so somehow, Googles solution feels more robust. What speaks for ThinkFree is that it has the best ‘Excel feeling’ to it and the graphics that was imported in the spreadsheet came along just perfect.
I made a quick update of the formula comparison sheet (ThinkFree is now included): http://web.omnidrive.com/APIServer/public/8qr1RsuHYalOM7wsHSHS4J8w/Spreadsheet_Functions2.zip
This made me think, maybe there should be an acid-test spreadsheet here. It could be a good challenge for all developers. My second thought was, what happens when everyone come up to standard? Will we have a group of developers who will want to go further? Could we get that long sought integration of the Rocks and the Water (Juice Analytics)?
16 July 2007, 7:18 pm//Ola
Anthony:
Refined Jamie’s formula; no additional columns and works without reference to rows.
=SUM(IF(FREQUENCY(IF(B2:B13=1,MATCH(A2:A13,A2:A13,0),”"),IF(B2:B13=1,MATCH(A2:A13,A2:A13,0),”"))>0,1))
(Ctrl+Shift+Enter)
17 July 2007, 1:49 amOla:
EditGrid is also included in the file. When if comes to number of formulas, no one beats EditGrid (500+).
If you like to try editgrid.com I made an open account - free for us (everyone). Lgn: ExcelUser Paswrd: ExcelUser (misspelled to avoid searchers). There is one file for now, UniqueFruits: http://www.editgrid.com/user/exceluser/UniqueFruits2 but you can test and upload any file, as long as it’s smaller than 2Mb.
17 July 2007, 7:11 am//Ola
Jonathan:
Ola,
I have been meaning to get around to a thorough side by side comparison for some time now. Thank you so much for doing this.
I will check in with our developers at ThinkFree to see about the Array formula problem you mentioned.
Are you planning on doing a similar comparison with charts and graphs or determining compatibility with Excel?
We definitely agree with you - EditGrid is a very cool spreadsheet application. ThinkFree Calc comes in two modes - Quick Edit (AJAX) and Power Edit (Java). We partnered with TnC (the makers of EditGrid) to have EditGrid replace our current version of Quick Edit Calc. Hopefully the integration will be done in the next month or so.
Thanks,
18 July 2007, 11:34 amJonathan
zb:
I’ve slightly refined Anthony’s refined version of Jamie’s formula
=SUM(SIGN(FREQUENCY(IF(B2:B13=1,MATCH(A2:A13,A2:A13,0),”"),IF(B2:B13=1,MATCH(A2:A13,A2:A13,0),”"))))
(Ctrl+Shift+Enter)
I’m using it to count multiple columns on a large sheet, so processing speed/efficiency is an issue, and the SIGN(…) function is slightly more efficient than the IF(…,>0,1) function.
19 July 2007, 6:37 amOla:
Jonathan,
Thanks for the appreciation. EditGrid and Thinkfree sounds like an interesting cooperation. I look forward to try the end result.
The “function comparison file” is now updated. It will never be 100% correct but it’s probably the best function comparison to date. The biggest problem was to handle the differences between the Manual and the User Interface (fx button). All suppliers have some updating to do.
Have I though about to do Charts and Graphs comparison? No, I could consider a small acid test for spreadsheet functions but not for charts and graphs. Someone else might feel inclined though?
I though I mention it also, ThinkFree often freezes my computer (had to reboot 3 times). So any improvement there would be appreciated.
zb,
I have taken the liberty to put your solution in an EditGrid test file (link below).
Dick,
I am sorry if you disapprove of this the conversation since it’s not related to your original post.
//Ola
Function comparison file: http://web.omnidrive.com/APIServer/public/8qr1RsuHYalOM7wsHSHS4J8w/Spreadsheet_Functions2.zip
20 July 2007, 11:36 amTest file: http://www.editgrid.com/user/exceluser/UniqueFruits2
To edit the test file, see login/psw in the previous post
fzz:
*IF* online spreadsheets made it relatively simple to process results of web queries via FORMULAS, provided some form controls for use inside worksheets, and provided hidden worksheets and protection so that only a spreadsheet’s creator ID could unprotect it, then I could see online spreadsheets as a fairly complete replacement for spreadsheet models that don’t need VBA/macros.
Naively, I’d assume it wouldn’t be too difficult for the online spreadsheets to provide these security features (hiding rows, columns, sheets and creator-only protect/unprotect). Actually, that alone would make it superior to Excel in terms of IP protection.
As for the latter, maybe a function named wget (after the Unix-originated utility of that same name) that would return whatever it’s url argument returned. In the short term, it’d be sufficient just to be able to enter the result as an array formula that could be accessed by other, parsing formulas.
Just thinking out loud.
20 July 2007, 12:05 pmMichael:
Hello All -
Is it possible to determine the number of unique items in a filtered list? I wrote a UDF that returns TRUE() if the row is visible, but I can’t figure out the array to multiply it by, and I didn’t get anywhere using SUBTOTAL().
For instance, in Dick’s above example, if we filter on “Apple” I would like to know that of the three 1’s and two 2’s in Column B, there are just 2 unique values in Column B.
Thanks,
14 August 2007, 7:09 amMichael
Dorothy:
Michael,
Will you please share an example of the UDF you wrote that returns TRUE() if the row is visible?
Thanks, Dorothy
17 October 2007, 8:44 amMichael:
Hi Dorothy -
Couldn't be much simpler. Glad to share.
Application.Volatile
IsVisible = Not Cell.EntireRow.Hidden
End Function
...Michael
19 October 2007, 10:46 am