Multi Column Match

Here is a little trick for matching two values against a two column table.
It’s much like creating a hash code for each row, then matching the hash code.

With this spreadsheet, the top section allows lookup, the lower section is the lookup table.

Here we are searching for a Value by looking up a City and Season:

The array formula in cell C3:
=MATCH(A3 & CHAR(1) & B3, A7:A16 & CHAR(1) & B7:B16, 0)
Enter the formula by pressing Ctrl+Shift+Enter.

The formula in cell D3:
=INDEX(C7:C16, C3)

We simply concatenate A3 and B3 to match against concatenated A7:A16 and B7:B16.

The CHAR(1) is to ensure uniqueness.
For example: “userdata” & “base” would also match “user” & “database”
Fixed by: “userdata” & CHAR(1) & “base” does not match “user” & CHAR(1) & “database”

Update:
Daniel M points out a different, more efficient, match lookup using boolean comparison:
The array formula in cell C3:
=MATCH(1, (A3=A7:A16) * (B3=B7:B16), 0)
Enter the formula by pressing Ctrl+Shift+Enter.

D3 remains the same.

This works because…
A3=A7:A16 results in an array of TRUE, FALSE values
Arithmetic on a boolean results in a number. eg. TRUE * FALSE = 0.
Only TRUE * TRUE will result in the “1″ MATCH is looking for.

Thanks Daniel!

21 Comments

  1. Ola Sandström:

    Good idea. I will use that.

  2. Jon Peltier:

    You can replace C3 and D3 with a single array formula in C3:

    =INDEX(C7:C16, MATCH(A3 & CHAR(1) & B3, A7:A16 & CHAR(1) & B7:B16, 0))

    Enter the formula by pressing Ctrl+Shift+Enter.

    - Jon

  3. Daniel M.:

    Hi Rob,

    IMO, it’s better to use a boolean matrix (less space, particularly in the case of long names):

    As in the following ARRAY formula:

    =INDEX(C7:C16,MATCH(1,(A3=A7:A16)*(B3=B7:B16),0))

    For anyone confortable with summing with criterias (à la SOMMEPROD), it follows the same pattern.

    - Daniel M.

  4. excelthoughts:

    I have been looking for something like this for a while. Excellent.

    Andrew

  5. simon:

    Is this a lot faster than creating a that combines the City and Season and performing a VLOOKUP instead?

  6. Z:

    I would echo simon’s question. Any thoughts?

  7. Rob van Gelder:

    Simon,

    VLOOKUP only performs matching on one column. This is why the multicolumn MATCH is useful.

    Single column vs. multi column lookup cannot be directly compared.

    If I was a betting man, I’d say VLOOKUP would win.

  8. Rob van Gelder:

    Simon,

    Sorry - I misread your question.

    The overhead of so many formulas in your helper column would mean VLOOKUP is slower.

    Cheers,
    Rob

  9. jkpieterse:

    Rob:

    Array formulas aren’t the most efficient regarding calculation time.

    I created a three column range (A1:C65531) with RAND formulas.

    With this formula:

    {=SUM((A1:A65531>0.5)*(B1:B65531>0.5)*(C1:C65531>0.5))}

    The calc time on my system (workbook recalc) was 0.213 seconds.

    Next I removed the formula and filles column D with:

    =AND(A1>0.5,B1>0.5,C1>0.5)

    And replaced the array formula with a simple:

    =COUNTIF(D1:D65531,TRUE)

    Calc time now equals 0.156, which is about 27 percent faster.

    I have had a file that I was able to speed up by a factor three (!) just by removing the array formulas and replacing them with helper columns.
    I still am a great fan of array formulas though :-)

  10. Rob van Gelder:

    JK:
    Very helpful - thank you.

    Simon:
    Please ignore my previous two comments. The 1st comment I misread - the 2nd I totally botched my experiment for calculation timing. Sorry.
    I hate spreading misinformation like that - :(

    Cheers,
    Rob

  11. Jon Peltier:

    JK -

    “Array formulas aren’t the most efficient regarding calculation time.”

    Back when Excel 97 came out, I was building a resource deployment worksheet. List of personnel down column A, list of projects across row 1, percent of FTE in the grid. I had it all worked out, using a sheet with three columns as the data, and displaying the output in a pivot table (which I’d just encountered for the first time, and thought they were pretty neat).

    But nobody else could figure out the pivot table, or should I say, none of the bosses. And I wasn’t programming these things yet. So I was told to find another way.

    The answer was array formulas. And manual calculation, because so many dozens of array formulas can choke your system. Changing any cell in the grid required 45 minutes to update the row and column totals. The pivot tables were taking, what, some fractions of a second to update. I could have explained the pivot table in just one of the 45 minute delays in array formula calculation (or maybe two, because they were the bosses).

    - Jon

  12. jkpieterse:

    Jon:

    “or maybe two, because they were the bosses”

    :-))))

  13. Jazzer:

    Hi,

    How about using Excel’s excellent database functions?

    =DGET(A6:C16,”Value”,A2:B3)

    - Asser

  14. ChrisNoyes:

    Great Post, i used that to great effect on a quoting spreadsheet based on courier destination and state!

  15. Tony Colpus:

    The DGET function looks to be quicker than an array But…How do I extend down below the top row

    week crew time
    W1 C 14 =DGET($A$6:$C$15,”time”,$F$2:G3)
    W2 C
    W3 C

  16. jehangir:

    pls provide solution ffor multiple coloumn and multiple row solution eg
    As/Ac Ab/Ac 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
    0.5 0.25 3.44 0.7 0.3 0.2 0.17 0.16 0.16 0.17 0.18
    0.5 0.5 11 2.37 1.06 0.64 0.52 0.47 0.47 0.47 0.48
    0.5 1 60 13 4.78 2.06 0.96 0.47 0.31 0.27 0.26
    0.75 0.25 2.19 0.55 0.35 0.31 0.33 0.35 0.36 0.37 0.39
    0.75 0.5 13 2.5 0.89 0.47 0.34 0.31 0.32 0.36 0.43
    0.75 1 70 15 5.67 2.62 1.36 0.78 0.53 0.41 0.36
    1 0.25 3.44 0.78 0.42 0.33 0.3 0.31 0.4 0.42 0.46
    1 0.5 15.5 3 1.11 0.62 0.48 0.42 0.4 0.42 0.46
    1 1 67 13.75 5.11 2.31 1.28 0.81 0.59 0.47 0.46

    Branch Duct
    As =Area of duct before branch entry 300 Qs =Airflow in duct before branch enrty 3200
    Ac =Area of duct after branch entry 400 Qc =Airflow in duct after branch enrty 4000
    Ab =Area of brach duct 400 Qb =Airflow in branch duct 800
    As/Ac 0.75
    Ab/Ac 1.00 Qb/Qc 0.20

    Branch Run (Co) 1.36

    the answer should be 15.00

  17. Rudolph:

    Good Job! Keep it up.

  18. Jess:

    I would like some suggestions if possible. I have 2 columns of data from 2 sources. One is a catalog of cd’s and the other the a database file of what should be in the cd collection. I need a way to organize the data. I need if cd001 is in A2 and its matching counter part cd001 is also on line B724, I need them to find each other. However the data is not all alike cd538 is in column A but not in column B and cd539-2 may be in column B but not A. I need the non-matching entries to have there own line. This way I can compare the two sources of information. Thanks for any help.

  19. Eric:

    Wondeful help with 2 columns vlookup…It help me solve my most tricky problems.

  20. Jason B:

    I used this again today and the recalculation is a bit slow, but the result is great as always…

  21. MaN:

    hi Rob…it is possible to do this in java programming? how do is use this in java? can you help me?

Leave a comment