Remove Duplicates From Arrays

I have a number of arrays that can’t contain duplicates. Not just within themselves, but between them as well. It would be easy enough to dump them into a big array, sort it, and remove the duplicates, but I need to put the non-duplicate numbers back into the original array. Eventually, I need to select a limited number of random entries from each array and the number selected needs to be somewhat equally distributed among the arrays.

I don’t know of any way to remove a single element from an array, so I would have to load the non-duplicate elements into another array. My first attempt at this failed because I wasn’t able to

<span class="text">ReDim</span>

a dynamic array that was contained in another array. For instance,

Dim aMain(0 To 0) As Variant
Dim aOne() As Variant
 
aMain(0) = aOne
 
redim amain(0)(0 to 1)

The

<span class="text">ReDim</span>

is a syntax error. I ended up attempting to add each element of each array to a Collection to identify the duplicate elements. If the Add method of the Collection object produced an error, I knew the element was a duplicate and I changed the element to an empty string so I could identify it later. Because I would be pulling elements out of each array later and I wanted the selected elements to be roughly equally distributed among the arrays, I needed to remove the duplicates from the larger arrays. The first operation I had to perform was to sort the arrays by size. Luckily the order of the secondary arrays within the primary array doesn’t matter.

I believe that all the secondary arrays need to be Variants for this to work. Normal dynamic arrays like

<span class="text">Dim aMyArr() as Variant</span>

won’t work because of the way the temporary array is reassigned back to the original. The variables have to be declared like

<span class="text">Dim vaMyArr as Variant</span>

.

Sub StartArrays()
   
    Dim vaOne As Variant
    Dim vaTwo As Variant
    Dim vaThree As Variant
    Dim vaMain As Variant
   
    ‘Set up some secondary arrays
   vaOne = Array(1, 2, 3, 4, 5)
    vaTwo = Array(5, 6, 7)
    vaThree = Array(1, 5, 8, 9)
   
    ‘load the primary array with the secondary ones
   vaMain = Array(vaOne, vaTwo, vaThree)
   
    ‘pass the primary array to remove the dupes
   ‘and display the results
   RemoveDuplicates vaMain
    ShowResults vaMain
   
End Sub
 
Sub RemoveDuplicates(ByRef vaMain As Variant)
   
    Dim i As Long, j As Long, k As Long
    Dim vItm As Variant
    Dim cDupes As Collection
    Dim aTemp() As Variant
    Dim vaSort As Variant
   
    ‘check for array arguments
   If Not IsArray(vaMain) Then Exit Sub
   
    For i = LBound(vaMain) To UBound(vaMain)
        If Not IsArray(vaMain(i)) Then Exit Sub
    Next i
   
    ‘Sort arrays by number of elements so as to remove duplicates from
   ‘the most populous arrays
   For i = LBound(vaMain) To UBound(vaMain) – 1
        For j = i + 1 To UBound(vaMain)
            If UBound(vaMain(i)) > UBound(vaMain(j)) Then
                vaSort = vaMain(i)
                vaMain(i) = vaMain(j)
                vaMain(j) = vaSort
            End If
        Next j
    Next i
   
    ‘blank out array elements that are duplicates.  Later
   ‘I’ll test the length of the element to remove those
   ‘elements that were duplicates and are now empty strings
   Set cDupes = New Collection
   
    For i = LBound(vaMain) To UBound(vaMain)
        For j = LBound(vaMain(i)) To UBound(vaMain(i))
            On Error Resume Next
                cDupes.Add vaMain(i)(j), CStr(vaMain(i)(j))
               
                If Err.Number > 0 Then
                    vaMain(i)(j) = “”
                    Err.Clear
                End If
            On Error GoTo 0
        Next j
    Next i
   
    ‘put non-blank elements in a temp array and reassign
   ‘the temporary array back to the secondary array
   For i = LBound(vaMain) To UBound(vaMain)
   
        ‘reinitialize variables for each secondary array
       ReDim aTemp(0 To 0)
        k = 0
        For j = LBound(vaMain(i)) To UBound(vaMain(i))
       
            ‘skip elements that are empty strings
           If Len(vaMain(i)(j)) > 0 Then
           
                ‘assign non-empty strings to the temporary array and
               ‘increment the counter
               ReDim Preserve aTemp(0 To k)
                aTemp(k) = vaMain(i)(j)
                k = k + 1
            End If
        Next j
       
        ‘clear out the secondary array once the temp array is filled
       Erase vaMain(i)
       
        ‘assign the temporary array back to the recently-erased secondary array
       vaMain(i) = aTemp
    Next i
   
End Sub
 
Sub ShowResults(ByRef vaMain As Variant)
   
    Dim i As Long, j As Long
   
    For i = LBound(vaMain) To UBound(vaMain)
        For j = LBound(vaMain(i)) To UBound(vaMain(i))
            Debug.Print i, vaMain(i)(j)
        Next j
        Debug.Print “—————”
    Next i
   
End Sub

The actual secondary arrays I’m working with are two dimensional, so my next task is revise this code to work with two dimensional arrays.

4 Comments

  1. Tushar Mehta says:

    Interesting exercise. FWIW, I wrote up an alternative approach to the task and posted the result to he programming NG.

    I would have posted it here but the code always gets all messed up. And, yes, if there is a way to fix that I don’t know it. :)

    The post’s ID is MPG.1e399e4a4bdd1b3598b2fa@msnews.microsoft.com

    The direct google.com archive is http://groups.google.com/group/microsoft.public.excel.programming/msg/b923181ccbdff1e1

  2. ross says:

    test some code:

    “‘Checks if SO, PO, YO, CO need to be corrected
    “‘ Function is passed and will ruten the WHOLE outcode
    Function O_Excprtions(sOutcode As String)
    Dim sFirstLetter As String
    sFirstLetter = UCase(Left(sOutcode, 1))
    ‘ if one of the do-dars
    If sFirstLetter = “S” Or sFirstLetter = “P” Or _
    sFirstLetter = “Y” Or sFirstLetter = “C” Then
    ‘replace with O
    O_Excprtions = WorksheetFunction.Replace(sOutcode, 2, 1, “O”)
    Else
    O_Excprtions = sOutcode
    End If
    End Function

  3. How to make a cell constant during sum.

    For example:

    Date Files Done Daily Total Sum

    1/20/20065 30
    1/21/20065
    1/22/20065
    1/23/20065
    1/24/20065
    1/25/20065

    I review daily certain number of files.
    I want to make the total sum constant.
    that whenever i review more files daily it should be added automatically to Total Sum.

    And its also should not effect the Total Sum, if i a value under, Files Done Daily.

    Thanks,

  4. J.O. says:

    This code does NOT remove duplicates!

Leave a Reply