Copying HTML Tables over Merged Cells

I ran into some strangeness while copying tables from the web to Excel. To demonstrate, I’ll use Debra’s sample data table. I run this code

Sub GetTable()
   
    Dim ieApp As Object
    Dim ieDoc As Object
    Dim Clip As DataObject
   
    Set ieApp = CreateObject(“InternetExplorer.Application”)
   
    With ieApp
        .navigate “http://www.contextures.com/xlsampledata01.html”
        Do While .Busy: DoEvents: Loop
        Do While .readyState <> 4: DoEvents: Loop
       
        Set ieDoc = .document
        Sheet1.UsedRange.ClearContents
       
        Set Clip = New DataObject
        Clip.SetText “<html>” & ieDoc.all.tags(“TABLE”).Item(3).outerhtml & “</html>”
        Clip.PutInClipboard
        Sheet1.Range(“A1”).Select
        Sheet1.PasteSpecial “Unicode Text”
       
   
        .Quit
    End With
   
End Sub

That code automates Internet Explorer, navigates to the page, gets the table, puts it in the clipboard, and pastes it to a cell.

Nice. I delete the data and merge A1:C1

I run the code again and get this

Not so nice. I fixed it by adding this line to the code

Sheet1.UsedRange.UnMerge

That was easy enough. The problem was that I didn’t know it was merged cells causing the problem. I had some tables coming in fine and other were truncated at seemingly random spots. Once I discovered that some of the cells were merged, I deleted the rows and it ran fine. Then later, it would screw up again. Finally I figured out that the HTML table had some colspan attributes in some of the td tags. The colspan attribute, when pasted into Excel, creates merged cells. Now you know.

And if you’re wondering, the tables in question were in a website that required a log in, so I couldn’t use a web query. By automating IE, I could navigate to the login page, fill in the form, then navigate to the page with the tables I wanted. I’ll post about that as soon as I work out a few bugs.

Posted in Uncategorized

10 thoughts on “Copying HTML Tables over Merged Cells

  1. The worksheet.pastespecial method has the format parameter “Unicode Text”. What are all of the possible options for that argument? Help doesn’t provide an enum list.

  2. Maybe you could run this sub before you bring the table in:

    Sub Prepare()
    Dim rng As Range
    For Each rng In Sheet1.UsedRange
        If rng.MergeCells Then
            MsgBox “You have merged cells. Please slap yourself.”, vbExclamation
            rng.UnMerge
        End If
    Next rng

    End Sub

  3. Oh, wow. Everyday it seems I find something new and useful Excel can do.

    Is there a simple way to grab only a section of a table from a web page?

    For example, on ‘http://finance.yahoo.com/q/pm?s=NAESX+Performance’ I would only like to grab the ‘Past Quarterly Returns (%) for NAESX’ section of the table.

    By the way, I love your Disconnected Recordset post. Thanks for sharing your knowledge.

  4. you might consider:
    reference Microsoft XML version2.0

    Sub faster()
      With New XMLHTTP
        .Open “Get”, “http://www.contextures.com/xlsampledata01.html”, False
        .send
        Do While .readyState <> 4
          DoEvents
        Loop
           
        Set Clip = New DataObject
        Clip.SetText “<html><table “ & Split(Split(.responseText, “<table “)(4), “/table>”)(0) & “/table></html>”
        Clip.PutInClipboard
           
        [K1].PasteSpecial
      End With
    End Sub
  5. this is extrmely useful. This is what I was trying to wrie for the paste year. thank you very much

    if there are more than one table how to select the particular table.

    take this url
    http://in.finance.yahoo.com/q?s=CCCL.NS&ql=0

    there are many tables. when i use this url in your code in the .navigate statement as (the full address of the url)

    .navigate “http://in.finance.yahoo.com/q?s=CCCL.NS&ql=0?

    and run the macro. I get the table “comparison”. suppose I want to get only the table “key statistics
    how to modify the above statement.

  6. Those of you interested in extraction of particular items from web pages may be interested in my free, open-source add-in:

    http://finance.groups.yahoo.com/group/smf_addin (or just click on the link on my name)

    The add-in is basically a collection of user-defined functions to extract data out of the source code of web pages. It uses the XMLHTTP process to grab the data and parse the page based on parameters passed to the function.

    For example, to get the Market Capitalization of ticker symbol “MMM” from Yahoo’s Key Statistics page:

    =RCHGetTableCell(“http://finance.yahoo.com/q/ks?s=MMM”,1,”Market Cap (intraday)”)

    Documentation on the functions is in the “Files” area of the Yahoo group. And the “Links” area of the Yahoo group contain links to tips and FAQs.

  7. thank you for the explanation. I have used smf addin for yahoo finance and used it for Indian stocks. It works seemlessly. Is there any restriction in the number of symbols. You have said highlight b2:M100. does that mean the restriction of 100 symbols is there?

    my second comment is
    quote
    Those of you interested in extraction of particular items from web pages may be interested in my free, open-source add-in:

    http://finance.groups.yahoo.com/group/smf_addin (or just click on the link on my name)
    unquote

    I am interested in downloading tables from any webpage not only yahoo finance.
    can I use the above html replacing the webpage?

    thanks for having patience and trying to educate us.

  8. >>> [Does the add-in] any restriction in the number of symbols.

    It has a limit of 1000 web pages at a time, because that’s the size of the array I use to store web pages from which to extract data. I’d prefer people NOT use it to download a ton of data from sites — such as to load a database — because I don’t want the free data providers to be overburdened. It wouldn’t be that difficult for them to make the pages inaccessible.

    However, the RCHGetYahooQuotes() function can grab numerous data items on up to 200 ticker symbols, because it uses a CSV file from Yahoo for that purpose. So this function is independent of the 1000-web page limit mentioned above, because it’s not extracting the data from a web page. The CSV file from Yahoo is great, because all of the data can be retrieved with a single Internet access. Most of the other sources would require at least one page per ticker symbol.

    Both Zacks and FinViz also have multiple fields and multiple company download files that can be collected with a single Internet access, using the export options of their screening process.

    >>> I am interested in downloading tables from any webpage not only yahoo finance.
    can I use the above html replacing the webpage?

    Different function. To download the entire table containing the data item from my previous example:

    =RCHGetHTMLTable(“http://finance.yahoo.com/q/ks?s=MMM”,”Market Cap (intraday)”,-1,””,1)

    One disadvantage of using this versus getting the individual table cells is that retrieval of the entire table has to be done with an array-entered formula over a worksheet range. That means the retrieved data cannot be sorted.


Posting code? Use <pre> tags for VBA and <code> tags for inline.

Leave a Reply

Your email address will not be published.