Exporting mbed wiki page

This is a piece of Python 2.7.x code I wrote as a workaround of saving mbed wiki pages in a personal repository to be able to regenerate all content in case of lossing it. This question originated the work.

import HTMLParser
import urllib2

#URL must have the "?action=edit" prefix. For example: "https://developer.mbed.org/teams/Digi-International-Inc/code/XBeeLib/wiki/Homepage?action=edit"

def save_wiki_plaintext_in_file(url, output_file):
    file_contents = urllib2.urlopen(url).read()
    textarea_contents = find_between(file_contents, "<textarea", "</textarea>")
    textarea_contents = find_between(textarea_contents, ">", "") # Skip the "textarea" attributes, like in <textarea cols="40" id="id_content" name="content" rows="20" style="width: 100%;">
    
    textarea_contents = textarea_contents.encode("utf8")
    
    html_decoded_string = HTMLParser.HTMLParser().unescape(textarea_contents)
    
    html_decoded_string = html_decoded_string.replace("\r\n", "\n")
    html_decoded_string = html_decoded_string.replace("\r", "\n") #Convert all end-of-lines into unix-style

    if (html_decoded_string[0] == '\n'):
        html_decoded_string = html_decoded_string[1:]
    html_decoded_string = html_decoded_string + "\n"

    file = open(output_file, "w")
    file.write(html_decoded_string)
    file.close()


Please log in to post comments.