Use blogger python API to publish from markdown files

這篇文章延續之前的想法，希望透過使用markdown比較單純簡單好懂的語法來寫部落格文章，然後轉換成html語法，接著透過python程式掃描文章內容，判斷是不是新文章，或是要更新文章，然後再透過blogger API的介面將轉換之後的html內容上傳新文章或更新文章，底下記錄在完成這個過程遇到的困難，和學習到的一些內容的紀錄。

1. 如何使用blogger python API

Install Blogger APIs Client Library

用pip安裝python library pip install --upgrade google-api-python-client

如何組出Authorization Request URL, 並且取得access token & refresh token

這篇在講要將user導到什麼URL去做認證，我們一般用到app有用到google api的都會自動導到google認證頁面，讓使用者同意此app存取使用者資料，這篇就是在講如果你要自己組出http request/response來達成這個需求，具體request/response內容是什麼

Simple API & Authorized API example

這篇就是透過Google client library來達成，如何在使用API之前做認證，Simple API指的是存取的是公開的資訊，不需要使用者認證的API，而Authorized API則是需要使用者同意存取私人資料的API，而我的目的在透過python去publish/update blogger文章，所以必須要用到的是Authorized API，透過這個範例，確實是相當容易，連web browser都自動開啟來做到Authorization，讓使用者同意存取。

google api也很易懂，先透過build建立service物件，然後blogger API提供的方法，可以直接透過service物件呼叫使用

self.service = build('blogger', 'v3', http=http)
def getBlogs(self):
        service=self.service
        blogs=service.blogs()
        thisuserblogs=blogs.listByUser(userId='self').execute()
        pprint.pprint(thisuserblogs)

2. 如何判斷local markdown文章有沒有更新

基本的blogger API可以呼叫以後，要先思考如何判斷資料夾的markdown檔案中，哪些是要新發佈的，哪些是要更新文章的，比較簡單的做法是對文章內容做checksum，blogger上傳發佈的內容是html，所以checksum也是針對html的內容，所以本地端的md檔案也要先轉換成html，在做checksum，才能跟blogger上的內容比對。

2.1 和blogger連線取得全部發佈文章，建立本地端資料庫

利用blogger API，將指定的blog文章全部資訊下載並且將需要的資訊更新到local sqlite3資料庫中，blogger API可以取得文章id, title, content, published timestamp, updated timestamp，而我必須先將content內容做完checksum，再存入資料庫，供之後比對本地端與blogger上文章內容。

2.2 掃描本地文章(markdown檔案.md)，並且轉換成html

因為必須先從md轉換成html，才能計算checksum，而因為md有些語法是非標準的，所以如果有用到非標準的一些好用語法，勢必要選擇同樣有支援非標準語法的轉換程式，但是我目前應該還沒用到，所以就用python的markdown library ref link

安裝python markdown pip install markdown

轉換指令

python -m markdown readme.md > readme.html

在python內呼叫，可以支援extensions，包含fenced code block 參考這裡，extra extensions就包含fenced code block了，所以直接用markdown.extensions.extra

import markdown
html = markdown.markdown(content, extensions=['markdown.extensions.extra'])

所以流程就是

掃描每個.md檔案(markdown格式的文章)
將md檔案轉成html，並計算checksum

2.3 比對md檔案修改時間與blogger文章更新時間

除了比對checksum以外，如果本地端的md檔案修改時間<blogger上文章update時間，表示本地端文章內容比網路上還要舊，所以其實不需要將本地端的文章更新到blogger，所以需要紀錄md file的修改時間，另外轉換html檔案的時候，如果html檔案時間比md檔案舊，表示需要重新產生html檔，並且更新checksum資料，好讓後面的checksum比對工作能正確運作。

當比對到資料庫的title & checksum都一致，表示不需要更新

當比對到資料庫的title一致，checksum不同，並且modified_timestamp > updated timestamp，表示文章經過修改，需要update到網路

當沒有比對到資料庫有相同的title，並且title不是[draft]開頭，表示為新文章

遇到問題1:

因為posts有記錄updated & published時間，回傳時間格式2016-12-22T01:43:04-08:00，前面都能透過strftime ‘%Y-%m-%dT%H:%M’抓到，但是後面的-08:00是指GMT-08:00，我一開始blogger設定的timezone錯誤，所以回傳這個錯誤的timezone，算出來的時間不是台灣時間，但是因為strptime的timezone %z格式在python 2.7不支援，所以就參考下面自己算offset來轉換

參考這

from datetime import datetime,timedelta
def dt_parse(t, GMT_offset=8):
    ret = datetime.strptime(t[0:19],'%Y-%m-%dT%H:%M:%S')
    if t[19]=='-':
        ret+=timedelta(hours=int(t[20:22]),minutes=int(t[23:]))
    elif t[19]=='+':
        ret-=timedelta(hours=int(t[20:22]),minutes=int(t[23:]))
    else:
        perror('wrong format')

    if GMT_offset>0:
        ret+=timedelta(hours=GMT_offset)
    elif GMT_offset<0:
        ret-=timedelta(hours=GMT_offset)
    return ret

遇到問題2

由於常常用到從資料庫撈資料出來，但是要存取一筆一筆資料時，回傳的是list，所以又要記得SQL query SELECT欄位的順序，再用list[i]來存取所要的欄位，但是如果哪天table的欄位有更動，可能又要動到很多地方的code

TBL_POSTS_FIELDS=['id', 'title', 'filename', 'checksum', 'published', 'updated']
for row in r:
    if len(row)==len(TBL_POSTS_FIELDS):
        data=dict(zip(TBL_POSTS_FIELDS, row))
        print 'title=', data['title']
        print 'updated=', data['updated']

以上的做法是將欄位名稱定義好，假設sqlite3回傳的資料r，row為單筆資料，透過dict將資料與欄位名稱配對，然後就能用data[‘欄位’]來存取了，好處是之後新增欄位，或是調整欄位順序都不用重寫這段，以上比較簡潔的寫法，主要是這行data=dict(zip(TBL_POSTS_FIELDS, row))，透過將資料設定成dict的型態，存取資料透過key，比透過index直覺很多，之後要維護也比較容易懂。

3. 發佈新文章還是更新

沒有google到如何利用python blogger API上傳新文章或是更新文章的例子，但是還好自己試了一下，很容易就成功了，從API document中可以看到，新文章要用posts.insert API，然而我們只看到

body: object, The request body. (required)
    The object takes the form of

{
  "status": "A String", # Status of the post. Only set for admin-level requests
  "content": "A String", # The content of the Post. May contain HTML markup.
  "kind": "blogger#post", # The kind of this entity. Always blogger#post

...

傳入的body是個很多key的object，但是我們應該只用到title & content，所以其實就是這個單純，傳入title & content，也就能成功新增＆修改文章，修改要多傳入postId的參數，參考如下

How to use python blogger API to insert new post and update a post:

self.service = build('blogger', 'v3', http=http)

def post_update(self, blogId, postId, title, content):
        service=self.service
        posts=service.posts()
        post={'title':title, 'content':content}
        r=posts.update(blogId=blogId, postId=postId, body=post).execute()
        if r:
            print '[insert]',title,',Success'
        else:
            print '[insert]',title,',failed'

    def post_insert(self, blogId, title, content):
        service=self.service
        posts=service.posts()
        post={'title':title, 'content':content}
        r=posts.insert(blogId=blogId, body=post).execute()
        if r:
            print '[insert]',title,',Success'
        else:
            print '[insert]',title,',failed'

ref blogger API document

my GitHub for this example: python-blogger-for-markdown