【python】youtube-dlでとgspreadでGoogleSpreadsheetのURLをダウンロードする

今回はプログラミングのネタ。

前回Termuxのネタで以下投稿したpythonのYoutube-dlについて仕組みを解説します。

【iPlay40Pro】termuxをインストールしたら実施していること

尚、前回のTermuxの記事では、コマンドラインのyoutube-dlを使うコードだったのですが、
python-pipのyoutube-dlライブラリをPythonにインポートすることにして今回は修正しました。

以下が修正前の該当部分

    args = ["youtube-dl",url]
    subprocess.run(args,stdout=subprocess.PIPE)

「values_sheet」には新着動画のURLが格納されています。
「args」に「youtube-dl」コマンドと引数となるURLを含めて、
subprocess.runとするとシェルが実行されるという仕掛けでした。

コマンドラインを使う自動化の前提で検討していたので苦肉の策での仕組みでしたね。

今回はその部分を見直してコード上にyoutube-dlライブラリを読み込むことで、
上記のようにシェルに渡す必要がなくなり、pythonコードで完結する仕組みになりました。

ydl_opts = {
            "outtmpl": path + "%(title)s.%(ext)s"
            }
with youtube_dl.YoutubeDL(ydl_opts) as ydl:
        ydl.download([urls])

ソースコード全体は以下となります。
使い回しする場合、赤字の箇所を自身の環境に合わせて修正すればそのまま使えると思います。
（※ダウンロート先、スプレッドシート名、APIキー）

#!/usr/bin/python3
#
import datetime
import time
import subprocess
import gspread
import youtube_dl
import apiclient
import json
from oauth2client.service_account import ServiceAccountCredentials
#
today = datetime.datetime.today()
today_s = today.strftime("%Y/%m/%d %H:%M:%S")
dlPath = "****DOWNLOAD_PATH****"
#
def connect_gspread(jsonf,key):
    scope = ['https://spreadsheets.google.com/feeds','https://www.googleapis.com/auth/drive']
    credentials = ServiceAccountCredentials.from_json_keyfile_name(jsonf, scope)
    gc = gspread.authorize(credentials)
    spread_sheet_key = key
    spread_sheet_name = "****Sheet_Name****"
    worksheet = gc.open_by_key(spread_sheet_key).sheet1
    return worksheet
#
jsonf="./****your****.json"
spread_sheet_key="****Your_Key****"
#
ws = connect_gspread(jsonf,spread_sheet_key)
values_sheet = ws.get_all_values()
#
def get_video(urls,path):
    print('-- Downloading... --')
    ydl_opts = {
            "format": 'mp4',
            "outtmpl": path + "%(title)s.%(ext)s"
            }
    with youtube_dl.YoutubeDL(ydl_opts) as ydl:
        ydl.download([urls])
    print('-- Downloaded !!! --')
#
def do_download():
    get_video(spSongUrl,dlPath)
    cnt_today = datetime.datetime.today()
    cnt_today_s = today.strftime("%Y/%m/%d %H:%M:%S")
    ws.update_cell(cnt+1,5,'Downloaded!! ' + cnt_today_s)
#
maxNo = len(values_sheet)
#
cnt = 0
while cnt < maxNo:
    spDate = values_sheet[cnt][0]
    spSongTitle= values_sheet[cnt][1]
    spSongComment = values_sheet[cnt][2]
    spSongUrl = values_sheet[cnt][3]
#
    if (len(values_sheet[cnt])==4):
        spFlag = '0'
    if (len(values_sheet[cnt])==5):
        spFlag = values_sheet[cnt][4]
#
    print(cnt,spDate,spSongTitle.replace('[NCS Release]',''),spSongComment[0:12])
    print(spSongUrl,str(spFlag))
    print("-")
#
    if (str(spFlag) == '0'):
        print('spFlag == str(0)')
        do_download()
        time.sleep(1)
    elif (spFlag[0:1] != 'D'):
        print('spFlag = D')
        do_download()
        time.sleep(1)

#
    cnt += 1
    time.sleep(1)

各コード内容の解説

まずは以下箇所ではGoogleのSpreadSheetを読み込み、「values_sheet」リストに内容を格納します。

def connect_gspread(jsonf,key):
    scope = ['https://spreadsheets.google.com/feeds','https://www.googleapis.com/auth/drive']
    credentials = ServiceAccountCredentials.from_json_keyfile_name(jsonf, scope)
    gc = gspread.authorize(credentials)
    spread_sheet_key = key
    spread_sheet_name = "****Sheet_Name****"
    worksheet = gc.open_by_key(spread_sheet_key).sheet1
    return worksheet
#
jsonf="./****your****.json"
spread_sheet_key="****Your_Key****"
#
ws = connect_gspread(jsonf,spread_sheet_key)
values_sheet = ws.get_all_values()

次にvalues_sheetの行数を取得し、maxNo変数へ格納。
cnt変数に0を格納し、maxNoになるまでwhileループ文で繰り返します。

繰り返す内容は、ダウンロード済かどうかを判定し、
未ダウンロードのものをダウンロードする処理となります。

判定処理ではSpreadSheetのセルの値が文字型の「0」か「D」かどうかを確認し、
対象であれば「do_download関数」の処理に行き、ダウンロードとしています。

ここで「0」としているのは、セルのデータを一度消した後、目では見えないのですが、
Nullとはなっておらず、0が含まれているためです。

尚、sleep(1)は1秒停止です。
ソースコード内にちょこちょこと挟んでいますのは、多数連続処理を実行すると
大量アクセス扱いとなりGoogleSpreadSheet側から蹴られるためです。

maxNo = len(values_sheet)
#
cnt = 0
while cnt < maxNo:
    spDate = values_sheet[cnt][0]
    spSongTitle= values_sheet[cnt][1]
    spSongComment = values_sheet[cnt][2]
    spSongUrl = values_sheet[cnt][3]
#
    if (len(values_sheet[cnt])==4):
        spFlag = '0'
    if (len(values_sheet[cnt])==5):
        spFlag = values_sheet[cnt][4]
#
    print(cnt,spDate,spSongTitle.replace('[NCS Release]',''),spSongComment[0:12])
    print(spSongUrl,str(spFlag))
    print("-")
#
    if (str(spFlag) == '0'):
        print('spFlag == str(0)')
        do_download()
        time.sleep(1)
    elif (spFlag[0:1] != 'D'):
        print('spFlag = D')
        do_download()
        time.sleep(1)

#
    cnt += 1
    time.sleep(1)

最後はダウンロード処理の部分です。

do_downloadではget_video関数に渡すためのURLとPATHを引数にセット。
getvideoが完了すると、SpreadSheetの５項目目に「Downoaded !!＋日付が挿入されるようにしています。

これで既にダウンロードが終わったものはスキップされます。

再度ダウンロードしたい場合は、SpreadSheetの日付箇所をクリアすると再度ダウンロード可能になります。

def get_video(urls,path):
    print('-- Downloading... --')
    ydl_opts = {
            "outtmpl": path + "%(title)s.%(ext)s"
            }
    with youtube_dl.YoutubeDL(ydl_opts) as ydl:
        ydl.download([urls])
    print('-- Downloaded !!! --')
#
def do_download():
    get_video(spSongUrl,dlPath)
    cnt_today = datetime.datetime.today()
    cnt_today_s = today.strftime("%Y/%m/%d %H:%M:%S")
    ws.update_cell(cnt+1,5,'Downloaded!! ' + cnt_today_s)

いかがでしたでしょうか。

何かの参考になれば幸いです。

それではHappyHackingLifeを！

Youtubeの動画をコマンド一発でダウンロードする（Python）

【python】youtube-dlでとgspreadでGoogleSpreadsheetのURLをダウンロードする

各コード内容の解説

Related Posts

ProxmoxでCluster切り離し後に「* this host already contains virtual guests task error: check if node may join a cluster failed!」と出てClusterの再参加ができない場合の対応

vimからneovimに移行してみました。

Lenovo Think Centre Tiny M720qのOpencoreを0.9.3に更新しました。

HackintoshでSoftware Updateできない場合

G検定（ジェネラリスト検定）に合格しました。