nicovideo-dl パッチ案 - Wisteria::Diary

Nicovideo Downloader をいじってみた。が、とてもコミットできるクォリティではないので、ブログに貼り付けるにとどめよう orz

追加したオプション

nFinder スタイルの保存ファイル名 (--title-nfinder)
保存ファイル名を表示して終了 (--print-filename)

問題点

このパッチでは、nFinder スタイルの保存ファイル名が正しく付けられるのは SMILEVIDEO の動画 (sm*) とニコニコチャンネルだけである。

nFinder スタイルの保存ファイル名は、例えば「[エコノミー]未来日記 #01「サインアップ」(so15809985).mp4」のように、括弧内に動画 ID が含まれている*1。一方、ニコニコチャンネルの視聴ページの URL は「http://www.nicovideo.jp/watch/1317899356」のようにスレッド ID で指定されていて、動画 ID が含まれていない。したがって、何らかの方法で動画 ID を取得する必要がある。動画ファイルの実 URL は「http://smile-cln47.nicovideo.jp/smile?m=15809985.14619low」のように動画 ID の数字部分を含んでいるが、接頭辞が含まれていない。接頭辞付きの動画 ID を取得するにはどうすればいい? 視聴ページの HTML をスクレイピングするしかない?? Python でスクレイパー書くのいやーん…。

参考文献

ソースコード diff

3a4
> # Copyright (c) 2011 Ikki Fujiwara
47c48
< const_version = '2011.02.08'
---
> const_version = '2011.10.14'
62c63,66
< const_video_type_re = re.compile(r'^http://.*\.nicovideo\.jp/smile\?(.*?)=.*')
---
> const_video_type_re = re.compile(r'^http://.*\.nicovideo\.jp/smile\?(.*?)=(\w+).(\w+)')
> const_site_title_re = re.compile(ur' *‐ *ニコニコ動画\(.*?\)$')
> const_filename_low_str = '[エコノミー]%s'
> const_filename_invalid_re = re.compile(r'[\\/:,;*?"<>|]')
103c107
< 	if not (cmdl_opts.quiet or cmdl_opts.get_url):
---
> 	if not (cmdl_opts.quiet or cmdl_opts.get_url or cmdl_opts.get_filename):
106a111,119
> # Title string normalization for nFinder-style filename
> def title_string_nf(title):
> 	title_s = unicode(title.decode('utf-8', 'ignore'))
> 	title_s = const_site_title_re.sub(u'', title_s)
> 	title_s = const_filename_invalid_re.sub(u'', title_s)
> 	title_s = title_s.replace(os.sep, u'%')
> 	title_s = title_s.encode('utf-8','ignore')
> 	return title_s
> 
233a247,248
> cmdl_parser.add_option('--title-nfinder', action='store_true', dest='use_title_nf', help='use nFinder-style title in file name')
> cmdl_parser.add_option('--print-filename', action='store_true', dest='get_filename', help='print final file name and exit')
254,255c269,270
< if cmdl_opts.quiet and cmdl_opts.get_url:
< 	sys.exit('Error: cannot be quiet and print final URL at the same time.')
---
> if cmdl_opts.quiet and (cmdl_opts.get_url or cmdl_opts.get_filename):
> 	sys.exit('Error: cannot be quiet and print something at the same time.')
333c348
< 	if cmdl_opts.use_title or cmdl_opts.use_literal or cmdl_opts.get_title:
---
> 	if cmdl_opts.use_title or cmdl_opts.use_title_nf or cmdl_opts.use_literal or cmdl_opts.get_title:
363c378
< 	if cmdl_opts.use_title or cmdl_opts.use_literal:
---
> 	if cmdl_opts.use_title or cmdl_opts.use_title_nf or cmdl_opts.use_literal:
365a381,391
> 			video_filename = '%s-%s%s' % (prefix, video_url_id, video_extension)
> 		elif cmdl_opts.use_title_nf: 
> 			prefix = title_string_nf(video_title)
> 			video_real_id = video_url_id
> 			# FIXME: How can we extract video-id (e.g. so15784703) from thread-id (e.g. 1317636899)?
> 			if video_url_id.find('sm') != 0:
> 				# FIXME: Video-id is not limited to so********. cf. http://dic.nicovideo.jp/a/id
> 				video_real_id = 'so%s' % const_video_type_re.match(video_url_real).group(2)
> 			video_filename = '%s(%s)%s' % (prefix, video_real_id, video_extension)
> 			if const_video_type_re.match(video_url_real).group(3).find("low") >= 0:
> 				video_filename = const_filename_low_str % video_filename
368c394
< 		video_filename = '%s-%s%s' % (prefix, video_url_id, video_extension)
---
> 			video_filename = '%s-%s%s' % (prefix, video_url_id, video_extension)
387c413,416
< 		if cmdl_opts.simulate or cmdl_opts.get_url:
---
> 		if cmdl_opts.get_filename:
> 			print video_filename
> 		
> 		if cmdl_opts.simulate or cmdl_opts.get_url or cmdl_opts.get_filename:

*1:これによってダウンロード済みの動画を識別していると思われる