Download youtube video’s to watch with Plex with automatically translated subtitles

There is so much entertaining, funny and educative material available on Youtube, most of which is in english. Some of the video’s have Dutch subtitles.

This is my wishlist for these video’s:

  1. Preference for Dutch and English subtitles, if youtube has them available (automatically translated or not)
  2. Best quality available – this is the default setting for youtube-dl according to the documentation – but specifically for merging into mp4
  3. Converted to MP4, so I can watch it both in the web browser, as well as on the RasPlex, without transcoding (my NAS is not fast enough)

This is my method to watch these video’s with Plex:

  • Log in to my private google account and give the video a like
  • On the NAS make two directories youtube-likes-workdir and youtube-likes
  • In the youtube-likes-workdir directory, run:
    youtube-dl --username  \
     --cookies _youtube-dl-cookies.txt \
     --keep-video \
     --ignore-errors \
     --write-auto-sub \
     --sub-lang 'nl,en' \
     --prefer-ffmpeg \
     --format 'bestvideo[ext=mp4]+bestaudio[ext=m4a]/mp4' \
     --merge-output-format mp4 \
     
    cd ../youtube-likes
    IFS=$(echo -en "\n\b") ; for i in $(ls ../youtube-likes-workdir/*vtt) ; do ln -f "$i" . ; done
    IFS=$(echo -en "\n\b") ; for i in $(ls ../youtube-likes-workdir/*mp4 | egrep -v 'f[0-9]*.mp4') ; do ln -f "$i" . ; done
    

The separation between youtube-likes-workdir and youtube-likes is required to prevent Plex to list video’s twice, which it will do since both separate audio/video and merged video files exist in the workdir, and I want to keep the original files to prevent excessive downloads. As a workaround, Plex is directed to the youtube-likes folder, where only the merged videos and subtitles are linked from the working directory. The last two commands above create these links. Hardlinks in stead of symlinks are required in order for Plex to play the video.

This method works with the latest Plex versions that supports vtt subtitles, as well as youtube 2FA google account authentication.

 

Advertenties

Stop a docker swarm stack and remove volumes

I often want to restart a stack with clean volumes. In swarm mode, docker stack down stops containers (services) and removes the network, but keeps the volumes, so you need to remove the volumes with docker volume rm.

Volumes cannot be removed until a stack has been completely shutdown. If you remove a volume that is still attached to a container, the following error is thrown:

Error response from daemon: unable to remove volume: remove mystack_myvolume: volume is in use.

Since it is quite annoying to retry this manually, you can use the following threeliner to stop your stack and remove the associated volumes:

docker stack down mystack
sleep 1 | docker volume ls | grep mystack | awk '{print $2}' | xargs docker volume rm -f
while [ $? -ne 0 ]; do !!; done

Increase hyperkit memory for Docker on Mac

I am running the Docker CE version 17.09 on a Mac. The overall experience with the default configuration feels so much like running docker on linux, that you’d almost forget that the docker containers are running in their own VM, with its own disks, cpu and memory settings. Long story short, here is how to watch memory usage, and increase it if necessary.

Enter the HyperKit VM to watch memory usage of the docker host

Enter this command and press ‘enter’ to enter the HyperKit VM

screen ~/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/tty
<press enter>
free
top
etc..

Increase HyperKit memory

Click the docker symbol in the menuScreen Shot 2017-12-07 at 05.04.01 bar, and select ‘Preferences’ from the docker menu. Then select the ‘Advanced’ tab. Here you will be able to change the amount of memory.

Note that this is maximum memory. The hyperkit vm will not consume that much memory if you do not use it.

Finding the right memory setting for HyperKit

HyperKit’s memory itself can be monitored by the Mac’s Activity Monitor. If you see that it runs to its maximum, check in the hyperkit itself the memory usage is linux filesystem buffers, or real application usage. In the former case, restart docker. In the latter case, you might need to give HyperKit more or less memory, depending on your actual usage.

Background

I was hit by a python command reporting a database crash, after switching from docker-compose to docker swarm mode. The crash looked like a process killed by the OOM killer, but since the setup worked fine in docker-compose mode, I did not immediately suspect memory problems. I watched memory usage on the host with the Mac’s Activity Monitor (wrong, should have accessed the HyperKit VM!) and that did not show excess memory usage. The default settings of HyperKit of 2GB prevented my application from using more memory, and crash instead.

 

Mirror (open) source projects with Gitlab

The recent EES and Bronze editions of Gitlab have a great feature for automating mirroring source projects. If your project depends on open source projects that are not under your control, mirroring is a wise choice. Full documentation is available at Gitlab Repository Mirroring

The mirroring feature is not available in the free plan, upgrade to the bronze plan – $48 per year – is required. I could spend some time writing my own mirroring software using git (set remotes, fetch all, pushing and monitoring the processes), but getting this right and monitoring it, would take probably more of my spare time than I would value $48, and also.. Gitlab is great software, why not just support it!

This screendump shows how to create a new project from an existing git url:

mirrorprojectIf you do not have a paid subscription, the project settings -> repository will show that you need to upgrade to get access to the repository mirroring feature:

upgradeplan

Migrating PyPI packages to a newer python minor version

The output of pip-freeze contains version numbers of all installed packages. I do not want to restrict package version numbers, unless there is a problem with some package. The solution is to use pip-chill. If you use a virtualenv, activate it and run

pip install pip-chill
pip-chill --no-version > requirements.txt.no-version

In the new virtualenv — I needed to upgrade from 3.4 to 3.6, see Install a Python 3.6 virtualenv with no site packages on a mac — you can then run

pip install -r requirements.txt.no-version

If a package caused troubles, instead of trying to get that package to work, perhaps it is not required anymore. In my case ‘mlab’ caused problems, so

for i in $(grep -v 'mlab' requirements.txt.no-version) ; do pip install $i ; done

 

 

Install a Python 3.6 virtualenv with no site packages on a mac

I needed to upgrade from python 3.4 to 3.6 to be able to use the SIP library required by PyQt5. Using brew virtualenv-3.x failed and pyenv has been deprecated as of python 3.6.

brew install python3
brew link --overwrite python3
deactivate
python3.6 -m venv ./pyenv
source ./pyenv/bin/activate

The defaults of the venv command, pip installed, and no system packages, are exactly what I required. For more information refer to pythons venv command.

See also Migrating PyPI packages to a newer python minor version for a quick method to populate your new environment with the packages from a previous environment.

Towards a book printable jupyter notebook

TLDR: this page contains information how to produce from a notebook a PDF with:

  • Inclusion of input cells
  • Size A5 paper, one-column
  • Markdown images placed here [h] instead of floating
  • Two-page book printing and less vertical whitespace with the LaTeX book document class
  • Summarized in the template a5_book.tplx that is listed at the bottom of this post

front

This year I started to write lecture notes from the MOOC Analyzing the Universe with Python notebooks. It would be nice to have something tangible after finishing the course. The idea of having just one button to save a book-printable PDF was appealing. However, I also remember days from the past getting LaTeX content ready for print, and several days is prohibitively long these days. Except now it is holiday 🙂

Yesterday I time-boxed one day to turn a notebook into a PDF suitable for two-page print format. Though this endeavor is far from finished, the things I learned might help others save some time, and myself in the future for further refinement.

I started with the instructions provided by Making publication ready Python Notebooks by Julius Schulz. The main takeout of his article was to write an additional template that extends an nbconvert provided LaTeX template, and fine tune your own work from there. I needed some more fine tuning, since I definitely wanted to keep the input cells with astropy calculations. Another big difference with image conversion was that I do not have images that are output of cells such as plots; all images in my notebook are inline markdown images.

TODO for future versions

  1. Having equations numbered with the equation numbering extension that now is part of the jupyter contrib nbextensions project.
  2. Refine the python markup so cell input is rendered more like it is on the screen, or at least some thin lines are shown to indicate start and end of a code cell.
  3. Finding the right paper weight so images are invisible on the other side.
  4. Maybe a little big bigger than a5, perhaps b5paper with 8pt font size.
  5. Combine multiple notebooks into as chapter per notebook into one book.
  6. Replacing the [width=.8\maxwidth] default setting from upstream nbconvert and set it in my custom template or configurable per image.

Running nbconvert from the command line

! Package pdftex.def Error: File `images/week6_lecture1_m31_nebula.png' 
not found.

If your images are specified with relative paths from a notebook that is not in the root directory of the jupyter notebook server, you need to run the command jupyter nbconvert in the same directory as the notebook. Running from the command line is also required to specify a custom latex template.

/notebooks/Analyzing the universe$ jupyter nbconvert \\
--to=latex --template=a5_book.tplx Week\ 6\ Lecture\ notes.ipynb \\
&& pdflatex Week\ 6\ Lecture\ notes.tex

Paper size A5

I chose paper size A5 for the following reasons:

  1. On A4 the images, that are set to max 80% of page width by the nbconvert latex base template, are just too big, vertically and horizontally.
  2. On two-column A4 size, the python code is too wide so it overflows into the other column
  3. On single-column A5, the width is right for most python code, and images do not become too large.

Images part of the markdown notes

Make images non-floating

With the following markdown syntax an image can be included:

![Caption text](image/file/location.png)

When the notebook is rendered on the screen, images appear between the text at the place where they are declared. In the default notebook PDF rendering of markdown images, the images are floating; they do not follow the normal stream of text. I did not like this at all, since it differed much from the way the notebook was edited and rendered in the browser. The fix requires understanding of the notebook to PDF conversion process. When the notebook is converted to PDF, it is converted by Pandoc to LaTeX and from LaTeX to PDF. The LaTeX image declaration looks like this:

\begin{figure}

\centering
\includegraphics{images/week6_lecture1_m31_nebula.png}

\caption{Location of M31 in the sky}

\end{figure}

The default image configuration in LaTeX is to make it a floating image, and this means that the images are entities separate from the text, and can even be placed on a different page than the text that it was originally placed between.

A couple of solutions are available. One is to edit the markdown and put \ after each image, which causes Pandoc to skip the \begin{figure} and use only the LaTeX \includegraphics{} declaration. A lot of work. Another solution is to generate or post-process theLaTeX file so all images are declared with \begin{figure}[h].

The solution that I could use, from Tex Stack Exchange post Latex Figures appear before text in pandoc markdown, was adding two following two lines to the LaTeX template header, to place all floats ‘here’ with the [H] argument:

\usepackage{float}
\floatplacement{figure}{H}

In the LaTeX document images are still declared without [], but all images do not float.

Controlling image width (fail)

Default image size is set by the nbconvert latex base template to 80% of the page width. It is possible to control the width of a markdown image declaration with appending {width=..}, as is shown below.

![Computing the plate scale](images/week6_lecture2_platescale.png){ width=50% }

Pandoc will correctly translate the width command, but together with the width specification already given by the base template, this duplicate width specification is incorrect LaTeX code that gives the following error:

Runaway argument?
width=.8\maxwidth ][width=0.50000\textwidth ]

The failure to make images smaller was one of the reasons for me choosing A5 paper size.

Book LaTeX document class

I switched from article to the book document class for these reasons:

  1. The book document class is meant to be printed on two pages; it can keep track of different inner and outer margin sizes.
  2. The book class also takes care of creating a separate title page, and starting the actual content on the third page, so if you let it print with a different kind of paper for the cover page, the title page alone is printed on the special cover paper, and the actual content starts at the first normal paper.
  3. The book document class produces less pages with much vertical whitespace. With the article class some of the pages had many vertical whitespace between text boxes, which was probably caused by my choice for A5 paper size and non-floating images. The book document class uses \flushbottom where and article class uses \raggedbottom. For more information, refer to Tex Stack Exchange post Why does latex stretch small sections across the whole page vertically?

Prevent \geometry to reset book margin settings

The nbconvert latex base template contains a \geometry command that redefines all margin settings. To keep the book inner and outer margins, I used the xparse package to renew the \geometry command to do nothing, to prevent future \geometry commands to reset the book template margin settings.

\usepackage{xparse}
\RenewDocumentCommand{\geometry}{om}{%
}

Top level division from section to chapter

The Book document class introduced a side effect, which was that the top level devision of the document is no longer ‘section’ but ‘chapter’, and chapter name and numbers are printed in the header. While Pandoc has the setting –top-level-division, I could not find an option to convince nbconvert to let Pandoc use section as top level division. In the python code the call to pandoc has a kwargs argument for additional arguments, but nbconvert –help-all does not convey this option. The best way to use the book document class is probably to write a book.tplx as alternative to nbconverts own article.tplx, however since I was time-boxed, I put the chapter counter and title in the notebook metadata, and let the a5_book.tplx produce the content of these variables in the LaTeX document, resulting in code like this:

\setcounter{chapter}{5}

\chapter{Week 6}

Notebook meta data for title page and chapter info

Go to the notebook -> Edit Menu -> Edit Notebook Metadata and create a JSON property “latex_metadata” to specify the variables that can be used by the custom LaTeX template. Note that the object format is strictly coupled to the custom template a5_book.tplx below, it is not read by any other software. The latex_metadata object in my notebook looks as follows:

{
  "language_info": {
   ...
  },
  "latex_metadata": {
    "affiliation": "Rutgers the State University of New Jersey",
    "title": "Analyzing the Universe, Week 6 lecture notes",
    "author": "Dr. Terry A. Matilsky",
    "chapter": {
      "setcounter": 5,
      "title": "Week 6"
    }
  }
}

Example pages printed

This was printed and shipped for less than $10,-. Next time I’ll ask the print shop which paper is suitable for full colour print and preventing images on the other side to shine through. Still, I was quite pleased with the result!
content

a5_book.tplx

The final template I ended up with was the following:

((*- extends 'article.tplx' -*))

((* block docclass *))
\documentclass[9pt, reprint, floatfix, groupaddress, prb, twoside]{book}

% Use a wider inner margin for the two-sided book
\usepackage[a5paper, margin=0.5in, inner=1in]{geometry}

% Ignore future geometry commands with optional and mandatory arguments
\usepackage{xparse}
\RenewDocumentCommand{\geometry}{om}{%
}

% Let all figures float 'H'ere
\usepackage{float}
\floatplacement{figure}{H}

((* endblock docclass *))

% Author and Title from metadata
((* block maketitle *))

((*- if nb.metadata["latex_metadata"]: -*))
((*- if nb.metadata["latex_metadata"]["author"]: -*))
\author{((( nb.metadata["latex_metadata"]["author"] )))}
((*- endif *))
((*- endif *))

((*- if nb.metadata["latex_metadata"]: -*))
((*- if nb.metadata["latex_metadata"]["title"]: -*))
\title{((( nb.metadata["latex_metadata"]["title"] )))}
((*- endif *))
((*- else -*))
\title{((( resources.metadata.name )))}
((*- endif *))

\date{\today}
\maketitle

((*- if nb.metadata["latex_metadata"]: -*))
((*- if nb.metadata["latex_metadata"]["chapter"]: -*))
((*- if nb.metadata["latex_metadata"]["chapter"]["setcounter"]: -*))
\setcounter{chapter}{((( nb.metadata["latex_metadata"]["chapter"]["setcounter"] )))}
((*- endif *))

((*- if nb.metadata["latex_metadata"]["chapter"]["title"]: -*))
\chapter{((( nb.metadata["latex_metadata"]["chapter"]["title"] )))}
((*- endif *))
((*- endif *))
((*- endif *))

((* endblock maketitle *))