<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>my tech blog &#187; perl</title>
	<atom:link href="http://billauer.se/blog/category/perl/feed/" rel="self" type="application/rss+xml" />
	<link>https://billauer.se/blog</link>
	<description>Anything I found worthy to write down.</description>
	<lastBuildDate>Thu, 12 Mar 2026 11:36:00 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1.2</generator>
		<item>
		<title>Perl one-liner for adding newlines to HTML</title>
		<link>https://billauer.se/blog/2026/03/perl-tidy-html/</link>
		<comments>https://billauer.se/blog/2026/03/perl-tidy-html/#comments</comments>
		<pubDate>Thu, 12 Mar 2026 11:35:32 +0000</pubDate>
		<dc:creator>eli</dc:creator>
				<category><![CDATA[Internet]]></category>
		<category><![CDATA[perl]]></category>
		<category><![CDATA[Rich text editors]]></category>

		<guid isPermaLink="false">https://billauer.se/blog/?p=7235</guid>
		<description><![CDATA[When the rich editor puts all HTML in one line, and I want to edit it, I could always use the &#8220;tidy&#8221; utility, however it does too much. All I want is a newline here and there to make the whole thing accessible. So this simple one-liner does the job: perl -pe 's/(&#60;\/(?:p&#124;h\d&#124;div&#124;tr&#124;td&#124;table&#124;ul&#124;ol&#124;li)&#62;)/"$1\n"/ge' Not perfect, [...]]]></description>
			<content:encoded><![CDATA[<p>When the rich editor puts all HTML in one line, and I want to edit it, I could always use the &#8220;tidy&#8221; utility, however it does too much. All I want is a newline here and there to make the whole thing accessible.</p>
<p>So this simple one-liner does the job:</p>
<pre>perl -pe 's/(&lt;\/(?:p|h\d|div|tr|td|table|ul|ol|li)&gt;)/"$1\n"/ge'</pre>
<p>Not perfect, but gives something to work with.</p>
]]></content:encoded>
			<wfw:commentRss>https://billauer.se/blog/2026/03/perl-tidy-html/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Converting vtt to srt subtitles with a simple Perl script</title>
		<link>https://billauer.se/blog/2025/09/vtt-to-srt-subtitles-conversio/</link>
		<comments>https://billauer.se/blog/2025/09/vtt-to-srt-subtitles-conversio/#comments</comments>
		<pubDate>Fri, 19 Sep 2025 16:45:46 +0000</pubDate>
		<dc:creator>eli</dc:creator>
				<category><![CDATA[perl]]></category>

		<guid isPermaLink="false">https://billauer.se/blog/?p=7169</guid>
		<description><![CDATA[I tried to use ffmpeg to convert an vtt file to srt, but that didn&#8217;t work at all: $ ffmpeg -i in.vtt out.srt Output file is empty, nothing was encoded (check -ss / -t / -frames parameters if used) I tried a whole lot of suggestions from the Internet, and eventually I gave up. So [...]]]></description>
			<content:encoded><![CDATA[<p>I tried to use ffmpeg to convert an vtt file to srt, but that didn&#8217;t work at all:</p>
<pre>$ <strong>ffmpeg -i in.vtt out.srt</strong>
Output file is empty, nothing was encoded (check -ss / -t / -frames parameters if used)</pre>
<p>I tried a whole lot of suggestions from the Internet, and eventually I gave up.</p>
<p>So I wrote a simple Perl script to get the job done. It took about 20 minutes, because I made a whole lot of silly mistakes:</p>
<pre><span class="hljs-comment">#!/usr/bin/perl</span>

<span class="hljs-keyword">use</span> warnings;
<span class="hljs-keyword">use</span> strict;

<span class="hljs-keyword">my</span> $n = <span class="hljs-number">1</span>;
<span class="hljs-keyword">my</span> $l;

<span class="hljs-keyword">my</span> $timestamp_regex = <span class="hljs-regexp">qr/[0-9]+:[0-9]+:[0-9:\.]+/</span>; <span class="hljs-comment"># Very permissive</span>

<span class="hljs-keyword">while</span> (<span class="hljs-keyword">defined</span> ($l = &lt;&gt;)) {
  <span class="hljs-keyword">my</span> ($header) = ($l =~ <span class="hljs-regexp">/^($timestamp_regex --&gt; $timestamp_regex)/</span>);
  <span class="hljs-keyword">next</span> <span class="hljs-keyword">unless</span> (<span class="hljs-keyword">defined</span> $header);

  $header =~ <span class="hljs-regexp">s/\./,/g</span>;

  <span class="hljs-keyword">print</span> <span class="hljs-string">"$n\n"</span>;
  <span class="hljs-keyword">print</span> <span class="hljs-string">"$header\n"</span>;

  $n++;

  <span class="hljs-keyword">while</span>  (<span class="hljs-keyword">defined</span> ($l = &lt;&gt;)) {
    <span class="hljs-keyword">last</span> <span class="hljs-keyword">unless</span> ($l =~ <span class="hljs-regexp">/[^ \t\n\r]/</span>); <span class="hljs-comment"># Nothing but possibly whitespaces</span>

    <span class="hljs-keyword">print</span> $l;
  }
  <span class="hljs-keyword">print</span> <span class="hljs-string">"\n"</span>;
}

$n--;
<span class="hljs-keyword">print</span> STDERR <span class="hljs-string">"Converted $n subtitles\n"</span>;</pre>
<p>Maybe not a piece of art, and it can surely be made more accurate, but it does the job with simply</p>
<pre>$ <strong>./vtt2srt.pl in.vtt &gt; out.srt</strong>
Converted 572 subtitles</pre>
<p>And here&#8217;s why Perl is a pearl.</p>
]]></content:encoded>
			<wfw:commentRss>https://billauer.se/blog/2025/09/vtt-to-srt-subtitles-conversio/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Perl script for mangling SRT subtitle files</title>
		<link>https://billauer.se/blog/2024/05/perl-subtitle-fixing/</link>
		<comments>https://billauer.se/blog/2024/05/perl-subtitle-fixing/#comments</comments>
		<pubDate>Tue, 21 May 2024 09:06:03 +0000</pubDate>
		<dc:creator>eli</dc:creator>
				<category><![CDATA[perl]]></category>

		<guid isPermaLink="false">https://billauer.se/blog/?p=7075</guid>
		<description><![CDATA[I had a set of SRT files with pretty good subtitles, but with one annoying problem: When there was a song in the background, the translation of the song would pop up and interrupt of the dialogue&#8217;s subtitles, so it became impossible to understand what&#8217;s going on. Luckily, those song-translating subtitles had all have a [...]]]></description>
			<content:encoded><![CDATA[<p>I had a set of SRT files with pretty good subtitles, but with one annoying problem: When there was a song in the background, the translation of the song would pop up and interrupt of the dialogue&#8217;s subtitles, so it became impossible to understand what&#8217;s going on.</p>
<p>Luckily, those song-translating subtitles had all have a &#8220;{\a6}&#8221; string, which is an ASS tag meaning that the text <a rel="noopener" href="https://aegisub.org/docs/3.2/ASS_Tags/" target="_blank">should be shown at the top of the picture</a>. mplayer ignores these tags, which explains why these subtitles make sense, but mess up things for me. So the simple solution is to remove these entries.</p>
<p>Why don&#8217;t I use VLC instead? Mainly because I&#8217;m used to mplayer, and I&#8217;m  under the impression that mplayer gives much better and easier control  of low-level issues such as adjusting the subtitles&#8217; timing. But also  the ability to run it with a lot of parameters from the command line and  jumping back and forth in the displayed video, in particular through a  keyboard remote control. But maybe it&#8217;s just a matter of habit.</p>
<p>Here&#8217;s a Perl script that reads an SRT file and removes all entries with such string. It fixes the numbering of the entries to make up for those that have been removed. Fun fact: The entries don&#8217;t need to appear in chronological order. In fact, most of the annoying subtitles appeared at the end of the file, even though they messed up things everywhere.</p>
<p>This can be a boilerplate for other needs as well, of course.</p>
<pre><span class="hljs-comment">#!/usr/bin/perl</span>
<span class="hljs-keyword">use</span> warnings;
<span class="hljs-keyword">use</span> strict;

<span class="hljs-keyword">my</span> $fname = <span class="hljs-keyword">shift</span>;

<span class="hljs-keyword">my</span> $data = readfile($fname);

<span class="hljs-keyword">my</span> ($name, $ext) = ($fname =~ <span class="hljs-regexp">/^(.*)\.(.*)$/</span>);

<span class="hljs-keyword">die</span>(<span class="hljs-string">"No extension in file name \"$fname\"\n"</span>)
  <span class="hljs-keyword">unless</span> (<span class="hljs-keyword">defined</span> $name);

<span class="hljs-comment"># Regex for a newline, swallowing surrounding CR if such exist</span>
<span class="hljs-keyword">my</span> $nl = <span class="hljs-regexp">qr/\r*\n\r*/</span>;

<span class="hljs-comment"># Regex for a subtitle entry</span>
<span class="hljs-keyword">my</span> $tregex = <span class="hljs-regexp">qr/(?:\d+$nl.*?(?:$nl$nl|$))/s</span>;

<span class="hljs-keyword">my</span> ($pre, $chunk, $post) = ($data =~ <span class="hljs-regexp">/^(.*?)($tregex*)(.*)$/</span>);

<span class="hljs-keyword">die</span>(<span class="hljs-string">"Input file doesn't look like an SRT file\n"</span>)
  <span class="hljs-keyword">unless</span> (<span class="hljs-keyword">defined</span> $chunk);

<span class="hljs-keyword">my</span> $lpre = <span class="hljs-keyword">length</span>($pre);
<span class="hljs-keyword">my</span> $lpost = <span class="hljs-keyword">length</span>($post);

<span class="hljs-keyword">print</span> <span class="hljs-string">"Warning: Passing through $lpre bytes at beginning of file untouched\n"</span>
 <span class="hljs-keyword">if</span> ($lpre);

<span class="hljs-keyword">print</span> <span class="hljs-string">"Warning: Passing through $lpost bytes at beginning of file untouched\n"</span>
 <span class="hljs-keyword">if</span> ($lpost);

<span class="hljs-keyword">my</span> @items = ($chunk =~ <span class="hljs-regexp">/($tregex)/g</span>);

<span class="hljs-comment">#### This is the mangling part</span>

<span class="hljs-keyword">my</span> @outitems;
<span class="hljs-keyword">my</span> $removed = <span class="hljs-number">0</span>;
<span class="hljs-keyword">my</span> $counter = <span class="hljs-number">1</span>;

<span class="hljs-keyword">foreach</span> <span class="hljs-keyword">my</span> $i (@items) {
  <span class="hljs-keyword">if</span> ($i =~ <span class="hljs-regexp">/\\a6/</span>) {
    $removed++;
  } <span class="hljs-keyword">else</span> {
    $i =~ <span class="hljs-regexp">s/\d+/$counter/</span>;
    $counter++;
    <span class="hljs-keyword">push</span> @outitems, $i;
  }
}

<span class="hljs-keyword">print</span> <span class="hljs-string">"Removed $removed subtitle entries from $fname\n"</span>;

<span class="hljs-comment">#### Mangling part ends here</span>

writefile(<span class="hljs-string">"$name-clean.$ext"</span>, <span class="hljs-keyword">join</span>(<span class="hljs-string">""</span>, $pre, @outitems, $post));

<span class="hljs-keyword">exit</span>(<span class="hljs-number">0</span>); <span class="hljs-comment"># Just to have this explicit</span>

<span class="hljs-comment">############ Simple file I/O subroutines ############</span>

<span class="hljs-function"><span class="hljs-keyword">sub</span> <span class="hljs-title">writefile</span> </span>{
  <span class="hljs-keyword">my</span> ($fname, $data) = @_;

  <span class="hljs-keyword">open</span>(<span class="hljs-keyword">my</span> $out, <span class="hljs-string">"&gt;:utf8"</span>, $fname)
    <span class="hljs-keyword">or</span> <span class="hljs-keyword">die</span> <span class="hljs-string">"Can't open \"$fname\" for write: $!\n"</span>;
  <span class="hljs-keyword">print</span> $out $data;
  <span class="hljs-keyword">close</span> $out;
}

<span class="hljs-function"><span class="hljs-keyword">sub</span> <span class="hljs-title">readfile</span> </span>{
  <span class="hljs-keyword">my</span> ($fname) = @_;

  <span class="hljs-keyword">local</span> $/; <span class="hljs-comment"># Slurp mode</span>

  <span class="hljs-keyword">open</span>(<span class="hljs-keyword">my</span> $in, <span class="hljs-string">"&lt;:utf8"</span>, $fname)
    <span class="hljs-keyword">or</span> <span class="hljs-keyword">die</span> <span class="hljs-string">"Can't open $fname for read: $!\n"</span>;

  <span class="hljs-keyword">my</span> $input = &lt;$in&gt;;
  <span class="hljs-keyword">close</span> $in;

  <span class="hljs-keyword">return</span> $input;
}</pre>
]]></content:encoded>
			<wfw:commentRss>https://billauer.se/blog/2024/05/perl-subtitle-fixing/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Perl: &#8220;$&#8221; doesn&#8217;t really mean end of string</title>
		<link>https://billauer.se/blog/2023/03/perl-dollar-match-end-of-string/</link>
		<comments>https://billauer.se/blog/2023/03/perl-dollar-match-end-of-string/#comments</comments>
		<pubDate>Mon, 20 Mar 2023 02:59:26 +0000</pubDate>
		<dc:creator>eli</dc:creator>
				<category><![CDATA[perl]]></category>

		<guid isPermaLink="false">https://billauer.se/blog/?p=6816</guid>
		<description><![CDATA[Who ate my newline? It&#8217;s 2023, Perl is ranked below COBOL, but I still consider it as my loyal working horse. But even the most loyal horse will give you a grand kick in the bottom every now and then. So let&#8217;s jump to the problematic code: #!/usr/bin/perl use warnings; use strict; my $str = [...]]]></description>
			<content:encoded><![CDATA[<h3>Who ate my newline?</h3>
<p>It&#8217;s 2023, Perl is ranked below COBOL, but I still consider it as my loyal working horse. But even the most loyal horse will give you a grand kick in the bottom every now and then.</p>
<p>So let&#8217;s jump to the problematic code:</p>
<pre><span class="hljs-comment">#!/usr/bin/perl</span>
<span class="hljs-keyword">use</span> warnings;
<span class="hljs-keyword">use</span> strict;

<span class="hljs-keyword">my</span> $str = <span class="hljs-string">".\n\n"</span>;

<span class="hljs-keyword">my</span> $nonn = <span class="hljs-regexp">qr/[ \t]|(?&lt;!\n)\n(?!\n)/</span>;

<span class="hljs-keyword">my</span> ($pre, $match, $post) = ($str =~ <span class="hljs-regexp">/^($nonn*)(.*?)($nonn*)$/s</span>);

<span class="hljs-keyword">print</span> <span class="hljs-string">"pre = \"$pre\"\n"</span>;
<span class="hljs-keyword">print</span> <span class="hljs-string">"match = \"$match\"\n"</span>;
<span class="hljs-keyword">print</span> <span class="hljs-string">"post = \"$post\"\n"</span>;

<span class="hljs-keyword">print</span> <span class="hljs-string">"This doesn't add up!\n"</span>
  <span class="hljs-keyword">unless</span> ($str eq <span class="hljs-string">"$pre$match$post"</span>);</pre>
<p>For now, never mind what I tried to do here. Let&#8217;s just note that $nonn doesn&#8217;t capture anything: Those two expressions with parentheses are a lookbehind and a lookahead, and hence don&#8217;t capture.</p>
<p>So now let&#8217;s look at</p>
<pre><span class="hljs-keyword">my</span> ($pre, $match, $post) = ($str =~ <span class="hljs-regexp">/^($nonn*)(.*?)($nonn*)$/s</span>);</pre>
<p>This is an enclosure between ^ and $, and everything in the middle is captured into three matches. So no matter what, the concatenation of these three matches should equal $str, shouldn&#8217;t it? Let&#8217;s give it a test run:</p>
<pre>$ <strong>./try.pl</strong>
pre = ""
match = ".
"
post = ""
<span class="punch">This doesn't add up!</span></pre>
<p>So $pre and $post are empty. OK, fine. Hence $match should equal $str, which is &#8220;.\n\n&#8221;. But I see only one newline. Where&#8217;s the other one?</p>
<h3>RTFM</h3>
<p>The one thing that I really like about Perl, is that even when it plays a dirty trick, the answer is in the plain manual. As in &#8220;man perlre&#8221;, where it says, black on white in the description of $:</p>
<blockquote><p>Match the end of the string (<span class="punch">or before newline at the end of the string</span>; or before any newline if /m is used)</p></blockquote>
<p>So there we have it. &#8220;$&#8221; can also consider the character before the last newline as the end. Note that &#8220;$&#8221; itself will not match the last newline, so even if there&#8217;s a capture on the &#8220;$&#8221; itself, as in &#8220;($)&#8221;, that last newline is still not captured. It&#8217;s a Perl quirk. One of those things that make Perl do exactly what you really want, except for when you&#8217;re surgical about it.</p>
<p>I&#8217;ve been using Perl a lot for 20 years, and I wasn&#8217;t aware that &#8220;$&#8221; could match anything but the end of the string (let alone the &#8220;/m&#8221; modifier).</p>
<p>So that&#8217;s what happened above: $ considered the character before the last newline to be the end, and one newline went up in smoke.</p>
<h3>Use \z instead</h3>
<p>The second thing that I really like about Perl, is that even when it&#8217;s quirky, there&#8217;s always a simple solution. The same &#8220;man perlre&#8221; also says:</p>
<blockquote><p>\z     Match only at end of string</p></blockquote>
<p>Simple, isn&#8217;t it? From now on and until the end of time, always use \z if you really mean the end of string. Like, character-wise. And if I change &#8220;$&#8221; to &#8220;\z&#8221; in the code above, I get:</p>
<pre><span class="hljs-keyword">my</span> ($pre, $match, $post) = ($str =~ <span class="hljs-regexp">/^($nonn*)(.*?)($nonn*)\z/s</span>);</pre>
<p>and the test run gives:</p>
<pre>$ ./try.pl
pre = ""
match = ".

"
post = ""</pre>
<p>The working horse is back on track again.</p>
<h3>What I really wanted to do</h3>
<p>Since I messed up with this regex, I should maybe explain what it does:</p>
<pre><span class="hljs-keyword">my</span> $nonn = <span class="hljs-regexp">qr/[ \t]|(?&lt;!\n)\n(?!\n)/</span>;</pre>
<p>First, let&#8217;s note that $nonn only matches one character (or none): It&#8217;s either a plain space, a tab or a newline. But what&#8217;s the mess with the newline?</p>
<p>The &#8220;(?&lt;!\n)\n(?!\n)&#8221; part says this: Match a \n character that isn&#8217;t preceded by a \n, and isn&#8217;t followed by a \n. Or in other words, match a newline only if it isn&#8217;t part of a sequence of newlines. Only if it&#8217;s one, isolated \n.</p>
<p>No double \n. Or for short, &#8220;nonn&#8221;.</p>
<p>I needed this for a script that handles multiple newlines later on (in LaTeX, a double newline means a new paragraph, that&#8217;s the reason).</p>
<p>And it actually worked. The &#8220;\n\n&#8221; part in the string wasn&#8217;t matched into neither $pre nor $post. But the (.*?), which attempts to match as little as possible, sold off the last newline to $. Tricky stuff.</p>
]]></content:encoded>
			<wfw:commentRss>https://billauer.se/blog/2023/03/perl-dollar-match-end-of-string/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Using git send-email with Gmail + OAUTH2, but without subscribing to cloud services</title>
		<link>https://billauer.se/blog/2022/10/git-send-email-with-oauth2-gmail/</link>
		<comments>https://billauer.se/blog/2022/10/git-send-email-with-oauth2-gmail/#comments</comments>
		<pubDate>Sun, 30 Oct 2022 09:08:44 +0000</pubDate>
		<dc:creator>eli</dc:creator>
				<category><![CDATA[email]]></category>
		<category><![CDATA[Internet]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[perl]]></category>

		<guid isPermaLink="false">https://billauer.se/blog/?p=6761</guid>
		<description><![CDATA[Introduction There is a widespread belief, that in order to use git send-email with Gmail, there&#8217;s a need to subscribe to Google Cloud services and obtain some credentials. Or that a two-factor authentication (2fa) is required. This is not the case, however. If Thunderbird can manage to fetch and send emails through Google&#8217;s mail servers [...]]]></description>
			<content:encoded><![CDATA[<h3>Introduction</h3>
<p>There is a widespread belief, that in order to use git send-email with Gmail, there&#8217;s a need to subscribe to Google Cloud services and obtain some credentials. Or that a two-factor authentication (2fa) is required.</p>
<p>This is not the case, however. If Thunderbird can manage to fetch and send emails through Google&#8217;s mail servers (as well as other OAUTH2 authenticated mail services), there&#8217;s no reason why a utility won&#8217;t be able to do the same.</p>
<p>The subscription to Google&#8217;s services is indeed required if the communication with Google&#8217;s server must be done without human supervision. That&#8217;s the whole point with API keys. If a human is around when the mail is dispatched, there&#8217;s no need for any special measures. And it&#8217;s quite obvious that there&#8217;s a responsive human around when a patch is being submitted.</p>
<p>What is actually needed, is a client ID and a client secret, and these are indeed obtained by registering to Google&#8217;s cloud service (<a href="https://gitlab.com/fetchmail/fetchmail/-/blob/e92e57cb1ce93b5a09509e65f26bbb5aee5de533/README.OAUTH2" target="_blank">this</a> explains how). But here&#8217;s the thing: Someone at Mozilla has already obtained these, and hardcoded them into Thunderbird itself. So there&#8217;s no problem using these to access Gmail with another mail client. It seems like many believe that the client ID and secret must be related to the mail account to access, and therefore each and every one has to obtain their own pair. That&#8217;s a mistake that has made a lot of people angry for nothing.</p>
<p>This post describes how to use git send-email without any further involvement with Google, except for having a Gmail account. The same method surely applies for other mail service providers that rely on OAUTH2, but I haven&#8217;t gotten into that. It should be quite easy to apply the same idea to other services as well however.</p>
<p>For this to work, Thunderbird must be configured to access the same email account. This doesn&#8217;t mean that you actually have to use Thunderbird for your mail exchange. It&#8217;s actually enough to configure the Gmail server as an <strong>outgoing</strong> mail server for the relevant account. In other words, you don&#8217;t even need to fetch mails from the server with Thunderbird.</p>
<p>The point is to make Thunderbird set up the OAUTH2 session, and then fetch the relevant piece of credentials from it. And take it from there with Google&#8217;s servers. Thunderbird is a good candidate for taking care of the session&#8217;s setup, because the whole idea with OAUTH2 is that the user / password session (plus possible additional authentication challenges) is done with a browser. Since Thunderbird is Firefox in disguise, it integrates the browser session well into its general flow.</p>
<p>If you want to use another piece of software to maintain the OAUTH2 session, that&#8217;s most likely possible, given that you can get its refresh token. This will also require obtaining its client ID and client secret. Odds are that it can be found somewhere in that software&#8217;s sources, exactly as I found it for Thunderbird. Or look at the https connection it runs to get an access token (which isn&#8217;t all that easy, encryption and that).</p>
<h3>Outline of solution</h3>
<p>All below relates to Linux Mint 19, Thunderbird 91.10.0, git version 2.17.1, Perl 5.26 and msmtp 1.8.14. But except for Thunderbird and msmtp, I don&#8217;t think the versions are going to matter.</p>
<p>It&#8217;s highly recommended to read through my <a rel="noopener" href="https://billauer.se/blog/2022/06/fetchmail-gmail-lsa-oauth2/" target="_blank">blog post on OAUTH2</a>, in particular the section called &#8220;The authentication handshake in a nutshell&#8221;. You&#8217;re going to need to know the difference between an access token and a refresh token sooner or later.</p>
<p>So the first obstacle is the fact that git send-email relies on the system&#8217;s sendmail to send out the emails. That utility doesn&#8217;t support OAUTH2 at the time of writing this. So instead, I used msmtp, which is a drop-in replacement for sendmail, plus it supports OAUTH2 (since version 1.8.13).</p>
<p>msmtp identifies itself to the server by sending it an access token in the SMTP session (see a dump of a sample session below). This access token is short-lived (3600 seconds from Google as of writing this), so it can&#8217;t be fetched from Thunderbird just like that. In particular because most of the time Thunderbird doesn&#8217;t have it.</p>
<p>What Thunderbird does have is a refresh token. It&#8217;s a completely automatic task to ask Google&#8217;s server for the access token with the refresh token at hand. It&#8217;s also an easy task (once you&#8217;ve figured out how to do it, that is). It&#8217;s also easy to get the refresh token from Thunderbird, exactly in the same way as getting a saved password. In fact, Thunderbird treats the refresh token as a password.</p>
<p>msmtp allows executing an arbitrary program in order to get the password or the access token. So I wrote a Perl script (<a rel="noopener" href="https://github.com/billauer/oauth2-helper/blob/main/oauth2-helper.pl" target="_blank">oauth2-helper.pl</a>) that reads the refresh token from a file and gets an access token from Google&#8217;s server. This is how msmtp manages to authenticate itself.</p>
<p>So everything relies on this refresh token. In principle, it can change every time it&#8217;s used. In practice, as of today, Google&#8217;s servers don&#8217;t change it. It seems like the refresh token is automatically replaced every six months, but even if that&#8217;s true today, it may change.</p>
<p>But that doesn&#8217;t matter so much. All that is necessary is that the refresh token is correct once. If the refresh token goes out of sync with Google&#8217;s server, a simple user / password session rectifies this. And as of now, than virtually never happens.</p>
<p>So let&#8217;s get to the hands-on part.</p>
<h3>Install msmtp</h3>
<p>Odds are that your distribution offers msmtp, so it can be installed with something like</p>
<pre># apt install msmtp</pre>
<p>Note however that the version needs to be at least 1.8.13, which wasn&#8217;t my case (Linux Mint 19). So I installed it from the sources. To do that, first install the TLS library, if it&#8217;s not installed already (as root):</p>
<pre># apt install gnutls-dev</pre>
<p>Then clone the git repository, compile and install:</p>
<pre>$ GIT_SSL_NO_VERIFY=true git clone http://git.marlam.de/git/msmtp.git
$ cd msmtp
$ git checkout msmtp-1.8.14
$ autoreconf -i
$ ./configure
$ make &amp;&amp; echo Success
$ sudo make install</pre>
<p>The installation goes to /usr/local/bin and other /usr/local/ paths, as one would expect.</p>
<p>I checked out version 1.8.14 because later versions failed to compile on my Linux Mint 19. OAUTH2 support was added in 1.8.13, and judging by the commit messages it hasn&#8217;t been changed since, except for commit 1f3f4bfd098, which is &#8220;Send XOAUTH2 in two lines, required by Microsoft servers&#8221;. Possibly cherry-pick this commit (I didn&#8217;t).</p>
<p>Once everything has been set up as described below, it&#8217;s possible to send an email with</p>
<pre>$ msmtp -v -t &lt; ~/email.eml</pre>
<p>The -v flag is used only for debugging, and it prints out the entire SMTP session.</p>
<p>The -t flag tells msmtp to fetch the recipients from the mail&#8217;s own headers. Otherwise, the recipients need to be listed in the command line, just like sendmail. Without this flag or recipients, msmtp just replies with</p>
<pre>msmtp: no recipients found</pre>
<p>The -t flag isn&#8217;t necessary with git send-email, because it explicitly lists the recipients in the command line.</p>
<h3>The oauth2-helper.pl script</h3>
<p>As mentioned above, Thunderbird has the refresh token, but msmtp needs an access token. So the script that talks with Google&#8217;s server and grabs the access token can be downloaded from its <a rel="noopener" href="https://github.com/billauer/oauth2-helper/blob/main/oauth2-helper.pl" target="_blank">Github repo</a>. Save it, with execution permission to /usr/local/bin/oauth2-helper.pl (or whatever, but this is what I assume in the configurations below).</p>
<p>Some Perl libraries may be required to run this script. On a Debian-based system, the packages&#8217; names are  probably something like libhttp-message-perl, libwww-perl and libjson-perl.</p>
<p>It&#8217;s written to access Google&#8217;s token server, but can be modified easily to access a different service provider by changing the parameters at its beginning. For other email providers, check if it happens to be listed in <a rel="noopener" href="https://github.com/mozilla/releases-comm-central/blob/master/mailnews/base/src/OAuth2Providers.sys.mjs" target="_blank">OAuth2Providers.sys.mjs</a>. I don&#8217;t know how well it will work with those other providers, though.</p>
<p>The script reads the refresh token from ~/.oauth2_reftoken as a plain file containing the blob only. There&#8217;s an inherent security risk of having this token stored like this, but it&#8217;s basically the same risk as the fact that it can be obtained from Thunderbird&#8217;s credential files. The difference is the amount of security by obscurity. Anyhow, the reference token isn&#8217;t your password, and it can&#8217;t be derived from it. Either way, make sure that this file has a 0600 or 0400 permission, if you&#8217;re running on a multi-user computer.</p>
<p>The script caches the access token in ~/.oauth2_acctoken, with an expiration timestamp. As of today, it means that the script talks with the Google&#8217;s server once in 60 minutes at most.</p>
<h3>Setting up config files</h3>
<p>So with msmtp installed and the script downloaded into /usr/local/bin/oauth2-helper.pl, all that is left is configuration files.</p>
<p>First, create ~/.msmtprc as follows (put your Gmail username instead of mail.username, of course):</p>
<pre>account default
host smtp.gmail.com
port 587
tls on
tls_starttls on
auth xoauth2
user mail.username
passwordeval /usr/local/bin/oauth2-helper.pl
from mail.username@gmail.com</pre>
<p>And then change the [sendemail] section in ~/.gitconfig to</p>
<pre>[sendemail]
        smtpServer = /usr/local/bin/msmtp</pre>
<p>That&#8217;s it. Only that single line. It&#8217;s however possible to use smtpServerOption in the .gitconfig to add various flags. So for example, to get the entire SMTP session shown while sending the email, it should say:</p>
<pre>[sendemail]
        smtpServer = /usr/local/bin/msmtp
        smtpServerOption = <span class="punch">-v
</span></pre>
<p>But really, don&#8217;t, unless there&#8217;s a problem sending mails.</p>
<p>Other than that, don&#8217;t keep old settings. For example, there should <strong>not</strong> be a &#8220;from=&#8221; entry in .gitconfig. Having such causes a &#8220;From:&#8221; header to be added into the mail body (so it&#8217;s visible to the reader of the mail). This header is created when there is a difference between the &#8220;From&#8221; that is generated by git send-email (which is taken from the &#8220;from=&#8221; entry) and the patch&#8217; author, as it appears in the patch&#8217; &#8220;From&#8221; header. The purpose of this in-body header is to tell &#8220;git am&#8221; who the real author is (i.e. not the sender of the patch). So this extra header won&#8217;t appear in the commit, but it nevertheless makes the sender of the message look somewhat clueless.</p>
<p>So in short, no old junk.</p>
<h3>Sending a patch</h3>
<p>Unless it&#8217;s the first time, I suggest just trying to send the patch to your own email address, and see if it works. There&#8217;s a good chance that the refresh token from the previous time will still be good, so it will just work, and no point hassling more.</p>
<p>Actually, it&#8217;s fine to try like this even on the first time, because the Perl script will fail to grab the access token and then tell you what to do to fix it, namely:</p>
<ul>
<li>Make sure that Thunderbird has access to the mail account itself, possibly by attempting to send an email through Gmail&#8217;s server.</li>
<li>Go to Thunderbird&#8217;s Preferences &gt; Privacy &amp; Security and click on Saved Passwords. Look for the account, where the Provider start with oauth://. Right-click that line and choose &#8220;Copy Password&#8221;.</li>
<li>Create or open ~/.oauth2_reftoken, and paste the blob into that file, so it contains only that string. No need to be uptight with newlines and whitespaces: They are ignored.</li>
</ul>
<p>And then go, as usual:</p>
<pre>$ git send-email --to 'my@test.mail' 0001-my.patch</pre>
<p>I&#8217;ve added the output of a successful session (with the -v flag) below.</p>
<h3>Room for improvements</h3>
<p>It would have been nicer to fetch the refresh token automatically from Thunderbird&#8217;s credentials store (that is from logins.json, based upon the decryption key that is kept in key4.db), but the available scripts for that are written in Python. And to me Python is equal to &#8220;will cause trouble sooner or later&#8221;. Anyhow, <a rel="noopener" href="https://apr4h.github.io/2019-12-20-Harvesting-Browser-Credentials/" target="_blank">this tutorial</a> describes the mechanism (in the part about Firefox).</p>
<p>Besides, it could have been even nicer if the script was completely standalone, and didn&#8217;t depend on Thunderbird at all. That requires doing the whole dance with the browser, something I have no motivation to get into.</p>
<h3>A successful session</h3>
<p>This is what it looks like when a patch is properly sent, with the smtpServerOption = -v line in .gitignore (so msmtp produces verbose output):</p>
<pre><span class="yadayada">Send this email? ([y]es|[n]o|[q]uit|[a]ll): y</span>
ignoring system configuration file /usr/local/etc/msmtprc: No such file or directory
loaded user configuration file /home/eli/.msmtprc
falling back to default account
Fetching access token based upon refresh token in /home/eli/.oauth2_reftoken...
using account default from /home/eli/.msmtprc
host = smtp.gmail.com
port = 587
source ip = (not set)
proxy host = (not set)
proxy port = 0
socket = (not set)
timeout = off
protocol = smtp
domain = localhost
auth = XOAUTH2
user = mail.username
password = *
passwordeval = /usr/local/bin/oauth2-helper.pl
ntlmdomain = (not set)
tls = on
tls_starttls = on
tls_trust_file = system
tls_crl_file = (not set)
tls_fingerprint = (not set)
tls_key_file = (not set)
tls_cert_file = (not set)
tls_certcheck = on
tls_min_dh_prime_bits = (not set)
tls_priorities = (not set)
tls_host_override = (not set)
auto_from = off
maildomain = (not set)
from = mail.username@gmail.com
set_from_header = auto
set_date_header = auto
remove_bcc_headers = on
undisclosed_recipients = off
dsn_notify = (not set)
dsn_return = (not set)
logfile = (not set)
logfile_time_format = (not set)
syslog = (not set)
aliases = (not set)
reading recipients from the command line
&lt;-- 220 smtp.gmail.com ESMTP m8-20020a7bcb88000000b003c6d21a19a0sm3316430wmi.29 - gsmtp
--&gt; EHLO localhost
&lt;-- 250-smtp.gmail.com at your service, [109.186.183.118]
&lt;-- 250-SIZE 35882577
&lt;-- 250-8BITMIME
&lt;-- 250-STARTTLS
&lt;-- 250-ENHANCEDSTATUSCODES
&lt;-- 250-PIPELINING
&lt;-- 250-CHUNKING
&lt;-- 250 SMTPUTF8
--&gt; STARTTLS
&lt;-- 220 2.0.0 Ready to start TLS
TLS session parameters:
    (TLS1.2)-(ECDHE-ECDSA-SECP256R1)-(CHACHA20-POLY1305)
TLS certificate information:
    Subject:
        CN=smtp.gmail.com
    Issuer:
        C=US,O=Google Trust Services LLC,CN=GTS CA 1C3
    Validity:
        Activation time: Mon 26 Sep 2022 11:22:04 AM IDT
        Expiration time: Mon 19 Dec 2022 10:22:03 AM IST
    Fingerprints:
        SHA256: 53:F3:CA:1D:37:F2:1F:ED:2C:67:40:A2:A2:29:C2:C8:E8:AF:9E:60:7A:01:92:EC:F0:2A:11:E8:37:A5:88:F3
        SHA1 (deprecated): D4:69:6E:59:2D:75:43:59:02:74:25:67:E7:57:40:E0:28:43:A8:62
--&gt; EHLO localhost
&lt;-- 250-smtp.gmail.com at your service, [109.186.183.118]
&lt;-- 250-SIZE 35882577
&lt;-- 250-8BITMIME
&lt;-- 250-AUTH LOGIN PLAIN XOAUTH2 PLAIN-CLIENTTOKEN OAUTHBEARER XOAUTH
&lt;-- 250-ENHANCEDSTATUSCODES
&lt;-- 250-PIPELINING
&lt;-- 250-CHUNKING
&lt;-- 250 SMTPUTF8
--&gt; AUTH XOAUTH2 dXNlcj1lbGkuYmlsbGF1ZXIBYXV0aD1CZWFyZXIgeWEyOS5hMEFhNHhyWE1GM1gtOTJMVWNidjE4MFdVOBROENRcUdSbk5KaUFSY0VSckVaXzdzbDlHMTNpdFIyUTk0NjlKWG45aHVGLQVRBU0FSTVXJpSjRqMjBLcWh6WU9GekxlcU5BYVpFNUU4WXRhNjdLUXpCRm1HRDg3dFgzeHJ4amNPTnRVTkZFVWdESXhsUlcxOFhVT0pqQ1hPSlFwZlNGUUVqRHZMOWw4RExkTjlKZlNbGRTazNNbFNMNjVfQWFDZ1lLVVF2Y0luOWNSSUEwMTY2AQE=
&lt;-- 235 2.7.0 Accepted
--&gt; MAIL FROM:&lt;mail.username@gmail.com&gt;
--&gt; RCPT TO:&lt;test@mail.com&gt;
--&gt; RCPT TO:&lt;mail.username@gmail.com&gt;
--&gt; DATA
&lt;-- 250 2.1.0 OK m8-20020a7bcb88000000b003c6d21a19a0sm3316430wmi.29 - gsmtp
&lt;-- 250 2.1.5 OK m8-20020a7bcb88000000b003c6d21a19a0sm3316430wmi.29 - gsmtp
&lt;-- 250 2.1.5 OK m8-20020a7bcb88000000b003c6d21a19a0sm3316430wmi.29 - gsmtp
&lt;-- 354  Go ahead m8-20020a7bcb88000000b003c6d21a19a0sm3316430wmi.29 - gsmtp
--&gt; From: Eli Billauer &lt;mail.username@gmail.com&gt;
--&gt; To: test@mail.com
--&gt; Cc: Eli Billauer &lt;mail.username@gmail.com&gt;
--&gt; Subject: [PATCH v8] Gosh! Why don't you apply this patch already!
--&gt; Date: Sun, 30 Oct 2022 07:01:14 +0200
--&gt; Message-Id: &lt;20221030050114.49299-1-mail.username@gmail.com&gt;
--&gt; X-Mailer: git-send-email 2.17.1
--&gt; 

<span class="yadayada">[ ... email body comes here ... ]</span>

--&gt; --
--&gt; 2.17.1
--&gt;
--&gt; .
&lt;-- 250 2.0.0 OK  1667106108 m8-20020a7bcb88000000b003c6d21a19a0sm3316430wmi.29 - gsmtp
--&gt; QUIT
&lt;-- 221 2.0.0 closing connection m8-20020a7bcb88000000b003c6d21a19a0sm3316430wmi.29 - gsmtp
OK. Log says:
Sendmail: /usr/local/bin/msmtp -v -i test@mail.com mail.username@gmail.com
From: Eli Billauer &lt;mail.username@gmail.com&gt;
To: test@mail.com
Cc: Eli Billauer &lt;mail.username@gmail.com&gt;
Subject: [PATCH v8] Gosh! Why don't you apply this patch already!
Date: Sun, 30 Oct 2022 07:01:14 +0200
Message-Id: &lt;20221030050114.49299-1-mail.username@gmail.com&gt;
X-Mailer: git-send-email 2.17.1

Result: OK</pre>
<p>Ah, and the fact that the access token can be copied from here is of course meaningless, as it has expired long ago.</p>
<h3>Thunderbird debug notes</h3>
<p>These are some random notes I made while digging in Thunderbird&#8217;s guts to find out what&#8217;s going on.</p>
<p>So this is Thunderbird&#8217;s official <a rel="noopener" href="https://github.com/mozilla/releases-comm-central" target="_blank">git repo</a>. Not that I used it.</p>
<p>To get logging info from Thunderbird: Based upon <a rel="noopener" href="https://wiki.mozilla.org/MailNews:Logging#Setting_Thunderbird_Preference" target="_blank">this page</a>, go to Thunderbird&#8217;s preferences &gt; General and click the Config Editor button. Set mailnews.oauth.loglevel to All (was Warn). Same with mailnews.smtp.loglevel. Then open the Error Console with Ctrl+Shift+J.</p>
<p>The cute thing about these logs is that the access code is written in the log. So it&#8217;s possible to skip the Perl script, and use the access code from Thunderbird&#8217;s log. Really inconvenient, but possible.</p>
<p>The OAuth2 token requests is implemented in <a rel="noopener" href="https://github.com/mozilla/releases-comm-central/blob/master/mailnews/base/src/OAuth2.jsm" target="_blank">Oauth2.jsm</a>. It&#8217;s possible to make a breakpoint in this module by through Tools &gt; Developer Tools &gt; Developer Toolbox, and once it opens (after requesting permission for external connection), go to the debugger.</p>
<p>Find Oauth2.jsm in the sources pane to the left (of the Debugger tab), under resource:// modules &gt; sessionstore. Add a breakpoint in requestAccessToken() so that the clientID and consumerSecret properties can be revealed.</p>
<h3><span style="color: #888888;">Sending a patch from Thunderbird directly</span></h3>
<p><span style="color: #888888;">This is a really bad idea. But if you have Thunderbird, and need to send a patch right now, this is a quick, dirty and somewhat dangerous procedure for doing that.</span></p>
<p><span style="color: #888888;">Why is it dangerous? Because at some point, it&#8217;s easy to pick &#8220;Send now&#8221; instead of &#8220;Send later&#8221;, and boom, a junk patch is mailed to the whole world.</span></p>
<p><span style="color: #888888;">The problem with Thunderbird is that it makes small changes into the patch&#8217; body. So to work around this, there&#8217;s a really silly procedure. I used it once, and I&#8217;m not proud of that.</span></p>
<p><span style="color: #888888;">So here we go.</span></p>
<p><span style="color: #888888;">First, a very simple script that outputs the patch mail into a file. Say that I called it dumpit (should be executable, of course):</span></p>
<pre><span style="color: #888888;">#!/bin/bash

cat &gt; /home/eli/Desktop/git-send-email.eml
</span></pre>
<p><span style="color: #888888;">Then change ~/.gitconfig, so it reads something like this in the [sendemail] section:</span></p>
<pre><span style="color: #888888;">[sendemail]
        from = mail.username@gmail.com
        smtpServer = /home/eli/Desktop/dumpit
</span></pre>
<p><span style="color: #888888;">So basically it uses the silly script as a mail server, and the content goes out to a plain file.</span></p>
<p><span style="color: #888888;">Then run git send-email as usual. The result is a git-send-email.eml as a file.</span></p>
<p><span style="color: #888888;">And now comes the part of making Thunderbird send it.</span></p>
<ul>
<li><span style="color: #888888;">Close Thunderbird. All windows.</span></li>
<li><span style="color: #888888;">Change directory to where Thunderbird keeps its profile files, to under Mail/Local Folders</span></li>
<li><span style="color: #888888;">Remove &#8220;Unsent Messages&#8221; and &#8220;Unsent Messages.msf&#8221;</span></li>
<li><span style="color: #888888;">Open Thunderbird again</span></li>
<li><span style="color: #888888;">Inside Thunderbird, go to Hamburger Icon &gt; File &gt; Open &gt; Saved Message&#8230; and select git-send-email.eml. The email message should appear.</span></li>
<li><span style="color: #888888;">Right-Click somewhere in the message&#8217;s body, and pick Edit as New Message&#8230;</span></li>
<li><span style="color: #888888;"><strong>Don&#8217;t send this message as is</strong>! It&#8217;s completely messed up. In particular, there are some indentations in the patch itself, which renders it useless.</span></li>
<li><span style="color: #888888;">Instead, pick File &gt; Send Later.</span></li>
<li><span style="color: #888888;">Once again, close Thunderbird. All windows.</span></li>
<li><span style="color: #888888;">Remove &#8220;Unsent Messages.msf&#8221; (only)</span></li>
<li><span style="color: #888888;">Edit &#8220;Unsent Messages&#8221; as follows: Everything under the &#8220;Content-Transfer-Encoding: 7bit&#8221; part is the mail&#8217;s body. So remove the &#8220;From:&#8221; line after it, and paste the email&#8217;s body from git-send-email.eml instead.</span></li>
<li><span style="color: #888888;">Note that there are normally two blank lines after the mail&#8217;s body. Retain them.</span></li>
<li><span style="color: #888888;">Open Thunderbird again. Verify that those indentations are away.</span></li>
<li><span style="color: #888888;">Look at the mail inside Outbox, and verify that it&#8217;s OK now. These are the three things to look for in particular:</span>
<ul>
<li><span style="color: #888888;">The &#8220;From:&#8221; part at the beginning of the message is gone.</span></li>
<li><span style="color: #888888;">At the end of the message, there&#8217;s a &#8220;&#8211;&#8221; and git&#8217;s version number. These should be in <strong>separate lines</strong>.</span></li>
<li><span style="color: #888888;">Look at the mail&#8217;s source. The &#8220;+&#8221; and &#8220;-&#8221; signs of the diffs must not be indented.</span></li>
</ul>
</li>
<li><span style="color: #888888;">If all is fine, right-click Outbox, and pick &#8220;Send unsent messages&#8221;. And hope for good.</span></li>
</ul>
<p><span style="color: #888888;">Are you sure you want to do this?</span></p>
]]></content:encoded>
			<wfw:commentRss>https://billauer.se/blog/2022/10/git-send-email-with-oauth2-gmail/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Blocking bots by their IP addresses, the DIY version</title>
		<link>https://billauer.se/blog/2022/08/spiders-bots-denial-iptables-ipset/</link>
		<comments>https://billauer.se/blog/2022/08/spiders-bots-denial-iptables-ipset/#comments</comments>
		<pubDate>Tue, 16 Aug 2022 10:26:37 +0000</pubDate>
		<dc:creator>eli</dc:creator>
				<category><![CDATA[Internet]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[perl]]></category>
		<category><![CDATA[Server admin]]></category>

		<guid isPermaLink="false">https://billauer.se/blog/?p=6670</guid>
		<description><![CDATA[Introduction I had some really annoying bots on one of my websites. Of the sort that make a million requests (like really, a million) per month, identifying themselves as a browser. So IP blocking it is. I went for a minimalistic DIY approach. There are plenty of tools out there, but my experience with things [...]]]></description>
			<content:encoded><![CDATA[<h3>Introduction</h3>
<p>I had some really annoying bots on one of my websites. Of the sort that make a million requests (like really, a million) per month, identifying themselves as a browser.</p>
<p>So IP blocking it is. I went for a minimalistic DIY approach. There are plenty of tools out there, but my experience with things like this is that in the end, it&#8217;s me and the scripts. So I might as well write them myself.</p>
<h3>The IP set feature</h3>
<p>Iptables has an IP set module, which allows feeding it with a set of random IP addresses. Internally, it creates a hash with these addresses, so it&#8217;s an efficient way to keep track of multiple addresses.</p>
<p>IP sets has been in the kernel since ages, but it has to be opted in the kernel with CONFIG_IP_SET. Which it most likely is.</p>
<p>The ipset utility may need to be installed, with something like</p>
<pre># apt install ipset</pre>
<p>There seems to be a protocol mismatch issue with the kernel, which apparently is a non-issue. But every time something goes wrong with ipset, there&#8217;s a warning message about this mismatch, which is misleading. So it looks something like this.</p>
<pre># ipset <span class="yadayada">[ ... something stupid or malformed ... ]</span>
ipset v6.23: Kernel support protocol versions 6-7 while userspace supports protocol versions 6-6
<span class="yadayada">[ ... some error message related to the stupidity ... ]</span></pre>
<p>So the important thing is to be aware of is that odds are that the problem isn&#8217;t the version mismatch, but between chair and keyboard.</p>
<h3>Hello, world</h3>
<p>A quick session</p>
<pre># ipset create testset hash:ip
# ipset add testset 1.2.3.4
# iptables -I INPUT -m set --match-set testset src -j DROP
# ipset del testset 1.2.3.4</pre>
<p>Attempting to add an IP address that is already in the list causes a warning, and the address isn&#8217;t added. So no need to check if the address is already there. Besides, there the -exist option, which is really great.</p>
<p>List the members of the IP set:</p>
<pre># ipset -L</pre>
<h3>Timeout</h3>
<p>An entry can have a timeout feature, which works exactly as one would expect: The rule vanishes after the timeout expires. The timeout entry in ipset -L counts down.</p>
<p>For this to work, the set must be created with a default timeout attribute. Zero means that timeout is disabled (which I chose as a default in this example).</p>
<pre># ipset create testset hash:ip timeout 0
# ipset add testset 1.2.3.4 timeout 10</pre>
<p>The &#8216;-exist&#8217; flag causes ipset to re-add an existing entry, which also resets its timeout. So this is the way to keep the list fresh.</p>
<h3>Don&#8217;t put the DROP rule first</h3>
<p>It&#8217;s tempting to put the DROP rule with &#8211;match-set first, because hey, let&#8217;s give those intruders the boot right away. But doing that, there might be TCP connections lingering, because the last FIN packet is caught by the firewall as the new rule is added. Given that adding an IP address is the result of a flood of requests, this is a realistic scenario.</p>
<p>The solution is simple: There&#8217;s most likely a &#8220;state RELATED,ESTABLISHED&#8221; rule somewhere in the list. So push it to the top. The rationale is simple: If a connection has begun, don&#8217;t chop it in the middle in any case. It&#8217;s the first packet that we want killed.</p>
<h3>Persistence</h3>
<p>The rule in iptables must refer to an existing set. So if the rule that relies on the set is part of the persistent firewall rules, it must be created before the script that brings up iptables runs.</p>
<p>This is easily done by adding a rule file like this as /usr/share/netfilter-persistent/plugins.d/10-ipset</p>
<pre><span class="hljs-meta">#!/bin/sh</span>

IPSET=/sbin/ipset
SET=mysiteset

<span class="hljs-keyword">case</span> <span class="hljs-string">"<span class="hljs-variable">$1</span>"</span> <span class="hljs-keyword">in</span>
start|restart|reload|force-reload)
	<span class="hljs-variable">$IPSET</span> destroy
	<span class="hljs-variable">$IPSET</span> create <span class="hljs-variable">$SET</span> <span class="hljs-built_in">hash</span>:ip <span class="hljs-built_in">timeout</span> 0
	;;

save)
	<span class="hljs-built_in">echo</span> <span class="hljs-string">"ipset-persistent: The save option does nothing"</span>
	;;

stop|flush)
	<span class="hljs-variable">$IPSET</span> flush <span class="hljs-variable">$SET</span>
	;;
*)
    <span class="hljs-built_in">echo</span> <span class="hljs-string">"Usage: <span class="hljs-variable">$0</span> {start|restart|reload|force-reload|save|flush}"</span> &gt;&amp;2
    <span class="hljs-built_in">exit</span> 1
    ;;
<span class="hljs-keyword">esac</span>

<span class="hljs-built_in">exit</span> 0</pre>
<p>The idea is that the index 10 in the file&#8217;s name is smaller than the rule that sets up iptables, so it runs first.</p>
<p>This script is a dirty hack, but hey, it works. There&#8217;s a <a rel="noopener" href="https://sourceforge.net/p/ipset-persistent/wiki/Home/" target="_blank">small project</a> on this, for those who like to do it properly.</p>
<p>The operating system in question is systemd-based, but this old school style is still in effect.</p>
<h3>Maybe block by country?</h3>
<p>Since all offending requests came from the same country (cough, cough, China, from more than 4000 different IP addresses) I&#8217;m considering to block them in one go. A list of 4000+ IP addresses that I busted in August 2022 with aggressive bots (all from China) can be downloaded as a simple <a href="/download/china-bots-ip-addresses.txt.gz" target="_blank">compressed text file</a>.</p>
<p>So the idea is going something like</p>
<pre>ipset create foo hash:net
ipset add foo 192.168.0.0/24
ipset add foo 10.1.0.0/16
ipset add foo 192.168.0/24</pre>
<p>and download the per-country IP ranges from <a href="https://www.ipdeny.com/ipblocks/" target="_blank">IP deny</a>. That&#8217;s a simple and crude tool for denial by geolocation. The only thing that puts me down a bit is that it&#8217;s &gt; 7000 rules, so I wonder if that doesn&#8217;t put a load on the server. But what really counts is the number of sizes of submasks, because each submask size has its own hash. So if the list covers all possible  sizes, from a full /32 down to say, 16/, there are 17 hashes to look up for each packet arriving.</p>
<p>On the other hand, since the rule should be after the &#8220;state RELATED,ESTABLISHED&#8221; rule, it only covers SYN packets. And if this whole thing is put as late as possible in the list of rules, it boils down to handling only packets that are intended for the web server&#8217;s ports, or those that are going to be dropped anyhow. So compared with the CPU cycles of handling the http request, even 17 hashes isn&#8217;t all that much.</p>
<p>The biggest caveat is however if other websites are colocated on the server. It&#8217;s one thing to block offending IPs, but blocking a whole country from all sites, that&#8217;s a bit too much.</p>
<p><em>Note to self: In the end, I wrote a little Perl-XS module that says if the IP belongs to a group. Look for byip.pm.</em></p>
<h3>The blacklisting script</h3>
<p>The Perl script that performs the blacklisting is crude and inaccurate, but simple. This is the part to tweak and play with, and in particular adapt to each specific website. It&#8217;s all about detecting abnormal access.</p>
<p>Truth to be told, I replaced this script with a more sophisticated mechanism pretty much right away on my own system. But what&#8217;s really interesting is the calls to ipset.</p>
<p>This script reads through Apache&#8217;s access log file, and analyzes each minute in time (as in 60 seconds). In other words, all accesses that have the same timestamp, with the seconds part ignored. Note that the regex part that captures $time in the script ignores the last part of :\d\d.</p>
<p>If the same IP address appears more than 50 times, that address is blacklisted, with a timeout of 86400 seconds (24 hours). Log file that correspond to page requisites and such (images, style files etc.) are skipped for this purpose. Otherwise, it&#8217;s easy to reach 50 accesses within a minute with legit web browsing.</p>
<p>There are several imperfections about this script, among others:</p>
<ul>
<li>Since it reads through the entire log file each time, it keeps relisting each IP address until the access file is rotated away, and a new one is started. This causes an update of the timeout, so effectively the blacklisting takes place for up to 48 hours.</li>
<li>Looking in segments of accesses that happen to have the same minute in the timestamp is quite inaccurate regarding which IPs are caught and which aren&#8217;t.</li>
</ul>
<p>The script goes as follows:</p>
<pre><span class="hljs-comment">#!/usr/bin/perl</span>
<span class="hljs-keyword">use</span> warnings;
<span class="hljs-keyword">use</span> strict;

<span class="hljs-keyword">my</span> $logfile = <span class="hljs-string">'/var/log/mysite.com/access.log'</span>;
<span class="hljs-keyword">my</span> $limit = <span class="hljs-number">50</span>; <span class="hljs-comment"># 50 accesses per minute</span>
<span class="hljs-keyword">my</span> $timeout = <span class="hljs-number">86400</span>;

<span class="hljs-keyword">open</span>(<span class="hljs-keyword">my</span> $in, <span class="hljs-string">"&lt;"</span>, $logfile)
  <span class="hljs-keyword">or</span> <span class="hljs-keyword">die</span> <span class="hljs-string">"Can't open $logfile for read: $!\n"</span>;

<span class="hljs-keyword">my</span> $current = <span class="hljs-string">''</span>;
<span class="hljs-keyword">my</span> $l;
<span class="hljs-keyword">my</span> %h;
<span class="hljs-keyword">my</span> %blacklist;

<span class="hljs-keyword">while</span> (<span class="hljs-keyword">defined</span> ($l = &lt;$in&gt;)) {
  <span class="hljs-keyword">my</span> ($ip, $time, $req) = ($l =~ <span class="hljs-regexp">/^([^ ]+).*?\[(.+?):\d\d[ ].*?\"\w+[ ]+([^\"]+)/</span>);
  <span class="hljs-keyword">unless</span> (<span class="hljs-keyword">defined</span> $ip) {
    <span class="hljs-comment">#    warn("Failed to parse line $l\n");</span>
    <span class="hljs-keyword">next</span>;
  }

  <span class="hljs-keyword">next</span>
    <span class="hljs-keyword">if</span> ($req =~ <span class="hljs-regexp">/^\/(?:media\/|robots\.txt)/</span>);

  <span class="hljs-keyword">unless</span> ($time eq $current) {
    <span class="hljs-keyword">foreach</span> <span class="hljs-keyword">my</span> $k (<span class="hljs-keyword">sort</span> <span class="hljs-keyword">keys</span> %h) {
      $blacklist{$k} = <span class="hljs-number">1</span>
	<span class="hljs-keyword">if</span> ($h{$k} &gt;= $limit);
    }

    %h = ();
    $current = $time;
  }
  $h{$ip}++;
}

<span class="hljs-keyword">close</span> $in;

<span class="hljs-keyword">foreach</span> <span class="hljs-keyword">my</span> $k (<span class="hljs-keyword">sort</span> <span class="hljs-keyword">keys</span> %blacklist) {
  <span class="hljs-keyword">system</span>(<span class="hljs-string">'/sbin/ipset'</span>, <span class="hljs-string">'add'</span>, <span class="hljs-string">'-exist'</span>, <span class="hljs-string">'mysiteset'</span>, $k, <span class="hljs-string">'timeout'</span>, $timeout);
}</pre>
<p>It has to be run as root, of course. Most likely as a cronjob.</p>
]]></content:encoded>
			<wfw:commentRss>https://billauer.se/blog/2022/08/spiders-bots-denial-iptables-ipset/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Google Translate, LaTeX and asian languages: Technical notes</title>
		<link>https://billauer.se/blog/2022/08/google-translate-pdflatex-technical/</link>
		<comments>https://billauer.se/blog/2022/08/google-translate-pdflatex-technical/#comments</comments>
		<pubDate>Mon, 15 Aug 2022 07:18:50 +0000</pubDate>
		<dc:creator>eli</dc:creator>
				<category><![CDATA[Internet]]></category>
		<category><![CDATA[perl]]></category>
		<category><![CDATA[Software]]></category>

		<guid isPermaLink="false">https://billauer.se/blog/?p=6665</guid>
		<description><![CDATA[Introduction These post contains a few technical notes of using Google Translate for translating LaTeX documents into Chinese, Japanese and Korean. The insights on the language-related issues are written down in a separate post. Text vs. HTML Google&#8217;s cloud translator can be fed with either plain text or HTML, and it returns the same format. [...]]]></description>
			<content:encoded><![CDATA[<h3>Introduction</h3>
<p>These post contains a few technical notes of using Google Translate for translating LaTeX documents into Chinese, Japanese and Korean. The insights on the language-related issues are written down in a <a title="Translating technical documentation with Google Translate" href="https://billauer.se/blog/2022/08/google-translate-insights/" target="_blank">separate post</a>.</p>
<h3>Text vs. HTML</h3>
<p>Google&#8217;s cloud translator can be fed with either plain text or HTML, and it returns the same format. Plain text format is out of the question for anything but translating short sentences, as it becomes impossible to maintain the text&#8217;s formatting. So I went for the HTML interface.</p>
<p>The thing with HTML is that whitespaces can take different forms and shapes, and they are redundant in many situations. For example, a newline is often equivalent to a plain space, and neither make any difference between two paragraphs that are enclosed by &lt;p&gt; tags.</p>
<p>Google Translate takes this notion to the extreme, and typically removes all newlines from the original text. OK, that&#8217;s understandable. But it also adds and removes whitespaces where it had no business doing anything, in particular around meaningless segments that aren&#8217;t translated anyhow. This makes it quite challenging when feeding the results for further automatic processing.</p>
<h3>Setting up a Google Cloud account</h3>
<p>When creating a new Google Cloud account, there&#8217;s an automatic credit of $300 to spend for three months. So there&#8217;s plenty of room for much needed experimenting. Too see the status of the evaluation period, go to Billing &gt; Cost Breakdown and wait a minute or so for the &#8220;Free trial status&#8221; strip to appear at the top of the page. There&#8217;s no problem with &#8220;activating full account&#8221; immediately. The free trial credits remain, but it also means that real billing occurs when the credits are consumed and/or the trial period is over.</p>
<p>First create a new Google cloud account and enable the Google Translate API.</p>
<p>I went for Basic v2 translation (and not Advanced, v3). Their pricing is the same, but v3 is not allowed with an API key, and I really wasn&#8217;t into setting up a service account and struggle with OAuth2. The main advantage with v3 is the possibility to train the machine to adapt to a specific language pattern, but as mentioned in <a title="Translating technical documentation with Google Translate" href="https://billauer.se/blog/2022/08/google-translate-insights/" target="_blank">that separate post</a>, I&#8217;m hiding away anything but common English language patterns.</p>
<p>As for authentication, I went for <a rel="noopener" href="https://cloud.google.com/docs/authentication/api-keys" target="_blank">API keys</a>. I don&#8217;t need any personalized info, so that&#8217;s the simple way to go. To obtain the keys, go to main menu (hamburger icon) &gt; APIs and services &gt; Credentials and pick Create Credentials, and choose to create API keys. Copy the string and use it in the key=API_KEY parameters in POST requests. It&#8217;s possible to restrict the usage of this key in various ways (HTTP referrer, IP address etc.) but it wasn&#8217;t relevant in my case, because the script runs only on my computer.</p>
<p>The web interface for setting up cloud services is horribly slow, which is slightly ironic and a bit odd for a company like Google.</p>
<h3>The translation script</h3>
<p>I wrote a simple script for taking a piece of text in English and translating it into the language of choice:</p>
<pre><span class="hljs-comment">#!/usr/bin/perl</span>

<span class="hljs-keyword">use</span> warnings;
<span class="hljs-keyword">use</span> strict;
<span class="hljs-keyword">use</span> LWP::UserAgent;
<span class="hljs-keyword">use</span> JSON <span class="hljs-string">qw[ from_json ]</span>;

<span class="hljs-keyword">our</span> $WASTEMONEY = <span class="hljs-number">0</span>; <span class="hljs-comment"># Prompt before making request</span>
<span class="hljs-keyword">my</span> $MAXLEN = <span class="hljs-number">500000</span>;
<span class="hljs-keyword">my</span> $chars_per_dollar = <span class="hljs-number">50000</span>; <span class="hljs-comment"># $20 per million characters</span>

<span class="hljs-keyword">our</span> $APIkey = <span class="hljs-string">'your API key here'</span>;

<span class="hljs-keyword">my</span> ($outfile, $origfile, $lang) = @ARGV;

<span class="hljs-keyword">die</span>(<span class="hljs-string">"Usage: $0 outfile origfile langcode\n"</span>)
  <span class="hljs-keyword">unless</span> (<span class="hljs-keyword">defined</span> $origfile);

<span class="hljs-keyword">my</span> $input = readfile($origfile);

askuser() <span class="hljs-keyword">unless</span> ($WASTEMONEY);

<span class="hljs-keyword">my</span> $len = <span class="hljs-keyword">length</span> $input;

<span class="hljs-keyword">die</span>(<span class="hljs-string">"Cowardly refusing to translate $len characters\n"</span>)
  <span class="hljs-keyword">if</span> ($len &gt; $MAXLEN);

writefile($outfile, translate($input, $lang));

<span class="hljs-comment">################## SUBROUTINES ##################</span>

<span class="hljs-function"><span class="hljs-keyword">sub</span> <span class="hljs-title">writefile</span> </span>{
  <span class="hljs-keyword">my</span> ($fname, $data) = @_;

  <span class="hljs-keyword">open</span>(<span class="hljs-keyword">my</span> $out, <span class="hljs-string">"&gt;"</span>, $fname)
    <span class="hljs-keyword">or</span> <span class="hljs-keyword">die</span> <span class="hljs-string">"Can't open \"$fname\" for write: $!\n"</span>;
  <span class="hljs-keyword">binmode</span>($out, <span class="hljs-string">":utf8"</span>);
  <span class="hljs-keyword">print</span> $out $data;
  <span class="hljs-keyword">close</span> $out;
}

<span class="hljs-function"><span class="hljs-keyword">sub</span> <span class="hljs-title">readfile</span> </span>{
  <span class="hljs-keyword">my</span> ($fname) = @_;

  <span class="hljs-keyword">local</span> $/; <span class="hljs-comment"># Slurp mode</span>

  <span class="hljs-keyword">open</span>(<span class="hljs-keyword">my</span> $in, <span class="hljs-string">"&lt;"</span>, $fname)
    <span class="hljs-keyword">or</span> <span class="hljs-keyword">die</span> <span class="hljs-string">"Can't open $fname for read: $!\n"</span>;

  <span class="hljs-keyword">my</span> $input = &lt;$in&gt;;
  <span class="hljs-keyword">close</span> $in;

  <span class="hljs-keyword">return</span> $input;
}

<span class="hljs-function"><span class="hljs-keyword">sub</span> <span class="hljs-title">askuser</span> </span>{
  <span class="hljs-keyword">my</span> $len = <span class="hljs-keyword">length</span> $input;
  <span class="hljs-keyword">my</span> $cost = <span class="hljs-keyword">sprintf</span>(<span class="hljs-string">'$%.02f'</span>, $len / $chars_per_dollar);

  <span class="hljs-keyword">print</span> <span class="hljs-string">"\n\n*** Approval to access Google Translate ***\n"</span>;
  <span class="hljs-keyword">print</span> <span class="hljs-string">"$len bytes to $lang, $cost\n"</span>;
  <span class="hljs-keyword">print</span> <span class="hljs-string">"Source file: $origfile\n"</span>;
  <span class="hljs-keyword">print</span> <span class="hljs-string">"Proceed? [y/N] "</span>;

  <span class="hljs-keyword">my</span> $ans = &lt;STDIN&gt;;

  <span class="hljs-keyword">die</span>(<span class="hljs-string">"Aborted due to lack of consent to proceed\n"</span>)
    <span class="hljs-keyword">unless</span> ($ans =~ <span class="hljs-regexp">/^y/i</span>);
}

<span class="hljs-function"><span class="hljs-keyword">sub</span> <span class="hljs-title">translate</span> </span>{
  <span class="hljs-keyword">my</span> ($text, $lang) = @_;

  <span class="hljs-keyword">my</span> $ua = LWP::UserAgent-&gt;new;
  <span class="hljs-keyword">my</span> $url = <span class="hljs-string">'https://translation.googleapis.com/language/translate/v2'</span>;

  <span class="hljs-keyword">my</span> $res = $ua-&gt;post($url,
		      [
		       <span class="hljs-string">source =&gt;</span> <span class="hljs-string">'en'</span>,
		       <span class="hljs-string">target =&gt;</span> $lang,
		       <span class="hljs-string">format =&gt;</span> <span class="hljs-string">'html'</span>, <span class="hljs-comment"># Could be 'text'</span>
		       <span class="hljs-string">key =&gt;</span> $APIkey,
		       <span class="hljs-string">q =&gt;</span> $text,
		      ]);

  <span class="hljs-keyword">die</span>(<span class="hljs-string">"Failed to access server: "</span>. $res-&gt;status_line . <span class="hljs-string">"\n"</span>)
    <span class="hljs-keyword">unless</span> ($res-&gt;is_success);

  <span class="hljs-keyword">my</span> $data = $res-&gt;content;

  <span class="hljs-keyword">my</span> $json = from_json($data, { <span class="hljs-string">utf8 =&gt;</span> <span class="hljs-number">1</span> } );

  <span class="hljs-keyword">my</span> $translated;

  <span class="hljs-keyword">eval</span> {
    <span class="hljs-keyword">my</span> $d = $json-&gt;{data};
    <span class="hljs-keyword">die</span>(<span class="hljs-string">"Missing \"data\" entry\n"</span>) <span class="hljs-keyword">unless</span> (<span class="hljs-keyword">defined</span> $d);

    <span class="hljs-keyword">my</span> $tr = $d-&gt;{translations};
    <span class="hljs-keyword">die</span>(<span class="hljs-string">"Missing \"translations\" entry\n"</span>)
      <span class="hljs-keyword">unless</span> ((<span class="hljs-keyword">defined</span> $tr) &amp;&amp; (<span class="hljs-keyword">ref</span> $tr eq <span class="hljs-string">'ARRAY'</span>) &amp;&amp;
	     (<span class="hljs-keyword">ref</span> $tr-&gt;[<span class="hljs-number">0</span>] eq <span class="hljs-string">'HASH'</span>));

    $translated = $tr-&gt;[<span class="hljs-number">0</span>]-&gt;{translatedText};

    <span class="hljs-keyword">die</span>(<span class="hljs-string">"No translated text\n"</span>)
      <span class="hljs-keyword">unless</span> (<span class="hljs-keyword">defined</span> $translated);
  };

  <span class="hljs-keyword">die</span>(<span class="hljs-string">"Malformed response from server: $@\n"</span>) <span class="hljs-keyword">if</span> ($@);

  $translated =~ <span class="hljs-regexp">s/(&lt;\/(?:p|h\d+)&gt;)[ \t\n\r]*/"$1\n"/g</span>e;

  <span class="hljs-keyword">return</span> $translated;
}</pre>
<p>The substitution at the end of the translate() function adds a newline after each closing tag for a paragraph or header (e.g. &lt;/p&gt;, &lt;h1&gt; etc.) so that the HTML is more readable with a text editor. Otherwise it&#8217;s all in one single line.</p>
<h3>Protecting your money</h3>
<p>By obtaining an API key, you effectively give your computer permission to spend money. Which is fine as long as it works as intended, but a plain bug in a script that leads to an infinite loop or recursion, or maybe just feeding the system with a huge file by mistake, can end up with consequences that are well beyond the CPU fan spinning a bit.</p>
<p>So there are two protection mechanisms in the script itself:</p>
<ul>
<li>The script prompts for permission, stating how much it will cost (based upon <a rel="noopener" href="https://cloud.google.com/translate/pricing" target="_blank">$20 / million chars</a>).</li>
<li>It limits a single translation to 500k chars (to avoid a huge file from being processed accidentally).</li>
</ul>
<p>Another safety mechanism is to set up budgets and budget alerts. Go to Main menu (hamburger) &gt; Billing &gt; Budgets &amp; Alerts. Be sure to check &#8220;Email alerts to billing admins and users&#8221;. If I got it right, budgets don&#8217;t protect against spending, but only sends notifications. So I selected a sum, and enabled only the 100% threshold. It seems to make sense to check all the Discounts and Promotion options in the Credits part, which makes sure that the alert is given for the money to be spent by deducing all promotion credits.</p>
<p>On top of that, it&#8217;s a good idea to set quota limits: Go to Main menu (hamburger) &gt; IAM &amp; Admin &gt; Quotas. Set the filter to Translation to get rid of a lot of lines.</p>
<p>It&#8217;s also the place to get an accurate figure for the current consumption.</p>
<p>Enable the quota for &#8220;v2 and v3 general model characters per day&#8221;, which is the only character limit that isn&#8217;t per minute, and set it to something sensible, for example 2 million characters if you&#8217;re a modest user like myself. That&#8217;s $40, which is fairly acceptable damage if the computer goes crazy, and high enough not to hit the roof normally.</p>
<p>Also do something with &#8220;v3 batch translation characters using general models per day&#8221; and same with AutoML custom models. I don&#8217;t use these, so I set both to zero. Just to be safe.</p>
<p>There&#8217;s &#8220;Edit Quotas&#8221; to the top right. Which didn&#8217;t work, probably because I did this during the trial period, so quotas are meaningless, and apparently disabled anyhow (or more precisely, enabled to fixed limits).</p>
<p>So the way to do it was somewhat tricky (as it&#8217;s probably pointless): To enable a quota, right-click the &#8220;Cloud Translation API&#8221; to the left of the quota item, and open it in a new tab. Set up the quota figure there. But this description on how to do it might not be accurate for a real-life use. Actually, the system ignored my attempts to impose limits. They appeared on the page for editing them, but not on the main page.</p>
<h3>Supporting CJK in LaTeX</h3>
<p>I&#8217;m wrapping up this post with notes on how to feed LaTeX (pdflatex, more precisely) with Chinese, Japanese and Korean, with UTF-8 encoding, and get a hopefully reasonable result.</p>
<p>So first grab a few packages:</p>
<pre># apt install texlive-lang-european
# apt install texlive-lang-chinese
# apt install texlive-lang-korean
# apt install texlive-cjk-all</pre>
<p>Actually, texlive-lang-european isn&#8217;t related, but as its name implies, it&#8217;s useful for European languages.</p>
<p>I first attempted with</p>
<pre><span class="hljs-keyword">\usepackage</span>[UTF8]{ctex}</pre>
<p>but pdflatex failed miserably with an error saying that the fontset &#8216;fandol&#8217; is unavailable in current mode, <a rel="noopener" href="https://tex.stackexchange.com/questions/545681/critical-package-ctex-errorctex-fontsetfandol-is-unavailable-in-current" target="_blank">whatever that means</a>. After trying a few options back and forth, I eventually went for the rather hacky solution of using CJKutf8. The problem is that CJK chars are allowed only within</p>
<pre><span class="hljs-keyword">\begin</span>{CJK}{UTF8}{gbsn}

<span class="yadayada">[ ... ]</span>

<span class="hljs-keyword">\end</span>{CJK}</pre>
<p>but I want it on the whole document, and I need the language setting to be made in a file that is included by the main LaTeX file (a different included file for each language). So I went for this simple hack:</p>
<pre><span class="hljs-keyword">\AtBeginDocument</span>{<span class="hljs-keyword">\begin</span>{CJK}{UTF8}{gbsn}}
<span class="hljs-keyword">\AtEndDocument</span>{<span class="hljs-keyword">\end</span>{CJK}}</pre>
<p>As for the font, <a rel="noopener" href="https://www.overleaf.com/learn/latex/Chinese" target="_blank">it appears like</a> gbsn or gkai fonts should be used with Simplified Chinese, and bsmi or bkai for with Traditional Chinese. Since I translated into Simplified Chinese, some characters just vanished from the output document when trying bsmi and bkai. The back-translation to English of a document made with bsmi was significantly worse, so these dropped characters had a clear impact in intelligibility of the Chinese text.</p>
<p>I got this LaTeX warning saying</p>
<pre>LaTeX Font Warning: Some font shapes were not available, defaults substituted.</pre>
<p>no matter which of these fonts I chose, so it doesn&#8217;t mean much.</p>
<p>So the choice is between gbsn or gkai, but which one? To decide, I copy-pasted Chinese text from updated Chinese websites, and compared the outcome of LaTeX, based upon the TeX file shown below. It was quite clear that gbsn is closer to the fonts in use in these sites, even though I suspect it&#8217;s a bit of a Times New Roman: The fonts used on the web have less serifs than gbsn. So gbsn it is, even though it would have been nicer with a font with less serifs.</p>
<p>For Japanese, there&#8217;s &#8220;min&#8221;, &#8220;maru&#8221; and &#8220;goth&#8221; fonts. &#8220;Min&#8221; is a serif font, giving it a traditional look (calligraphy style) and judging from Japanese websites, it appears to be used primarily for logos and formal text (the welcoming words of a university&#8217;s president, for example).</p>
<p>&#8220;Maru&#8221; and &#8220;goth&#8221; are based upon simple lines, similar to plain text in Japanese websites. The latter is a bit of a bold version of &#8220;maru&#8221;, but it&#8217;s what seems to be popular. So I went with &#8220;goth&#8221;, which has a clean and simple appearance, similar to the vast majority of Japanese websites, even though the bold of &#8220;goth&#8221; can get a bit messy with densely drawn characters. It&#8217;s just that &#8220;maru&#8221; looks a bit thin compared to what is commonly preferred.</p>
<p>Korean has two fonts in theory, &#8220;mj&#8221; and &#8220;gt&#8221;. &#8220;mj&#8221; is a serif font with an old fashioned look, and &#8220;gt&#8221; is once again the plain, gothic version. I first failed to use the &#8220;gt&#8221; font even though it was clearly installed (there were a lot of files in the same directories as where the &#8220;mj&#8221; files were installed, only with &#8220;gt&#8221;). Nevertheless, trying the &#8220;gt&#8221; font instead of &#8220;mj&#8221; failed with</p>
<pre>LaTeX Font Warning: Font shape `C70/gt/m/it' undefined
(Font)              using `C70/song/m/n' instead on input line 8.

! Undefined control sequence.
try@size@range ...extract@rangefontinfo font@info
                                                  &lt;-*&gt;@nil &lt;@nnil</pre>
<p>But as it turns out, it should be referred to as &#8220;nanumgt&#8221;, e.g.</p>
<pre>\begin{CJK}{UTF8}{<span class="punch">nanumgt</span>}
나는 멋진 글꼴을 원한다
\end{CJK}</pre>
<p>It&#8217;s worth mentioning XeLaTeX, which allows using an arbitrary True Type font withing LaTeX, so the font selection is less limited.</p>
<p>See <a rel="noopener" href="https://tex.my/2010/06/21/cjk-support-in-latex/" target="_blank">this page</a> on fonts in Japanese and Korean.</p>
<p>For these tests, I used the following LaTeX file for use with e.g.</p>
<pre>$ pdflatex test.tex</pre>
<pre><span class="hljs-keyword">\documentclass</span>{hitec}
<span class="hljs-keyword">\usepackage</span>[utf8]{inputenc}
<span class="hljs-keyword">\usepackage</span>[T1]{fontenc}
<span class="hljs-keyword">\usepackage</span>{CJKutf8}
<span class="hljs-keyword">\newcommand</span>{<span class="hljs-keyword">\thetext</span>}
{

它说什么并不重要，重要的是它是如何写的。
}

<span class="hljs-keyword">\AtBeginDocument</span>{}
<span class="hljs-keyword">\AtEndDocument</span>{}
<span class="hljs-keyword">\title</span>{This document}
<span class="hljs-keyword">\begin</span>{document}

gbsn:

<span class="hljs-keyword">\begin</span>{CJK}{UTF8}{gbsn}
<span class="hljs-keyword">\thetext</span>
<span class="hljs-keyword">\end</span>{CJK}

gkai:

<span class="hljs-keyword">\begin</span>{CJK}{UTF8}{gkai}
<span class="hljs-keyword">\thetext</span>
<span class="hljs-keyword">\end</span>{CJK}

bsmi:

<span class="hljs-keyword">\begin</span>{CJK}{UTF8}{bsmi}
<span class="hljs-keyword">\thetext</span>
<span class="hljs-keyword">\end</span>{CJK}

bkai:

<span class="hljs-keyword">\begin</span>{CJK}{UTF8}{bkai}
<span class="hljs-keyword">\thetext</span>
<span class="hljs-keyword">\end</span>{CJK}

<span class="hljs-keyword">\end</span>{document}</pre>
]]></content:encoded>
			<wfw:commentRss>https://billauer.se/blog/2022/08/google-translate-pdflatex-technical/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Random notes on Perl Regular Expressions</title>
		<link>https://billauer.se/blog/2022/07/perl-regex-notes/</link>
		<comments>https://billauer.se/blog/2022/07/perl-regex-notes/#comments</comments>
		<pubDate>Sun, 10 Jul 2022 04:05:23 +0000</pubDate>
		<dc:creator>eli</dc:creator>
				<category><![CDATA[perl]]></category>

		<guid isPermaLink="false">https://billauer.se/blog/?p=6647</guid>
		<description><![CDATA[It&#8217;s 2022, Perl isn&#8217;t as popular as it used to be, and for a moment I questioned its relevance. Until I had a task requiring a lot of pattern matching, which reminded me why Perl is that loyal companion that always has an on-spot solution to whatever I need. These are a few notes I [...]]]></description>
			<content:encoded><![CDATA[<p>It&#8217;s 2022, Perl isn&#8217;t as popular as it used to be, and for a moment I questioned its relevance. Until I had a task requiring a lot of pattern matching, which reminded me why Perl is that loyal companion that always has an on-spot solution to whatever I need.</p>
<p>These are a few notes I took as I discovered the more advanced, and well-needed, features of Perl regexps.</p>
<ul>
<li>If a regex is passed as a value generated by qr//, the modifiers in this qr// have a significance. So e.g. if the match should be case-insensitive, add it after the qr//.</li>
<li>Quantifiers can be used on regex groups, whether they capture or not. For example, \d+(?:\.\d+)+ means one or more digits followed by one or more patterns of a dot and one or more digits. Think BNF.</li>
<li>Complex regular expressions can be created relatively easily by breaking them down into smaller pieces and assigning each a variable with qr//. The complex expression becomes fairly readable this way. Almost needless to say, quantifiers can be applied on each of these subexpressions.</li>
<li>It&#8217;s possible to give capture elements names, e.g. $t =~ /^(<span class="punch">?&lt;pre&gt;</span>.*?)(<span class="punch">?&lt;found&gt;</span>[ \t\n]*${regex}[ \t\n]*)(<span class="punch">?&lt;post&gt;</span>.*)$/s. The capture results then appear in e.g. $+{pre}, $+{found} and $+{post}. This is useful in particular if the regex in the middle may have capture elements of its own, so the usual counting method doesn&#8217;t work.</li>
<li>Captured elements can be used in the regex itself, e.g. /([\'\"])(.*?)\1/ so \1 stands for either a single or double quote, whichever was found.</li>
<li>Even better, there&#8217;s e.g \g{-1} instead of numeric grouping, which in this case means that last group captured. Once again, useful in a regex that can be used in more complicated contexts.</li>
<li>When there are nested unnamed capture parentheses, the outer parenthesis gets the first capture number.</li>
<li>If there are several capture parentheses with a &#8216;|&#8217; between them, all of them produce a capture position, but those that weren&#8217;t in use for matching get undef.</li>
<li>(?:&#8230;) grouping can be followed by a quantifier, so this makes perfect sense ((?:[^\\\{\}]|\\\\|\\\{|\\\})*) for any number of characters that aren&#8217;t a backslash or a curly bracket, or any of these followed by an escape.</li>
<li>Quantifiers can be super-greedy in the sense that they don&#8217;t allow backtracking. So e.g. /a++b/ is exactly like /a+b/, but with the former the computer won&#8217;t attempt to consume less a&#8217;s (if such are found) in order to try to find a &#8220;b&#8221;. This is just an optimization for speed. All of these extra-greedy quantifiers are made with an extra plus sign.</li>
<li>There&#8217;s lookbehind and lookahead assertions, which are really great. In particular, the negative assertions. E.g. /(?&lt;![ \t\n\r])(d+)/ captures a number that isn&#8217;t after a whitespace, and /(\d+)(?![ \t\n\r])/ captures a number that isn&#8217;t followed by a whitespace. Note that the parentheses around these assertions are for grouping, but not capturing, so in these examples only the number was captured.</li>
<li>Lookaheads and lookbehinds also work inside grouping parentheses (whether capturing or not), as grouping is treated as an independent regex.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>https://billauer.se/blog/2022/07/perl-regex-notes/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Thunderbird: Upgrade notes</title>
		<link>https://billauer.se/blog/2022/06/thunderbird-installation-oauth2/</link>
		<comments>https://billauer.se/blog/2022/06/thunderbird-installation-oauth2/#comments</comments>
		<pubDate>Fri, 10 Jun 2022 15:16:22 +0000</pubDate>
		<dc:creator>eli</dc:creator>
				<category><![CDATA[email]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[perl]]></category>
		<category><![CDATA[stop updates]]></category>

		<guid isPermaLink="false">https://billauer.se/blog/?p=6618</guid>
		<description><![CDATA[Introduction These are my notes as I upgraded Thunderbird from version 3.0.7 (released September 2010) to 91.10.0 on Linux Mint 19. That&#8217;s more than a ten year&#8217;s gap, which says something about what I think about upgrading software (which was somewhat justified, given the rubbish issues that arose, as detailed below). What eventually forced me [...]]]></description>
			<content:encoded><![CDATA[<h3>Introduction</h3>
<p>These are my notes as I upgraded Thunderbird from version 3.0.7 (released September 2010) to 91.10.0 on Linux Mint 19. That&#8217;s more than a ten year&#8217;s gap, which says something about what I think about upgrading software (which was somewhat justified, given the rubbish issues that arose, as detailed below). What eventually forced me to do this was the need to support OAuth2 in order to send emails through Google&#8217;s Gmail server (supported <a rel="noopener" href="https://support.mozilla.org/en-US/kb/automatic-conversion-google-mail-accounts-oauth20" target="_blank">since 91.8.0</a>).</p>
<p>Thunderbird is essentially a Firefox browser which happens to be set up with a GUI that processes emails. So for example, the classic menubar is hidden, but can be revealed by pressing Alt.</p>
<h3>Using the correct profile</h3>
<p>When attempting to run a new version of Thunderbird, be sure to rename ~/.thunderbird into something else, or else the current profile will be upgraded right away. With some luck, the suffixes (e.g. -release) might make Thunderbird ignore the old information, but don&#8217;t trust that.</p>
<p>Actually, it seems like this is handled gracefully anyhow. When I installed exactly the same version on a different position on the disk, it ignored the profile with -release suffix, and added one with -release-1. So go figure.</p>
<p>To select which profile to work with, invoke Thunderbird with Profile Manager with</p>
<pre>$ thunderbird -profilemanager &amp;</pre>
<p>For making the upgrade, first make a backup tarball from the original profile directory.</p>
<p>To adopt in into the new version of Thunderbird, invoke the Profile Manager and pick Create Profile&#8230;, create a new directory (I called it &#8220;mainprofile&#8221;), and pick that as the place for the new profile. Launch Thunderbird, quit right away, and then delete the new directory. Rename the old directory with the new deleted directory&#8217;s name. Then launch Thunderbird again.</p>
<h3>Add-ons</h3>
<p>Previously, I had the following add-ons:</p>
<ul>
<li>BiDi Mail UI (apparently still necessary)</li>
<li>Clippings. Just import the previous clippings from ~/clipdat2.rdf. Unlike the old version, the data is <a rel="noopener" href="https://aecreations.sourceforge.io/clippings/faq.php" target="_blank">kept in a database file</a> inside the profile, so the old file can be deleted.</li>
<li>Gnome Integration Options for calling a command on mail arrival. It was deprecated. So I went for Mailbox Alert, which allows adding specific actions: Sound, a message and/or command. With mail folder granularity, in fact.</li>
<li>Mail Tweak. It&#8217;s really really old, and probably unnecessary since long.</li>
<li>Outgoing Message Format (for text vs. HTML messages). Deprecated since long, as these options are integrated into Thunderbird itself.</li>
</ul>
<p>So I remained with the first two only.</p>
<h3>Installing Thunderbird</h3>
<p>The simplest Thunderbird installation involves downloading it from <a rel="noopener" href="https://www.thunderbird.net/en-US/" target="_blank">their website</a> and extract the tarball somewhere in the user&#8217;s own directories. For a proper installation, I installed it under /usr/local/bin/ with</p>
<pre># tar -C /usr/local/bin -xjvf thunderbird-91.10.0.tar.bz2</pre>
<p>as root. And then reorganize it slightly:</p>
<pre># cd /usr/local/bin
# mv thunderbird thunderbird-91.10.0
# ln -s thunderbird-91.10.0/thunderbird</pre>
<h3>Composing HTML messages</h3>
<p>Right-click the account at the left bar, pick Settings and select the Composition &amp; Addressing item. Make sure Compose messages in HTML is unchecked: Messages should be composed as plain text by default.</p>
<p>Then go through each of the mail identities and verify that Compose messages in HTML is unchecked under the Composition &amp; Addressing tab.</p>
<p>However if Shift is pressed along with clicking Write, Reply or whatever for composing a new message, Thunderbird opens it as HTML.</p>
<h3>Recover old contacts</h3>
<p>Thunderbird went from the old *.mab format to SQLite for keeping the address books. So go Tools &gt; Import&#8230; &gt; Pick Address Books&#8230; and pick Monk Database, and from there pick abook.mab (and posssibly repeat this with history.mab, but I skipped this, because it&#8217;s too much).</p>
<h3>Silencing update notices</h3>
<p>Thunderbird, like most software nowadays, wants to update itself automatically, because who cares if something goes wrong all of the sudden as long as the latest version is installed.</p>
<p>I messed around with this for quite long until I found the solution. So I&#8217;m leaving everything I did written here, but it&#8217;s probably enough with just adding policies.json, as suggested below.</p>
<p>So to the whole story (which you probably want to skip): Under Preferences &gt; General &gt; Updates I selected &#8220;check for updates&#8221; rather than install automatically (it can&#8217;t anyhow, since I&#8217;ve installed Thunderbird as root), but then it starts nagging that there are updates.</p>
<p>So it&#8217;s down to setting the application properties manually by going to Preferences &gt; General &gt; Config Editor&#8230; (button at the bottom).</p>
<p>I changed app.update.promptWaitTime to 31536000 (365 days) but that didn&#8217;t have any effect. So I added an <a href="http://kb.mozillazine.org/App.update.silent" target="_blank">app.update.silent</a> property and set it true, but that didn&#8217;t solve the problem either. So the next step was to change app.update.staging.enabled to false, and that did the trick. Well, almost. With this, Thunderbird didn&#8217;t issue a notification, but its tab on the system tray gets focus every day. Passive aggressive.</p>
<p>As a side note, there are other suggestions I&#8217;ve encountered out there: To change app.update.url so that Thunderbird doesn&#8217;t know where to look for updates, or set app.update.doorhanger false. Haven&#8217;t tried either.</p>
<p>So what actually worked: Create a policies.json in /usr/local/bin/thunderbird/distribution/, with &#8220;<a href="https://github.com/mozilla/policy-templates/blob/master/README.md#disableappupdate" target="_blank">DisableAppUpdate</a>&#8220;: true, that is:</p>
<pre>{
 "policies": {
  "DisableAppUpdate": true
 }
}</pre>
<p>Note that the &#8220;distribution&#8221; directory must be in the same the directory as the actual executable for Thunderbird (that is, follow the symbolic link if such exists). In my case, I had to add this directory myself, because of a manual installation.</p>
<p>And, as suggested on <a href="https://blog.gnu-designs.com/deploying-firefox-and-thunderbird-policies-to-prevent-auto-updates-and-tune-other-features/" target="_blank">this page</a>, the successful deployment can be verified by restarting Thunderbird, and then looking at Help &gt; About inside Thunderbird, which now says (note the comment on updates being disabled):</p>
<p style="text-align: left;"><img class="aligncenter size-medium wp-image-6653" title="The About window after disabling updates with policies.json" src="https://billauer.se/blog/wp-content/uploads/2022/06/disable-updates.png" alt="The About window after disabling updatess with policies.json" width="653" height="337" />In hindsight, I can speculate on why this works: The authors of Thunderbird really don&#8217;t want us to turn off automatic updates, mainly because if people start running outdated software, that increases the chance of a widespread attack on some vulnerability, which can damage the software&#8217;s reputation. So Thunderbird is designed to ignore previous possibilities to turn the update off.</p>
<p style="text-align: left;">There&#8217;s only one case where there&#8217;s no choice: If Thunderbird was installed by the distribution. In this case, it&#8217;s installed as root, so it can&#8217;t be updated by a plain user. Hence it&#8217;s the distribution&#8217;s role to nag. And it has the same interest to nag about upgrades (reputation and that).</p>
<p style="text-align: left;">So I guess that&#8217;s why Thunderbird respects this JSON file only.</p>
<h3>Folders with new mails in red</h3>
<p><a rel="noopener" href="https://billauer.se/blog/2012/05/thunderbird-tweaks/" target="_blank">Exactly like 10 years ago</a>, the trick is to create a &#8220;chrome&#8221; directory under .thunderbird/ and then add the following file:</p>
<pre>$ cat ~/.thunderbird/sdf2k45i.default/chrome/userChrome.css
@namespace
url("http://www.mozilla.org/keymaster/gatekeeper/there.is.only.xul"); /* set default namespace to XUL */

/* Setting the color of folders containing new messages to red */

treechildren::-moz-tree-cell-text(folderNameCol, newMessages-true) {
 font-weight: bold;
 color: red !important;
}</pre>
<p>But unlike old Thunderbird, this file isn&#8217;t read by default. So to fix that, go to Preferences &gt; General &gt; Config Editor&#8230; (button at the bottom) and there change toolkit.legacyUserProfileCustomizations.stylesheets to true.</p>
<h3>New mail icon in system tray</h3>
<p>Thunderbird sends a regular notification when a new mail arrives, but exactly like <a rel="noopener" href="https://billauer.se/blog/2012/05/linux-icon-system-tray-thunderbird-zenity/" target="_blank">last time</a>, I want a dedicated icon that is dismissed only when I click it. The rationale is to be able to see if a new mail has arrived at a quick glance of the system tray. Neither zenity &#8211;notification nor send-notify were good for this, since they send the common notification (zenity used to just add an icon, but it &#8220;got better&#8221;).</p>
<p>But then there&#8217;s yad. I began with &#8220;apt install yad&#8221;, but that gave me a really old version that distorted the icon in the system bar. So I installed it from the <a rel="noopener" href="https://github.com/v1cont/yad" target="_blank">git repository&#8217;s</a> tag 1.0. I first attempted v12.0, but I ended up with the problem <a rel="noopener" href="https://githubmemory.com/repo/v1cont/yad/issues/119" target="_blank">mentioned here</a>, and didn&#8217;t want to mess around with it more.</p>
<p>Its &#8220;make install&#8221; adds /usr/local/bin/yad, as well as a lot of yad.mo under /usr/local/share/locale/*, a lot of yad.png under /usr/local/share/icons/*, yad.m4 under /usr/local/share/aclocal/ and yad.1 + pfd.1 in /usr/local/share/man/man1. So quite a lot of files, but in a sensible way.</p>
<p>With this done, the following script is kept (as executable) as /usr/local/bin/new-mail-icon:</p>
<pre><span class="hljs-comment">#!/usr/bin/perl</span>
<span class="hljs-keyword">use</span> warnings;
<span class="hljs-keyword">use</span> strict;
<span class="hljs-keyword">use</span> Fcntl <span class="hljs-string">qw[ :flock ]</span>;

<span class="hljs-keyword">my</span> $THEDIR=<span class="hljs-string">"$ENV{HOME}/.thunderbird"</span>;
<span class="hljs-keyword">my</span> $ICON=<span class="hljs-string">"$THEDIR/green-mail-unread.png"</span>;

<span class="hljs-keyword">my</span> $NOW=<span class="hljs-keyword">scalar</span> <span class="hljs-keyword">localtime</span>;

<span class="hljs-keyword">open</span>(<span class="hljs-keyword">my</span> $fh, <span class="hljs-string">"&lt;"</span>, <span class="hljs-string">"$ICON"</span>)
  <span class="hljs-keyword">or</span> <span class="hljs-keyword">die</span> <span class="hljs-string">"Can't open $ICON for read: $!"</span>;

<span class="hljs-comment"># Lock the file. If it's already locked, the icon is already</span>
<span class="hljs-comment"># in the tray, so fail silently (and don't block).</span>

<span class="hljs-keyword">flock</span>($fh, LOCK_EX | LOCK_NB) <span class="hljs-keyword">or</span> <span class="hljs-keyword">exit</span> <span class="hljs-number">0</span>;

<span class="hljs-keyword">fork</span>() &amp;&amp; <span class="hljs-keyword">exit</span> <span class="hljs-number">0</span>; <span class="hljs-comment"># Only child continues</span>

<span class="hljs-keyword">system</span>(<span class="hljs-string">'yad'</span>, <span class="hljs-string">'--notification'</span>, <span class="hljs-string">"--text=New mail on $NOW"</span>, <span class="hljs-string">"--image=$ICON"</span>, <span class="hljs-string">'--icon-size=32'</span>);</pre>
<p>This script is the improved version of the <a rel="noopener" href="https://billauer.se/blog/2012/05/linux-icon-system-tray-thunderbird-zenity/" target="_blank">previous one</a>, and it prevents multiple icons in the tray much better: It locks the icon file exclusively and without blocking. Hence if there&#8217;s any other process that shows the icon, subsequent attempts to lock this file fail immediately.</p>
<p>Since the &#8220;yad&#8221; call takes a second or two, the scripts forks and exits before that, so it doesn&#8217;t delay Thunderbird&#8217;s machinery.</p>
<p>With this script in place, the Mailbox Alert is configured as follows. Add a new item to the list as in this dialog box:</p>
<p><a href="https://billauer.se/blog/wp-content/uploads/2022/06/new-mail-dialogbox.png"><img class="aligncenter size-full wp-image-6620" title="Setting dialog box for Mailbox Alert extension" src="https://billauer.se/blog/wp-content/uploads/2022/06/new-mail-dialogbox.png" alt="Setting dialog box for Mailbox Alert extension" width="566" height="407" /></a></p>
<p>The sound should be set to a WAV file of choice.</p>
<p>Then right-click the mail folder to have covered (Local Folders in my case), pick Mailbox Alert and enable &#8220;New Mail&#8221; and &#8220;Alert for child folders&#8221;.</p>
<p>Then right-click &#8220;Inbox&#8221; under this folder, and verify that nothing is checked for Mailbox Alert for it (in particular not &#8220;Default sound&#8221;). That except for the Outbox and Draft folders, for which &#8220;Don&#8217;t let parent folders alert for this one&#8221; should be checked, or else there&#8217;s a false alarm on autosaving and when using &#8220;send later&#8221;.</p>
<p>Later on, I changed my mind and added a message popup, so now all three checkboxes are ticked, and the Message tab reads:</p>
<p><a href="https://billauer.se/blog/wp-content/uploads/2022/06/mail-alert-with-message.png"><img class="aligncenter size-full wp-image-6624" title="Mail Alert dialog box, after update" src="https://billauer.se/blog/wp-content/uploads/2022/06/mail-alert-with-message.png" alt="Mail Alert dialog box, after update" width="566" height="407" /></a></p>
<p>I picked the icon as /usr/local/bin/thunderbird-91.10.0/chrome/icons/default/default32.png (this depends on the installation path, of course).</p>
<p>I&#8217;m not 100% clear why the original alert didn&#8217;t show up, even though &#8220;Show an alert&#8221; was still checked under &#8220;Incoming Mails&#8221; at Preferences &gt; General. I actually preferred the good old one, but it seems like Mailbox Alert muted it. I unchecked it anyhow, just to be safe.</p>
<h3>Refusing to remember passwords + failing to sent through gmail</h3>
<p>It&#8217;s not a real upgrade if a weird problem doesn&#8217;t occur out of the blue.</p>
<p>So attempting to Get Messages from pop3 server at localhost failed quite oddly: Every time I checked the box to use Password Manager to remember the password, it got stuck with &#8220;Main: Connected to 127.0.0.1&#8230;&#8221;. But checking with Wireshark, it turned out that Thunderbird asked the server about its capabilities (CAPA), got an answer and then did nothing for about 10 seconds, after which it closed the connection.</p>
<p>On the other hand, when I didn&#8217;t request remembering the password, it went fine, and so did subsequent attempts to fetch mail from the pop3 server.</p>
<p>Another thing was that when attempting to use Gmail&#8217;s server, I went through the entire OAuth2 thing (the browser window, and asking for my permissions) but then the mail was just stuck on &#8220;Sending message&#8221;. Like, forever.</p>
<p>So I followed the advice <a rel="noopener" href="https://support.mozilla.org/en-US/questions/1342635" target="_blank">here</a>, and deleted key3.db, key4.db, secmod.db, cert*.db and all signon* files with Thunderbird not running of course. Really old stuff.</p>
<p>And that fixed it.</p>
<p>The files that were apparently created when things got fine were logins.json, cert9.db, key4.db and pkcs11.txt. But I might have missed something.</p>
<h3>The GUI stuck for a few seconds every now and then</h3>
<p>This happened occasionally when I navigated from one mail folder to another. The solution I found somewhere was to delete all .msf files from where Thunderbird keeps the mail info, and that did the trick. Ehm, just for a while. After a few days, it was back.</p>
<p>As a side effect, it forgot the display settings for each folder, i.e. which columns to show and in what order.</p>
<p>These .msf files are apparently indexes to the files containing the actual messages, and indeed it took a few seconds before something appeared when I went to view each mail folder for the first time. At which time the new .msf files went from zero bytes to a significant figure.</p>
<p>Since the problem remains, I watched &#8220;top&#8221; when the GUI got stuck. And indeed, Thunderbird&#8217;s process was at 100%, but so was a completely different process: caribou. Which is a virtual keyboard. Do I need one? No. So to get rid of this process (which runs all the time, but doesn&#8217;t eat a lot of CPU normally), go Accessibility settings, the Keyboard tab and turn &#8220;Enable the on-screen keyboard&#8221; off. The process is gone, and so is the problem with the GUI? Nope. It&#8217;s basically the same, but instead of two processes taking 100% CPU, now it&#8217;s Thunderbird alone. I have no idea what to do next.</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>https://billauer.se/blog/2022/06/thunderbird-installation-oauth2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Perl: Matching apparently plain space in HTML with regular expression</title>
		<link>https://billauer.se/blog/2022/01/perl-space-match-regex/</link>
		<comments>https://billauer.se/blog/2022/01/perl-space-match-regex/#comments</comments>
		<pubDate>Wed, 05 Jan 2022 14:21:31 +0000</pubDate>
		<dc:creator>eli</dc:creator>
				<category><![CDATA[Internet]]></category>
		<category><![CDATA[perl]]></category>

		<guid isPermaLink="false">https://billauer.se/blog/?p=6517</guid>
		<description><![CDATA[I&#8217;ve been using a plain space character in Perl regular expressions since ages, and it has always worked. Something like this for finding double spaces: my @doubles = ($t =~ / {2,}/g); or for emphasis on the space character, equivalently: my @doubles = ($t =~ /[ ]{2,}/g); but then I began processing HTML representation from [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been using a plain space character in Perl regular expressions since ages, and it has always worked. Something like this for finding double spaces:</p>
<pre><span class="hljs-keyword">my</span> @doubles = ($t =~ <span class="hljs-regexp">/ {2,}/g</span>);</pre>
<p>or for emphasis on the space character, equivalently:</p>
<pre><span class="hljs-keyword">my</span> @doubles = ($t =~ <span class="hljs-regexp">/[ ]{2,}/g</span>);</pre>
<p>but then I began processing HTML representation from the Mojo::DOM module (or TinyMCE&#8217;s output directly) and this just didn&#8217;t work. That is, \s detected the spaces (with Perl 5.26) but the plain space character didn&#8217;t.</p>
<p>As it turns out, TinyMCE put &amp;nbsp; instead of the first space (when there was a pair of them), which Mojo::DOM correctly translated to the 0xa0 Unicode character (0xc2, 0xa0 in UTF-8). Hence no chance that a plain space, i.e. a 0x20, will match it. Perl was clever enough to match it as a whitespace (with \s).</p>
<p>Solution: Simple. Just go</p>
<pre><span class="hljs-keyword">my</span> @doubles = ($t =~ <span class="hljs-regexp">/[ \xa0]{2,}/g</span>);</pre>
<p>In other words, match either the good old space or the non-breakable space.</p>
]]></content:encoded>
			<wfw:commentRss>https://billauer.se/blog/2022/01/perl-space-match-regex/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
