<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Nick's blog</title><link>https://blog.njodell.com/</link><description>Recent content on Nick's blog</description><generator>Hugo -- gohugo.io</generator><language>en</language><copyright>© 2022 Nick ODell</copyright><lastBuildDate>Sun, 23 Jan 2022 00:00:00 +0000</lastBuildDate><atom:link href="https://blog.njodell.com/index.xml" rel="self" type="application/rss+xml"/><item><title>How reducing parallelism can make your ARIMA model faster</title><link>https://blog.njodell.com/arima-omp-parallel/</link><pubDate>Sun, 23 Jan 2022 00:00:00 +0000</pubDate><guid>https://blog.njodell.com/arima-omp-parallel/</guid><description>&lt;p&gt;I recently discovered a strange performance issue in a popular
statistics library that makes it possible to speed up fitting an ARIMA
model by a factor of four, just by adding one line of code, and without
changing how the model is fit.&lt;/p&gt;
&lt;p&gt;Here&amp;rsquo;s how I discovered it.&lt;/p&gt;
&lt;h2 id="motivation"&gt;Motivation&lt;/h2&gt;
&lt;p&gt;Recently, I was working on a project which required me to fit ARIMA
models to over ten thousand time series using the Python package
&lt;a href="http://alkaline-ml.com/pmdarima/"&gt;pmdarima&lt;/a&gt;.
This is a handy Python package which automates the selection of ARIMA
models by varying the p, d, and q parameters, and measuring model fit
for each one, while penalizing more complex models.&lt;/p&gt;</description><content:encoded><![CDATA[<p>I recently discovered a strange performance issue in a popular
statistics library that makes it possible to speed up fitting an ARIMA
model by a factor of four, just by adding one line of code, and without
changing how the model is fit.</p>
<p>Here&rsquo;s how I discovered it.</p>
<h2 id="motivation">Motivation</h2>
<p>Recently, I was working on a project which required me to fit ARIMA
models to over ten thousand time series using the Python package
<a href="http://alkaline-ml.com/pmdarima/">pmdarima</a>.
This is a handy Python package which automates the selection of ARIMA
models by varying the p, d, and q parameters, and measuring model fit
for each one, while penalizing more complex models.</p>
<p>Unfortunately, fitting these models is very slow, so I started looking
for a way to parallelize it. When I started this work, I ran into
something surprising: it&rsquo;s already using multiple cores!</p>
<p><img src="/images/htop2-300x177.png" alt="htop screenshot showing multiple cores in
use"></p>
<p>This contradicts the pmdarima documentation. The documentation says that
parallelism is only used when doing a grid search. However, I&rsquo;m using
the stepwise algorithm for fitting the model, which is supposedly not
parallelized.</p>
<p>As I&rsquo;ll show later in this blog post, although it&rsquo;s running in
parallel, it&rsquo;s not doing anything useful with those extra cores.</p>
<h2 id="dataset">Dataset</h2>
<p>In order to demonstrate this issue, I&rsquo;m using a <a href="https://www.kaggle.com/c/store-sales-time-series-forecasting/overview">time-series dataset
for grocery stores in
Ecuador</a>.
The way I&rsquo;m going to frame this problem is to consider each store and
category of items separately, and fit an ARIMA model to each one.</p>
<h2 id="measuring-performance">Measuring performance</h2>
<p>Most Python benchmarking only tools measure elapsed time, but to debug
this issue I needed a way to measure both wall-clock time and <a href="https://en.wikipedia.org/wiki/CPU_time">CPU
time</a>.
(Wall-clock time is another name for elapsed time. CPU time is
wall-clock time multiplied by the average number of cores that were
used.)</p>
<p>I wrote a context manager which measures both kinds of time - the full
details are in the notebook at the end of the post. The sections marked
with <code>with MyTimer() as timer:</code> are being timed this way.</p>
<h2 id="how-parallel-is-auto_arima">How parallel is auto_arima?</h2>
<p>I wrote a test which uses a package called
<a href="https://github.com/joblib/threadpoolctl">threadpoolctl</a>
to restrict parallelism. I tested fitting the model with and without
restricting parallelism.</p>






<div class="highlight"><pre tabindex="0" style="color:#cdd6f4;background-color:#1e1e2e;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-py" data-lang="py"><span style="display:flex;"><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f849c"> 1</span><span><span style="color:#cba6f7">for</span> limit_cores <span style="color:#89dceb;font-weight:bold">in</span> [<span style="color:#fab387">True</span>, <span style="color:#fab387">False</span>]:
</span></span><span style="display:flex;"><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f849c"> 2</span><span>    series <span style="color:#89dceb;font-weight:bold">=</span> array[<span style="color:#fab387">0</span>, <span style="color:#fab387">0</span>]
</span></span><span style="display:flex;"><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f849c"> 3</span><span>    <span style="color:#cba6f7">with</span> MyTimer() <span style="color:#cba6f7">as</span> timer:
</span></span><span style="display:flex;"><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f849c"> 4</span><span>        <span style="color:#cba6f7">if</span> limit_cores:
</span></span><span style="display:flex;"><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f849c"> 5</span><span>            controller <span style="color:#89dceb;font-weight:bold">=</span> ThreadpoolController()
</span></span><span style="display:flex;"><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f849c"> 6</span><span>            <span style="color:#cba6f7">with</span> controller<span style="color:#89dceb;font-weight:bold">.</span>limit(limits<span style="color:#89dceb;font-weight:bold">=</span><span style="color:#fab387">1</span>, user_api<span style="color:#89dceb;font-weight:bold">=</span><span style="color:#a6e3a1">&#39;blas&#39;</span>):
</span></span><span style="display:flex;"><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f849c"> 7</span><span>                fit <span style="color:#89dceb;font-weight:bold">=</span> pm<span style="color:#89dceb;font-weight:bold">.</span>auto_arima(series)
</span></span><span style="display:flex;"><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f849c"> 8</span><span>        <span style="color:#cba6f7">else</span>:
</span></span><span style="display:flex;"><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f849c"> 9</span><span>            fit <span style="color:#89dceb;font-weight:bold">=</span> pm<span style="color:#89dceb;font-weight:bold">.</span>auto_arima(series)
</span></span><span style="display:flex;"><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f849c">10</span><span>    <span style="color:#89dceb">print</span>(<span style="color:#f38ba8">f</span><span style="color:#a6e3a1">&#34;lim: </span><span style="color:#a6e3a1">{</span>limit_cores<span style="color:#a6e3a1">}</span><span style="color:#a6e3a1"> wall: </span><span style="color:#a6e3a1">{</span>timer<span style="color:#89dceb;font-weight:bold">.</span>wall_time<span style="color:#a6e3a1">:</span><span style="color:#a6e3a1">.3f</span><span style="color:#a6e3a1">}</span><span style="color:#a6e3a1"> cpu: </span><span style="color:#a6e3a1">{</span>timer<span style="color:#89dceb;font-weight:bold">.</span>cpu_time<span style="color:#a6e3a1">:</span><span style="color:#a6e3a1">.3f</span><span style="color:#a6e3a1">}</span><span style="color:#a6e3a1"> &#34;</span>)</span></span></code></pre></div>
<p>Here are the results of this test:</p>
<pre><code>lim: True wall: 8.006 cpu: 8.539 
lim: False wall: 8.312 cpu: 33.715 
</code></pre>
<p>There are several things we can learn from the output here:</p>
<ul>
<li>When limiting the parallelism, the wall-clock time is roughly equal
to the CPU time.</li>
<li>When not limiting the parallelism, the CPU time is about eight times
larger than the wall clock time, so <em>something</em> is using every core,
even if the pmdarima documentation says otherwise.</li>
<li>The multicore version takes longer to run than the singlecore
version. That&rsquo;s pretty surprising - I would expect the multicore
version to be faster.</li>
<li>The multicore version falls even further behind in CPU time:
although it slightly slower in elapsed time, it is 3.9x slower in
CPU time.</li>
</ul>
<h2 id="slowdown-cause">Slowdown cause</h2>
<p>Why does pmdarima slow down when you give it additional cores?</p>
<p>I&rsquo;m not exactly sure what causes the slowdown here. My rough theory is
that pmdarima uses statsmodels internally, which uses scipy internally,
which uses numpy internally, which uses OpenBLAS internally.</p>
<p>OpenBLAS is a linear algebra library which provides various matrix and
vector operations, and can use multiple threads to process a large
matrix operation. However, for some small operations, the overhead
associated with giving work to a different thread will be larger than
the gain from parallelism.</p>
<p>I&rsquo;m guessing that the threshold for where it switches from a singlecore
operation to a multicore operation is set too low, and that&rsquo;s why
restricting the parallelism makes it faster.</p>
<p>As evidence for this, note that
<code>with controller.limit(limits=1, user_api='blas'):</code> restricts
parallelism, but only for libraries that implement the BLAS api.</p>
<h2 id="adding-parallelism-back-in">Adding parallelism back in</h2>
<p>My original purpose for looking into this was that I needed to fit over
ten thousand ARIMA models to different time series. Rather than trying
to parallelize within a single ARIMA model, I can run multiple different
ARIMA models in parallel. This is more efficient, because the units of
work are larger.</p>
<p>I set up a very similar problem, except that instead of fitting a single
model, it fits one model for each category in a certain store. It uses
multiprocessing to distribute each time series to a different process.
(You cannot use ThreadPool here because <code>auto_arima()</code> holds the
<a href="https://wiki.python.org/moin/GlobalInterpreterLock">GIL</a>
most of the time.)</p>
<p>For the first test, I put no limit on parallelism, so within each ARIMA
model, the calculation is also parallelized. For the second test, each
process is limited to a single thread.</p>






<div class="highlight"><pre tabindex="0" style="color:#cdd6f4;background-color:#1e1e2e;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-py" data-lang="py"><span style="display:flex;"><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f849c"> 1</span><span>controller <span style="color:#89dceb;font-weight:bold">=</span> ThreadpoolController()
</span></span><span style="display:flex;"><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f849c"> 2</span><span>
</span></span><span style="display:flex;"><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f849c"> 3</span><span><span style="color:#cba6f7">def</span> <span style="color:#89b4fa">attach_limit</span>(func, limit, <span style="color:#89dceb;font-weight:bold">*</span>args, <span style="color:#89dceb;font-weight:bold">**</span>kwargs):
</span></span><span style="display:flex;"><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f849c"> 4</span><span>    <span style="color:#a6e3a1">&#34;&#34;&#34;Call func() using a limited number of cores, or no limit if limit is None.&#34;&#34;&#34;</span>
</span></span><span style="display:flex;"><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f849c"> 5</span><span>    <span style="color:#cba6f7">if</span> limit:
</span></span><span style="display:flex;"><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f849c"> 6</span><span>        <span style="color:#cba6f7">return</span> func(<span style="color:#89dceb;font-weight:bold">*</span>args, <span style="color:#89dceb;font-weight:bold">**</span>kwargs)
</span></span><span style="display:flex;"><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f849c"> 7</span><span>    <span style="color:#cba6f7">else</span>:
</span></span><span style="display:flex;"><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f849c"> 8</span><span>        <span style="color:#cba6f7">with</span> controller<span style="color:#89dceb;font-weight:bold">.</span>limit(limits<span style="color:#89dceb;font-weight:bold">=</span><span style="color:#fab387">1</span>, user_api<span style="color:#89dceb;font-weight:bold">=</span><span style="color:#a6e3a1">&#39;blas&#39;</span>):
</span></span><span style="display:flex;"><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f849c"> 9</span><span>            <span style="color:#cba6f7">return</span> func(<span style="color:#89dceb;font-weight:bold">*</span>args, <span style="color:#89dceb;font-weight:bold">**</span>kwargs)
</span></span><span style="display:flex;"><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f849c">10</span><span>
</span></span><span style="display:flex;"><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f849c">11</span><span><span style="color:#cba6f7">def</span> <span style="color:#89b4fa">predict</span>(x):
</span></span><span style="display:flex;"><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f849c">12</span><span>    <span style="color:#cba6f7">return</span> pm<span style="color:#89dceb;font-weight:bold">.</span>auto_arima(x, error_action<span style="color:#89dceb;font-weight:bold">=</span><span style="color:#a6e3a1">&#34;ignore&#34;</span>)
</span></span><span style="display:flex;"><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f849c">13</span><span>
</span></span><span style="display:flex;"><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f849c">14</span><span><span style="color:#cba6f7">for</span> limit <span style="color:#89dceb;font-weight:bold">in</span> [<span style="color:#fab387">True</span>, <span style="color:#fab387">False</span>]:
</span></span><span style="display:flex;"><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f849c">15</span><span>    <span style="color:#cba6f7">with</span> multiprocessing<span style="color:#89dceb;font-weight:bold">.</span>Pool() <span style="color:#cba6f7">as</span> p:
</span></span><span style="display:flex;"><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f849c">16</span><span>        <span style="color:#6c7086;font-style:italic"># Get one store</span>
</span></span><span style="display:flex;"><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f849c">17</span><span>        store_array <span style="color:#89dceb;font-weight:bold">=</span> array[<span style="color:#fab387">1</span>]
</span></span><span style="display:flex;"><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f849c">18</span><span>        <span style="color:#cba6f7">with</span> MyTimer() <span style="color:#cba6f7">as</span> timer:
</span></span><span style="display:flex;"><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f849c">19</span><span>            predict_restrict <span style="color:#89dceb;font-weight:bold">=</span> functools<span style="color:#89dceb;font-weight:bold">.</span>partial(attach_limit, predict, limit)
</span></span><span style="display:flex;"><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f849c">20</span><span>            p<span style="color:#89dceb;font-weight:bold">.</span>map(predict_restrict, store_array)
</span></span><span style="display:flex;"><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f849c">21</span><span>        <span style="color:#89dceb">print</span>(<span style="color:#f38ba8">f</span><span style="color:#a6e3a1">&#34;lim: </span><span style="color:#a6e3a1">{</span><span style="color:#89dceb">str</span>(limit)<span style="color:#89dceb;font-weight:bold">.</span>ljust(<span style="color:#fab387">4</span>)<span style="color:#a6e3a1">}</span><span style="color:#a6e3a1"> time: </span><span style="color:#a6e3a1">{</span>timer<span style="color:#89dceb;font-weight:bold">.</span>wall_time<span style="color:#a6e3a1">:</span><span style="color:#a6e3a1">.3f</span><span style="color:#a6e3a1">}</span><span style="color:#a6e3a1">&#34;</span>)</span></span></code></pre></div>
<p>Result:</p>
<pre><code>lim: True time: 144.185
lim: False time: 534.951
</code></pre>
<p>The result of this is a 3.7x speedup for disabling BLAS parallelism,
with no changes to how the model is fit. (This is on a 4 core computer -
you may get different numbers on a computer with more or less cores.)</p>
<h2 id="notebook">Notebook</h2>
<p>A jupyter notebook demonstrating this technique can be downloaded
<a href="https://nbviewer.org/gist/nickodell/3cb070feeff805fa4b19307bb3bd459d">here</a>.</p>
<h2 id="summary">Summary</h2>
<ul>
<li>OpenBLAS does some things in parallel even if you don&rsquo;t ask for it.</li>
<li>You can turn this behavior off with the
<a href="https://github.com/joblib/threadpoolctl">threadpoolctl</a>
library.</li>
<li>Turning it off results in a 1.8x speedup, or a 3.7x speedup if
you&rsquo;re also fitting multiple ARIMA models.</li>
</ul>
]]></content:encoded></item><item><title>Sandboxing nginx with systemd</title><link>https://blog.njodell.com/sandboxing-nginx/</link><pubDate>Sun, 24 May 2020 00:00:00 +0000</pubDate><guid>https://blog.njodell.com/sandboxing-nginx/</guid><description>&lt;p&gt;nginx uses a master process, and several worker processes. Normally, the
master process runs as root. If you look online, the &lt;a href="https://unix.stackexchange.com/questions/134301/why-does-nginx-starts-process-as-root/134303#134303"&gt;common wisdom&lt;/a&gt;
is that there&amp;rsquo;s no way around this, and nginx needs root access to bind to
low-numbered ports:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;The process you noticed is the master process, the process that
starts all other nginx processes. This process is started by the init
script that starts nginx. The reason this process is running as root
is simply because you started it as root! [&amp;hellip;]&lt;br&gt;
Most importantly; Only root processes can listen to ports below 1024.
A webserver typically runs at port 80 and/or 443. That means it needs
to be started as root.&lt;br&gt;
In conclusion, the master process being run by root is completely
normal and in most cases necessary for normal operation.&lt;/em&gt;&lt;/p&gt;</description><content:encoded><![CDATA[<p>nginx uses a master process, and several worker processes. Normally, the
master process runs as root. If you look online, the <a href="https://unix.stackexchange.com/questions/134301/why-does-nginx-starts-process-as-root/134303#134303">common wisdom</a>
is that there&rsquo;s no way around this, and nginx needs root access to bind to
low-numbered ports:</p>
<blockquote>
<p><em>The process you noticed is the master process, the process that
starts all other nginx processes. This process is started by the init
script that starts nginx. The reason this process is running as root
is simply because you started it as root! [&hellip;]<br>
Most importantly; Only root processes can listen to ports below 1024.
A webserver typically runs at port 80 and/or 443. That means it needs
to be started as root.<br>
In conclusion, the master process being run by root is completely
normal and in most cases necessary for normal operation.</em></p>
</blockquote>
<p>However, Linux has a feature called
<a href="https://man7.org/linux/man-pages/man7/capabilities.7.html">capabilities</a>,
which allow a process to do one privileged operation
without being able to do any kind of privileged operation. If you look
through that manual page, you&rsquo;ll find one capability which is exactly
what we need: CAP_NET_BIND_SERVICE. This allows a process to bind to a
low-numbered port, despite not being root. Perfect!</p>
<h3 id="editing-the-systemd-service-file">Editing the systemd service file</h3>
<p>Now we need a way to start nginx as an unprivileged user, with this one
additional capability. You can do this with systemd. We just need to
change a few configuration files.</p>
<p>First, stop the nginx process.</p>






<pre tabindex="0"><code>sudo systemctl stop nginx</code></pre>
<p>Now, copy the system-provided nginx service file into the local
configuration area.</p>






<pre tabindex="0"><code>sudo cp /lib/systemd/system/nginx.service \
    /etc/systemd/system/nginx.service</code></pre>
<p>Now, use your favorite text editor to edit
<code>/etc/systemd/system/nginx.service</code>. When we make edits to this file, it
will override the system-provided service file.</p>
<p>Go down to the <code>[Service]</code> section, and add these two lines:</p>






<pre tabindex="0"><code>User=www-data
Group=www-data</code></pre>
<p>This will start nginx as an unprivileged user. However, to make this
work, we need to give nginx the CAP_NET_BIND_SERVICE capability. Add
this line:</p>






<pre tabindex="0"><code>AmbientCapabilities=CAP_NET_BIND_SERVICE</code></pre>
<p>Next, we need to create a place for nginx to write its PID file.
Currently, it writes to <code>/run/nginx.pid</code>, which is a directory owned by
root. We need to create a directory called <code>/run/nginx</code> which is owned
by www-data. To do this, add this line:</p>






<pre tabindex="0"><code>RuntimeDirectory=nginx</code></pre>
<p>systemd will automatically create this directory with the correct
ownership.</p>
<p>Now, we need to move the PID file. Edit the line starting with <code>PIDFile</code>
to read:</p>






<pre tabindex="0"><code>PIDFile=/run/nginx/nginx.pid</code></pre>
<p>We&rsquo;ll also need to tell nginx about this new PID file.</p>
<p>Edit the file <code>/etc/nginx/nginx.conf</code>. Change the line starting with
<code>pid</code> to read:</p>






<pre tabindex="0"><code>pid /run/nginx/nginx.pid;</code></pre>
<p>Now restart nginx. Run</p>






<pre tabindex="0"><code>sudo systemctl daemon-reload
sudo systemctl restart nginx</code></pre>
<p>If you get an error, run this command to see a detailed error message.</p>






<pre tabindex="0"><code>sudo journalctl -u nginx</code></pre>
<h3 id="additional-sandboxing">Additional sandboxing</h3>
<p><em>Note: the following section assumes you have a systemd version greater
than 235. To see your systemd version, run</em> <em><code>systemctl --version</code> .</em></p>
<p>Running nginx as a non-root user is a good first step, but what else can
we do to make this more secure? Linux has many built-in sandboxing
features which systemd can make use of.</p>
<p>I added the following to my systemd configuration for nginx.service:</p>






<pre tabindex="0"><code># Process may not gain any capabilities besides the one we just gave it
CapabilityBoundingSet=CAP_NET_BIND_SERVICE
# Process is not allowed to gain new privileges using SUID binaries such as sudo
NoNewPrivileges=true
# Disables use of the personality(2) system call, which may have security bugs
LockPersonality=true
# Allows only common service-related system calls
SystemCallFilter=@system-service
# When system call is disallowed, return error code instead of killing process 
SystemCallErrorNumber=EPERM</code></pre>
<p>You can download my full systemd service file and my nginx configuration
<a href="https://gist.github.com/nickodell/7862263f6e89ea202731db3c558c3c03">here</a>.</p>
<h3 id="using-systemd-analyze">Using systemd-analyze</h3>
<p>systemd ships with a tool to analyze how much each of the services on
your system make use of systemd-related security features. (Note: this
report doesn&rsquo;t consider non-systemd methods of sandboxing, such as a
service dropping privileges using setuid.) Run this command to see the
report:</p>






<pre tabindex="0"><code>SYSTEMD_EMOJI=0 systemd-analyze security</code></pre>
<p>You can also get detailed information about a single unit by running</p>






<pre tabindex="0"><code>systemd-analyze security nginx</code></pre>
<p>By following this guide, you can reduce the systemd&rsquo;s risk score for
nginx from 9.5 (UNSAFE) to 5.0 (MEDIUM.)</p>
<h3 id="further-work">Further work</h3>
<p>There are several other things you could do to improve this sandbox:</p>
<ul>
<li>Make the syscall filter more restrictive. The @system-service
filter is very broad and over-inclusive. Using perf, you can record
exactly which syscalls a service makes, and allow only those
syscalls. However, keep in mind that loading new plugins into nginx,
or changing its configuration, may cause your syscall list to become
out-of-date. For example, an nginx configuration which serves static
files will use different syscalls than one which proxies traffic to
another service. Here&rsquo;s a writeup on how to do this:
<a href="https://prefetch.net/blog/2017/11/27/securing-systemd-services-with-seccomp-profiles/">https://prefetch.net/blog/2017/11/27/securing-systemd-services-with-seccomp-profiles/</a></li>
<li>Disallow nginx from changing kernel tunables and modules.</li>
<li>Disallow nginx from connecting to unix domain sockets, netlink
sockets, or opening raw sockets.</li>
<li>Whitelist which devices in /dev nginx is allowed to read/write.</li>
<li>Blacklist namespace-altering syscalls.</li>
</ul>
<p>However, I chose to not include these things. First, many of them would
require an attacker to have root privilege anyway, so once the service
is no longer running as root, they have little value. Second, they have
some possibility of breaking someone&rsquo;s configuration. The sandbox
settings I show are intended to be general-purpose and work in a variety
of contexts.</p>
<h3 id="testing-notes">Testing notes</h3>
<p>I have tested this configuration on recent versions of Debian, Fedora,
and Ubuntu. Here&rsquo;s what I&rsquo;ve found:</p>
<ul>
<li>Works on Debian Buster</li>
<li>Partially works on Debian Stretch (Note: You must comment out
LockPersonality and SystemCallFilter.)</li>
<li>Doesn&rsquo;t work on Fedora 32. The use of NoNewPrivileges interferes
with SELinux somehow. If you skip the &ldquo;Additional sandboxing&rdquo; step,
and substitute &rsquo;nginx&rsquo; for &lsquo;www-data&rsquo;, it will work. This is
possibly fixable, but I don&rsquo;t have much knowledge of SELinux.</li>
<li>Works on Ubuntu 20.04</li>
<li>Partially works on Ubuntu 18.04. (Note: You must comment out
SystemCallFilter.)</li>
</ul>
]]></content:encoded></item></channel></rss>