Welcome to my on again off again personal website. Once again rebuilt on a new custom content management system written in Rust.


Thoughts on the hybrid cloud

So getting through a from scratch Kubernetes build was fun and deeply interesting. And on top of that I’ve started finding all sorts of great hybrid cloud technologies that would have been great to have on so many projects I worked on in the past (these projects all suffered big cloud myopia unfortunately). One of these that is really interesting and worth noting is the OpenFaaS project, think of it as AWS Lambdas or Azure Functions but running locally or in a Kubernetes cluster.

It’s really a great project for many reasons, one of the top of which is the function limits are orders of magnitude larger than most major serverless function providers (12TB memory limit, 96 CPU cores, 290 year execution limit). Everyone working in AWS is aware of the strict limits their Lambda’s impose on the workloads, these can be design crippling, forcing teams to re-work how they orchestrate the logical components of their applications to accommodate alternatives. Where as OpenFaaS only seems to have limits inherent to the programming language and frameworks, so it’s suitable for a vastly wider set of discrete processing tasks. I’ve been on several projects in the past where the AWS Lambda timeout limit suddenly killed forward progress. And the amount of re-work required to the data or the logic/compute easily eclipsed standing up an OpenFaaS cluster. To the point where it seems almost criminal not to run an OpenFaaS cluster for at least long running occasional discrete functions.

And to be clear, OpenFaaS is not the same as AWS Lambda. It’s not running something like Firecracker underneath necessarily. There isn’t a sophisticated over provisioning scheme in place. But it does make clever use of docker pause to provide resource conservation so you can load a lot of functions on an OpenFaaS cluster. And you don’t even need a full Kubernetes cluster to take advantage of it, the basic Daemon is called faasd, which can run independently of K8s on a VM or say a RasberryPi.

OpenFaaS is event driven, and provides it’s own REST API to support flexible invocation. There are built in monitoring and control mechanisms to round out the project. So in many ways OpenFaaS can supplement or maybe even replace your serverless function sub systems. At the very least I feel this project is something you should keep in your back pocket for when the limits of big cloud serverless functions suddenly prove to be roadblocks for your projects.

420 words


Exploring kubernetes the hard way

So I’ve been in the AWS cloud space for a long time which has been great as they really have phenomenal cloud offering, but in that time Kubernetes has been steadily gaining speed as an alternative hybrid approach to cloud computing. And while I’ve read some things and worked on containers running in K8’s I’ve never really had an in depth understanding of the cluster management system. So I decided to fix that by doing the "Kubernetes the Hard Way" tutorial. It’s been great, while not a whole lot of it is new, it really is great practice and a very good end to end, "secure", setup walk through. So these are some take a ways I have from the experience.

While the tutorial says "no scripts", that’s not exactly accurate. It will have you write your own setup scripts and bash is the language used to explain those operations. While the code display pieces on the tutorial have an "easy button" to copy the code and paste it in your terminal there are two main reasons not to do this:

  1. Don’t copy and paste code from the web in your terminal! There are well documented attack vectors that can compromise your entire system by doing this.
  2. The idea is to get a non-shallow understanding of Kubernetes, copying the code and writing it your way adds muscle memory to the exercise.

Additionally unless you are a super human typist there are the inevitable typos, bugs and such that actually help to learn in depth how to diagnose problems in Kubernetes and fix them. This is really what I find most useful about doing it the hard way to really learn. For instance, the tutorial assumes using tmux in parallel manner across 3 controllers when initializing etcd on the controllers, which I didn’t do. Through which I found out that the second controller must be initialized within the timeout period of the daemon start on the first or you will get a timeout error (after those first two are initialized this is no longer an issue). I would likely have never learned this by using scripted setups for K8’s clusters.

Or like when I typo’d a cert location as /etc/kubernetes.pem instead of /etc/etcd/kubernetes.pem and I learned I’ve been absolutely spoiled by Rust’s detailed and helpful compiler error messages. The error message was something like ERROR: systemd returned a status of "error code" returned. I know what you are thinking, "error code" should be more than enough for you to know what went wrong. Unfortunately I needed a bit more detail to figure out the problem so a bit of research showed me the command

journalctl -ocat -b -u etcd

Where "etcd" is really whatever your systemd daemon name is. I think I’m going to alias this to doh in my shell for future reference. I know journalctl but the argument soup is a super useful combo for working with systemd daemons, but one which I’m not sure I’ve ever used or have forgotten. So learning/re-learning it because I’m doing this the "hard way" has been really great. I’d highly recommend this tutorial if you’d like to learn hands on about Kubernetes.

Kubernetes the Hard Way

543 words


Batteries included backends

So I’ve been working on learning a new backend stack to follow through on some ideas I want to code out and it turns out there are quite a lot of batteries included offerings out there (sometimes referred to as BaaS offerings). Many of them are similar to Firebase in that they bundle data, storage and AA(authentication/authorization) and other useful bits together with slick management interfaces. All of these offer a relatively quicker path to get up and running with a app by bundling features together for a backend. I’m researching them so I thought I’d share what I’ve found.

I’m going to stick with the obviously somewhat open source ones here:

Some that are open source license ambiguous, not that I’m one to judge.

Some of these are actually wrappers for other services and applications of the same type but with different visions, offerings, [ENTER SOME SUCH DISTINGUISHING FEATURE HERE].

And some are more simply ORM like middleware offerings. No batteries included, but sometimes you don’t need batteries cause you’re working with anti-matter or something. I’ll again stick with the apparently open source offerings that include PostgresDB here.

232 words


Connecting to SupaBase from Rust in Reqwest async

So let’s just say you, my fair reader, are asking yourself, "How can I connect to a turnkey data and api solution from Rust?" Well it just so happens I had the same question last night and decided to give it a shot. Here it is using the Rust Reqwest crate in the async pattern:

use dotenv::dotenv; // Because we never hard code secrets, even in simple prototypes
use reqwest;
use reqwest::header::HeaderMap;
use reqwest::header::HeaderValue;

extern crate dotenv_codegen; // Excessive for such a small piece, but I like the look better

fn construct_headers() -> HeaderMap {

    let key = dotenv!("SUPABASE_KEY");
    let bearer: String = format!("Bearer {}", key);
    let mut headers = HeaderMap::new();
    headers.insert("apikey", HeaderValue::from_str(key).unwrap());

async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = reqwest::Client::new();

    let resp = client
    println!("{:#?}", resp.text().await?);

Of course there is a Rust SDK library coming supposedly, which provides for a more GraphQL like linked query type approach. But this is easy enough for what I’m curious about. And I’d just like to add that I am really beginning to grow fond of SupaBase the more I get into it. Seems the team has made some good design decisions so far and their feature production speed is great. I hope they can layer in some good code and company maturity growth now that they have paid tiers. It’s just really nice to see a turnkey solution built on top of PostgreSQL like this, I’d like to see them succeed.

261 words


Food that lies

VR Meeting Transcription

[Mark]> It’s not so much the impact of what is being reported, it’s about what is actually going on that I need to understand. Dave, could you explain to me what neurotic drift means?

Dave’s avatar takes center focus in the room.

[Dave]> Sure Mark. As you know, the Matrient packs provide the experience of a high quality traditional meal while being a standardized nutritional material, a synth ration essentially. Its taste, texture, presentation and form are little different from a compressed nutritional bar of synthetic materials that provide a perfect delivery of nutrition for an average human. To understand how we take this normally bland product and make it seem like a delicious meal you have to understand the nano-mechanized delivery of memory engrams.

Mark expands his group focus and interjects,

[Mark]> Yes, yes, try not to cover the details of what we make too much and get to the point.

Mark rescinds focus.

[Dave]> Ok. So the nanites embedded in the ration immediately infiltrate the blood stream and target the brain and mouth nervous systems within a few seconds. They deliver carefully tailored memory engrams that make the consumer think they are eating say, a delicious turkey dinner or a mouth watering hamburger. This is of course at odds with the sensory input the consumer is receiving and must continue to receive to finish the meal. To counter this discontinuity a low dose of neurotropic N-adylhyde-metacystine produces a brief opioid like response and dulls the brain’s confusion at the sensory discontinuity while also stimulating hunger briefly by breaking down quickly into ghrelin. This cascade of factors gives consumers the desired outcome of eating a ration bar while experiencing a fine meal 99.9999999% of the time.

Mark again expands his focus, but says nothing.

Dave relinquishes group focus while Mark considers that number with a lot of nines.

[Mark]> So every one in a billion meals something goes wrong with that "cascade of factors"?

Dave issues an avatar nod and resumes normal conversational focus.

[Mark]> And our product got approved for use based on that exact percentage you mention but based on people effected overall, not on the number of meals that fail to work?

Dave responds in conversational focus.

[Dave]> Yes, there was an error in the approval model that analyzed our submission. I’ve double checked our submission and our numbers are perfect and our data schema is correct and unambiguous. We are not at fault here.

Marks avatar indicates he is reviewing other information while holding focus.

[Mark]> That’s relieving to hear from you, but that doesn’t explain these reports filtering through the lower tiers of the net that we are worried about.

Dave takes conversational focus but with an icon that indicates importance and another that indicates speculation.

[Dave]> Well when it doesn’t work as designed, normally it’s just the cascade of factors. Usually it’s just a confusing experience as the discontinuity mitigations fail. Sometimes the meal becomes difficult to eat, sometimes the actual taste is not overidden but dual experienced, sometimes the brief high is too pronounced. But sometimes it’s the memory engrams that fail to embed correctly.

Mark takes focus.

[Mark]> I thought we determined that engrams failing was impossible? That either the memory takes or it fails and breaks down. Is this new behavior?

Dave takes focus and responds.

[Dave]> Yes, well new behavior to us, the simulation budget being what it was. It turns out there can be interactions between other engram injection systems. Unforeseen behavior in excessive injection of similar engrams. And some extremely rare physiology types that accept the engram but receive a completely different memory. When the engram fails in one of these edge cases the results can be particularly undesirable.

Marks’s avatar portrays annoyance.

[Dave]> Well, uh, the effects are usually minor. But we have confirmed some cases of psychosis.

Mark takes focus.

[Mark]> Is that all?

Dave highlights the speculative icon.

[Dave]> Uh, that one case that turned a consumer into a psychotic uncontrollable cannibal was an unexpected permanent implantation of the engram in the wrong cognitive area of the brain. We think we can avoid that ever happening again by adding some additional targeting meta-proteins in the engram sheath.

734 words


Considerations in Distributed Work Models.

So I’ve been considering what can make a distributed work model effective. Thinking about these systems for me at least brings into consideration a handful of successful distributed work systems, such as the blockchain for cryptocurrencies, the grid for projects such as BOINC/SETI/FAH I’ll just collectively call Grid, and of course the logistics system for Coca Cola in Africa that I’ll just call Soda. So we have our data set scope, let’s ask it some questions and postulate answers. I’m going to abstract down to things as basic as I can easily get them.

What drives participation in a distributed work model?


  • Grid = A sense of helping a larger goal
  • Blockchain = A currency like thing
  • Soda = Actual money

So we kind of get a range between the altruistic group collective goal and actual cash micro-rewards with maybe speculative value in between. I think this provides a good basis for a scale of reward. While it may seem obvious describing it clearly can allow for the direct correlation of credit in accordance with distributed model. This could be use to ambiguate a reward and use the scale to measure the reward in accordance with desired participation models. Maybe.

So there is also a scale here between centrally organized and truly distributed work. Where the closer we get to central management the more concrete the reward is. To some degree this correlates with the difficulty of the work, but I don’t see that as a hard correlation.

So assume we are dealing with a single network of reward and work to be done and a finite pool of credits with which to distribute for doing the work of the network. How would we use the scale to assign credit to best encourage the work to be done?

To postulate on what may work. So a given distributed economy may be considered a distinct network and may form with an arbitrary pool of credits. If we assume the motivations listed are sufficiently accurate the reward system would then scale the reward based speculation or difficulty. So that the smallest rewards are given towards the largest goals, relying on the sense of community effort as the primary reward and the credit number as just a confirmation of personal contribution. Moderate awards would be given towards work that may produce a larger return but at uncertain or unascertainable risk. Large rewards could be tied to complex work with less or no communal affirmation.

I think key to the idea of distributed work systems, well you know actually working, is that there needs to be constant alignment with the idea of minimal investment and minimal infrastructure requirements. I think setting as a basis some monetary investment minimum moves away from this idea of minimal viability.

So how would that work? Just arbitrary assignment of value? That might work for altruistic reward only, but without a way to exchange to more general accepted and exchangeable credit it seems lacking. To postulate without first closing previous postulates, maybe its a bit like you could issue credit similarly to stock but with contractually set exchanges. Balance exchange through a scale of tasks from speculative to concrete. Might work. Seems there would need to be clarity in the lack of fundamental value, speculation or existence of actual concrete backing. Though, to be deeply honest, this does not really exist in current real world fiat systems so why should it exist in virtual systems?

This brings the concept of bootstrapped economies to mind. That something of large value can rise out of something of minimal value, a phenomena of emergence. This is possible. This brings up some more questions worth pursuing. Does a plethora of micro economies increase the chance of value emergence? Do centralized features increase the chance of value emergence? What are the measurable features of value in a virtual economy? Given inputs and controls what increases value emergence and what stifles it? Lot’s of interesting questions, I think I’ll explore these.

To be continued...

680 words


Back to Machine Learning

It’s been a while since I checked in with Machine Learning tools, so here’s a post which is mostly me getting back into it. This is more or less from scratch as I haven’t done any work on my current laptop with machine learning, much less any work on PopOS in Python beyond a few simple scripts. So if you have Python and Pip installed, and you are on PopOS or maybe an Ubuntu based distro, this is sort of a walk through of how things setup.

This is the method of setting up these tools without Anaconda. While Anaconda seems great for beginners in Python, I’ve seen it cause all sorts of dependency issues when running in parallel with standard Python installs.

First make sure your Pip is up to date and you are not in some Python virtual environment:

gatewaynode@pop-os:~$ pip install --upgrade pip
Collecting pip
  Downloading pip-21.0.1-py3-none-any.whl (1.5 MB)
     |████████████████████████████████| 1.5 MB 4.5 MB/s 
Installing collected packages: pip
Successfully installed pip-21.0.1

Then install TensorFlow with pip. I seem to remember these need to be installed widely enough for Jupyter Notebooks(for any of you poor fools following this as tutorial of sorts, notebooks are Python in an interactive browser based console) to use, so at least as your user, possibly as system global installs.

gatewaynode@pop-os:~$ pip install tensorflow
Collecting tensorflow
  Downloading tensorflow-2.4.1-cp38-cp38-manylinux2010_x86_64.whl (394.4 MB)
<snip a lot of dependencies and such /> 
Successfully installed absl-py-0.11.0 astunparse-1.6.3 cachetools-4.2.1 flatbuffers-1.12 gast-0.3.3 google-auth-1.26.1 google-auth-oauthlib-0.4.2 google-pasta-0.2.0 grpcio-1.32.0 h5py-2.10.0 keras-preprocessing-1.1.2 markdown-3.3.3 numpy-1.19.5 opt-einsum-3.3.0 pyasn1-0.4.8 pyasn1-modules-0.2.8 requests-oauthlib-1.3.0 rsa-4.7 tensorboard-2.4.1 tensorboard-plugin-wit-1.8.0 tensorflow-2.4.1 tensorflow-estimator-2.4.0 termcolor-1.1.0 typing-extensions- werkzeug-1.0.1 wheel-0.36.2 wrapt-1.12.1

We’ll still most likely need a few more dependencies for the Notebooks to do some of their fancy-fancy.

gatewaynode@pop-os:~$ pip install scipy
Collecting scipy
  Downloading scipy-1.6.0-cp38-cp38-manylinux1_x86_64.whl (27.2 MB)

And we’ll need the Notebook server itself (runs locally to serve the notebook to your browser).

gatewaynode@pop-os:~$ pip install jupyterlab
Collecting jupyterlab
  Downloading jupyterlab-3.0.7-py3-none-any.whl (8.3 MB)

I like to review the dependencies so I at least have a small chance of knowing what is installed on my system. Something that stands out for me on jupyterlab is the prometheus-client-0.9.0, this worries me a bit. Prometheus is an application performance monitoring tool for time-series data. There are security implications to this client being setup by default, I’d be more worried about the server installed but this is concerning as well. I hope it doesn’t setup a default connection or is connectable from open source models I want to try to some random Prometheus server in Seychelles.

Launch the notebook server to make sure everything is working:

gatewaynode@pop-os:~$ jupyter-lab

Since I’m getting back into machine learning from a security perspective a few things stand out that might be useful if any future vectors present themselves.

[I 2021-02-13 23:34:53.037 ServerApp] Writing notebook server cookie secret to /home/gatewaynode/.local/share/jupyter/runtime/jupyter_cookie_secret
[I 2021-02-13 23:34:53.048 ServerApp] http://localhost:8888/lab?token=3f3205496238dcc675a68eee723bd59b572c6c69d616be62
    To access the server, open this file in a browser:

Like "cookie secret", I’m not sure why this little detail is shared with the Notebook user. But if I’m trying to hack data scientists that I know are running a particular local webserver this is helpful. Now I’m not actually trying to hack data scientists, but who knows, maybe my corp will want to red team this vector some day.

Now I can start playing with tensors interactively. More specifically I can start breaking them, document failure modes and start building a fuzzer.

578 words


Working on integrating Svelte as a progressive component system.

NOTE: While this whole site is rambling, this blog post is particularly so. This is not a "how-to". More of me just publishing my notes as I go along after finishing the first pass of the Svelte tutorial and trying to create some progressive components for my site (see the default component at the top of the base content page).

So starting a new Svelte app from scratch from inside my /static/ directory with:

npx degit sveltejs/template svelte

We get a templated project directory like so

 .
└──  svelte
   ├──  package.json
   ├──  public
   │  ├──  favicon.png
   │  ├──  global.css
   │  └──  index.html
   ├── 
   ├──  rollup.config.js
   ├──  scripts
   │  └──  setupTypeScript.js
   └──  src
      ├──  App.svelte
      └──  main.js

So the other starter instructions are to install with npm and start the dev server, which for brevity I’ll follow so:

cd svelte
npm install
npm run dev

This will get the build and rolling build update going, but I don’t really care about the dev server. I’ll get back to running the continuous build without it later. For now this gives us a /build directory and build artifacts, we just need the javascript bundle.js and the stylesheet bundle.css. A quick symlink of those into my standard /static/css and /static/javascript directories and now I can access them in the content using my CMS .content_meta file.

  "template_override": "",
  "javascript_include": [
  "javascript_inline": "",
  "css_include": [
  "css_inline": "",

There are more fields in the meta (which each piece of content has), but those five let you include other javascript and css files, inline snippets or even change the backend rendering template on a per piece of content basis. This is probably meant to be more global and long lived, but this is fine for now.

Save the content meta file, refresh my local page and whoa! Below my footer is the Svelte default starter template thingy.

Rollup looks pretty nice, at least the output is clean and colorful, and I’m looking for a potential replacement for Webpack after hitting a few snags with it so maybe Rollup will be the future replacement? I don’t know, we’ll see.

So to be minimally functional I need to be able to inject the Svelte components where I want them to go in the DOM. Sure I could write some Javascript to place them, but I’m thinking there might be support for that in Svelte itself. I don’t know, I just finished the tutorial so I really have no idea. Let’s look at the API reference.

Custom Element API this looks like what I’m trying to do. I tried creating a custom element in content, which for my CMS can just be to copy the markdown filename and give it the .html extension and replace the file contents with <svelte-demo></svelte-demo>. The CMS will attempt to render the HTML content just above the markdown file I’m using.

But the element is still at the bottom of the page. Ah, but it now uses the <svelte-demo> tag. This seems like the right direction, but not quite where I wanted it.

Ok, what else might there be in the API? Not much, but it seems like this should have worked. I must be missing something.

Insert random awesome blog that shows exactly what I missed

So the custom component tag in the content can’t be <svelte-demo></svelte-demo> it just needs to be <svelte-demo />. This injects the component in the custom HTML snippet I placed above this content, but it also wipes out the following content I created in Markdown. I’ve seen this before, so without diving into it much I just wrapped the tag in a container div and everything works as expected. This is what my content.html file looks like now:

<div id="svelte-container">
    <svelte-demo />

Well almost, it looks like I have 2 Svelte components now, one in the content card where I am embedding content where I’m expecting it that is missing the propery something and one at the bottom of the body that has the expected default prop(erty). Duplicate Svelte components would be a nasty bug, but I don’t think this is a bug in Svelte, more just something I’m not setting up right yet.

There is a console error about a prop not getting set, so I check that first. A quick change to the html content to set the property something gets me what I expect in the content embedded component. That’s great as I can pass the prop from the static renderer to the component through the Tera template.

<div id="svelte-container">
    <svelte-demo something="Gatewaynode" />

But I still have 2 components being rendered into the page. Let’s take a look at the source layout.

 .
└──  svelte
   └──  src
      ├──  App.svelte
      └──  main.js

The App.svelte file is our main place to write Svelte style JS, the main.js file is vanilla Javascript to initialize and construct the app. So looking in the main.js file I found the culprit for double rendering.

import App from ’./App.svelte’;

const demo = new App({
	// target: document.body,
	props: {
		something: ’Default’

export default demo;

The section, commented out here, tells the app to initialize in the document body. Which I don’t need as I’m already declaring it in the DOM where it should be, which is enough for Svelte. So the component renders correctly, you can tell because it has a shadow DOM which only exists for Svelte components, whole apps don’t use a shadow DOM. There is a pretty good explanation of why they have to use a shadow DOM for components here. If you are not familiar with how to see the shadow DOM in the developer tools you can also look at the page source (right click and choose "view page source") and search for "Mars" or "svelte-container" and notice none of the text you see is in the DOM, that’s because it’s being rendered in Javascript in your browser.

Added a small CSS snippet in the .content_meta to add a little border to the HTML above the markdown content that contains the component:

  "javascript_include": [
  "javascript_inline": "",
  "css_include": [
  "css_inline": "#svelte-container{border: 3px solid grey;}",

And I think that’s a good place to stop right with what is just a toy implementation of Svelte right now. It’s progress, but it also has a few bugs (component inline style API doesn’t seem to be working, something strange is going on with JS execution inside the component). Time to go back through the tutorial again and takes notes as to what to study as I go.

<<< Part One

1156 words


Supply chain intake sandboxes

So to speculate on what a system of intake sandboxes would be. How it might work. I’d like to do it without breaking budget, but given this is rambling speculation I’m not going to worry about that here. Let’s consider as a basis for this exercise the SolarWinds breach and Sunspot as the model of the currently most successful supply chain attack.

So one of the interesting details of Sunspot is how the execution is delayed a specific amount of time (10 - 12 days), which implies a high security intake sandbox environment that doesn’t last longer than 9 days before releasing a binary to production environments. Probably some internal SLA for a high security environment that was the primary target of the attack (note how a secondary piece of information leaked or stolen provides the security bypass).

So knowing this, how would our sandbox work?

Well it needs to be continuous, more like a persistent staging environment for all internal systems and processes. We can’t have a predictable period of time in which we carefully scrutinize application behavior because that can be determined and easily bypassed.

Does this mean some arbitrary, internal 7 day SLA can’t be honored?

I think you can still honor it, but it doesn’t mean you stop watching the new patch or binary very carefully. It also means we want to keep things in our sandbox longer if we can, but there is probably little stopping us from pushing something through quickly when needed. Our sandbox might need additional mechanisms outside of it so that we can rollback to a known good state if one day that patch we installed months ago suddenly start’s trying to beacon out to a sketchy data center in Saychelles. The SolarWinds compromise is proof positive that you can be infected for a long time and never know it.

And our sandbox needs to look like production in a way that isn’t easily distinguishable. We can’t just throw a VM on an isolated box and expect it to be adequate. VM’s are pretty much required here though, so that implies that our production environment also needs to be virtualized. There needs to be a network that looks like were we are going to run the software. This means an AD domain controller if it needs to live in a Windows network. Possibly a traffic generator of some sort so the system state isn’t idle too much. We’ll also need other active endpoints that look like our production network. Now none of these fake sandbox systems should be real, in fact they should probably just be very convincing honeypots. Or maybe ephemeral devices with a very short lifetime that are constantly getting re-hydrated and replaced, with the system state continuously examined on every tear down.

Entry into the sandbox should also trigger different analysis workflows, such as secondary checks on the file hashes through separate channels. And even though it’s not very effective, signature scanning engines are probably required. It might be worth while to be able to add the scanning engines into the sandbox periodically and watch for any changes once they are in the environment.

Binary analysis is also something we should consider, or failing that VM level debugging. We would want to be looking for new network capabilities and process detection/manipulation capabilities. So being able to investigate our new or updated application at that very low level would give us a real chance of detecting something like Sunburst.

Ok, enough rambling speculation.

This sandbox seems like it would be a major initiative, most likely needing it’s own support and engineering team. Even though I haven’t considered budget while thinking through this, it’s certainly going to be considerable. When I think of the how feasible this would be, I really think only the largest enterprises could consider a supply chain intake sandbox like this. Balanced against risk, most enterprises would not want to go through the effort to secure their upstream supply chain in this way. Maybe if a security vendor was to make such a system and spread the cost across enterprise clients? Maybe.

<< Previous

702 words


The year supply chain attacks exploded

So those of us working in the application security space have been watching with trepidation as the use of package managers and minimalist languages have cause exponential increases in application dependencies with not more than a little bit of fear. I remember some years ago when I was still in development, certain teams at a previous employer were moving to NodeJS for new projects more than a few engineers were concerned with the hundreds of dependencies they were including in their projects. It was really quite plain that there would be no way to review all the code being pulled in so the attitude we took was one of cross your fingers and hope for the best. Not exactly a good engineering technique.

The real wake up for me was in the following years when I transitioned to application security and became exposed to the vast swaths of vulnerabilities being publicly acknowledged by major vendors as that became a big thing. Just the sheer number of hard coded back doors and easily bypassed authentication systems from major vendors was crazy. It was mind boggling how much trust in large enterprise vendors had been misplaced. It was obvious back then that open source was generally better at preventing security problems just by being scrutable. And then the open source dependencies started having problems too. Not as bad as some large enterprise vendors, but still incidents were slipping through.

Now we seem to be at the beginning of an awakening of the massive weaknesses in IT supply chains. Obviously some of the recent attacks, like the one that hit SolarWinds, are very sophisticated and outside the realm of your average internet criminal organization. But with all the coverage and the release of techniques and malware source we can be sure there will be copy cats of lesser sophistication.

Sure there are some security systems an enterprise can implement to help. Software Composition Analysis tools can make you aware of your dependency scope, and let you act on known problems. Some repository management software will help you confirm things like hashes and let you slow down intake of new dependencies. But none of it would really help you against something targeted and stealthy like Sunspot.

That’s not to say nothing can be done to stop such an attack, it’s just what it would take is often well beyond what private enterprises and security vendors currently do. That’s not to mention that most engineers and managers in IT barely understand what the supply chain is, and often confuse it with the software development lifecycle which makes any defense placed too far down the line to be effective. So before I talk about speculative mitigations, let me define some logistics terms as I see them in software development:

"Upstream Supply Chain": Data, libraries and software that are brought into an enterprise to make software and services.

To understand how to defend this, it needs to be understood that this stops at some point where we need to enforce security before bad things can happen.

"Manufacturing Process": The process of creating code to create additional software or services, to support or as, the business practice. This manufacturing process is often referred to as the software development lifecycle (SDLC).

This is the development, manipulation or redistribution of the supplies. If an attack gets this far it’s too late to stop any damage from happening.

"Downstream Supply Chain": The data, libraries and software and services sold or otherwise distributed to the next level of customers.

It gets confusing because the parts of an IT supply chain look the same at most points. Code doesn’t get refined from ore into shiny ingots, and often it looks almost the same coming out of the manufacturing process as it does coming in to the manufacturing process. And logistics isn’t often a requirement of a computer science degree, so the distinction in parts is not alway intuitive.

The important things to note of these definitions is that they set clear enforcement boundaries and clear incident response scopes. Supply chain attacks should be stopped at the end of the upstream supply chain. This is the most effective place to stop them and has the smallest incident response scope to resolve when attacks are detected here.

Yes you could stop a malicious dependency at a release gate in your CI/CD pipeline. But that’s a bit too late if the developer workstation was the target all along right?

So how do we prevent attacks from reaching our manufacturing process? Well sandboxes are one way to do it, at least for full applications. Install and run them while looking for strange behavior. Maybe run copies of software we provide to our downstream in the sandbox as well with newer dependencies to see if the behavior has changed in unexpected ways? I haven’t seen systems like this, but we know something similar exists not the least because Sunspot had specific code for hiding from sandboxes and detonating in a delayed manner.

One way to do it would be with binary analysis, basically decompiling binaries, to scrutinize new behaviors seems like a possible way to catch upstream supply chain attacks like this. But what vendor is going to be ok with their customers running something like Ghidra on every release of their proprietary software and comparing behaviours?

There is a lot to think about and consider about high tech upstream supply chains. One thing is certain though, they are generally not secure and attacks against them are trending up.

Next >>

936 words