Skip to content

What is Malstrom?

Malstrom is a distributed, stateful stream processing framework written in Rust. In usage it is similar to Apache Flink or bytewax, although implemented fundamentally differently. Malstrom's goal is to offer best-in-class usability, reliability and performance, enabling everyone to build fast parallel systems with unparalleled up-time.


Distributed: Malstrom can run on many machines in parallel, sharing the processing workload and enabling zero-downtime scaling to fit any demand. Kubernetes is supported as a first-class deployment environment, others can be added through a public trait interface.

Stateful: Processing jobs can hold arbitrary state, which is snapshotted regularly to persistent storage like disk or S3. In case of failure or restarts, the job resumes from the last snapshot. Malstroms utilizes the ABS Algorithm, ensuring every message affects the state exactly once.

Usability: Malstrom provides a straight-forward dataflow API, which can be extended when needed. A simple threading model means no async, no complex lifetimes, no Send or Sync needed. Data only needs to be serialisable when explicitly send to other processes.

Reliability: Using the world's safest programming language makes building highly-reliable stream processors a breeze. In any case [[zero-downtime scaling]] and [[zero-downtime upgrades]] (TBD) allow for awesome uptime.

Code Example

rs
//! Stream processing can be easy!
use malstrom::operators::*;
use malstrom::runtime::MultiThreadRuntime;
use malstrom::sinks::{StatelessSink, StdOutSink};
use malstrom::snapshot::NoPersistence;
use malstrom::sources::{SingleIteratorSource, StatelessSource};
use malstrom::worker::StreamProvider;

fn main() {
    MultiThreadRuntime::builder()
        .persistence(NoPersistence)
        .parrallelism(1)
        .build(build_dataflow)
        .execute()
        .unwrap();
}

fn build_dataflow(provider: &mut dyn StreamProvider) {
    provider
        .new_stream()
        .source(
            "words",
            StatelessSource::new(SingleIteratorSource::new([
                "Look",
                "ma'",
                "I'm",
                "streaming",
            ])),
        )
        .map("upper", |x| x.to_uppercase())
        .sink("stdout", StatelessSink::new(StdOutSink));
}

This outputs

{ key: NoKey, value: "LOOK", timestamp: 0 }
{ key: NoKey, value: "MA'", timestamp: 1 }
{ key: NoKey, value: "I'M", timestamp: 2 }
{ key: NoKey, value: "STREAMING", timestamp: 3 }