Gordon Murray

Posts

Sep 27, 2025
When Your Real-Time Dashboard Refuses to be Real-Time: An Evening with Rill, Paimon, and DuckDB
I sat down with the kind of confidence that only a a healthy CDC pipeline can deliver. MySQL was passing on changes to Flink, Flink was talking to Paimon, and...
May 14, 2025
When Lance Hits the Wall at 70 Images on Cloudflare R2
First impressions of using Lance with Python were excellent. It took almost no code at all to upload images, vectorize them, save the data in Lance format in R2 and...
May 5, 2025
Building a Digital Asset Management (DAM) project with different tools
Building a Digital Asset Management (DAM) project with different tools
Feb 17, 2024
Using OpenTofu to create an Apache Flink cluster on AWS
Open Tofu, the open source fork of Terraform™ became generally available last month. I wanted to create a project using Tofu to try it out and see if there were...
Jan 20, 2024
Excluding sensitive data from Debezium CDC
When using Debezium to source data from a relational database like Mariadb, it might be neccessary to block some data from being copied. It might be for privacy reasons like...
Jan 13, 2024
Sending Kafka topics events to HTTP endpoints
My experience with streaming data so far has been for internal use, streaming data within the same infrastructure. I wanted to try out the HTTP sink connector to see how...
Dec 28, 2023
Exposing metrics from logs using vector.dev
I’ve spent some time recently creating Apache Flink jobs to process data from a number of Kafka topics. The jobs work out some customer related counts so that recurring work...
Nov 18, 2023
WarpStream Apache Flink and Iceberg for a cost effective scalable logging solution
When developing and debugging systems, having logs is critical. A number of services exist to capture logs and provide a UI for searching and alerting, both 3rd party and self...
Nov 15, 2023
Monitoring Apache Flink containers using Prometheus
Apache Flink comes with a Prometheus plugin already in place in the /plugins folder and ready to use without adding any additional JAR files.
Nov 9, 2023
Apache Flink and Apache Iceberg
I recently tried out using Flink with Apache Paimon. Paimon is a “Streaming data lake platform with high-speed data ingestion”. My hope is to find a convenient way for Flink...
Nov 5, 2023
Trying out Apache Paimon with Flink
I’ve been working with Apache Flink recently processing data from Kafka topics. While creating pipelines I wanted to see if I could also send the data from the topics to...
Nov 2, 2023
Deploying Flink CDC Jobs with Docker compose
Running Apache Flink containers using Docker Compose is a convenient way to get up and running to try out some Flink workloads.
Oct 28, 2023
Misusing Catalogs in Apache Flink for identifying Jobs
When jobs are created in Flink using SQL, they show up in the jobs list with default names such as insert-into_default_catalog.default_database.sink_name. If you’re pulling records from multiple sources and sinking...
Oct 25, 2023
Apache Flink using Checkpoints
When running some Flink Jobs that perform CDC from a relational database, there were times that a Job would restart. The restart could be caused by a source database restarting...
Jan 9, 2023
Query an RDS Snapshot on s3 using Apache Drill
In 2020 AWS announced that RDS snapshots can be exported to s3. The resulting export is in Parquet format which is ideal for searching. “The Parquet format is up to...
Nov 27, 2022
Vector.dev, AWS S3 and SingleStore for fast log search
I wanted to explore the possibility of using AWS S3 as a back end for a logging solution. Not only the storage of logs which s3 handles very well but...
Jun 25, 2022
Using AWS Redshift for fast count queries
I work with a database that stores Sendgrid event information. Sendgrid is a popular service for sending emails and for each email sent to a recipient, Sendgrid provides a webhook...
May 29, 2022
Starting a Kafka journey
My first exposure to binary logs, or binlogs were just something that database replicas needed to use to keep up to date. I didn’t need to know anything more than...
Mar 16, 2022
Bash script for RDS maintenance
A useful Bash script to schedule a backup window, a maintenance window and some upgrades to perform on a number of RDS instances.
Mar 12, 2022
Quick and simple steps to get a single node Singlestore cluster running on AWS
Use at least an m5.xlarge ec2 instance for 4 CPUs