Posts
-
When Lance Hits the Wall at 70 Images on Cloudflare R2
First impressions of using Lance with Python were excellent. It took almost no code at all to upload images, vectorize them, save the data in Lance format in R2 and...
-
Building a Digital Asset Management (DAM) project with different tools
Building a Digital Asset Management (DAM) project with different tools
-
Using OpenTofu to create an Apache Flink cluster on AWS
Open Tofu, the open source fork of Terraform™ became generally available last month. I wanted to create a project using Tofu to try it out and see if there were...
-
Excluding sensitive data from Debezium CDC
When using Debezium to source data from a relational database like Mariadb, it might be neccessary to block some data from being copied. It might be for privacy reasons like...
-
Sending Kafka topics events to HTTP endpoints
My experience with streaming data so far has been for internal use, streaming data within the same infrastructure. I wanted to try out the HTTP sink connector to see how...
-
Exposing metrics from logs using vector.dev
I’ve spent some time recently creating Apache Flink jobs to process data from a number of Kafka topics. The jobs work out some customer related counts so that recurring work...
-
WarpStream Apache Flink and Iceberg for a cost effective scalable logging solution
When developing and debugging systems, having logs is critical. A number of services exist to capture logs and provide a UI for searching and alerting, both 3rd party and self...
-
Monitoring Apache Flink containers using Prometheus
Apache Flink comes with a Prometheus plugin already in place in the /plugins folder and ready to use without adding any additional JAR files.
-
Apache Flink and Apache Iceberg
I recently tried out using Flink with Apache Paimon. Paimon is a “Streaming data lake platform with high-speed data ingestion”. My hope is to find a convenient way for Flink...
-
Trying out Apache Paimon with Flink
I’ve been working with Apache Flink recently processing data from Kafka topics. While creating pipelines I wanted to see if I could also send the data from the topics to...
-
Deploying Flink CDC Jobs with Docker compose
Running Apache Flink containers using Docker Compose is a convenient way to get up and running to try out some Flink workloads.
-
Misusing Catalogs in Apache Flink for identifying Jobs
When jobs are created in Flink using SQL, they show up in the jobs list with default names such as insert-into_default_catalog.default_database.sink_name. If you’re pulling records from multiple sources and sinking...
-
Apache Flink using Checkpoints
When running some Flink Jobs that perform CDC from a relational database, there were times that a Job would restart. The restart could be caused by a source database restarting...
-
Query an RDS Snapshot on s3 using Apache Drill
In 2020 AWS announced that RDS snapshots can be exported to s3. The resulting export is in Parquet format which is ideal for searching. “The Parquet format is up to...
-
Vector.dev, AWS S3 and SingleStore for fast log search
I wanted to explore the possibility of using AWS S3 as a back end for a logging solution. Not only the storage of logs which s3 handles very well but...
-
Using AWS Redshift for fast count queries
I work with a database that stores Sendgrid event information. Sendgrid is a popular service for sending emails and for each email sent to a recipient, Sendgrid provides a webhook...
-
Starting a Kafka journey
My first exposure to binary logs, or binlogs were just something that database replicas needed to use to keep up to date. I didn’t need to know anything more than...
-
Bash script for RDS maintenance
A useful Bash script to schedule a backup window, a maintenance window and some upgrades to perform on a number of RDS instances.
-
Quick and simple steps to get a single node Singlestore cluster running on AWS
Use at least an m5.xlarge ec2 instance for 4 CPUs