Data Strategy is your next move
I just got back from two weeks in South Africa. It was one of the more intense business trips I’ve had in a long time. ... But by the end of the trip, it became clear that the real issue underneath almost every conversation was data strategy.
I just got back from two weeks in South Africa. It was one of the more intense business trips I’ve had in a long time. Four to five meetings a day with MSSPs, enterprises, technology providers, and organizations trying to scale quickly. We went there primarily to talk about cybersecurity, SIEM, analytics, and AI. But by the end of the trip, it became clear that the real issue underneath almost every conversation was data strategy.
And this wasn’t just a security problem. I saw the same pattern everywhere. Organizations are growing fast, AI initiatives are accelerating, and telemetry volumes are exploding. Companies are adopting modern analytical technologies on top of infrastructure that was originally designed for much smaller environments and much simpler workloads. The result is that many organizations now have massive amounts of data, but no clear strategy for how to manage or operationalize it effectively.
Historically, most organizations approached data in one of two ways. The first was simple: collect everything and dump it into one centralized platform or database. The second approach was federated search, where the data stayed where it lived and systems queried it remotely across the environment when needed. Both approaches sound reasonable on paper, but both start breaking down under operational scale.
The centralized model becomes expensive because every piece of data receives the same storage, indexing, and computational treatment whether it has value or not. The federated model avoids moving data, but pushes the complexity into the query itself. Every search now has to normalize inconsistent schemas, pull results from different systems, and reconstruct context dynamically. In both cases, the operational result is the same: higher costs, slower analytics, and increasing complexity just to keep the environment functioning.
What really changed the conversation was AI. AI is a data hog. It consumes enormous amounts of information, and it forced organizations to think about data differently. In the security world, we were already seeing this shift with platforms like Cribl and Ingext because SIEM environments were already struggling with telemetry scale. But outside of security, the rest of the industry started running into the exact same issue. Too much data, too much cost, and too much complexity trying to analyze everything at once.
That led to the rise of modern ETL and lakehouse architectures. ETL stands for extract, transform, and load. The goal was to pull data in, reshape it, normalize it, and store it in a structure better suited for analytics and AI processing. But AI also exposed another problem. Raw data by itself often has very little meaning.
Take something as simple as an IP address. By itself, it tells you almost nothing operationally useful. Once it is enriched, though, it suddenly becomes far more valuable. Now you know where it is located geographically, who owns it, whether it belongs to a cloud provider, or whether it is associated with anonymity services like TOR. That context does not exist naturally in the original telemetry. It has to be added.
Most modern analytical pipelines do this enrichment after the data is already stored. Systems like Ingext can enrich the data while it is moving through the pipeline itself. Either way, the end result is the same. The original telemetry is transformed into a completely different operational dataset, usually stored in highly compressed columnar formats like Apache Parquet for large-scale analytics.
So what does this have to do with you? The answer is simple. You already have enormous amounts of data flowing through your organization. The real question is whether all of that data deserves the same treatment. The answer is no.
Some data has no operational value at all. Delete it early and avoid paying indexing, storage, and analytical costs on information nobody is ever going to use. Other data may have future value, but not immediate value. Historical telemetry, logs, and supporting analytical data can be stored much more cheaply in lakehouse environments using compressed formats like Apache Parquet.
Then there is the small percentage of data you actually need right now. Alerts, operational analytics, and high-priority workflows. This is the gold. This is the data that deserves expensive real-time analytical processing because it directly impacts operations.
Once you separate these different classes of data, the downstream advantages become enormous. Costs drop because you stop treating every event equally. Operational systems become faster because they are no longer buried under low-value telemetry. Governance improves because data can be separated by tenant, geography, or regulatory boundary. Organizations also gain flexibility because they can decide where data should live and who should have access to it.
The cost savings part is fairly obvious. If I store and process data differently depending on its value, I save money. But the interesting part is that performance also improves, sometimes dramatically. That sounds backwards at first until you realize what is actually happening. The goal is not just reducing storage. The goal is increasing computational density.
Think about gold mining. If I’m digging through dirt looking for gold, the first thing I do is throw away the material that obviously has no value. Then I cheaply sift through lower-value material that might contain something useful later. Finally, I aggressively pull out the visible high-value chunks immediately because those are what matter operationally. Data works the same way.
When organizations separate data this way, their operational systems stop drowning in low-value telemetry. Searches touch fewer memory pages. Queries fan out across fewer datasets. Indexes become smaller, and enrichment does not have to be repeated constantly across irrelevant information. The result is faster searches, lower compute utilization, and more responsive analytics overall. Performance improves because unnecessary computation is eliminated before the search even begins.
If you read my blogs regularly, you know I usually approach problems from a security perspective. And honestly, this creates a much stronger security model because it improves visibility, governance, and operational control. But this problem has moved far beyond security. What we are really talking about now is enterprise-scale analytics and operational data management.
The same pressures are showing up everywhere. AI systems, cloud operations, networking platforms, observability tooling, and business analytics are all generating enormous amounts of operational telemetry. Every group is trying to answer the same question: how do we manage all this data efficiently without losing the operational value inside it?
A modern data strategy also allows organizations to intelligently route data to the groups that actually need it. Security teams may need one view of the data while networking or operations teams need another. Business analytics teams may need a completely different perspective. The goal is not simply storing data. The goal is making sure the right people receive the right data, in the right form, at the right cost, so they can make better decisions.
What made this trip interesting for me personally is that I did not go to South Africa expecting to come back talking about data strategy. I went there expecting conversations around cybersecurity, SIEM operations, AI, and analytics. But somewhere in the middle of all those meetings, the larger pattern became obvious. Organizations are growing quickly, modernizing aggressively, and trying to operationalize enormous amounts of data using infrastructure that was never really designed for this scale.
What also stood out to me is that this is not isolated to one region or one industry. In many ways, organizations in Africa are leapfrogging technologically because they are modernizing so quickly. That naturally exposes the need for better data strategy earlier. But honestly, I’m seeing the exact same pressures back in the United States. AI, analytics, compliance, and operational telemetry are all forcing organizations to ask the same question: are we building modern analytical systems on top of infrastructure that was never designed to support them?
The organizations that solve that problem well are going to operate faster, cheaper, and with significantly more flexibility over the next decade.