UPDATED 18:52 EDT / JUNE 19 2023

CLOUD

Databases then and now: the rise of the digital twin

When I first started in information technology, back in the Mainframe Dark Ages, we had hulking big databases that ran on IBM Corp.’s Customer Information Control System, written in COBOL.

These mainframes ran on a complex collection of hardware and operating systems that was owned lock, stock, and bus and tag barrel by IBM. The average age of the code was measured in decades, and code changes were measured in months. They contained millions of transactions, and the data was always out of date since it was a batch system, meaning every night new data would be uploaded.

Contrast that to today’s typical database setup. Data is current to the second, code is changed hourly, and the nature of what constitutes a transaction has changed significantly to something that is now called a “digital twin.” Code is written in dozens of higher-level languages that have odd names that you may never have heard of, and this code runs on a combination of cloud and on-premises equipment that uses loads of microprocessors and open-source products that can be purchased from hundreds of suppliers.

It really is remarkable, especially since these changes have happened all within the span of a little more than 35 years. Many of the changes have happened within the past five years.

Digital twins get real

Take Uber Technologies Inc. as an example. Now, if you had told me five years ago that I would now be lauding praise over this superheated tech bro Silicon Valley darling, I would have thought you were crazy. But indeed, it has come a long way. SiliconANGLE’s Dave Vellante took a closer look at how its technology has changed to create a digital twin.

Ironically, IBM has a very cogent definition and examples of what constitutes a digital twin. The term has come into its own thanks to a combination of trends. First, there’s hyperscale cloud computing that can be used to house very large databases and operate in nearly real time to absorb, process and react to all that data.

Second, this data is not just a table of customers and their invoices and payments, but data that reflects places and things that these customers find important, and why they are doing business with a company in the first place.

Third, the concept is based a growing crop of what is called NewSQL databases. This term might not be as familiar to all, but it combines the best of the original SQL databases — something that grew out of the mainframe era of the 1980s — and NoSQL worlds of big data (think Hadoop and Cassandra). They combine them into something that can be used to store all these real-time data points, and map them to a set of more meaningful data tables used to make business decisions.

Back in the 1980s, my first PC-oriented database app was something called dBase II. It was a transformational bit of tech, because it didn’t require users to wait for those mainframe COBOLers to code something up and return the result later in the year. Instead, users could sit down in front of their (character-mode) PC screen and write the code in a hour or a day and see the results almost immediately.

What does this have to do with tables? Because that is how humans think about data, in terms of rows and columns. The other PC software tool from that era was Lotus 1-2-3. Although it was a spreadsheet, it was often used to construct databases because you could see the rows and columns on the screen. And if you were fortunate, you could outfit your PC with a special graphics card so you could draw pretty visualizations on the screen from this data.

Real-time at scale

Anyway, back to the present-day Uber story: What makes NewSQL powerful is that you can map data transactions to real-time events. When you bring up the Uber app, you want to see if there are any drivers nearby who can take you where you want to go — and see this now, not in the indeterminate future. And, once you are paired up with a driver, you can track their car’s progress as it approaches your pickup spot.

This was the killer reason why regular taxi companies have fallen by the wayside. Now every customer, not just the taxi dispatcher, can see what is happening with the network of cars in real time.

The next trend is a term that is bandied about frequently, having to do with scalability. But the digital-twin world takes this to new highs — and rates of speed — because it’s scaling across multiple degrees of freedom concurrently. Those people, places and things are moving about in the real world, and our systems have to keep up to record all these digital tracks. It used to be that scalability had to do with starting up a new virtual machine or cloud instance as sales volume increased, but this is a lot harder problem to solve.

It helps that today we have very powerful computers, and very large storage capacities, and programmers who think in terms of this massive amount of data. And with the current craze about artificial intelligence, these digital-twin situations are becoming more common. As one example, my daughter works for a company that has created a digital twin for retailers that are analyzing their shopping patterns.

IBM makes this point: “The difference between a digital twin and a simulation is largely a matter of scale: While a simulation typically studies one particular process, a digital twin can itself run any number of useful simulations in order to study multiple processes.” This makes it easier for a business such as Uber to keep up with changes in market conditions or customer demand, or entering new markets such as food delivery.

Part of the challenge for Uber specifically was that knowing what not to update the database — such as ETA estimates, which don’t really change minute-by-minute — is almost as important as what to update and when.

Doing effective digital twins ain’t easy. You need the right combination of programming talent, tech and understanding how data is created and consumed for a particular business. You have to dance carefully between data privacy and creepiness factors, something that advertisers and social media companies still haven’t figured out. And you have to innovate continually to modify the twin to reflect changes in the physical world.

It is a tough challenge. But we are certainly not returning to the COBOL era.

Image: SAP/YouTube

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One-click below supports our mission to provide free, deep and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU