Ahead of his presentation at the Big Data Innovation Summit in Singapore on March 1 & 2, we spoke to Cetin Karakus, Global Head, Analytics Core Strategies & Quantitative Development at BP.
Cetin has almost two decades of experience in designing and building large scale software systems. Over the last decade, he has worked on design and development of complex derivatives pricing and risk management systems in leading global investment banks and commodity trading houses. Prior to that, he has worked on various large scale systems ranging from VOIP stacks to ERP systems.
In his current role, he had the opportunity to build an investment bank grade quantitative derivatives pricing and risk infrastructure from scratch. Most recently, he has shifted his focus on building a proprietary state-of-the-art BigData analytics platform based existing open source tools and technologies.
Cetin has a degree in Electrical & Electronics Engineering and enjoys thinking and reading on various fields of humanities in his free time.
How do you think big data has impacted the energy industry over the past decade?
Even before the term 'big data was popular with mainstream IT world, the energy industry, especially the upstream oil and gas exploration business, were making heavy use of data and analytics nowadays considered as big data analytics. Getting oil and gas out of deep offshore basins are extremely complex and expensive undertakings. You better know where you are digging and be sufficiently certain that you will find oil and/or gas reserves there. The way energy companies deal with this is to collect huge amount of data (geoseismic) and use this in sophisticated reservoir simulation and visualization models and hence get a sense of what is out there before any expensive physical extraction operations undertaken.
Big data analytical techniques have also been used in plant operations and transportation networks. You could have thousands of miles of pipelines to transport oil and gas with associated pumps, valves, storage tanks, interconnect hubs, etc. in a pipeline network. There could be millions of sensors constantly monitoring the network and capturing vital operational data statistics that have to processed and acted upon by operations staff. Using big data analytics type systems you could resource your operations teams optimally i.e. use the optimum number of resources that will maintain healthy and safe operations.
What do you think is the most important use of big data in BP today?
I would say, this is my personal guess, it would be in upstream oil and gas exploration: reservoir simulation and visualization and reserve prediction. These models not only consume huge amount of data, they are also very computationally expensive and we utilize massively distributed computing infrastructure to run them. They play key role in upstream investment project decisions.
What challenges are you currently facing in using big data?
Being an integrated energy company, BP deals with the entire spectrum of an energy business: exploration, production, transportation, refining, distribution and trading. Moreover, it operates across the globe, employs tens of thousands of people and has presence in pretty much every country. When you think about it, a huge amount of data passes through BP's systems, some BP's own data, some belonging to third party data sources, and other publicly available data. Combining all this data and getting a holistic picture of the energy world, mapping business and market dynamics, I would say, is the main challenge. This is an enormous undertaking with a prize matching in size and is more of a journey than a fixed project.
How do you see the use of big data in the energy industry changing in the next 5 years?
I think we will see more usage of data analytical techniques in downstream (refining, marketing and distribution) businesses. This will improve demand forecasting and will optimize the end of the supply chain. Ultimately similar methods will be employed across the entire energy value chain, creating value by optimizing the whole of the supply chain, from production to distribution. As energy markets becoming more liquid and moving from a long-term supply based model to a trading based model, the use of data analytical techniques to gauge the market dynamics and supply/demand imbalances will also play a key role in any player's ability to remove those inefficiencies through profitable arbitrage operations.
The last question is about the presentation you are going to give on the summit. What can the audience expect to take away from your presentation?
There are a bewildering number of tools, applications and technologies in the big data space nowadays. The first thing someone new to the field will experience is confusion. While there are clearly no shortage of tools that do a specific task (e.g. store data in compact form, executes jobs, builds a machine learning model, etc.), none of those tools themselves will solve your specific problems and hence create value out of box. Just like tools in a toolbox, they are only useful in the hands of a craftsman who knows how to put them to good use to solve real problems. Continuing with the toolbox analogy, tools are also really useful when used together. It is very hard to find a single tool that does a lot of things and even when you find such a tool, it more often than not, does each of the things it is supposed to do rather poorly.
In my presentation, I will go through a modular framework of combining big data tools in the data flow pipeline and use those pipelines to accomplish different tasks. It is an extremely scalable approach both in terms of tools & software components, but also in terms of the people and teams who develop those tools. In fact it prescribes a role-based, distributed working model where you will have component developers, pipeline designers, pipeline executors, etc. that will fit nicely into a diverse big data analytics team.
You can catch Cetin's presentation at the Big Data Innovation Summit in Singapore on March 1 & 2.