Big data is a term that is constantly used these days. Doesn't it simply mean a huge chunk of data? Before we get to the basics of what big data is, let us looks at the current challenges surrounding data and then outline the subsequent steps to remedy the situation.
The increasing problem of data growth
One of the key challenges faced by most organizations is the inability to manage the mammoth growth of data. As businesses are expanding, their customer base moves from local to national and subsequently global, and the amount of data is increasing too.
Organizations are discovering mechanics to increase their customer reach. For example, web portals allow customers to create and manage their products online, mobile apps allow customers access to manage their products using mobile devices, telephonic support allows customers to interact with banks’ customer services and other support teams, and email conversations with customers.
Data captured across all of these channels have large volumes, varieties of formats and high frequencies.
Let us take a look at the possible data sources which keep on growing with time.
Recorded conversations: Companies need to store data pertaining to the conversations which they had with their customers.
Images, videos and documents: Files containing important information pertaining to business sales and even customer demographics stored in these media forms both contribute to the increase in data size.
Email conversations: With the advent of online business, a huge amount of data is also stored in the form of emails.
Third party data: Often, data needs to be accrued from third party sources, and this also needs to be stored.
Online data sources: There is a large amount of data accrued from online sources, and they need to be aptly handled and managed for quick and efficient retrieval later.
Duplicate data: Duplicate and redundant data form the crux of our problem. This is a common problem faced by most organizations. Organizations with very reactive approaches to ad-hoc business needs to ultimately create redundant data stores.
Now that you know the problems of increasing data and the challenges it faces, it is time to seek the right solution.
The Big Data – An Overview
Big Data is data capability optimal for storing and processing large sets of data – structured or unstructured, and is capable of performing deep learning on image and text data, and leveraging machine learning algorithms to intelligently automate simple to complex data processing.
Some of the technologies that enable Big Data capabilities are Hadoop and HANA, etc.
Below, you can see how different industries derive value out of Big Data capabilities.
In this example, big data capabilities are leveraged to capture large volumes of flight sensor data and perform real-time learnings for early detection of engine performance. These results are not just used in the prevention of flight engine failures but are also used to develop more robust and high-performing engines.
In this example, big data capabilities are used to capture and process the fast moving data, and perform real-time learning that helps to detect criminal activities, thereby safeguarding themselves.
Large Retail Company
The retail sector perhaps has some of the highest data volume to deal with. There is a large amount of information to process, and every sale affects changes in stock, which in turn will also change the data details. So, big data infers incorporating all the changes with every single transaction and offering the ability to search and scale the analytics.
The retail sector needs live real-time data snapshots for making inventory decisions, handling storage and assessing the sales output.
Big Data is widely used in the Financial Industry (banks and insurance companies) to address the large-scale analytics, supporting risk and regulatory requirements while also supporting customers.
Typical Banking Big Data Use Cases
As the financial industry regulations are getting more stringent, and data is also generated through a variety of channels, banks and insurance companies feel the need to store large volumes of structured and unstructured data, perform analytics to detect fraud and money laundering networks, and report on at least twice the amount of data/years than previously.
Big Data platforms like Hadoop provide a cost-effective storage capability with distributed computing power to perform large-scale analytics on data, instead of moving data across applications.
Big Data value proposition to some of the critical Banking Use Cases:
Enhanced Customer Experience and behavioral analytics:
o Typically, customer operations and analytics are managed as separate capabilities. A customer master system is used to support customer operations and a variety of analytical systems are used to derive a variety of insights out of customer data. Bringing together customer master, transactions and historical data is near to impossible due to complex architecture and high expense and it is time-consuming - leading to analytics being performed on limited customer states and fewer years of historical data.
o Hadoop with Spark and Machine Learning provides a cost-effective and highly scalable environment – capable of storing large volumes of customer related transactions with decades of historical data. It also allows companies to ingest and process social information from online channels like Facebook, Twitter and online blogs. Distributed computing techniques can be applied to process information and run a pipeline of models to derive the customer financial and social value. Most importantly, it avoids moving data out of the Big Data capability by allowing the machine learning algorithms to be developed and executed within the Big Data capability – Move analytics to data, instead of moving data between applications.
Improved Compliance Incident detection and reporting:
o Money movements are widely tracked by rules based processes, transaction monitoring systems are highly dependent on subject matter experts. The rate of growth in the variety and volume of incidents require ad-hoc and continuous training of analysts and investigators and incur significant expense and time, causing delays in the speed of detection and reporting. It takes significant processing time to collect data from multiple systems, and cleansing for accurate deep learning outcomes.
o Hadoop with Spark and Machine Learning Libraries, allow the data science community to develop newer methods for tracking BSA-related information and for preventing, detecting and reporting sanction violations, money laundering, terrorist financing and other financial crimes in a timely manner. Customer and related money moment transactions can be captured and prepared at a granular level required by Anti-Money Laundering processes. Distributed computing techniques with a pipeline of algorithms can be applied to scale, detect suspicious activities and learn false-positives without manual intervention and data relocation.
Optimized Credit Risk Analytics:
o Analytics teams spending the majority of their time collecting data, fixing data issues and end-up creating redundant data stores. Little time is focused on analytics compared to data acquisition and standardizing the data to support models. It’s very intensive, expensive and time-consuming to collect the data at various granularity levels to support the credit risk models.
o Customer data and related transactions can be acquired once from sources, governed and managed in Hadoop. Scanned loan documents can be processed using machine learning techniques within the Hadoop environment, and its distributed computing techniques can be utilized to process larger volumes of data within Hadoop – Building a single version of the truth for active, as well as historical data. Huge productivity gains across analytical teams significantly reduce the time spent on data management activities and improving the accuracy of model outputs.
If properly designed and implemented, big value can be realized with a Big Data platform. It complements existing technology environments by significantly reducing data storage and processing costs and accomplishing business requirements with less complex and innovative solutions – laying the foundation for various transformation strategies.