Enhancing AML Model Using Graph Database

This post is originally published in Medium.com.


Money laundering is associated with the generation of a large amount of money from unlawful activities like drug trafficking, human trafficking, corruption or other organized crime and it appears as if the money has been raised from legitimate sources.

Based on one of the recent reports on money laundering, 800 billion to 2 trillion US dollars are laundered globally every year which is 2–5% of global GDP. Such a significant amount of money generated from the unlawful events implicates an adverse impact on global economic activities and it needs to be contemplated.

To address this rising issue, global financial regulatory institutions and central banks have created severe statutory rules, policies and regulations to prevent money laundering activities. Violations of these statutory policies and provisions by the financial institutions attract massive monetary fines and sanctions.

Traditional Anti Money Laundering Models (AML)

Financial institutions monitor all monetary transactions to prevent the occurrence of fraudulent transactions and activities. Banks across the globe invest a significant amount of money and effort into developing AML models to trace and detect fraudulent transactions. In the traditional anti-money laundering approach, financial institutions create rules around typologies to generate alerts and raise flags against the transactions that can be a potentially fraudulent transaction. Such an approach has their own merits but at the same time, these are perilous. The traditional approach does not take networks in accounts. Money laundered through a non-flagged person or entity may not get detected.

In the modern era when the world is evolving with new cutting-edge technologies, lawbreakers are also developing new techniques to launder money and escaping the banking system’s monitoring activities. To stay ahead of the curve, financial institutions should explore and formulate new techniques to combat money laundering frauds.

Harness Power of Graph in AML

Financial institutions face a large variety of challenges as they strive to fight the new techniques used by fraudsters to launder money. Post the digitization of the banking processes, Relational Databases have served as a backbone for serving enterprise applications. The relational databases have multiple benefits, but they inherit certain limitations which force the enterprises to look beyond relational databases. They now look for different analytical approaches which is being established as effective approach.

Empowering AML models with the vision of utilizing the network capabilities for detection of the fraud financial transactions. This approach will provide a bird’s view of the transactions performed by connecting network as well. This will also help in reducing false positives and can be used to process Suspicious Activity Reports (SAR) for suspicious transactions with connected data.

A four-step process should be utilized to extract the insight from available structured and unstructured data to establish and utilize the network to strengthen AML model with benefits of network analysis.

Data Sources for Establishing Network

In today’s digital world data, we are surrounded with data. There is a quote by LinkedIn CEO, Jeff Weiner — “Data powers everything that we do”. Based on a report, at the current pace 2.5 quintillion bytes of data is produced every day.

It is of prime importance to understand which relevant data sets should be utilized to generate insights on network linked to an individual or company. Organizations can customize the data sources to be utilized for the network development. Below is a list of the most commonly used data sources that should be used by financial institutions to establish network connection of individual or company.

  • Articles
  • KYC Data
  • Transaction Data
  • Social Network

Create Network Graphs from Articles

To understand how the customers or counter-parties are connected, leverage the data available for individuals and entities. Data can be gathered from the publicly available articles published in say last 5 years and can derive insight from that. NLP can be used for extracting meaningful data from the article.

Application packages like Spacy can be used to extract customers (nodes) and establish relationship. Divide the articles in sentences. To create the graph and understand how the customers are connected, extract the labels PERSON (individual) or ORG (entity) from the sentences, which can be uses as nodes. Derive the relationship between the nodes if they are specified together in a sentence.

To make it easier for the analysts who will be working on analyzing these connections, it would be a good practice to extract verb from the sentence and add it to show how they are connected. Also,adding sentence snippets from which the relationship is established will be useful for the analysts in investigation.

Create Network Graphs from KYC Data

When we are talking about financial institutions, one can never ignore the value KYC data provides. KYC is the abbreviation of “Know Your Customer”. Financial institutions use KYC process to establish identity and gather information about their customers to assess appropriateness and risks and understand if it is a legitimate business relationship.

A graph database can be created using KYC data. A graph database can be created with Customer Name as nodes and all available information like Address, Phone number as attributes. If multiple customers have common attributes, a relationship can be established.

Create Network Graphs from Transaction Data

Financial institutions can use their transaction data to understand how the customers are connected to other individuals or entities or other banks. This is very helpful while doing detailed research of suspicious transactions. This can be used to understand if money is transferred to multiple people in a relative timeline and multi-connections is being used for money laundering. For example, A is transferring money to B, C and D and they all are transferring money to E. So, money is eventually being transferred from A to E via B, C and D.

A graph database can be created with transactions data with Originator, Beneficiary, Intermediary, Originating Bank, Beneficiary Bank as nodes. The relationship can be established in between Originator and Beneficiary if a transaction is being processed between them. Also, relationship can be established between corresponding Originator and Originating Bank and Beneficiary and Beneficiary Bank. A snippet of transaction number will be helpful for analysts who will doing further investigative analysis.

Create Network Graphs from Social Network

In today’s era when whole world is aware of social network dominance and its power, it would be apparent to use social network sites’ data to establish and understand any connection.

Extract the data from all popular social media sites. The APIs of these are quite helpful in extracting the data. Create network graph from the extracted data to understand the connection between the individuals.

Create Rule to Fetch Associated Fraud Network

Once network connection from all available data sources is established, a consolidated central graph database should be created incorporating required entity resolution and entity consolidation to create one node for one individual/entity and discover insights from the connections of all data sources.

A Breadth First Search (BFS) rule can be created to fetch all the connections based on defined n level from the suspicious entity. BFS is a technique used for graph traversal to traverse from a node to its neighbors and then to neighbors of neighbor. Setting the level as 1 will provide the list of direct connections with the node. Level 2 will provide the adjacent neighbors and neighbors of neighbor.


The four-step approach lays out how to use data for deriving information to fight with fraud system. This approach can be combined with traditional AML models for setting up more efficient and effective fraud detection system. The secret to successfully combat frauds in the modern era is to use these enormous data available around to analyze trends, patterns and networks and utilize it effectively and efficiently in transaction models or risk models. As the world says, “Data is the most valuable resource”. So, keep digging new and innovative ways to use this resource wisely and effectively to generate new insights to make a secured and better world.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: