How to use LLMs as a learning tool for Data Engineers

There are varying opinions on LLMs. I take it as a disrupting technology – a tool that can be used, misused or misapplied. Disruptive technologies tend to be over-hyped in the short-term and under-appreciated over longer horizons (Read How Innovation Works if you want to know more). LLMs in pedagogy are most exciting to me. I attest to its efficacy. 

One of the demands of Data Engineering is to learn new tech quickly. I want to show you how I use LLMs towards this end. 

A Framework for Skill Acquisition

Imagine you’re a Data Engineer, tasked with deploying a Memgraph graph database for a new client. The database will serve as a model for fraud detection, for an insurance company. 

The theme of self-evaluation (mentioned in my previous blogs) continues. The Dreyfus model is one methodology to use: 

I’d be a Novice if I’ve only heard about graph databases, or played around with them for exploration; An Advanced Beginner has used graph databases as part of his/her professional life; a Competent person can use graph databases for the correct purpose and efficiently etc. 

Here are some key observations: 

  • There is no shame in being a novice in any field. I am a novice in many areas and view it as an exciting challenge to learn something new
  • There’s a difference between being an expert and thinking you’re one. I have to repeat the importance of self-honesty in your evaluation 
  • Reaching a level of Competent is sufficient for most endeavours 
  • Try to objectively evaluate your level of expertise in a field if you find self-evaluation challenging. 

Enter LLMs

That is where LLMs come into play. Here is a PDF I generated with the prompts in this link: 

Dreyfus_Model_Graph_Database_Self_Assessment

You can download the file here:

The ability to generate a bespoke self-evaluation framework – for what I’m trying to learn –  already demonstrates the power of this approach. 

Interactive Learning

After completing the self-evaluation framework, I’ve determined I’m at a Novice level. I ask the LLM to base its teaching on my current level. Here is the prompt I used: 

I am at a Novice level. Teach me the basics of Graph DBs, using Memgraph. Use examples and links to references, where appropriate. 

Basics of Graph Databases Using Memgraph for Fraud Detection in an Insurance Company

Key Concepts

  1. Nodes: Represent entities such as customers, policies, claims, and agents.
  2. Edges: Represent relationships such as ownership, submission, approval, and connections between entities.
  3. Properties: Key-value pairs associated with nodes or edges.

Getting Started with Memgraph

Installation

To install Memgraph, you can follow the instructions on the official Memgraph Installation Guide.

Basic Operations

  1. Starting Memgraph After installation, start Memgraph by running:
   mgconsole
  1. Creating Nodes and Relationships
  • Create Nodes CREATE (c1:Customer {name: 'Alice', customerID: 'C001'}), (c2:Customer {name: 'Bob', customerID: 'C002'}), (p1:Policy {policyID: 'P001', type: 'Auto'}), (p2:Policy {policyID: 'P002', type: 'Home'}), (cl1:Claim {claimID: 'CL001', amount: 5000}), (cl2:Claim {claimID: 'CL002', amount: 15000}), (a1:Agent {name: 'Charlie', agentID: 'A001'}), (a2:Agent {name: 'Dana', agentID: 'A002'});
  • Create Relationships
    cypher MATCH (c1:Customer {name: 'Alice'}), (c2:Customer {name: 'Bob'}), (p1:Policy {policyID: 'P001'}), (p2:Policy {policyID: 'P002'}), (cl1:Claim {claimID: 'CL001'}), (cl2:Claim {claimID: 'CL002'}), (a1:Agent {name: 'Charlie'}), (a2:Agent {name: 'Dana'}) CREATE (c1)-[:OWNS]->(p1), (c2)-[:OWNS]->(p2), (c1)-[:SUBMITTED]->(cl1), (c2)-[:SUBMITTED]->(cl2), (a1)-[:APPROVED]->(cl1), (a2)-[:APPROVED]->(cl2), (c1)-[:CONNECTED_TO {type: 'Friend'}]->(c2);
  1. Querying Data
  • Retrieve Customers and Policies MATCH (c:Customer)-[:OWNS]->(p:Policy) RETURN c.name, p.policyID, p.type;
  • Retrieve Claims and Approval Agents
    cypher MATCH (cl:Claim)<-[:SUBMITTED]-(c:Customer), (a:Agent)-[:APPROVED]->(cl) RETURN c.name, cl.claimID, cl.amount, a.name;
  1. Updating Data
  • Update a Node MATCH (c:Customer {name: 'Alice'}) SET c.email = 'alice@example.com';
  • Update a Relationship
    cypher MATCH (c1:Customer {name: 'Alice'})-[r:CONNECTED_TO]->(c2:Customer {name: 'Bob'}) SET r.since = 2022;
  1. Deleting Data
  • Delete a Node MATCH (cl:Claim {claimID: 'CL002'}) DELETE cl;
  • Delete a Relationship
    cypher MATCH (c1:Customer {name: 'Alice'})-[r:CONNECTED_TO]->(c2:Customer {name: 'Bob'}) DELETE r;

Examples

Example 1: Detecting Suspicious Claims

  1. Create Nodes and Relationships
   CREATE (c1:Customer {name: 'Alice', customerID: 'C001'}),
          (c2:Customer {name: 'Bob', customerID: 'C002'}),
          (p1:Policy {policyID: 'P001', type: 'Auto'}),
          (p2:Policy {policyID: 'P002', type: 'Home'}),
          (cl1:Claim {claimID: 'CL001', amount: 5000, status: 'Pending'}),
          (cl2:Claim {claimID: 'CL002', amount: 15000, status: 'Approved'}),
          (a1:Agent {name: 'Charlie', agentID: 'A001'}),
          (a2:Agent {name: 'Dana', agentID: 'A002'});

   MATCH (c1:Customer {name: 'Alice'}), (c2:Customer {name: 'Bob'}),
         (p1:Policy {policyID: 'P001'}), (p2:Policy {policyID: 'P002'}),
         (cl1:Claim {claimID: 'CL001'}), (cl2:Claim {claimID: 'CL002'}),
         (a1:Agent {name: 'Charlie'}), (a2:Agent {name: 'Dana'})
   CREATE (c1)-[:OWNS]->(p1),
          (c2)-[:OWNS]->(p2),
          (c1)-[:SUBMITTED]->(cl1),
          (c2)-[:SUBMITTED]->(cl2),
          (a1)-[:APPROVED]->(cl1),
          (a2)-[:APPROVED]->(cl2),
          (c1)-[:CONNECTED_TO {type: 'Friend'}]->(c2);
  1. Query to Identify Suspicious Claims
   MATCH (c1:Customer)-[:SUBMITTED]->(cl:Claim {status: 'Pending'}),
         (c2:Customer)-[:CONNECTED_TO]->(c1)
   RETURN c1.name, cl.claimID, cl.amount, c2.name;

Example 2: Identifying Potential Fraud Rings

  1. Create Nodes and Relationships
   CREATE (c1:Customer {name: 'Alice', customerID: 'C001'}),
          (c2:Customer {name: 'Bob', customerID: 'C002'}),
          (c3:Customer {name: 'Eve', customerID: 'C003'}),
          (p1:Policy {policyID: 'P001', type: 'Auto'}),
          (p2:Policy {policyID: 'P002', type: 'Home'}),
          (p3:Policy {policyID: 'P003', type: 'Health'}),
          (cl1:Claim {claimID: 'CL001', amount: 5000}),
          (cl2:Claim {claimID: 'CL002', amount: 15000}),
          (cl3:Claim {claimID: 'CL003', amount: 25000}),
          (a1:Agent {name: 'Charlie', agentID: 'A001'}),
          (a2:Agent {name: 'Dana', agentID: 'A002'});

   MATCH (c1:Customer {name: 'Alice'}), (c2:Customer {name: 'Bob'}),
         (c3:Customer {name: 'Eve'}),
         (p1:Policy {policyID: 'P001'}), (p2:Policy {policyID: 'P002'}),
         (p3:Policy {policyID: 'P003'}),
         (cl1:Claim {claimID: 'CL001'}), (cl2:Claim {claimID: 'CL002'}), (cl3:Claim {claimID: 'CL003'}),
         (a1:Agent {name: 'Charlie'}), (a2:Agent {name: 'Dana'})
   CREATE (c1)-[:OWNS]->(p1),
          (c2)-[:OWNS]->(p2),
          (c3)-[:OWNS]->(p3),
          (c1)-[:SUBMITTED]->(cl1),
          (c2)-[:SUBMITTED]->(cl2),
          (c3)-[:SUBMITTED]->(cl3),
          (a1)-[:APPROVED]->(cl1),
          (a2)-[:APPROVED]->(cl2),
          (a1)-[:APPROVED]->(cl3),
          (c1)-[:CONNECTED_TO {type: 'Friend'}]->(c2),
          (c2)-[:CONNECTED_TO {type: 'Friend'}]->(c3);
  1. Query to Identify Potential Fraud Rings
   MATCH (c1:Customer)-[:CONNECTED_TO]->(c2:Customer)-[:CONNECTED_TO]->(c3:Customer)
   WHERE (c1)-[:SUBMITTED]->(:Claim)<-[:APPROVED]-(a:Agent)-[:APPROVED]->(:Claim)<-[:SUBMITTED]-(c3)
   RETURN c1.name, c2.name, c3.name, a.name;

References

By understanding and using these examples, you can start working with Memgraph to detect fraud in an insurance company context. As you gain more experience, explore advanced topics such as graph algorithms for anomaly detection, real-time analytics

The response from Chat-GPT is already helpful but I can make it even more powerful by incorporating my frame of reference. Remember we need to use it in an insurance company for fraud detection: 

Change the examples to be based on an Insurance company that wants to use Memgraph for fraud detection

Basics of Graph Databases Using Memgraph for Fraud Detection in an Insurance Company

Key Concepts

  1. Nodes: Represent entities such as customers, policies, claims, and agents.
  2. Edges: Represent relationships such as ownership, submission, approval, and connections between entities.
  3. Properties: Key-value pairs associated with nodes or edges.

Getting Started with Memgraph

Installation

To install Memgraph, you can follow the instructions on the official Memgraph Installation Guide.

Basic Operations

  1. Starting Memgraph After installation, start Memgraph by running:
   mgconsole
  1. Creating Nodes and Relationships
  • Create Nodes CREATE (c1:Customer {name: 'Alice', customerID: 'C001'}), (c2:Customer {name: 'Bob', customerID: 'C002'}), (p1:Policy {policyID: 'P001', type: 'Auto'}), (p2:Policy {policyID: 'P002', type: 'Home'}), (cl1:Claim {claimID: 'CL001', amount: 5000}), (cl2:Claim {claimID: 'CL002', amount: 15000}), (a1:Agent {name: 'Charlie', agentID: 'A001'}), (a2:Agent {name: 'Dana', agentID: 'A002'});
  • Create Relationships
    cypher MATCH (c1:Customer {name: 'Alice'}), (c2:Customer {name: 'Bob'}), (p1:Policy {policyID: 'P001'}), (p2:Policy {policyID: 'P002'}), (cl1:Claim {claimID: 'CL001'}), (cl2:Claim {claimID: 'CL002'}), (a1:Agent {name: 'Charlie'}), (a2:Agent {name: 'Dana'}) CREATE (c1)-[:OWNS]->(p1), (c2)-[:OWNS]->(p2), (c1)-[:SUBMITTED]->(cl1), (c2)-[:SUBMITTED]->(cl2), (a1)-[:APPROVED]->(cl1), (a2)-[:APPROVED]->(cl2), (c1)-[:CONNECTED_TO {type: 'Friend'}]->(c2);
  1. Querying Data
  • Retrieve Customers and Policies MATCH (c:Customer)-[:OWNS]->(p:Policy) RETURN c.name, p.policyID, p.type;
  • Retrieve Claims and Approval Agents
    cypher MATCH (cl:Claim)<-[:SUBMITTED]-(c:Customer), (a:Agent)-[:APPROVED]->(cl) RETURN c.name, cl.claimID, cl.amount, a.name;
  1. Updating Data
  • Update a Node MATCH (c:Customer {name: 'Alice'}) SET c.email = 'alice@example.com';
  • Update a Relationship
    cypher MATCH (c1:Customer {name: 'Alice'})-[r:CONNECTED_TO]->(c2:Customer {name: 'Bob'}) SET r.since = 2022;
  1. Deleting Data
  • Delete a Node MATCH (cl:Claim {claimID: 'CL002'}) DELETE cl;
  • Delete a Relationship
    cypher MATCH (c1:Customer {name: 'Alice'})-[r:CONNECTED_TO]->(c2:Customer {name: 'Bob'}) DELETE r;

Examples

Example 1: Detecting Suspicious Claims

  1. Create Nodes and Relationships
   CREATE (c1:Customer {name: 'Alice', customerID: 'C001'}),
          (c2:Customer {name: 'Bob', customerID: 'C002'}),
          (p1:Policy {policyID: 'P001', type: 'Auto'}),
          (p2:Policy {policyID: 'P002', type: 'Home'}),
          (cl1:Claim {claimID: 'CL001', amount: 5000, status: 'Pending'}),
          (cl2:Claim {claimID: 'CL002', amount: 15000, status: 'Approved'}),
          (a1:Agent {name: 'Charlie', agentID: 'A001'}),
          (a2:Agent {name: 'Dana', agentID: 'A002'});

   MATCH (c1:Customer {name: 'Alice'}), (c2:Customer {name: 'Bob'}),
         (p1:Policy {policyID: 'P001'}), (p2:Policy {policyID: 'P002'}),
         (cl1:Claim {claimID: 'CL001'}), (cl2:Claim {claimID: 'CL002'}),
         (a1:Agent {name: 'Charlie'}), (a2:Agent {name: 'Dana'})
   CREATE (c1)-[:OWNS]->(p1),
          (c2)-[:OWNS]->(p2),
          (c1)-[:SUBMITTED]->(cl1),
          (c2)-[:SUBMITTED]->(cl2),
          (a1)-[:APPROVED]->(cl1),
          (a2)-[:APPROVED]->(cl2),
          (c1)-[:CONNECTED_TO {type: 'Friend'}]->(c2);
  1. Query to Identify Suspicious Claims
   MATCH (c1:Customer)-[:SUBMITTED]->(cl:Claim {status: 'Pending'}),
         (c2:Customer)-[:CONNECTED_TO]->(c1)
   RETURN c1.name, cl.claimID, cl.amount, c2.name;

Example 2: Identifying Potential Fraud Rings

  1. Create Nodes and Relationships
   CREATE (c1:Customer {name: 'Alice', customerID: 'C001'}),
          (c2:Customer {name: 'Bob', customerID: 'C002'}),
          (c3:Customer {name: 'Eve', customerID: 'C003'}),
          (p1:Policy {policyID: 'P001', type: 'Auto'}),
          (p2:Policy {policyID: 'P002', type: 'Home'}),
          (p3:Policy {policyID: 'P003', type: 'Health'}),
          (cl1:Claim {claimID: 'CL001', amount: 5000}),
          (cl2:Claim {claimID: 'CL002', amount: 15000}),
          (cl3:Claim {claimID: 'CL003', amount: 25000}),
          (a1:Agent {name: 'Charlie', agentID: 'A001'}),
          (a2:Agent {name: 'Dana', agentID: 'A002'});

   MATCH (c1:Customer {name: 'Alice'}), (c2:Customer {name: 'Bob'}),
         (c3:Customer {name: 'Eve'}),
         (p1:Policy {policyID: 'P001'}), (p2:Policy {policyID: 'P002'}),
         (p3:Policy {policyID: 'P003'}),
         (cl1:Claim {claimID: 'CL001'}), (cl2:Claim {claimID: 'CL002'}), (cl3:Claim {claimID: 'CL003'}),
         (a1:Agent {name: 'Charlie'}), (a2:Agent {name: 'Dana'})
   CREATE (c1)-[:OWNS]->(p1),
          (c2)-[:OWNS]->(p2),
          (c3)-[:OWNS]->(p3),
          (c1)-[:SUBMITTED]->(cl1),
          (c2)-[:SUBMITTED]->(cl2),
          (c3)-[:SUBMITTED]->(cl3),
          (a1)-[:APPROVED]->(cl1),
          (a2)-[:APPROVED]->(cl2),
          (a1)-[:APPROVED]->(cl3),
          (c1)-[:CONNECTED_TO {type: 'Friend'}]->(c2),
          (c2)-[:CONNECTED_TO {type: 'Friend'}]->(c3);
  1. Query to Identify Potential Fraud Rings
   MATCH (c1:Customer)-[:CONNECTED_TO]->(c2:Customer)-[:CONNECTED_TO]->(c3:Customer)
   WHERE (c1)-[:SUBMITTED]->(:Claim)<-[:APPROVED]-(a:Agent)-[:APPROVED]->(:Claim)<-[:SUBMITTED]-(c3)
   RETURN c1.name, c2.name, c3.name, a.name;

References

By understanding and using these examples, you can start working with Memgraph to detect fraud in an insurance company context. As you gain more experience, explore advanced topics such as graph algorithms for anomaly detection, real-time analytics

The code examples include the creation of a Customer node. If you don’t know what a Node is, look at the conversation and ask more questions to the LLM.

MATCH (c1:Customer)-[:SUBMITTED]->(cl:Claim {status: 'Pending'}),
(c2:Customer)-[:CONNECTED_TO]->(c1)
RETURN c1.name, cl.claimID, cl.amount, c2.name;

Graph databases regard objects and the relationships between objects as equally important. If you read the line of code above, can you understand what it’s asking? Have a conversation with the LLM if you’re not sure. 

As a point of departure, be sure to get your hands dirty: 

Is there a playground I can use? 

Yes, Memgraph provides a Memgraph Playground where you can practice and experiment with graph databases using Cypher queries. This online platform allows you to run queries, visualize data, and explore graph structures without the need to install anything locally.

Here are the steps to get started with Memgraph Playground:

  1. Access the Playground
  1. Explore the Interface
  • Familiarize yourself with the user interface. You’ll see a query editor on the left and a results/visualization panel on the right.
  1. Run Basic Queries
  • You can start by running some basic Cypher queries to create nodes, relationships, and retrieve data. For example: CREATE (c1:Customer {name: 'Alice', customerID: 'C001'}), (c2:Customer {name: 'Bob', customerID: 'C002'}), (p1:Policy {policyID: 'P001', type: 'Auto'}), (p2:Policy {policyID: 'P002', type: 'Home'}), (cl1:Claim {claimID: 'CL001', amount: 5000}), (cl2:Claim {claimID: 'CL002', amount: 15000}), (a1:Agent {name: 'Charlie', agentID: 'A001'}), (a2:Agent {name: 'Dana', agentID: 'A002'}); MATCH (c1:Customer {name: 'Alice'}), (c2:Customer {name: 'Bob'}), (p1:Policy {policyID: 'P001'}), (p2:Policy {policyID: 'P002'}), (cl1:Claim {claimID: 'CL001'}), (cl2:Claim {claimID: 'CL002'}), (a1:Agent {name: 'Charlie'}), (a2:Agent {name: 'Dana'}) CREATE (c1)-[:OWNS]->(p1), (c2)-[:OWNS]->(p2), (c1)-[:SUBMITTED]->(cl1), (c2)-[:SUBMITTED]->(cl2), (a1)-[:APPROVED]->(cl1), (a2)-[:APPROVED]->(cl2), (c1)-[:CONNECTED_TO {type: 'Friend'}]->(c2);
  1. Visualize the Graph
  • After running the queries, you can visualize the graph structure in the results panel. This will help you see the connections and relationships between different entities.
  1. Experiment and Learn
  • Use the playground to experiment with different queries, explore the dataset, and learn how to manipulate and query graph data using Cypher.

Resources

Feel free to explore and practice your skills in the Memgraph Playground to get hands-on experience with graph databases.

Summary

The landscape of pedagogy is changing. LLMs are empowering people everywhere to learn quicker, better and more productively. You can choose to embrace this change or be left behind. I choose the former. I hope you do too. 

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *