How to use LLMs as a learning tool for Data Engineers

There are varying opinions on LLMs. I take it as a disrupting technology – a tool that can be used, misused or misapplied. Disruptive technologies tend to be over-hyped in the short-term and under-appreciated over longer horizons (Read How Innovation Works if you want to know more). LLMs in pedagogy are most exciting to me. I attest to its efficacy.

One of the demands of Data Engineering is to learn new tech quickly. I want to show you how I use LLMs towards this end.

A Framework for Skill Acquisition

Imagine you’re a Data Engineer, tasked with deploying a Memgraph graph database for a new client. The database will serve as a model for fraud detection, for an insurance company.

The theme of self-evaluation (mentioned in my previous blogs) continues. The Dreyfus model is one methodology to use:

I’d be a Novice if I’ve only heard about graph databases, or played around with them for exploration; An Advanced Beginner has used graph databases as part of his/her professional life; a Competent person can use graph databases for the correct purpose and efficiently etc.

Here are some key observations:

There is no shame in being a novice in any field. I am a novice in many areas and view it as an exciting challenge to learn something new
There’s a difference between being an expert and thinking you’re one. I have to repeat the importance of self-honesty in your evaluation
Reaching a level of Competent is sufficient for most endeavours
Try to objectively evaluate your level of expertise in a field if you find self-evaluation challenging.

Enter LLMs

That is where LLMs come into play. Here is a PDF I generated with the prompts in this link:

Dreyfus_Model_Graph_Database_Self_Assessment

You can download the file here:

Dreyfus_Model_Graph_Database_Self_Assessment-1 Download

The ability to generate a bespoke self-evaluation framework – for what I’m trying to learn – already demonstrates the power of this approach.

Interactive Learning

After completing the self-evaluation framework, I’ve determined I’m at a Novice level. I ask the LLM to base its teaching on my current level. Here is the prompt I used:

I am at a Novice level. Teach me the basics of Graph DBs, using Memgraph. Use examples and links to references, where appropriate.

Basics of Graph Databases Using Memgraph for Fraud Detection in an Insurance Company

Key Concepts

Nodes: Represent entities such as customers, policies, claims, and agents.
Edges: Represent relationships such as ownership, submission, approval, and connections between entities.
Properties: Key-value pairs associated with nodes or edges.

Getting Started with Memgraph

Installation

To install Memgraph, you can follow the instructions on the official Memgraph Installation Guide.

Basic Operations

Starting Memgraph After installation, start Memgraph by running:

   mgconsole

Creating Nodes and Relationships

Create Nodes CREATE (c1:Customer {name: 'Alice', customerID: 'C001'}), (c2:Customer {name: 'Bob', customerID: 'C002'}), (p1:Policy {policyID: 'P001', type: 'Auto'}), (p2:Policy {policyID: 'P002', type: 'Home'}), (cl1:Claim {claimID: 'CL001', amount: 5000}), (cl2:Claim {claimID: 'CL002', amount: 15000}), (a1:Agent {name: 'Charlie', agentID: 'A001'}), (a2:Agent {name: 'Dana', agentID: 'A002'});
Create Relationships
cypher MATCH (c1:Customer {name: 'Alice'}), (c2:Customer {name: 'Bob'}), (p1:Policy {policyID: 'P001'}), (p2:Policy {policyID: 'P002'}), (cl1:Claim {claimID: 'CL001'}), (cl2:Claim {claimID: 'CL002'}), (a1:Agent {name: 'Charlie'}), (a2:Agent {name: 'Dana'}) CREATE (c1)-[:OWNS]->(p1), (c2)-[:OWNS]->(p2), (c1)-[:SUBMITTED]->(cl1), (c2)-[:SUBMITTED]->(cl2), (a1)-[:APPROVED]->(cl1), (a2)-[:APPROVED]->(cl2), (c1)-[:CONNECTED_TO {type: 'Friend'}]->(c2);

Querying Data

Retrieve Customers and Policies MATCH (c:Customer)-[:OWNS]->(p:Policy) RETURN c.name, p.policyID, p.type;
Retrieve Claims and Approval Agents
cypher MATCH (cl:Claim)<-[:SUBMITTED]-(c:Customer), (a:Agent)-[:APPROVED]->(cl) RETURN c.name, cl.claimID, cl.amount, a.name;

Updating Data

Update a Node MATCH (c:Customer {name: 'Alice'}) SET c.email = 'alice@example.com';
Update a Relationship
cypher MATCH (c1:Customer {name: 'Alice'})-[r:CONNECTED_TO]->(c2:Customer {name: 'Bob'}) SET r.since = 2022;

Deleting Data

Delete a Node MATCH (cl:Claim {claimID: 'CL002'}) DELETE cl;
Delete a Relationship
cypher MATCH (c1:Customer {name: 'Alice'})-[r:CONNECTED_TO]->(c2:Customer {name: 'Bob'}) DELETE r;

Examples

Example 1: Detecting Suspicious Claims

Create Nodes and Relationships

   CREATE (c1:Customer {name: 'Alice', customerID: 'C001'}),
          (c2:Customer {name: 'Bob', customerID: 'C002'}),
          (p1:Policy {policyID: 'P001', type: 'Auto'}),
          (p2:Policy {policyID: 'P002', type: 'Home'}),
          (cl1:Claim {claimID: 'CL001', amount: 5000, status: 'Pending'}),
          (cl2:Claim {claimID: 'CL002', amount: 15000, status: 'Approved'}),
          (a1:Agent {name: 'Charlie', agentID: 'A001'}),
          (a2:Agent {name: 'Dana', agentID: 'A002'});

   MATCH (c1:Customer {name: 'Alice'}), (c2:Customer {name: 'Bob'}),
         (p1:Policy {policyID: 'P001'}), (p2:Policy {policyID: 'P002'}),
         (cl1:Claim {claimID: 'CL001'}), (cl2:Claim {claimID: 'CL002'}),
         (a1:Agent {name: 'Charlie'}), (a2:Agent {name: 'Dana'})
   CREATE (c1)-[:OWNS]->(p1),
          (c2)-[:OWNS]->(p2),
          (c1)-[:SUBMITTED]->(cl1),
          (c2)-[:SUBMITTED]->(cl2),
          (a1)-[:APPROVED]->(cl1),
          (a2)-[:APPROVED]->(cl2),
          (c1)-[:CONNECTED_TO {type: 'Friend'}]->(c2);

Query to Identify Suspicious Claims

   MATCH (c1:Customer)-[:SUBMITTED]->(cl:Claim {status: 'Pending'}),
         (c2:Customer)-[:CONNECTED_TO]->(c1)
   RETURN c1.name, cl.claimID, cl.amount, c2.name;

Example 2: Identifying Potential Fraud Rings

Create Nodes and Relationships

   CREATE (c1:Customer {name: 'Alice', customerID: 'C001'}),
          (c2:Customer {name: 'Bob', customerID: 'C002'}),
          (c3:Customer {name: 'Eve', customerID: 'C003'}),
          (p1:Policy {policyID: 'P001', type: 'Auto'}),
          (p2:Policy {policyID: 'P002', type: 'Home'}),
          (p3:Policy {policyID: 'P003', type: 'Health'}),
          (cl1:Claim {claimID: 'CL001', amount: 5000}),
          (cl2:Claim {claimID: 'CL002', amount: 15000}),
          (cl3:Claim {claimID: 'CL003', amount: 25000}),
          (a1:Agent {name: 'Charlie', agentID: 'A001'}),
          (a2:Agent {name: 'Dana', agentID: 'A002'});

   MATCH (c1:Customer {name: 'Alice'}), (c2:Customer {name: 'Bob'}),
         (c3:Customer {name: 'Eve'}),
         (p1:Policy {policyID: 'P001'}), (p2:Policy {policyID: 'P002'}),
         (p3:Policy {policyID: 'P003'}),
         (cl1:Claim {claimID: 'CL001'}), (cl2:Claim {claimID: 'CL002'}), (cl3:Claim {claimID: 'CL003'}),
         (a1:Agent {name: 'Charlie'}), (a2:Agent {name: 'Dana'})
   CREATE (c1)-[:OWNS]->(p1),
          (c2)-[:OWNS]->(p2),
          (c3)-[:OWNS]->(p3),
          (c1)-[:SUBMITTED]->(cl1),
          (c2)-[:SUBMITTED]->(cl2),
          (c3)-[:SUBMITTED]->(cl3),
          (a1)-[:APPROVED]->(cl1),
          (a2)-[:APPROVED]->(cl2),
          (a1)-[:APPROVED]->(cl3),
          (c1)-[:CONNECTED_TO {type: 'Friend'}]->(c2),
          (c2)-[:CONNECTED_TO {type: 'Friend'}]->(c3);

Query to Identify Potential Fraud Rings

   MATCH (c1:Customer)-[:CONNECTED_TO]->(c2:Customer)-[:CONNECTED_TO]->(c3:Customer)
   WHERE (c1)-[:SUBMITTED]->(:Claim)<-[:APPROVED]-(a:Agent)-[:APPROVED]->(:Claim)<-[:SUBMITTED]-(c3)
   RETURN c1.name, c2.name, c3.name, a.name;

References

By understanding and using these examples, you can start working with Memgraph to detect fraud in an insurance company context. As you gain more experience, explore advanced topics such as graph algorithms for anomaly detection, real-time analytics

The response from Chat-GPT is already helpful but I can make it even more powerful by incorporating my frame of reference. Remember we need to use it in an insurance company for fraud detection:

Change the examples to be based on an Insurance company that wants to use Memgraph for fraud detection

Basics of Graph Databases Using Memgraph for Fraud Detection in an Insurance Company

Key Concepts

Nodes: Represent entities such as customers, policies, claims, and agents.
Edges: Represent relationships such as ownership, submission, approval, and connections between entities.
Properties: Key-value pairs associated with nodes or edges.

Getting Started with Memgraph

Installation

To install Memgraph, you can follow the instructions on the official Memgraph Installation Guide.

Basic Operations

Starting Memgraph After installation, start Memgraph by running:

   mgconsole

Creating Nodes and Relationships

Create Nodes CREATE (c1:Customer {name: 'Alice', customerID: 'C001'}), (c2:Customer {name: 'Bob', customerID: 'C002'}), (p1:Policy {policyID: 'P001', type: 'Auto'}), (p2:Policy {policyID: 'P002', type: 'Home'}), (cl1:Claim {claimID: 'CL001', amount: 5000}), (cl2:Claim {claimID: 'CL002', amount: 15000}), (a1:Agent {name: 'Charlie', agentID: 'A001'}), (a2:Agent {name: 'Dana', agentID: 'A002'});
Create Relationships
cypher MATCH (c1:Customer {name: 'Alice'}), (c2:Customer {name: 'Bob'}), (p1:Policy {policyID: 'P001'}), (p2:Policy {policyID: 'P002'}), (cl1:Claim {claimID: 'CL001'}), (cl2:Claim {claimID: 'CL002'}), (a1:Agent {name: 'Charlie'}), (a2:Agent {name: 'Dana'}) CREATE (c1)-[:OWNS]->(p1), (c2)-[:OWNS]->(p2), (c1)-[:SUBMITTED]->(cl1), (c2)-[:SUBMITTED]->(cl2), (a1)-[:APPROVED]->(cl1), (a2)-[:APPROVED]->(cl2), (c1)-[:CONNECTED_TO {type: 'Friend'}]->(c2);

Querying Data

Retrieve Customers and Policies MATCH (c:Customer)-[:OWNS]->(p:Policy) RETURN c.name, p.policyID, p.type;
Retrieve Claims and Approval Agents
cypher MATCH (cl:Claim)<-[:SUBMITTED]-(c:Customer), (a:Agent)-[:APPROVED]->(cl) RETURN c.name, cl.claimID, cl.amount, a.name;

Updating Data

Update a Node MATCH (c:Customer {name: 'Alice'}) SET c.email = 'alice@example.com';
Update a Relationship
cypher MATCH (c1:Customer {name: 'Alice'})-[r:CONNECTED_TO]->(c2:Customer {name: 'Bob'}) SET r.since = 2022;

Deleting Data

Delete a Node MATCH (cl:Claim {claimID: 'CL002'}) DELETE cl;
Delete a Relationship
cypher MATCH (c1:Customer {name: 'Alice'})-[r:CONNECTED_TO]->(c2:Customer {name: 'Bob'}) DELETE r;

Examples

Example 1: Detecting Suspicious Claims

Create Nodes and Relationships

   CREATE (c1:Customer {name: 'Alice', customerID: 'C001'}),
          (c2:Customer {name: 'Bob', customerID: 'C002'}),
          (p1:Policy {policyID: 'P001', type: 'Auto'}),
          (p2:Policy {policyID: 'P002', type: 'Home'}),
          (cl1:Claim {claimID: 'CL001', amount: 5000, status: 'Pending'}),
          (cl2:Claim {claimID: 'CL002', amount: 15000, status: 'Approved'}),
          (a1:Agent {name: 'Charlie', agentID: 'A001'}),
          (a2:Agent {name: 'Dana', agentID: 'A002'});

   MATCH (c1:Customer {name: 'Alice'}), (c2:Customer {name: 'Bob'}),
         (p1:Policy {policyID: 'P001'}), (p2:Policy {policyID: 'P002'}),
         (cl1:Claim {claimID: 'CL001'}), (cl2:Claim {claimID: 'CL002'}),
         (a1:Agent {name: 'Charlie'}), (a2:Agent {name: 'Dana'})
   CREATE (c1)-[:OWNS]->(p1),
          (c2)-[:OWNS]->(p2),
          (c1)-[:SUBMITTED]->(cl1),
          (c2)-[:SUBMITTED]->(cl2),
          (a1)-[:APPROVED]->(cl1),
          (a2)-[:APPROVED]->(cl2),
          (c1)-[:CONNECTED_TO {type: 'Friend'}]->(c2);

Query to Identify Suspicious Claims

   MATCH (c1:Customer)-[:SUBMITTED]->(cl:Claim {status: 'Pending'}),
         (c2:Customer)-[:CONNECTED_TO]->(c1)
   RETURN c1.name, cl.claimID, cl.amount, c2.name;

Example 2: Identifying Potential Fraud Rings

Create Nodes and Relationships

   CREATE (c1:Customer {name: 'Alice', customerID: 'C001'}),
          (c2:Customer {name: 'Bob', customerID: 'C002'}),
          (c3:Customer {name: 'Eve', customerID: 'C003'}),
          (p1:Policy {policyID: 'P001', type: 'Auto'}),
          (p2:Policy {policyID: 'P002', type: 'Home'}),
          (p3:Policy {policyID: 'P003', type: 'Health'}),
          (cl1:Claim {claimID: 'CL001', amount: 5000}),
          (cl2:Claim {claimID: 'CL002', amount: 15000}),
          (cl3:Claim {claimID: 'CL003', amount: 25000}),
          (a1:Agent {name: 'Charlie', agentID: 'A001'}),
          (a2:Agent {name: 'Dana', agentID: 'A002'});

   MATCH (c1:Customer {name: 'Alice'}), (c2:Customer {name: 'Bob'}),
         (c3:Customer {name: 'Eve'}),
         (p1:Policy {policyID: 'P001'}), (p2:Policy {policyID: 'P002'}),
         (p3:Policy {policyID: 'P003'}),
         (cl1:Claim {claimID: 'CL001'}), (cl2:Claim {claimID: 'CL002'}), (cl3:Claim {claimID: 'CL003'}),
         (a1:Agent {name: 'Charlie'}), (a2:Agent {name: 'Dana'})
   CREATE (c1)-[:OWNS]->(p1),
          (c2)-[:OWNS]->(p2),
          (c3)-[:OWNS]->(p3),
          (c1)-[:SUBMITTED]->(cl1),
          (c2)-[:SUBMITTED]->(cl2),
          (c3)-[:SUBMITTED]->(cl3),
          (a1)-[:APPROVED]->(cl1),
          (a2)-[:APPROVED]->(cl2),
          (a1)-[:APPROVED]->(cl3),
          (c1)-[:CONNECTED_TO {type: 'Friend'}]->(c2),
          (c2)-[:CONNECTED_TO {type: 'Friend'}]->(c3);

Query to Identify Potential Fraud Rings

   MATCH (c1:Customer)-[:CONNECTED_TO]->(c2:Customer)-[:CONNECTED_TO]->(c3:Customer)
   WHERE (c1)-[:SUBMITTED]->(:Claim)<-[:APPROVED]-(a:Agent)-[:APPROVED]->(:Claim)<-[:SUBMITTED]-(c3)
   RETURN c1.name, c2.name, c3.name, a.name;

References

The code examples include the creation of a Customer node. If you don’t know what a Node is, look at the conversation and ask more questions to the LLM.

MATCH (c1:Customer)-[:SUBMITTED]->(cl:Claim {status: 'Pending'}),
(c2:Customer)-[:CONNECTED_TO]->(c1)
RETURN c1.name, cl.claimID, cl.amount, c2.name;

Graph databases regard objects and the relationships between objects as equally important. If you read the line of code above, can you understand what it’s asking? Have a conversation with the LLM if you’re not sure.

As a point of departure, be sure to get your hands dirty:

Is there a playground I can use?

Yes, Memgraph provides a Memgraph Playground where you can practice and experiment with graph databases using Cypher queries. This online platform allows you to run queries, visualize data, and explore graph structures without the need to install anything locally.

Here are the steps to get started with Memgraph Playground:

Access the Playground

Go to Memgraph Playground.

Explore the Interface

Familiarize yourself with the user interface. You’ll see a query editor on the left and a results/visualization panel on the right.

Run Basic Queries

You can start by running some basic Cypher queries to create nodes, relationships, and retrieve data. For example: CREATE (c1:Customer {name: 'Alice', customerID: 'C001'}), (c2:Customer {name: 'Bob', customerID: 'C002'}), (p1:Policy {policyID: 'P001', type: 'Auto'}), (p2:Policy {policyID: 'P002', type: 'Home'}), (cl1:Claim {claimID: 'CL001', amount: 5000}), (cl2:Claim {claimID: 'CL002', amount: 15000}), (a1:Agent {name: 'Charlie', agentID: 'A001'}), (a2:Agent {name: 'Dana', agentID: 'A002'}); MATCH (c1:Customer {name: 'Alice'}), (c2:Customer {name: 'Bob'}), (p1:Policy {policyID: 'P001'}), (p2:Policy {policyID: 'P002'}), (cl1:Claim {claimID: 'CL001'}), (cl2:Claim {claimID: 'CL002'}), (a1:Agent {name: 'Charlie'}), (a2:Agent {name: 'Dana'}) CREATE (c1)-[:OWNS]->(p1), (c2)-[:OWNS]->(p2), (c1)-[:SUBMITTED]->(cl1), (c2)-[:SUBMITTED]->(cl2), (a1)-[:APPROVED]->(cl1), (a2)-[:APPROVED]->(cl2), (c1)-[:CONNECTED_TO {type: 'Friend'}]->(c2);

Visualize the Graph

After running the queries, you can visualize the graph structure in the results panel. This will help you see the connections and relationships between different entities.

Experiment and Learn

Use the playground to experiment with different queries, explore the dataset, and learn how to manipulate and query graph data using Cypher.

Resources

Feel free to explore and practice your skills in the Memgraph Playground to get hands-on experience with graph databases.

Summary

The landscape of pedagogy is changing. LLMs are empowering people everywhere to learn quicker, better and more productively. You can choose to embrace this change or be left behind. I choose the former. I hope you do too.

How to use LLMs as a learning tool for Data Engineers

A Framework for Skill Acquisition

Enter LLMs

Interactive Learning

Basics of Graph Databases Using Memgraph for Fraud Detection in an Insurance Company

Key Concepts

Getting Started with Memgraph

Installation

Basic Operations

Examples

Example 1: Detecting Suspicious Claims

Example 2: Identifying Potential Fraud Rings

References

Basics of Graph Databases Using Memgraph for Fraud Detection in an Insurance Company

Key Concepts

Getting Started with Memgraph

Installation

Basic Operations

Examples

Example 1: Detecting Suspicious Claims

Example 2: Identifying Potential Fraud Rings

References

Resources

Summary

How to get a job as a data engineer (part 2)

The most important skill you need as an Engineer today

How to prepare for a Data Engineer interview – the research phase

How to get a job as a Data Engineer (Part 1)

Leave a Reply Cancel reply

A Framework for Skill Acquisition

Enter LLMs

Interactive Learning

Basics of Graph Databases Using Memgraph for Fraud Detection in an Insurance Company

Key Concepts

Getting Started with Memgraph

Installation

Basic Operations

Examples

Example 1: Detecting Suspicious Claims

Example 2: Identifying Potential Fraud Rings

References

Basics of Graph Databases Using Memgraph for Fraud Detection in an Insurance Company

Key Concepts

Getting Started with Memgraph

Installation

Basic Operations

Examples

Example 1: Detecting Suspicious Claims

Example 2: Identifying Potential Fraud Rings

References

Resources

Summary

Similar Posts

Leave a Reply Cancel reply