Home > Articles

This chapter is from the book

1.7 Applying Generative AI to an example problem domain

Here is the problem we will investigate:

  • The problem: The business problem is to keep a list of employees for a company that has several departments.

We’re going to keep it very simple, and we are not going to ask for persistence via some sort of database or any sort of user interface. We just want to see how the AI handles the key business concepts. We will generate Python because Python code is easy to read compared to many other languages. In the rest of this book, we will usually work at a level of abstraction that is higher than code but is precise enough to generate code. This would be an unnecessary distraction right now, so we will go straight to code. We will not define a level of abstraction, because we want to empirically identify the highest level of abstraction appropriate for code generation.

1.7.1 Generating a class

Let’s start right at the top of the abstraction tree by seeing whether Copilot knows enough about the business concept of an employee to generate some code. Copilot has the settings Creative, Balanced, and Precise. We set Copilot to Precise. The conversation is shown in Figure 1-3.

Figure 1-3 Employee 1.1 conversation

This Employee class is quite plausible, but is it consistently generated? If we clear the conversation and run the query again, we get a different answer, as shown in Figure 1-4.

Figure 1-4 Employee 2.1 conversation. We get a different answer.

If we clear the conversation and try it again (Employee 3.1, not shown), we get back to Employee 1.1. There seem to be at least two possible answers from Copilot, and it appears to be random which answer we get.

Despite this, Copilot has achieved quite a lot based on virtually zero information. It has

  1. Recognized that “employee” is a business concept that in object-oriented languages needs to be realized as a class, and it has reified this to an Employee class

  2. Come up with a plausible set of data that might be associated with an employee (name, age, and salary) and reified that data as attributes of the Employee class

  3. Asked us if there is anything else we would like to add to the Employee class or anything specific that we would like the code to do

  4. Suggested several follow-on prompts that sound wildly optimistic

The methods that Copilot has chosen to generate are strange. In Employee 1.1 we got the Python __str__(self) method. This is a Python “magic” or “dunder” (double underscore) method, which is a standard Python method that should return a String representation of an instance of the class. We also got a business method, give_raise(self, amount), that adds an amount to the salary attribute, which is wrong (see later). In Employee 2.1, we got methods to get the values of the attributes (get_name(self), etc.), but we did not get the corresponding methods to set the values of the attributes.

In both cases, the Suggested prompts are deeply concerning. In Employee 1.1, we got the following prompts.

  • Add a method to calculate the bonus.

  • Add a method to calculate the tax.

  • Add a method to calculate the net pay.

And in the second case, we got these prompts.

  • Add a method to give a raise.

  • Add a method to calculate years until retirement.

  • Add a method to display employee information.

It all sounds too good to be true, and indeed it is far, far too good to be true. Let’s return to conversation Employee 1.1 (Figure 1-3) and ask Copilot for a method to calculate tax. This is shown in conversation Employee 1.2 (Figure 1-5).

Figure 1-5 Employee 1.2 conversation. Copilot hallucinates a tax algorithm.

The calculate_tax(self) method is a dangerous fiction. In nlp terms, it is a generalization. We all know that tax rates vary and that tax can be very hard to calculate, and yet it is generalized to a hard-coded rate of 0.3. As well, taxes would never be calculated in an Employee class but would be calculated in a separate accounting system that was properly audited.

Now we see that we can “Add a method to calculate pension.” Okay—we can’t resist it, let’s go down the rabbit hole (Figure 1-6).

Figure 1-6 Employee 1.3 conversation. Down the rabbit hole.

Once again, the method is a dangerous fiction for much the same reason as calculate_tax(self) is. We see that pension calculation is ridiculously naive with the pension_rate hard-coded at an arbitrary value (a generalization). Also, pensions are usually handled in their own system.

The first line of both Employee 1.1 and Employee 2.1 is interesting because in both cases it says “Sure! Here’s a simple Python class that represents an employee:”. The use of the word simple implies that we can ask for a more detailed answer. We will clear the conversation and try a new prompt, as shown in Figure 1-7.

Figure 1-7 A more detailed answer

We find that the Employee has now gained a position and a bonus as well as methods that promise to calculate tax, pension, bonus, net pay, and vacation days. We think this is a good example of what is commonly referred to as AI hallucination (a distortion). Because Copilot is operating in an information vacuum, yet we keep asking more of it, it has turned inward and is essentially guessing what an Employee class might look like based on some internal representation in the neural net that we can’t examine. This is both useful, as a source of ideas, and dangerous, as a source of truth.

Each of these new business methods has been implemented using a process of generalization. Copilot has given us what is the most probable implementation of these methods based on its huge training set. We are given general solutions to specific problems that probably satisfy no one.

We know that the term hallucination is not really accurate when applied to an AI, but it is surprisingly similar in effect to human hallucination, which is a kind of trance in which attention becomes internally directed and information is internally generated with little or no reference to external reality. We also must remember that Generative AIs such as Copilot are prediction engines based on Large Language Models. Given an input, the engine will predict the most probable response based on its enormous training data set. This response may or may not be reasonable, sensible, useful, or factually correct.

This leads us to state our first and most important principle of Generative Analysis for Generative AI:

  • If you don’t say exactly what you want, you will get what you are given.

Recognizing these problems, it is tempting to get into a dialogue with the AI to get it to fix the code to our satisfaction. This is certainly possible, but we think it is generally a bad idea because it is too easy to elicit hallucinatory results. Also, it is quite a slow and laborious technique, and it is much easier to just be very specific about what we want. This is the approach we always take in Generative Analysis. However, as Generative AIs advance, and certainly if we ever achieve AGI, then an approach based on dialogue might become feasible and even preferable.

How can we be more specific? We need to specify the attributes that we want the employee to have. We should also specify methods, but we will put this aside for now. Consider conversation Employee 4.1 (Figure 1-8).

Figure 1-8 Being more specific gets us what we want.

By being very specific, we got exactly what we wanted. Two out of three of the Suggested prompts are also quite reasonable.

  • Add a method to calculate the annual salary: This is plausible, but dangerous because it assumes that the “salary” is not already a yearly salary. Is it then weekly, monthly, quarterly, or something else? We don’t know until we ask the AI to generate the code.

  • Add a method to print the employee’s full name: This is just a simple Python print method that should work okay.

  • Add a method to change the employee’s job title: This is just a simple setter method for the job title. It should be okay.

The Generative Analysis approach to Suggested prompts is to take them as useful suggestions for things we might want to add into our model, but to avoid using them directly. However, it is always fine to try them out to see what they do and then incorporate that into your model or analysis activity in some other way if you like it. Our approach is pragmatic because Suggested prompts that promise to deliver interesting business semantics generally deliver fictions, whereas prompts that are merely about Python plumbing aren’t that useful and just suck you into a conversational rabbit hole.

1.7.2 Generating a model

Now that we can generate a class, we need to generate the whole model. Suppose we have the following fragment of a detailed analysis document:

  • “A company has many departments, and each department employs one or more employees. A company has a name, address, email address and vat number. A department has a name and a unique identifier. An employee has a first name, a last name, a unique identifier, an address, an email address a salary and a job title.”

This is much more specific, so we should be able to generate some decent code from it, as shown in Figure 1-9.

Figure 1-9 Generating Python code from a precise narrative

That is so much better! By being specific, we have bypassed Copilot’s tendency to hallucinate, and it has given us pretty much what we want. We have a Company that has zero or more Departments, and we have a Company method to add a new Department. Each Department has zero or more Employees, and Department has a method to add Employees. The attributes for each class are just what we asked for. Furthermore, the Suggested prompts are now entirely reasonable because they relate directly to the structure of the model rather than to hallucinatory business semantics.

However, the Generative AI has omitted a business rule. Read that input prompt again, and then look at the code. Can you spot the missing rule?

We stated that “a department employs one or more employees.” However, this business rule is not enforced in the generated Python code. It has been ignored entirely and has not even been captured as a comment. This is a clear case of deletion.

Let’s now add some requirements related to finding employees expressed as “shall” statements. We will look at how to formulate these in a later chapter.

  1. A company shall be able to return a list of its employees.

  2. A company shall be able to find an employee by name.

  3. A company shall be able to find an employee by unique identifier.

We can just append these requirements to our existing prompt (Figure 1-10).

Figure 1-10 Adding some requirements

Given how little effort it required to input the necessary information, this is not a bad result. It captures the gist of the problem in Python. We have Company, Department, and Employee classes with exactly the attributes we specified. A Company has zero or more Departments, and each Department has zero or more Employees, so the correct relationships are in place. We have also generated two business methods on Company to find an Employee by full name or identifier.

Before we close this example, the Suggested prompts at the end of Company 2.1 are very intriguing, so let’s see what happens by continuing the conversation (Figure 1-11).

Figure 1-11 Generating example code

The generated code shows how to use the Company, Department, and Employee classes in a short program. The Suggested prompts offer to generate even more example code. Note that this is part of the same conversation. If we were to start a new conversation, Copilot would forget all about our Company example.

1.7.3 Generating UML

In the example above, we specifically looked at generating Python code, and we expect that code generation will be a primary use case in most software engineering projects. However, as much as we love Python, we would very much like to work at a higher level of abstraction. Although the level is higher, it will still be precise enough to generate code when needed. We really want to work at the level of UML models.

Here is our first attempt. We just ask Copilot for a UML model and see what we get (Figure 1-12).

Figure 1-12 An ASCII graphics UML class diagram generated by Copilot.

Just asking for a UML model sort of works but is not useful. First, the diagram is in ASCII graphics! This is quite fun, but it is not fit for purpose. Even worse, the diagram is wrong. The relationship between Department and Employee states in the diagram that a “Department has many (*) Employees, and each Employee works for one (1) Department.” However, the specification clearly states that a department has one or more (1..*) employees, which we can break down into the following atomic business rules.

  • Business rule: Each department shall have at least one employee.

  • Business rule: Each department may have more than one employee.

If we use the “Can you explain the diagram?” prompt, Copilot doubles down on this error (the error appears in dark gray shading in Figure 1-13).

Figure 1-13 Insisting on the error

We will show how to fix this multiplicity error shortly.

1.7.3.1 What about XMI?

UML has a standard XML textual representation called XMI (XML Metadata Interchange) format, and we can generate XMI by simply replacing “Generate UML” in Company 3.1 with “Generate XMI.” Can this solve our problem? No. The result is an abject failure, and we will not bother to show the details here. The generated XMI has syntax errors and will not load into any of the UML tools we have access to. XMI is, in principle, human readable, but in practice this is only with great difficulty, and different vendors have slightly different flavors of XMI, so fixing the syntax errors just isn’t worth it. None of this is surprising. While we can expect there to be a lot of Python code in the Generative AI training set, few developers use XMI, so there must be hardly anything there to work with.

In our opinion, XMI is one of the more problematical aspects of UML. It was designed as an import/export format for UML models so that there could be interoperability between UML modeling tools. As anyone who has ever tried to use it will tell you, this is a great idea in principle, but in practice it just doesn’t work. Each vendor seems to have their own flavor of XMI that is subtly (or sometimes not so subtly) incompatible with everyone else’s. And no matter whom you ask, the incompatibility is always the fault of the other party. The truth of the matter lies buried somewhere in the pages of the XMI standard, but good luck finding it.

Part of the problem is that XMI is a very complex and heavyweight import/export format. Even a simple UML class diagram generates pages of XMI because the whole underlying UML metamodel is exported. We think that UML urgently needs a lightweight import/export format that is human and Generative AI readable, and that XMI should be abandoned as unfit for purpose as we move forward into an AI-assisted future.

1.7.3.2 PlantUML

The solution to our UML generation problems is PlantUML.

PlantUML generates UML diagrams (not models!) from a simple textual representation. We explain the difference between the diagrams and models in considerable detail in UML 2 and the Unified Process [Arlow 1]. This immediately makes it much simpler than XMI. Also, PlantUML is used in the Microsoft GitHub code repository, so there is a decent amount of PlantUML code available.

Overall, Copilot generates PlantUML code very well, but it requires a small amount of prompt engineering, as we will explain. Let’s go back to our Company example and update it to generate PlantUML (Figure 1-14).

Figure 1-14 Generating a class diagram in PlantUML

You can see that the PlantUML code is quite readable, and there is excellent documentation on the PlantUML website should you want to create it or edit it yourself. To view the diagram, we need a PlantUML viewer. There are many options available, but we like the web-based viewer PlantText.

The generated class diagram is shown in Figure 1-15.

FIGURE 1-15

Figure 1-15 UML class diagram for our Company model

This is just what we want, but we had to do a bit of prompt engineering to get it.

If you look at the prompt in Company PlantUML 4.1 (Figure 1-14), we have broken the prompt down into propositions (PN) and requirements (RN), as follows.

  • P1: 1 company has 0..* departments.

  • P2: 1 department employs 1..* employees.

  • P3: A company has a name, address, email address and vat number.

  • P4: A department has a name and a unique identifier.

  • P5: An employee has a first name, last name, unique identifier, address, email address, salary, and a job title.

  • R1: A company shall be able to return a list of its employees.

  • R2: A company shall be able to find an employee by name.

  • R3: A company shall be able to find an employee by unique identifier.

This is a form that we have found always works very well with Generative AI. Propositions P1 and P2 are about relationships between things. P3, P4, and P5 are ontological statements about what things exist, and R1, R2, and R3 are requirements for the behavior of those things. You can generally put these things in any order. We will discuss propositions and requirements in much greater detail in Chapter 3.

The relationship propositions, P1 and P2, are stated in a very particular way. We have found that the only way to get PlantUML to get the multiplicities right on the relationships is to embed them in the prompt in UML syntax as shown. Although the wording “1 company has 0..* departments” is a bit clumsy, it is still clear enough that anyone can understand what it means, and it generates the correct PlantUML code. Unfortunately, statements such as “A company has zero or more departments” typically give the wrong multiplicities. However, if you want slightly better readability, then we find statements such as “One (1) company has many (0..*) departments” will also work.

If we put the engineered prompt Company PlantUML 4.1 back into Copilot and ask for Python instead of PlantUML, the generated code still does not enforce the 1 to 1..* business rule between Department and Employee. This rule is not even noted as a comment in the code. Sometimes these multiplicities represent very important business rules (as we will see in Chapter 7), and it is disturbing that they can be lost so easily.

Once we have some satisfactory Generative AI output, such as PlantUML or Python code, we can feed it back into the AI to generate a narrative, as shown in Figure 1-16.

Figure 1-16 Generating a narrative

Notice that the 1 to 1..* business rule between Department and Employee has been stated correctly in the narrative.

When we ask Copilot to extract a list of propositions from the PlantUML code, it doesn’t know what we mean. However, we can get it to extract a list of requirements from the PlantUML, as shown in Figure 1-17.

Figure 1-17 Extracting requirements

This list is useful, but notice that once again the 1 to 1..* business rule between Department and Employee has not been captured, even though it was explicit in the PlantUML code and also appeared in the generated narrative. This is a serious issue that we need to monitor. We also notice that the terminology is not exact—we have both “employees” and “Employees” in the above requirements.

We have seen in this section that we can generate accurate class diagrams from a precise narrative using a little bit of prompt engineering. We have also seen that we can generate narratives and requirements. However, Copilot is prone to deletions, and a key business rule, the 1 to 1..* business rule between Department and Employee, seems to come and go. The lesson from this is that we need to check the outputs of Generative AI very carefully indeed. This leads us to another Generative Analysis principle:

  • Generative Analysis Principle

  • Never trust Generative AI. Check everything!

In fact, this is specialization of a more general Generative Analysis principle that we call our first X Files principle:

  • X Files Principle

  • Trust no one.

Generative Analysis takes it as axiomatic that all information is to be distrusted until it has been analyzed. We discuss this in much more detail later. The good news is that our Second X Files principle is as follows:

  • Second X Files Principle

  • The truth is out there.

We also take it as axiomatic that through analysis and research, we can always get to the truth—at least in the restricted world of software engineering.

1.7.3.3 UML models and Generative AI

We have seen above that we can take a precise enough narrative and use it as a prompt to generate Python code, UML class diagrams, and UML requirements. Later in the book we will demonstrate that such a narrative can be used to create many kinds of UML artifacts, different types of code, databases, documentation, and even simulations. Thus, in a Generative AI–assisted analysis approach, the UML model loses a lot of its attraction. As it stands now, we can only get AI-generated artifacts into a UML model manually via transcription. Similarly, once the artifacts are in the UML model, we can only get information out to create prompts to use with Generative AI manually via reverse transcription. This is entirely unsatisfactory, and we hope that UML tool vendors will address this issue sooner rather than later.

Because sufficiently precise narratives can be used as prompts to generate code and UML artifacts, the narrative begins to take center stage as the main “source of truth” in the software development project. The implications for the UML model are that it will be incomplete and possibly inconsistent because some of the UML artifacts will only exist as generated diagrams outside of the model itself. We now have a complicated picture where the “source of truth” in the project is a combination of the UML model, the diagrams, and the narratives. Presently, these things are not well integrated, but we are sure that over time they will be. Figure 1-18 is a mind map that shows some of the pros and cons of UML models versus precise narratives as sources of truth.

FIGURE 1-18

Figure 1-18 Sources of truth

Fortunately, Generative Analysis already handles this complex situation well because it is predicated on Literate Modeling (which we discuss in detail in Chapter 7 and [Arlow 3]).

A Literate Model comprises a narrative written about, and directly referencing, a UML model. The aim is to make the information encrypted in the UML model available to the largest possible number of stakeholders, even those who do not know UML, via a precise human-readable narrative. Let’s ask Copilot to give us a quick summary of Literate Modeling (Figure 1-19).

Figure 1-19 Copilot explains Literate Modeling.

It has always been the case that a precise Literate Model narrative is virtually interchangeable with the UML model itself. It is only a small step to refine these narratives to a sufficient degree so that Generative AI can generate UML and other artifacts directly from them. We will demonstrate this many times throughout the rest of the book. This new, AI-assisted object modeling process now works as follows.

  • Express the object model as precise narratives that can either be used directly as Generative AI prompts or be turned into Generative AI prompts with small modifications. We will find that the Literate Modeling style of narrative, which always reduces to a string of propositions and requirements, is ideal for this.

  • Generate UML diagrams, code, and other artifacts directly from the narratives as needed.

  • Use a UML modeling tool to capture, by transcription, the most important UML artifacts if that is deemed necessary.

  • Use a UML modeling tool to create UML artifacts that can’t be generated.

In this new world, rather than seeking some level of completeness in the UML model, we take the pragmatic approach that this is no longer necessary. In fact, this has already been the case in many software engineering projects for quite some time. Creation of a UML model has, in many cases erroneously, been seen as an unnecessary overhead. Now, with Generative AI, provided we have precise narratives supported by UML models where necessary, we have an adequate source of truth and we can generate many other artifacts as needed.

In this new world of Generative AI–assisted analysis, UML becomes more a matter of visualization than modeling because the “model” is now distributed between UML and precise narratives. The new “source of truth” is therefore the Literate Model because it naturally combines these two things in a precise manner.

1.7.4 What have we learned from the example?

To get good code and UML generation from Generative AI, we have learned that we need to be very precise.

  1. First, we need to specify an ontology, the things that exist.

    1. We need to specify, for each thing, the attributes we want it to have.

    2. We need to specify, for each thing, the business methods we want it to have.

  2. Then, we need to specify the relationships between the things.

We also need to be very critical.

  • We need to examine the generated output very carefully because it is likely to contain errors.

  • Suggested features are likely to be wrong or inappropriate. However, they can provide useful input into the modeling process.

  • Generative AI is not good at enforcing business rules expressed as multiplicities.

The simple Company example illustrates the point we have made several times since the start of this book: The level of abstraction for our model (in this case, some text) is significantly lower than the average business analysis document because we need to be very precise and detailed about the ontology and relationships, right down to the attribute and method levels. Similarly, the level of abstraction is very much higher than for a Python program because Copilot quite successfully fills in many of the Python coding details, leaving us to concentrate on the big picture. Remember:

  • If you don’t say exactly what you want, you will get what you are given.

It is gratifying to us that the prompts that gave decent code generation results look like fragments of the Literate Models we introduced in Enterprise Patterns and MDA that were created using our initial ideas on Generative Analysis. It appears that we managed to nail the level of abstraction pretty well. This isn’t surprising, because the level of abstraction was designed to be precise enough for code generation.

InformIT Promotional Mailings & Special Offers

I would like to receive exclusive offers and hear about products from InformIT and its family of brands. I can unsubscribe at any time.