Understanding ChatGPT prompt engineering through plug-ins manifests

Published a year ago by sai @ pretzelbox
ChatGPT plug-ins are an amazing way to learn more about prompt engineering. Specifically, each plugin exposes a field called description_for_model.

My sense is that the description_for_model corresponds to system messages that we include with API calls to the model. Both set up the context to guide the chat session.

Data Extraction

For this analysis, I looked at the extant 127 ChatGPT plugins. To quickly get the manifests of all 127 plugins, I followed the steps listed in this reddit post.

For Firefox (Chrome should be mostly the same):
  1. Open the ChatGPT website
  2. Press F12 and open the Network tab
  3. Open the plugins store in the same tab
  4. On the Developer menu in the network tab, search "p?" and right click the one at the bottom
  5. Select Copy Value > Copy Response then paste somewhere you can read the text

This gave me a 274kb JSON object containing an array of plugins with the following keys

[
  'id',
  'domain',
  'namespace',
  'status',
  'manifest',
  'oauth_client_id',
  'user_settings',
  'categories'
]


Of these keys, the most interesting key for analysing prompts sits inside manifest > description_for_model.

I first extracted the description_for_model of all 127 plugins yielding an array of 127 strings. Then, I got the length of each string to get an array of 127 numbers.

Statistical Analysis

I then asked ChatGPT4 to suggest some statistical analysis I could do on this array. It came back with the following five high level options.
  1. Basic statistical analysis (mean, median, mode, range, standard deviation, etc.)
  2. Frequency distribution
  3. Percentiles
  4. Outlier detection
  5. Visualization (histogram, box plot, etc.)
Wolfram Alpha
Luckily, since ChatGPT comes with the Wolfram Alpha plugin, I was able to throw this back to ChatGPT after enabling the Wolfram plugin.

In response, ChatGPT sent my array of 127 numbers to Wolfram and within seconds, came back with the following insights.

Basic Statistical Analysis:
  1. Mean: 820.53 - average description_for_model length
  2. Median: 280 - 50% of description_for_model are longer than 280 characters
  3. Mode: 60, 192, 98, 155, 104 (These numbers appear twice in the data) - not very useful imho
  4. Range: 16 - 7881 - that's a massive range in sizes
  5. Standard Deviation: 1345.78 - this means that 68% of the description_for_model strings are within 1345 characters in length from the median of 820. if the lengths were to be shown on a line, you'd see them well dispersed. This means that plugin developers still don't know the optimal length for a description_for_model
Frequency Distribution:
The frequency of each number in the list is mostly 1, except for the numbers 60, 192, 98, 155, 104 which appear twice. Not very useful information, imho.

Percentiles:
25th percentile (Q1): 117
50th percentile (Q2/Median): 280
75th percentile (Q3): 984

See the massive jump from 280 to 984? 

This means that about half the plugin developers are not taking their description_for_model string seriously. In other words, they are half-assing their only existing tool for AI Engine Optimization.

Outlier Detection:
The outliers in the data are: 5325, 2352, 2652, 6924, 7881, 4707, 4211, 2477, 3797, 3603, 5381, 3015. 

These numbers are significantly different from the rest of the data. Not very useful information, imho.

Visualizations:
Histogram
78db8fd6-8605-41eb-a153-38d3a20163ed.png

Box-Plot

f1f9c979-a775-457a-a958-faa2fde09643.png

Again, these visualizations are not super valuable imho. They are interesting to me for the ease with which I was able to generate them without fiddling around with Excel or Google Worksheets.

Conclusion
There are quite a few conclusions to be drawn from this exercise.
  1. Plugin developers have not coalesced around an optimal length for description_for_model
  2. ChatGPT along with plugins (like Wolfram Alpha) open up an entirely new range of workflows which remove the drudgery of becoming Excel masters. Personally, I think ChatGPT's strategy of opening up their language model to plugins makes them a far more compelling product than Google or Bing Search. These two companies are right now caught in the classic innovator's dilemma - should they protect their search business or should they pivot and open up their business? Say what you will about Mark Zuckerberg, his pivot towards the Metaverse showed courage and chutzpah that Google would be well advised to emulate. Can they?
  3. It took me less than an hour to go from looking at plugin data to writing this post all thanks to ChatGPT. This thing is a force multiplier in the right hands.



Attachments