The suggestions from Bing Chat and other generative AI tools always take the given context into account. The AI usually uses neutral secondary sources such as trade journals, news sites, websites of associations and public institutions, and blogs as sources for the recommendations. The output of the generative AI is based on the determination of statistical frequencies. The more frequently words appear in the source data one after the other, the more likely it is that the word being searched for is the correct one in the output. Words that frequently appear together in the training data are statistically more similar or semantically more closely related.
Which brands and products are mentioned in a particular context can be explained by the way LLMs work.
How do Large Language Models (LLMs) work?
Modern transformer-based LLMs such as GPT or BARD are based on statistical evaluations of co-occurrences between tokens or words. For this purpose, texts and data are broken down into tokens for machine processing and positioned in semantic spaces using vectors. Vectors can also be whole words (Word2Vec), entities (Node2Vec) and attributes.
The semantic space is also called ontology in semantics. Since denmark cell phone number list LLMs are based more on statistics than semantics, they are not really ontologies. However, the amount of data allows AI to get closer to a semantic understanding.
Semantic proximity can be determined by Euclidean distance or cosine angle in semantic space.
Semantic Proximity in Vector Space
If an entity is frequently mentioned in connection with certain other entities or properties in the training data, there is a high statistical probability of a semantic relationship.
The method of this processing is called transformer-based natural language processing. NLP describes a process of transforming natural language into a machine-understandable form that enables communication between humans and machines. NLP consists of the areas of natural language understanding (NLU) and natural language generation (NLG).
The focus is on NLU when training LLMs and on NLG when outputting AI-generated results.
The identification of entities through Named Entity Extraction plays a special role both for the semantic understanding and for the meaning of an entity within a thematic ontology.
Due to the frequent co-occurrence of certain words, these vectors move closer together in the semantic space. The semantic proximity increases and the probability of belonging increases.
NLG outputs the results according to statistical probability.