Firstly, the term ‘generative’ implies the ability to generate or create, which is made possible by an invention called the Attention Transformer (Vaswani et al., 2017). This was utterly ground-breaking and deep learning was completely revolutionised.
For many years since then, Large Language Models (LLMs) were consuming, “a total data size exceeding 774.5 TB for pre-training corpora”, (Liu et al., 2024), and were handling the information in a completely new way to classical computers. OpenAI, ironically not, introduced ChatGPT (Generative Pre-Trained Transformer) in November 2022, and the consensus was amazement at its ability. Their LLM generated streams of text within seconds that resembled professional essays of expertise (See Fig 2.1).

Other modalities were emerging to make bespoke images from words, such as the diffusion models: MidJourney or DALL-E; also GANs (Generative Adversarial Networks) work best at duplicating images for higher quality. VAEs (Variational Autoencoders) can work with text, image audio and video using an encoder-decoder infrastructure. Google’s BERT (Bidirectional Encoder Representations from Transformers) excels at finding context in text (Abdullahi, 2024).

Currently, affordable software subscriptions are making this technology accessible to everyone. For example, Suno can take a prompt and auto-generate a published song within one minute, see Fig 2.2 (Suno, 2024).
Moving forward, GenAI software like SORA and VEO will confidently stride into the Box-office with hyper-realistic movies and soundtracks. Disney has been working at ‘replacing humans’ for a decade (Dams, 2022).
[250 words]
References
Abdullahi, A. (2024) Generative AI Models: A Complete Guide, eWEEK. Available at: https://www.eweek.com/artificial-intelligence/generative-ai-model/ (Accessed: 2 June 2024).
Dams, T. (2022) AI transforming movie production at Disney – with more to come, IBC. Available at: https://www.ibc.org/news/ai-transforming-movie-production-at-disney-with-more-to-come/9075.article (Accessed: 1 June 2024).
Liu, Y. et al. (2024) Datasets for Large Language Models: A Comprehensive Survey, arXiv.org. Available at: https://arxiv.org/abs/2402.18041 (Accessed: 1 June 2024).
SUNO (2024) Code to my Heart by @angelabevan, Suno.com. Available at: https://suno.com/song/7331f662-d7eb-4d8d-9019-b893626de621 (Accessed: 1 June 2024).
The Royal Institution (2023) ‘What is generative AI and how does it work? – The Turing Lectures with Mirella Lapata’, YouTube. Available at: https://www.youtube.com/watch?v=_6R7Ym6Vy_I (Accessed: 30 May 2024).
Vaswani, A. et al. (2017) Attention Is All You Need. Google Research. Available at: https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf (Accessed: 22 May 2024).
Further Reading
Anthropic (2024) Claude, Anthropic.com. Available at: https://www.anthropic.com/claude (Accessed: 29 May 2024).
Crouse, M. (2024) ChatGPT Gets an Upgrade With ‘Natively Multimodal’ GPT-4o, TechRepublic. TechRepublic. Available at: https://www.techrepublic.com/article/openai-next-flagship-model-gpt-4o/ (Accessed: 31 May 2024).
Meta AI (2024) Introducing Meta Llama 3: The most capable openly available LLM to date, Meta.com. Available at: https://ai.meta.com/blog/meta-llama-3/ (Accessed: 12 May 2024).
MusicLM (2024) Github.io. Available at: https://google-research.github.io/seanet/musiclm/examples/ (Accessed: 29 May 2024).
OpenAI (2024) Sora, https://openai.com. Available at: https://openai.com/index/video-generation-models-as-world-simulators/ (Accessed: 29 May 2024).
Team, G. and Google (2024) Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context . Available at: https://storage.googleapis.com/deepmind-media/gemini/gemini_v1_5_report.pdf (Accessed: 19 February 2024).