There and back again...
This deck is a work in progress…
and always will be
@misc{a-tour-of-genai-jgalego,
title = {A Tour of GenAI},
author = {Galego, João},
howpublished = \url{jgalego.github.io/GenAI},
year = {2023}
}
The slides were created using reveal.js
and the presentation is hosted on GitHub Pages
Just open an issue/PR for this project
github.com/JGalego/GenAI― Terry Pratchett, Lords and Ladies (1992)
Attention Is All You Need introduces the
Transformer
architecture
Previous seq2seq models were
SLOW 🐌 and FORGETFUL 🤔
Transformers
have driven draw boundaries in data space
describe how data is placed
throughout the data space
Since the model is probabilistic,
we can just sample from it to create new data
Signal $\rightarrow$ … $\rightarrow$ Noise
Noise $\rightarrow$ … $\rightarrow$ Signal
Théâtre d'Ópera Spatial (Midjourney + Gigapixel AI)
Learn how Runway helped create the rock scene 🪨 in
'Everything Everywhere All at Once.'
Learning to follow instructions
from human preferences
and what's coming next?
📈
Stable Diffusion accumulated
40k
stars
on GitHub in its first 90
days
ChatGPT reached the 1M
users mark
in
just 5
days
# 1
similar to what of a human would produce
Which painting was generated with AI?
A
, you're in big trouble…or was it the other way around? 🤔
# 2
or FMs for short
or LLMs for short
0th-order approximation | XFOML RXKHRJFFJUJ ALPWXFWJXYJ FFJEYVJCQSGHYD QPAAMKBZAACIBZLKJQD |
1st-order approximation | OCRO HLO RGWR NMIELWIS EU LL NBNESEBYA TH EEI ALHENHTTPA OOBTTVA NAH BRL |
2nd-order approximation | ON IE ANTSOUTINYS ARE T INCTORE ST BE S DEAMY ACHIN D ILONASIVE TUCOOWE AT TEASONARE FUSO TIZIN ANDY TOBE SEACE CTISBE |
1st-order word approximation | REPRESENTING AND SPEEDILY IS AN GOOD APT OR COME CAN DIFFERENT NATURAL HERE HE THE A IN CAME THE TO OF TO EXPERT GRAY COME TO FURNISHES THE LINE MESSAGE HAD BE THESE |
2nd-order word approximation | THE HEAD AND IN FRONTAL ATTACK ON AN ENGLISH WRITER THAT THE CHARACTER OF THIS POINT IS THEREFORE ANOTHER METHOD FOR THE LETTERS THAT THE TIME OF WHO EVER TOLD THE PROBLEM FOR AN UNEXPECTED |
Source: Adapted from Shannon & Weaver(1963)
Prompts can be as simple as an instruction/question,
or as complex as huge chunks of text.
You can tell it who to be (role
),
what it needs to know (context
)
what to do (task
) and how (instructions
),
what it should avoid (constraints
)
or you can show it what to do and how (examples
)
There are no rules!
Assign a role to the model
to provide some context
Show the model a few examples
of what you want it to do
Talking a model into doing
what you want it to do
Talking a model into doing
something it's not supposed to
(either good or bad)
― Andrej Karpathy, AI Researcher
# 3
Productivity (Text Generation) 💬
Chat (Virtual Assistant) 💁
Summarization (Text Extraction) 📖
Search 🔎
Code Generation 👨💻
Music Creation 🎶
Video Editing 🎥
txt2img
)img2txt
)img2img
)txt2gif
)txt2video
)txt2code
)"(...) for a fixed task and a fixed model family, the researcher can choose a metric
to create an emergent ability or choose a metric to ablate an emergent ability"
"Every industry that requires
humans to create original work (…)
is up for
reinvention."
A causally impossible scientific theory
just a prompt away
Philosophically speaking,
Bullshit
Any statement produced without
particular concern for reality and truth
Human evaluators review the model's responses and pick the most appropriate for the users' prompts
Direct Policy Optimization (DPO) bypasses both explicit reward estimation and RL and optimizes the language model directly using preference data.
Identify hallucinated content and
use it during training
Often overlooked, these techniques
can help alleviate overfitting
Temperature regulates the randomness
or creativity of the responses
Using CoT prompting we can improve a model's ability to perform complex reasoning
Provide access to relevant data from a knowledge base
Treat the task as a search problem grounded in data
takes only a few hundred 'poisoned apples' 🍎☠️🤢
Attack that exploits the vulnerabilities of LLMs,
by manipulating their prompts
llama.cpp
- initial releaseRounding off one data type to another
int8
absmax quantizationIntroduces a number of innovations incl. Double Quantization
and Paged Optimizers
that save memory without compromising performance
Code Llama
👨💻🦙No, and we need a new definition of open
Yes (citation needed)
1 LLM 💬 + 1 GPU ⚡🌱 + 1 day ⏳
🐧
For more information, visit aws.amazon.com/generative-ai
AWS supports GenAI in all layers of the stack
Let's start by looking at the bottom layer…
* Read the fine print!
Hardware 💻
Health Checks 👨⚕️⚠️
Orchestration 🎻🎶
Data 💾
Scale 📈
Cost 💰
Trn1/Trn1n
InstancesTrn1
InstancesThe 'Secret Sauce' behind AWS's success
pcluster
pcluster create-cluster -f config.yaml ...
Prepare Data
LAION-5B
Parquet files with SageMaker Processing Jobs
LAION-5B
images and text pairsTrain Model
Evaluate Model
Deploy Model
Run the math! 🧮
Transformer FLOPS equation:
$6 \times \# parameters \times \# tokens$
Training
Transformer Math 101 by EleutherAIInference
Transformer Inference Arithmetic by KipplyLook at the whole system
Choose the right "GPU instance" not just the right "GPU"Inf2
on Amazon SageMakerPre-trained models for each use case
Easy to customize + manage models at scale
Data is kept secure and private on AWS
Responsible AI support across ML lifecycle
Fully integrated with Amazon SageMaker
API-level access to FMs
For more information, visit
aws.amazon.com/bedrock
Build apps faster and more securely
with an AI coding companion
Use Cases, Patterns & Solutions
🛠️
since they treat vectors as first class citizens
A numerical representation of a piece of information
What if you had the embeddings of ALL of Wikipedia?
pgvector
extensionCREATE TABLE test_embeddings(product_id bigint, embeddings vector(3) );
INSERT INTO test_embeddings VALUES
(1, '[1, 2, 3]'), (2, '[2, 3, 4]'), (3, '[7, 6, 8]'), (4, '[8, 6, 9]');
SELECT product_id, embeddings, embeddings <-> '[3,1,2]' AS distance
FROM test_embeddings
ORDER BY embeddings <-> '[3,1,2]';
/*
product_id | embeddings | distance
------------+------------+-------------------
1 | [1,2,3] | 2.449489742783178
2 | [2,3,4] | 3
3 | [7,6,8] | 8.774964387392123
4 | [8,6,9] | 9.9498743710662
*/
pgvector
extension# Adapted from
# https://github.com/aws-samples/aurora-postgresql-pgvector/tree/main/apgpgvector-streamlit
import streamlit as st
from dotenv import load_dotenv
from PyPDF2 import PdfReader
from langchain.embeddings import HuggingFaceInstructEmbeddings
from langchain.llms import HuggingFaceHub
from langchain.vectorstores.pgvector import PGVector
from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationalRetrievalChain
from htmlTemplates import css, bot_template, user_template
from langchain.text_splitter import RecursiveCharacterTextSplitter
import os
# Load PDFs and split them into chunks
def get_pdf_text(pdf_docs):
text = ""
for pdf in pdf_docs:
pdf_reader = PdfReader(pdf)
for page in pdf_reader.pages:
text += page.extract_text()
return text
def get_text_chunks(text):
text_splitter = RecursiveCharacterTextSplitter(
separators=["\n\n", "\n", ".", "!", "?", ",", " ", ""],
chunk_size=1000,
chunk_overlap=200,
length_function=len
)
chunks = text_splitter.split_text(text)
return chunks
# Load the embeddings into Aurora PostgreSQL DB cluster
CONNECTION_STRING = PGVector.connection_string_from_db_params(
driver = os.environ.get("PGVECTOR_DRIVER"),
user = os.environ.get("PGVECTOR_USER"),
password = os.environ.get("PGVECTOR_PASSWORD"),
host = os.environ.get("PGVECTOR_HOST"),
port = os.environ.get("PGVECTOR_PORT"),
database = os.environ.get("PGVECTOR_DATABASE")
)
def get_vectorstore(text_chunks):
embeddings = HuggingFaceInstructEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2")
vectorstore = PGVector.from_texts(texts=text_chunks, embedding=embeddings,connection_string=CONNECTION_STRING)
return vectorstore
# Load the LLM and start a conversation chain
def get_conversation_chain(vectorstore):
llm = HuggingFaceHub(repo_id="google/flan-t5-xxl", model_kwargs={"temperature":0.5, "max_length":1024})
memory = ConversationBufferMemory(
memory_key='chat_history', return_messages=True)
conversation_chain = ConversationalRetrievalChain.from_llm(
llm=llm,
retriever=vectorstore.as_retriever(),
memory=memory
)
return conversation_chain
# Handle user input and perform Q&A
def handle_userinput(user_question):
response = st.session_state.conversation({'question': user_question})
st.session_state.chat_history = response['chat_history']
for i, message in enumerate(st.session_state.chat_history):
if i % 2 == 0:
st.write(user_template.replace(
"{{MSG}}", message.content), unsafe_allow_html=True)
else:
st.write(bot_template.replace(
"{{MSG}}", message.content), unsafe_allow_html=True)
# Create Streamlit app
def main():
load_dotenv()
st.set_page_config(page_title="Streamlit Question Answering App",
page_icon=":books::parrot:")
st.write(css, unsafe_allow_html=True)
st.sidebar.markdown(
"""
### Instructions:
1. Browse and upload PDF files
2. Click Process
3. Type your question in the search bar to get more insights
"""
)
if "conversation" not in st.session_state:
st.session_state.conversation = None
if "chat_history" not in st.session_state:
st.session_state.chat_history = None
st.header("GenAI Q&A with pgvector and Amazon Aurora PostgreSQL :books::parrot:")
user_question = st.text_input("Ask a question about your documents:")
if user_question:
handle_userinput(user_question)
with st.sidebar:
st.subheader("Your documents")
pdf_docs = st.file_uploader(
"Upload your PDFs here and click on 'Process'", accept_multiple_files=True)
if st.button("Process"):
with st.spinner("Processing"):
# get pdf text
raw_text = get_pdf_text(pdf_docs)
# get the text chunks
text_chunks = get_text_chunks(raw_text)
# create vector store
vectorstore = get_vectorstore(text_chunks)
# create conversation chain
st.session_state.conversation = get_conversation_chain(
vectorstore)
if __name__ == '__main__':
main()
def save_faiss_model(self, text_list, id_list):
# Convert abstracts to vectors
embeddings = model.encode(text_list, show_progress_bar=False)
# Step 1: Change data type
embeddings32 = np.array(
[embedding for embedding in embeddings]).astype("float32")
# Step 4: Add vectors and their IDs
index_start_id = self.index.ntotal # inclusive
self.index.add(embeddings32)
index_end_id = self.index.ntotal # exclsuive
# serialize index
Path(f"{FAISS_DIR}/{self.content_group}").mkdir(parents=True, exist_ok=True)
faiss.write_index(self.index, f"{FAISS_DIR}/{self.content_group}/{self.content_group}_faiss_index.bin")
return (embeddings32, range(index_start_id, index_end_id))
Vector libraries are used to perform similarity search.
Examples: Facebook FAISS, Spotify Annoy, Google ScaNN, NMSLIB, Hnswlib
Vector databases are used to store and update data.
What, when and how to evaluate
Find the most relevant AI use cases with
related content and guidance to make them real
Public repository with the newest content
released by AWS on GenAI
github.com/aws-samples/gen-ai-atlas
For a set of fun and interactive explanations
of core ML concepts, check out MLU Explain
Hello World
: Meet Generative AIWerner Vogels and Swami Sivasubramanian
sit down to discuss GenAI and why it's not a hype
Follow the Music 🎶
Learn the fundamentals of how GenAI works and
how to deploy it in real-world applications
Deploy a multi-LLM and multi-RAG powered chatbot using AWS CDK
Accelerate your GenAI startup in 10 weeks
abhi1thakur
, MishaLaskin
and 0xsanny
iScienceLuvr
Note: The emphasis is on breadth, not depth
mlabonne
Disclaimer: I take no responsibility for the content available through these links
Mooler0410
cckuailong
Hannibal046