generative-ai 学习笔记 Ⅲ

2024-05-24

前言
8. Building Search Applications
9. Building Image Applications

前言

8. Building Search Applications

本节中，我们会学习:

1. Semantic vs Keyword search
了解语义搜索和关键词搜索的差别

2. What are Text Embeddings
Text Embeddings(vector) 是什么，有什么用

3. Creating a Text Embeddings Index. Searching a Text Embeddings Index.
如何创建和搜索 Text Embeddings

8.1 what is semantic search

semantic search就是那种，你搜到my dream car，它会反应过来我想要找的是ideal car，而不会像keyword search那样，会遍历的搜寻关于车的梦想

8.2 what are Text Embeddings

Text Embeddings是一种把文本转换为向量（语义的数字表达）的技术
下面是例子:

Today we are going to learn about Azure Machine Learning.

// 实际上vector有很多，为了简便只列了10个
[-0.006655829958617687, 0.0026128944009542465, 0.008792596869170666, -0.02446001023054123, -0.008540431968867779, 0.022071078419685364, -0.010703742504119873, 0.003311325330287218, -0.011632772162556648, -0.02187200076878071, ...]

8.3 How is the Embedding index created?

将内容下载下来，比如 视频

使用openai的功能，从前3分钟把视频的作者名称提取出来，放入vector database中

将视频的文本切分成3mins的片段，为了保证片段的关联性，每个片段都包含和下一个片段重合的20个单词

使用openai 的api功能，把每个文本片段让如并提取出60个词左右的总结，将总结放入vector database中

最后，还是用openai的api，把每个文本变量的向量表达计算出来，放入vector database中

使用时，我们搜寻某个内容，这个内容会先被转化为vector，随后和数据库中的vector做比较，找到相似的，并将这个vector的文本总结、视频作者名称取出，就是我们想要的搜索内容啦

8.4 how to search?

上文提到的搜索某个东西,其实用到了cosine similarity的技术，在我们搜索的内容被转换成vector后，会与vector database的各个vector进行cosine similarity的计算来比对相似度。相似度高的会被取出。

8.5 实例

看了一个比较有趣的sample
写法值得学习，很高级（😄

import os
import pandas as pd
import numpy as np
from openai import OpenAI
from dotenv import load_dotenv

load_dotenv()

API_KEY = os.getenv("OPENAI_API_KEY","")
assert API_KEY, "ERROR: OpenAI Key is missing"

client = OpenAI(
    api_key=API_KEY
    )

model = 'text-embedding-ada-002'

SIMILARITIES_RESULTS_THRESHOLD = 0.75
DATASET_NAME = "./08-building-search-applications/embedding_index_3m.json"


def load_dataset(source: str) -> pd.core.frame.DataFrame:
    # Load the video session index
    pd_vectors = pd.read_json(source)
    # for col in pd_vectors:
    #     print(col,end=": ")
    #     print(pd_vectors[col][0])
    return pd_vectors.drop(columns=["text"], errors="ignore").fillna("") # 删除名为text的列，如果没有则忽略，缺省值用空字符串填充

def cosine_similarity(a, b):
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b)) # cosθ = a·b / |a||b|

def get_videos(
    query: str, dataset: pd.core.frame.DataFrame, rows: int
) -> pd.core.frame.DataFrame:
    # create a copy of the dataset
    video_vectors = dataset.copy()

    # get the embeddings for the query    
    query_embeddings = client.embeddings.create(input=query, model=model).data[0].embedding

    # create a new column with the calculated similarity for each row
    video_vectors["similarity"] = video_vectors["ada_v2"].apply(
        lambda x: cosine_similarity(np.array(query_embeddings), np.array(x))
    )

    # filter the videos by similarity
    mask = video_vectors["similarity"] >= SIMILARITIES_RESULTS_THRESHOLD
    # print(mask)
    video_vectors = video_vectors[mask].copy()
    # print(video_vectors)

    # sort the videos by similarity
    video_vectors = video_vectors.sort_values(by="similarity", ascending=False).head(
        rows
    )

    # return the top rows
    return video_vectors.head(rows)

def display_results(videos: pd.core.frame.DataFrame, query: str):
    def _gen_yt_url(video_id: str, seconds: int) -> str:
        """convert time in format 00:00:00 to seconds"""
        return f"https://youtu.be/{video_id}?t={seconds}"

    print(f"\nVideos similar to '{query}':")
    for _, row in videos.iterrows():
        print("_________________")
        print(_)
        print(row)
        print("_________________")
        youtube_url = _gen_yt_url(row["videoId"], row["seconds"])
        print(f" - {row['title']}")
        print(f"   Summary: {' '.join(row['summary'].split()[:15])}...")
        print(f"   YouTube: {youtube_url}")
        print(f"   Similarity: {row['similarity']}")
        print(f"   Speakers: {row['speaker']}")


pd_vectors = load_dataset(DATASET_NAME)


# get user query from imput
while True:
    query = input("Enter a query: ")
    if query == "exit":
        break
    videos = get_videos(query, pd_vectors, 5)
    display_results(videos, query)

9. Building Image Applications

图像生成还蛮重要的，有很多的应用:

Image editing and synthesis.

图像编辑和合成可以用到

Applied to a variety of industries.

各种各样的行业也能用到，例如：医疗科技、旅游、游戏开发

9.1 What is DALL-E and Midjourney?

DALL-E和Midjourney是两款比较著名的图片生成大模型。
它们都允许你用用户prompts生成图片

DALL-E

由2个部分组成：CLIP和 Diffused attention

CLIP是生成embedding的模型，将用户输入（图片和文本）转换为embeddings

Diffused attention用于读入embeddings，输出图片

Midjourney与之类似，也能生成图片。

插曲: tokenization和text embedding

Tokenization是预处理阶段，用于将文本分解为基本单元。
Text Embeddings是特征提取阶段，用于将tokens转换为向量表示。

9.2 How does DALL-E and Midjourney Work

对于DALL-E而言

DALL-E 是一个 Generative AI model， 基于 transformer 架构，还带有一个 autoregressive transformer

autoregressive transformer定义了模型如何根据文本描述生成图像，它一次生成一个像素，然后使用生成的像素生成下一个像素。经过神经网络中的多层，直到图像完整

通过此过程，DALL-E 可以控制其生成的图像中的属性、对象、特征等。但是，DALL-E 2 和 3 对生成的图像的控制更强。

9.3 实例

来点实例：

from openai import OpenAI
import os
import requests
from PIL import Image
import dotenv
from io import BytesIO

# import dotenv
dotenv.load_dotenv()

 
client = OpenAI()

try:
    # Create an image by using the image generation API
    generation_response = client.images.generate(
        model="dall-e-3",
        prompt='Bunny on horse, holding a lollipop, on a foggy meadow where it grows daffodils',    # Enter your prompt text here
        size='1024x1024',
        n=1,
    )
    # Set the directory for the stored image
    image_dir = os.path.join(os.curdir, 'images')
    
    # If the directory doesn't exist, create it
    if not os.path.isdir(image_dir):
        os.mkdir(image_dir)
    print(image_dir)

    # Initialize the image path (note the filetype should be png)
    image_path = os.path.join(image_dir, 'generated-image.png')
    print(image_path)

    # Retrieve the generated image
    print(generation_response)

    image_url = generation_response.data[0].url # extract image URL from response
    generated_image = requests.get(image_url).content  # download the image
    with open(image_path, "wb") as image_file:
        image_file.write(generated_image)

    # Display the image in the default image viewer
    image = Image.open(image_path)
    image.show()

# catch exceptions
except client.error.InvalidRequestError as err:
    print(err)

# ---creating variation below---
response = client.images.create_variation(
  image=open(image_path, "rb"),
  n=1,
  size="1024x1024",
)
print(response)
image_url_variation = response.data[0].url
print(image_url_variation)
generated_image_variation = requests.get(image_url_variation).content
img = Image.open(BytesIO(generated_image_variation))
img.show()
with open("./images/generated-image-variation.png", "wb") as image_file:
    image_file.write(generated_image_variation)


# ---creating edits below---
# response = client.images.edit(
#   image=open("./images/generated-image.png", "rb"),
#   mask=open("./images/generated-image-variation.png", "rb"), # ['RGBA', 'LA', 'L']必须是这些类型的图片，RGB类型是不行的
#   prompt="An image of a rabbit with a hat on its head.",
#   n=1,
#   size="1024x1024"
# )
# image_url = response['data'][0]['url']
# img = Image.open(BytesIO(requests.get(image_url).content))
# img.show()

from openai import OpenAI
import os
import requests
from PIL import Image
import dotenv

# import dotenv
dotenv.load_dotenv()

openai = OpenAI()

image_dir = os.path.join(os.curdir, 'images')

# Initialize the image path (note the filetype should be png)
image_path = os.path.join(image_dir, 'generated-image.png')

# ---creating variation below---
try:
    print("LOG creating variation")
    response = openai.images.create_variation(
        image=open(image_path, "rb"),
        n=1,
        size="1024x1024"
    )

    image_path = os.path.join(image_dir, 'generated_variation.png')

    image_url = response.data[0].url

    print("LOG downloading image")
    generated_image = requests.get(image_url).content  # download the image
    with open(image_path, "wb") as image_file:
        image_file.write(generated_image)

    # Display the image in the default image viewer
    image = Image.open(image_path)
    image.show()
except openai.error.InvalidRequestError as err:
    print(err)

9.4 增加安全措施的实例

from openai import OpenAI
import os
import requests
from PIL import Image
import dotenv

# import dotenv
dotenv.load_dotenv()

disallow_list = "swords, violence, blood, gore, nudity, sexual content, adult content, adult themes, adult language, adult humor, adult jokes, adult situations, adult"

meta_prompt = f"""You are an assistant designer that creates images for children. 

The image needs to be safe for work and appropriate for children. 

The image needs to be in color.  

The image needs to be in landscape orientation.  

The image needs to be in a 16:9 aspect ratio. 

Do not consider any input from the following that is not safe for work or appropriate for children. 
{disallow_list}"""

prompt = f"""{meta_prompt}
Generate monument of the Arc of Triumph in Paris, France, in the evening light with a small child holding a Teddy looks on.
"""
client = OpenAI()
# Create an image by using the image generation API
generation_response = client.images.generate(
    prompt=prompt,    # Enter your prompt text here
    size='1024x1024',
    n=2,
)
# Set the directory for the stored image
image_dir = os.path.join(os.curdir, 'images')

# If the directory doesn't exist, create it
if not os.path.isdir(image_dir):
    os.mkdir(image_dir)

# Initialize the image path (note the filetype should be png)
image_path = os.path.join(image_dir, 'generated-image.png')

# Retrieve the generated image
image_url = generation_response["data"][0]["url"]  # extract image URL from response
generated_image = requests.get(image_url).content  # download the image
with open(image_path, "wb") as image_file:
    image_file.write(generated_image)

# Display the image in the default image viewer
image = Image.open(image_path)
image.show()