Lesson 4.3: Google (Gemini)

Duration: 45 minutes

Learning Objectives

By the end of this lesson, you will be able to:

Understand Google's Gemini model family and capabilities
Set up and authenticate with the Google AI API
Make requests using the Google Generative AI SDK
Work with Gemini's multimodal features (text, images, video)
Leverage Gemini's integration with Google services
Compare Gemini's strengths to other providers

Introduction

Google's Gemini is a family of multimodal AI models built from the ground up to understand and reason across text, images, audio, and video. Gemini is deeply integrated with Google's ecosystem, making it a powerful choice for applications that need to work with diverse content types. In this lesson, you will learn how to integrate Gemini into your TypeScript applications.

Gemini Model Lineup

Google offers several Gemini models optimized for different use cases:

Model	Description	Best For
Gemini 2.0 Flash	Latest generation flagship. Fast, efficient, multimodal.	General tasks, real-time apps
Gemini 1.5 Pro	Excellent long-context capabilities. Up to 2M token context window.	Long documents, complex analysis
Gemini 1.5 Flash	Fast and cost-effective. Good multimodal support.	High-volume applications

Model Selection Guidelines

Use Case	Recommended Model	Why
Real-time applications	gemini-2.0-flash	Fast inference
Long document analysis	gemini-1.5-pro	2M context window
Video understanding	gemini-2.0-flash	Native video support
Cost-sensitive apps	gemini-1.5-flash	Best price/performance
Complex reasoning	gemini-1.5-pro	Best overall quality

Setting Up the Google AI SDK

Installation

npm install @google/generative-ai

Authentication

Get your API key from Google AI Studio.

import { GoogleGenerativeAI } from '@google/generative-ai';

// Initialize with API key
const genAI = new GoogleGenerativeAI(process.env.GOOGLE_API_KEY || '');

// Get a specific model
const model = genAI.getGenerativeModel({ model: 'gemini-1.5-flash' });

Environment setup:

# .env file
GOOGLE_API_KEY=your-api-key-here

Making Your First Request

Gemini uses a simple generateContent method:

import { GoogleGenerativeAI } from '@google/generative-ai';

const genAI = new GoogleGenerativeAI(process.env.GOOGLE_API_KEY || '');
const model = genAI.getGenerativeModel({ model: 'gemini-1.5-flash' });

async function chat(prompt: string): Promise<string> {
  const result = await model.generateContent(prompt);
  const response = result.response;
  return response.text();
}

// Usage
const answer = await chat('Explain TypeScript interfaces in simple terms');
console.log(answer);

Understanding the Response Structure

interface GenerateContentResult {
  response: {
    text(): string; // Get text content
    candidates: Array<{
      content: {
        parts: Array<{ text: string }>;
        role: string;
      };
      finishReason: string;
      safetyRatings: Array<{
        category: string;
        probability: string;
      }>;
    }>;
    usageMetadata?: {
      promptTokenCount: number;
      candidatesTokenCount: number;
      totalTokenCount: number;
    };
  };
}

Using System Instructions

Gemini supports system instructions to define behavior:

import { GoogleGenerativeAI } from '@google/generative-ai';

const genAI = new GoogleGenerativeAI(process.env.GOOGLE_API_KEY || '');

const model = genAI.getGenerativeModel({
  model: 'gemini-1.5-flash',
  systemInstruction: `You are an expert TypeScript developer and teacher.
Your responses should:
- Be clear and beginner-friendly
- Include practical code examples
- Explain why, not just how
- Use analogies for complex concepts`,
});

async function askTypeScriptQuestion(question: string): Promise<string> {
  const result = await model.generateContent(question);
  return result.response.text();
}

// Usage
const answer = await askTypeScriptQuestion('What are union types?');
console.log(answer);

Multi-Turn Conversations

Use startChat for maintaining conversation context:

import { ChatSession, GoogleGenerativeAI } from '@google/generative-ai';

const genAI = new GoogleGenerativeAI(process.env.GOOGLE_API_KEY || '');

class GeminiConversation {
  private chat: ChatSession;

  constructor(systemInstruction?: string) {
    const model = genAI.getGenerativeModel({
      model: 'gemini-1.5-flash',
      systemInstruction,
    });

    this.chat = model.startChat({
      history: [],
      generationConfig: {
        maxOutputTokens: 2048,
      },
    });
  }

  async send(message: string): Promise<string> {
    const result = await this.chat.sendMessage(message);
    return result.response.text();
  }

  getHistory() {
    return this.chat.getHistory();
  }
}

// Usage
const conversation = new GeminiConversation(
  'You are a friendly coding tutor helping students learn TypeScript.'
);

const response1 = await conversation.send('What are generics?');
console.log(response1);

const response2 = await conversation.send('Can you show a practical example?');
console.log(response2);

Multimodal Capabilities: Images

Gemini natively understands images alongside text:

import { GoogleGenerativeAI } from '@google/generative-ai';
import fs from 'fs';

const genAI = new GoogleGenerativeAI(process.env.GOOGLE_API_KEY || '');
const model = genAI.getGenerativeModel({ model: 'gemini-1.5-flash' });

async function analyzeImage(imagePath: string, prompt: string): Promise<string> {
  // Read image and convert to base64
  const imageBuffer = fs.readFileSync(imagePath);
  const base64Image = imageBuffer.toString('base64');

  // Determine MIME type
  const extension = imagePath.split('.').pop()?.toLowerCase();
  const mimeTypes: Record<string, string> = {
    jpg: 'image/jpeg',
    jpeg: 'image/jpeg',
    png: 'image/png',
    gif: 'image/gif',
    webp: 'image/webp',
  };
  const mimeType = mimeTypes[extension || ''] || 'image/jpeg';

  const result = await model.generateContent([
    {
      inlineData: {
        mimeType,
        data: base64Image,
      },
    },
    prompt,
  ]);

  return result.response.text();
}

// Usage
const description = await analyzeImage(
  './screenshot.png',
  'Describe this UI screenshot and suggest improvements for accessibility.'
);
console.log(description);

Analyzing Multiple Images

async function compareImages(
  image1Path: string,
  image2Path: string,
  prompt: string
): Promise<string> {
  const image1 = fs.readFileSync(image1Path).toString('base64');
  const image2 = fs.readFileSync(image2Path).toString('base64');

  const result = await model.generateContent([
    { inlineData: { mimeType: 'image/png', data: image1 } },
    { inlineData: { mimeType: 'image/png', data: image2 } },
    prompt,
  ]);

  return result.response.text();
}

// Usage
const comparison = await compareImages(
  './design-v1.png',
  './design-v2.png',
  'Compare these two UI designs. What changed? Which is better and why?'
);

Multimodal Capabilities: Video

Gemini can analyze video content:

import { GoogleGenerativeAI } from '@google/generative-ai';
import { GoogleAIFileManager } from '@google/generative-ai/server';
import fs from 'fs';

const genAI = new GoogleGenerativeAI(process.env.GOOGLE_API_KEY || '');
const fileManager = new GoogleAIFileManager(process.env.GOOGLE_API_KEY || '');

async function analyzeVideo(videoPath: string, prompt: string): Promise<string> {
  // Upload the video file
  const uploadResult = await fileManager.uploadFile(videoPath, {
    mimeType: 'video/mp4',
    displayName: 'Uploaded video',
  });

  // Wait for processing
  let file = await fileManager.getFile(uploadResult.file.name);
  while (file.state === 'PROCESSING') {
    await new Promise((resolve) => setTimeout(resolve, 5000));
    file = await fileManager.getFile(uploadResult.file.name);
  }

  if (file.state === 'FAILED') {
    throw new Error('Video processing failed');
  }

  // Generate content with the video
  const model = genAI.getGenerativeModel({ model: 'gemini-1.5-flash' });

  const result = await model.generateContent([
    {
      fileData: {
        mimeType: file.mimeType,
        fileUri: file.uri,
      },
    },
    prompt,
  ]);

  return result.response.text();
}

// Usage
const analysis = await analyzeVideo(
  './demo.mp4',
  'Summarize what happens in this video. Include timestamps for key moments.'
);
console.log(analysis);

Generation Configuration

Fine-tune Gemini's output with configuration options:

import { GoogleGenerativeAI } from '@google/generative-ai';

const genAI = new GoogleGenerativeAI(process.env.GOOGLE_API_KEY || '');

const model = genAI.getGenerativeModel({
  model: 'gemini-1.5-flash',
  generationConfig: {
    // Temperature: 0-2, controls randomness
    temperature: 0.7,

    // Top P: nucleus sampling threshold
    topP: 0.95,

    // Top K: number of tokens to consider
    topK: 40,

    // Maximum output length
    maxOutputTokens: 2048,

    // Stop sequences
    stopSequences: ['END', '---'],
  },
});

async function generateCreativeContent(prompt: string): Promise<string> {
  const result = await model.generateContent(prompt);
  return result.response.text();
}

Requesting JSON Output

const model = genAI.getGenerativeModel({
  model: 'gemini-1.5-flash',
  generationConfig: {
    responseMimeType: 'application/json',
  },
});

interface ExtractedData {
  name: string;
  topics: string[];
  sentiment: 'positive' | 'negative' | 'neutral';
}

async function extractStructuredData(text: string): Promise<ExtractedData> {
  const prompt = `Extract information from this text and return as JSON:
  
Text: "${text}"

Return JSON with: name (string), topics (array of strings), sentiment (positive/negative/neutral)`;

  const result = await model.generateContent(prompt);
  return JSON.parse(result.response.text());
}

// Usage
const data = await extractStructuredData(
  "Hi, I'm Alex. I really enjoyed your presentation about TypeScript and React!"
);
console.log(data);
// { name: "Alex", topics: ["TypeScript", "React"], sentiment: "positive" }

Error Handling

Handle Gemini-specific errors properly:

import { GoogleGenerativeAI } from '@google/generative-ai';

const genAI = new GoogleGenerativeAI(process.env.GOOGLE_API_KEY || '');
const model = genAI.getGenerativeModel({ model: 'gemini-1.5-flash' });

async function safeGenerate(prompt: string): Promise<string> {
  try {
    const result = await model.generateContent(prompt);
    return result.response.text();
  } catch (error) {
    if (error instanceof Error) {
      // Check for specific error types
      if (error.message.includes('SAFETY')) {
        console.error('Content blocked by safety filters');
        return '';
      }

      if (error.message.includes('QUOTA')) {
        console.error('API quota exceeded');
        return '';
      }

      if (error.message.includes('INVALID_ARGUMENT')) {
        console.error('Invalid request:', error.message);
        return '';
      }

      if (error.message.includes('UNAVAILABLE')) {
        console.error('Service temporarily unavailable');
        // Implement retry logic
        return '';
      }

      console.error('API error:', error.message);
    }
    throw error;
  }
}

Retry Logic

async function generateWithRetry(prompt: string, maxRetries: number = 3): Promise<string> {
  let lastError: Error | null = null;

  for (let attempt = 1; attempt <= maxRetries; attempt++) {
    try {
      const result = await model.generateContent(prompt);
      return result.response.text();
    } catch (error) {
      lastError = error as Error;

      if (error instanceof Error) {
        // Retry on temporary errors
        if (error.message.includes('UNAVAILABLE') || error.message.includes('RESOURCE_EXHAUSTED')) {
          const delay = Math.pow(2, attempt) * 1000;
          console.log(`Retrying in ${delay}ms...`);
          await new Promise((resolve) => setTimeout(resolve, delay));
          continue;
        }
      }

      // Don't retry on other errors
      throw error;
    }
  }

  throw lastError;
}

Understanding Pricing

Google offers competitive pricing for Gemini:

Gemini Pricing (as of 2024)

Model	Input (per 1M)	Output (per 1M)
Gemini 1.5 Pro (up to 128K)	$1.25	$5.00
Gemini 1.5 Pro (128K-2M)	$2.50	$10.00
Gemini 1.5 Flash	$0.075	$0.30
Gemini 2.0 Flash	Free tier available	Free tier available

Tracking Usage

interface UsageReport {
  inputTokens: number;
  outputTokens: number;
  totalTokens: number;
}

async function generateWithUsage(prompt: string): Promise<{
  text: string;
  usage: UsageReport;
}> {
  const result = await model.generateContent(prompt);
  const response = result.response;

  const usage = response.usageMetadata;

  return {
    text: response.text(),
    usage: {
      inputTokens: usage?.promptTokenCount || 0,
      outputTokens: usage?.candidatesTokenCount || 0,
      totalTokens: usage?.totalTokenCount || 0,
    },
  };
}

Gemini's Unique Strengths

1. Massive Context Window

Gemini 1.5 Pro supports up to 2 million tokens:

async function analyzeEntireCodebase(files: Map<string, string>): Promise<string> {
  // Combine all files into context
  let codebaseContext = '';
  for (const [path, content] of files) {
    codebaseContext += `\n--- File: ${path} ---\n${content}\n`;
  }

  const model = genAI.getGenerativeModel({ model: 'gemini-1.5-pro' });

  const result = await model.generateContent(`
Analyze this entire codebase and provide:
1. Overall architecture summary
2. Main dependencies and their purposes
3. Potential issues or improvements
4. Code quality assessment

${codebaseContext}
  `);

  return result.response.text();
}

2. Native Video Understanding

async function generateVideoSummary(videoUri: string): Promise<{
  summary: string;
  keyMoments: Array<{ timestamp: string; description: string }>;
}> {
  const result = await model.generateContent([
    {
      fileData: {
        mimeType: 'video/mp4',
        fileUri: videoUri,
      },
    },
    `Analyze this video and provide:
1. A comprehensive summary
2. Key moments with timestamps

Return as JSON with "summary" (string) and "keyMoments" (array of objects with "timestamp" and "description")`,
  ]);

  return JSON.parse(result.response.text());
}

3. Google Ecosystem Integration

Gemini integrates well with Google Cloud services:

// Example: Using with Google Cloud Storage
import { Storage } from '@google-cloud/storage';

const storage = new Storage();

async function analyzeCloudStorageImage(bucket: string, fileName: string): Promise<string> {
  // Get signed URL for the file
  const [url] = await storage
    .bucket(bucket)
    .file(fileName)
    .getSignedUrl({
      action: 'read',
      expires: Date.now() + 15 * 60 * 1000, // 15 minutes
    });

  // Download and analyze
  const response = await fetch(url);
  const buffer = await response.arrayBuffer();
  const base64 = Buffer.from(buffer).toString('base64');

  const result = await model.generateContent([
    { inlineData: { mimeType: 'image/jpeg', data: base64 } },
    'Describe this image in detail.',
  ]);

  return result.response.text();
}

Safety Settings

Configure content safety filters:

import { GoogleGenerativeAI, HarmBlockThreshold, HarmCategory } from '@google/generative-ai';

const genAI = new GoogleGenerativeAI(process.env.GOOGLE_API_KEY || '');

const model = genAI.getGenerativeModel({
  model: 'gemini-1.5-flash',
  safetySettings: [
    {
      category: HarmCategory.HARM_CATEGORY_HARASSMENT,
      threshold: HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
    },
    {
      category: HarmCategory.HARM_CATEGORY_HATE_SPEECH,
      threshold: HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
    },
    {
      category: HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT,
      threshold: HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
    },
    {
      category: HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT,
      threshold: HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
    },
  ],
});

Exercises

Exercise 1: Image Comparison Tool

Create a function that compares two images and describes the differences:

// Your implementation here
async function compareImages(
  image1Path: string,
  image2Path: string
): Promise<{
  similarities: string[];
  differences: string[];
  recommendation: string;
}> {
  // TODO: Implement using Gemini vision
}

Solution

import { GoogleGenerativeAI } from '@google/generative-ai';
import fs from 'fs';

const genAI = new GoogleGenerativeAI(process.env.GOOGLE_API_KEY || '');
const model = genAI.getGenerativeModel({
  model: 'gemini-1.5-flash',
  generationConfig: { responseMimeType: 'application/json' },
});

async function compareImages(
  image1Path: string,
  image2Path: string
): Promise<{
  similarities: string[];
  differences: string[];
  recommendation: string;
}> {
  const image1 = fs.readFileSync(image1Path).toString('base64');
  const image2 = fs.readFileSync(image2Path).toString('base64');

  const result = await model.generateContent([
    { inlineData: { mimeType: 'image/png', data: image1 } },
    { inlineData: { mimeType: 'image/png', data: image2 } },
    `Compare these two images in detail.

Return JSON with:
- similarities: array of things that are the same
- differences: array of things that are different
- recommendation: which image is better and why (or "both are equal" if applicable)`,
  ]);

  return JSON.parse(result.response.text());
}

// Test
const comparison = await compareImages('./v1.png', './v2.png');
console.log('Similarities:', comparison.similarities);
console.log('Differences:', comparison.differences);
console.log('Recommendation:', comparison.recommendation);

Exercise 2: Document Q&A System

Build a system that answers questions about uploaded documents:

// Your implementation here
class DocumentQA {
  private documentContent: string;

  constructor() {
    this.documentContent = '';
  }

  async loadDocument(filePath: string): Promise<void> {
    // TODO: Load document content
  }

  async ask(question: string): Promise<string> {
    // TODO: Answer questions about the loaded document
  }
}

Solution

import { ChatSession, GoogleGenerativeAI } from '@google/generative-ai';
import fs from 'fs';

const genAI = new GoogleGenerativeAI(process.env.GOOGLE_API_KEY || '');

class DocumentQA {
  private chat: ChatSession | null = null;
  private documentContent: string = '';

  async loadDocument(filePath: string): Promise<void> {
    this.documentContent = fs.readFileSync(filePath, 'utf-8');

    const model = genAI.getGenerativeModel({
      model: 'gemini-1.5-flash',
      systemInstruction: `You are a helpful assistant that answers questions about the provided document.
Always base your answers on the document content.
If the answer is not in the document, say so.
Quote relevant sections when helpful.`,
    });

    this.chat = model.startChat({
      history: [
        {
          role: 'user',
          parts: [{ text: `Here is the document to analyze:\n\n${this.documentContent}` }],
        },
        {
          role: 'model',
          parts: [{ text: "I've read the document. Feel free to ask me any questions about it." }],
        },
      ],
    });
  }

  async ask(question: string): Promise<string> {
    if (!this.chat) {
      throw new Error('No document loaded. Call loadDocument() first.');
    }

    const result = await this.chat.sendMessage(question);
    return result.response.text();
  }
}

// Test
const qa = new DocumentQA();
await qa.loadDocument('./terms-of-service.txt');

const answer1 = await qa.ask('What is the cancellation policy?');
console.log(answer1);

const answer2 = await qa.ask('Are there any fees mentioned?');
console.log(answer2);

Exercise 3: Multi-Image Story Generator

Create a function that generates a story based on multiple images:

// Your implementation here
async function generateStory(
  imagePaths: string[],
  genre: 'adventure' | 'mystery' | 'comedy'
): Promise<string> {
  // TODO: Generate a story that incorporates all images
}

Solution

import { GoogleGenerativeAI } from '@google/generative-ai';
import fs from 'fs';

const genAI = new GoogleGenerativeAI(process.env.GOOGLE_API_KEY || '');
const model = genAI.getGenerativeModel({ model: 'gemini-1.5-flash' });

async function generateStory(
  imagePaths: string[],
  genre: 'adventure' | 'mystery' | 'comedy'
): Promise<string> {
  const images = imagePaths.map((path) => {
    const data = fs.readFileSync(path).toString('base64');
    const ext = path.split('.').pop()?.toLowerCase();
    const mimeType = ext === 'png' ? 'image/png' : 'image/jpeg';
    return { inlineData: { mimeType, data } };
  });

  const genreInstructions = {
    adventure: 'exciting and action-packed with a brave protagonist',
    mystery: 'suspenseful with clues and an unexpected twist',
    comedy: 'humorous and lighthearted with witty dialogue',
  };

  const result = await model.generateContent([
    ...images,
    `Create a ${genre} story that incorporates ALL of these images as key scenes.

Requirements:
- The story should be ${genreInstructions[genre]}
- Each image should represent a distinct chapter or scene
- The narrative should flow naturally between the images
- Length: 500-800 words
- Include vivid descriptions that match what's in the images

Start the story now.`,
  ]);

  return result.response.text();
}

// Test
const story = await generateStory(['./scene1.jpg', './scene2.jpg', './scene3.jpg'], 'adventure');
console.log(story);

Key Takeaways

Model Selection: Use Flash models for speed, Pro for quality and long context
Multimodal Native: Gemini handles images, video, and text in the same request
Massive Context: 2M token window enables analyzing entire codebases or books
JSON Mode: Use responseMimeType: "application/json" for structured output
Free Tier: Generous free tier for development and testing
Safety Settings: Configurable content filters for production applications
Google Integration: Works well with Google Cloud services

Resources

Resource	Type	Description
Google AI Documentation	Documentation	Official API documentation
Google AI Studio	Tool	Interactive prompt testing
Gemini API Cookbook	Tutorial	Practical examples
Google AI TypeScript SDK	Repository	Official SDK source

Next Lesson

You have learned how to work with Google's Gemini models. In the next lesson, you will explore open-source alternatives like Llama and Mistral, which offer the flexibility of self-hosting and customization.

Continue to Lesson 4.4: Open Source Models