Lesson 4.3: Google (Gemini)
Duration: 45 minutes
Learning Objectives
By the end of this lesson, you will be able to:
- Understand Google's Gemini model family and capabilities
- Set up and authenticate with the Google AI API
- Make requests using the Google Generative AI SDK
- Work with Gemini's multimodal features (text, images, video)
- Leverage Gemini's integration with Google services
- Compare Gemini's strengths to other providers
Introduction
Google's Gemini is a family of multimodal AI models built from the ground up to understand and reason across text, images, audio, and video. Gemini is deeply integrated with Google's ecosystem, making it a powerful choice for applications that need to work with diverse content types. In this lesson, you will learn how to integrate Gemini into your TypeScript applications.
Gemini Model Lineup
Google offers several Gemini models optimized for different use cases:
┌─────────────────────────────────────────────────────────────────┐
│ Gemini Model Families │
├─────────────────────────────────────────────────────────────────┤
│ Gemini 2.0 │ Latest generation flagship │
│ Flash │ Fast, efficient, multimodal │
│ │ Best for: General tasks, real-time apps │
├─────────────────┼───────────────────────────────────────────────┤
│ Gemini 1.5 │ Excellent long-context capabilities │
│ Pro │ Up to 2M token context window │
│ │ Best for: Long documents, complex analysis │
├─────────────────┼───────────────────────────────────────────────┤
│ Gemini 1.5 │ Fast and cost-effective │
│ Flash │ Good multimodal support │
│ │ Best for: High-volume applications │
└─────────────────────────────────────────────────────────────────┘
Model Selection Guidelines
| Use Case | Recommended Model | Why |
|---|---|---|
| Real-time applications | gemini-2.0-flash | Fast inference |
| Long document analysis | gemini-1.5-pro | 2M context window |
| Video understanding | gemini-2.0-flash | Native video support |
| Cost-sensitive apps | gemini-1.5-flash | Best price/performance |
| Complex reasoning | gemini-1.5-pro | Best overall quality |
Setting Up the Google AI SDK
Installation
npm install @google/generative-ai
Authentication
Get your API key from Google AI Studio.
import { GoogleGenerativeAI } from '@google/generative-ai';
// Initialize with API key
const genAI = new GoogleGenerativeAI(process.env.GOOGLE_API_KEY || '');
// Get a specific model
const model = genAI.getGenerativeModel({ model: 'gemini-1.5-flash' });
Environment setup:
# .env file
GOOGLE_API_KEY=your-api-key-here
Making Your First Request
Gemini uses a simple generateContent method:
import { GoogleGenerativeAI } from '@google/generative-ai';
const genAI = new GoogleGenerativeAI(process.env.GOOGLE_API_KEY || '');
const model = genAI.getGenerativeModel({ model: 'gemini-1.5-flash' });
async function chat(prompt: string): Promise<string> {
const result = await model.generateContent(prompt);
const response = result.response;
return response.text();
}
// Usage
const answer = await chat('Explain TypeScript interfaces in simple terms');
console.log(answer);
Understanding the Response Structure
interface GenerateContentResult {
response: {
text(): string; // Get text content
candidates: Array<{
content: {
parts: Array<{ text: string }>;
role: string;
};
finishReason: string;
safetyRatings: Array<{
category: string;
probability: string;
}>;
}>;
usageMetadata?: {
promptTokenCount: number;
candidatesTokenCount: number;
totalTokenCount: number;
};
};
}
Using System Instructions
Gemini supports system instructions to define behavior:
import { GoogleGenerativeAI } from '@google/generative-ai';
const genAI = new GoogleGenerativeAI(process.env.GOOGLE_API_KEY || '');
const model = genAI.getGenerativeModel({
model: 'gemini-1.5-flash',
systemInstruction: `You are an expert TypeScript developer and teacher.
Your responses should:
- Be clear and beginner-friendly
- Include practical code examples
- Explain why, not just how
- Use analogies for complex concepts`,
});
async function askTypeScriptQuestion(question: string): Promise<string> {
const result = await model.generateContent(question);
return result.response.text();
}
// Usage
const answer = await askTypeScriptQuestion('What are union types?');
console.log(answer);
Multi-Turn Conversations
Use startChat for maintaining conversation context:
import { ChatSession, GoogleGenerativeAI } from '@google/generative-ai';
const genAI = new GoogleGenerativeAI(process.env.GOOGLE_API_KEY || '');
class GeminiConversation {
private chat: ChatSession;
constructor(systemInstruction?: string) {
const model = genAI.getGenerativeModel({
model: 'gemini-1.5-flash',
systemInstruction,
});
this.chat = model.startChat({
history: [],
generationConfig: {
maxOutputTokens: 2048,
},
});
}
async send(message: string): Promise<string> {
const result = await this.chat.sendMessage(message);
return result.response.text();
}
getHistory() {
return this.chat.getHistory();
}
}
// Usage
const conversation = new GeminiConversation(
'You are a friendly coding tutor helping students learn TypeScript.'
);
const response1 = await conversation.send('What are generics?');
console.log(response1);
const response2 = await conversation.send('Can you show a practical example?');
console.log(response2);
Multimodal Capabilities: Images
Gemini natively understands images alongside text:
import { GoogleGenerativeAI } from '@google/generative-ai';
import fs from 'fs';
const genAI = new GoogleGenerativeAI(process.env.GOOGLE_API_KEY || '');
const model = genAI.getGenerativeModel({ model: 'gemini-1.5-flash' });
async function analyzeImage(imagePath: string, prompt: string): Promise<string> {
// Read image and convert to base64
const imageBuffer = fs.readFileSync(imagePath);
const base64Image = imageBuffer.toString('base64');
// Determine MIME type
const extension = imagePath.split('.').pop()?.toLowerCase();
const mimeTypes: Record<string, string> = {
jpg: 'image/jpeg',
jpeg: 'image/jpeg',
png: 'image/png',
gif: 'image/gif',
webp: 'image/webp',
};
const mimeType = mimeTypes[extension || ''] || 'image/jpeg';
const result = await model.generateContent([
{
inlineData: {
mimeType,
data: base64Image,
},
},
prompt,
]);
return result.response.text();
}
// Usage
const description = await analyzeImage(
'./screenshot.png',
'Describe this UI screenshot and suggest improvements for accessibility.'
);
console.log(description);
Analyzing Multiple Images
async function compareImages(
image1Path: string,
image2Path: string,
prompt: string
): Promise<string> {
const image1 = fs.readFileSync(image1Path).toString('base64');
const image2 = fs.readFileSync(image2Path).toString('base64');
const result = await model.generateContent([
{ inlineData: { mimeType: 'image/png', data: image1 } },
{ inlineData: { mimeType: 'image/png', data: image2 } },
prompt,
]);
return result.response.text();
}
// Usage
const comparison = await compareImages(
'./design-v1.png',
'./design-v2.png',
'Compare these two UI designs. What changed? Which is better and why?'
);
Multimodal Capabilities: Video
Gemini can analyze video content:
import { GoogleGenerativeAI } from '@google/generative-ai';
import { GoogleAIFileManager } from '@google/generative-ai/server';
import fs from 'fs';
const genAI = new GoogleGenerativeAI(process.env.GOOGLE_API_KEY || '');
const fileManager = new GoogleAIFileManager(process.env.GOOGLE_API_KEY || '');
async function analyzeVideo(videoPath: string, prompt: string): Promise<string> {
// Upload the video file
const uploadResult = await fileManager.uploadFile(videoPath, {
mimeType: 'video/mp4',
displayName: 'Uploaded video',
});
// Wait for processing
let file = await fileManager.getFile(uploadResult.file.name);
while (file.state === 'PROCESSING') {
await new Promise((resolve) => setTimeout(resolve, 5000));
file = await fileManager.getFile(uploadResult.file.name);
}
if (file.state === 'FAILED') {
throw new Error('Video processing failed');
}
// Generate content with the video
const model = genAI.getGenerativeModel({ model: 'gemini-1.5-flash' });
const result = await model.generateContent([
{
fileData: {
mimeType: file.mimeType,
fileUri: file.uri,
},
},
prompt,
]);
return result.response.text();
}
// Usage
const analysis = await analyzeVideo(
'./demo.mp4',
'Summarize what happens in this video. Include timestamps for key moments.'
);
console.log(analysis);
Generation Configuration
Fine-tune Gemini's output with configuration options:
import { GoogleGenerativeAI } from '@google/generative-ai';
const genAI = new GoogleGenerativeAI(process.env.GOOGLE_API_KEY || '');
const model = genAI.getGenerativeModel({
model: 'gemini-1.5-flash',
generationConfig: {
// Temperature: 0-2, controls randomness
temperature: 0.7,
// Top P: nucleus sampling threshold
topP: 0.95,
// Top K: number of tokens to consider
topK: 40,
// Maximum output length
maxOutputTokens: 2048,
// Stop sequences
stopSequences: ['END', '---'],
},
});
async function generateCreativeContent(prompt: string): Promise<string> {
const result = await model.generateContent(prompt);
return result.response.text();
}
Requesting JSON Output
const model = genAI.getGenerativeModel({
model: 'gemini-1.5-flash',
generationConfig: {
responseMimeType: 'application/json',
},
});
interface ExtractedData {
name: string;
topics: string[];
sentiment: 'positive' | 'negative' | 'neutral';
}
async function extractStructuredData(text: string): Promise<ExtractedData> {
const prompt = `Extract information from this text and return as JSON:
Text: "${text}"
Return JSON with: name (string), topics (array of strings), sentiment (positive/negative/neutral)`;
const result = await model.generateContent(prompt);
return JSON.parse(result.response.text());
}
// Usage
const data = await extractStructuredData(
"Hi, I'm Alex. I really enjoyed your presentation about TypeScript and React!"
);
console.log(data);
// { name: "Alex", topics: ["TypeScript", "React"], sentiment: "positive" }
Error Handling
Handle Gemini-specific errors properly:
import { GoogleGenerativeAI } from '@google/generative-ai';
const genAI = new GoogleGenerativeAI(process.env.GOOGLE_API_KEY || '');
const model = genAI.getGenerativeModel({ model: 'gemini-1.5-flash' });
async function safeGenerate(prompt: string): Promise<string> {
try {
const result = await model.generateContent(prompt);
return result.response.text();
} catch (error) {
if (error instanceof Error) {
// Check for specific error types
if (error.message.includes('SAFETY')) {
console.error('Content blocked by safety filters');
return '';
}
if (error.message.includes('QUOTA')) {
console.error('API quota exceeded');
return '';
}
if (error.message.includes('INVALID_ARGUMENT')) {
console.error('Invalid request:', error.message);
return '';
}
if (error.message.includes('UNAVAILABLE')) {
console.error('Service temporarily unavailable');
// Implement retry logic
return '';
}
console.error('API error:', error.message);
}
throw error;
}
}
Retry Logic
async function generateWithRetry(prompt: string, maxRetries: number = 3): Promise<string> {
let lastError: Error | null = null;
for (let attempt = 1; attempt <= maxRetries; attempt++) {
try {
const result = await model.generateContent(prompt);
return result.response.text();
} catch (error) {
lastError = error as Error;
if (error instanceof Error) {
// Retry on temporary errors
if (error.message.includes('UNAVAILABLE') || error.message.includes('RESOURCE_EXHAUSTED')) {
const delay = Math.pow(2, attempt) * 1000;
console.log(`Retrying in ${delay}ms...`);
await new Promise((resolve) => setTimeout(resolve, delay));
continue;
}
}
// Don't retry on other errors
throw error;
}
}
throw lastError;
}
Understanding Pricing
Google offers competitive pricing for Gemini:
┌─────────────────────────────────────────────────────────────────┐
│ Gemini Pricing (as of 2024) │
├─────────────────┬──────────────────┬────────────────────────────┤
│ Model │ Input (per 1M) │ Output (per 1M) │
├─────────────────┼──────────────────┼────────────────────────────┤
│ Gemini 1.5 Pro │ $1.25 │ $5.00 │
│ (up to 128K) │ │ │
├─────────────────┼──────────────────┼────────────────────────────┤
│ Gemini 1.5 Pro │ $2.50 │ $10.00 │
│ (128K-2M) │ │ │
├─────────────────┼──────────────────┼────────────────────────────┤
│ Gemini 1.5 │ $0.075 │ $0.30 │
│ Flash │ │ │
├─────────────────┼──────────────────┼────────────────────────────┤
│ Gemini 2.0 │ Free tier │ Free tier │
│ Flash │ available │ available │
└─────────────────┴──────────────────┴────────────────────────────┘
Tracking Usage
interface UsageReport {
inputTokens: number;
outputTokens: number;
totalTokens: number;
}
async function generateWithUsage(prompt: string): Promise<{
text: string;
usage: UsageReport;
}> {
const result = await model.generateContent(prompt);
const response = result.response;
const usage = response.usageMetadata;
return {
text: response.text(),
usage: {
inputTokens: usage?.promptTokenCount || 0,
outputTokens: usage?.candidatesTokenCount || 0,
totalTokens: usage?.totalTokenCount || 0,
},
};
}
Gemini's Unique Strengths
1. Massive Context Window
Gemini 1.5 Pro supports up to 2 million tokens:
async function analyzeEntireCodebase(files: Map<string, string>): Promise<string> {
// Combine all files into context
let codebaseContext = '';
for (const [path, content] of files) {
codebaseContext += `\n--- File: ${path} ---\n${content}\n`;
}
const model = genAI.getGenerativeModel({ model: 'gemini-1.5-pro' });
const result = await model.generateContent(`
Analyze this entire codebase and provide:
1. Overall architecture summary
2. Main dependencies and their purposes
3. Potential issues or improvements
4. Code quality assessment
${codebaseContext}
`);
return result.response.text();
}
2. Native Video Understanding
async function generateVideoSummary(videoUri: string): Promise<{
summary: string;
keyMoments: Array<{ timestamp: string; description: string }>;
}> {
const result = await model.generateContent([
{
fileData: {
mimeType: 'video/mp4',
fileUri: videoUri,
},
},
`Analyze this video and provide:
1. A comprehensive summary
2. Key moments with timestamps
Return as JSON with "summary" (string) and "keyMoments" (array of objects with "timestamp" and "description")`,
]);
return JSON.parse(result.response.text());
}
3. Google Ecosystem Integration
Gemini integrates well with Google Cloud services:
// Example: Using with Google Cloud Storage
import { Storage } from '@google-cloud/storage';
const storage = new Storage();
async function analyzeCloudStorageImage(bucket: string, fileName: string): Promise<string> {
// Get signed URL for the file
const [url] = await storage
.bucket(bucket)
.file(fileName)
.getSignedUrl({
action: 'read',
expires: Date.now() + 15 * 60 * 1000, // 15 minutes
});
// Download and analyze
const response = await fetch(url);
const buffer = await response.arrayBuffer();
const base64 = Buffer.from(buffer).toString('base64');
const result = await model.generateContent([
{ inlineData: { mimeType: 'image/jpeg', data: base64 } },
'Describe this image in detail.',
]);
return result.response.text();
}
Safety Settings
Configure content safety filters:
import { GoogleGenerativeAI, HarmBlockThreshold, HarmCategory } from '@google/generative-ai';
const genAI = new GoogleGenerativeAI(process.env.GOOGLE_API_KEY || '');
const model = genAI.getGenerativeModel({
model: 'gemini-1.5-flash',
safetySettings: [
{
category: HarmCategory.HARM_CATEGORY_HARASSMENT,
threshold: HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
},
{
category: HarmCategory.HARM_CATEGORY_HATE_SPEECH,
threshold: HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
},
{
category: HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT,
threshold: HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
},
{
category: HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT,
threshold: HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
},
],
});
Exercises
Exercise 1: Image Comparison Tool
Create a function that compares two images and describes the differences:
// Your implementation here
async function compareImages(
image1Path: string,
image2Path: string
): Promise<{
similarities: string[];
differences: string[];
recommendation: string;
}> {
// TODO: Implement using Gemini vision
}
Solution
import { GoogleGenerativeAI } from '@google/generative-ai';
import fs from 'fs';
const genAI = new GoogleGenerativeAI(process.env.GOOGLE_API_KEY || '');
const model = genAI.getGenerativeModel({
model: 'gemini-1.5-flash',
generationConfig: { responseMimeType: 'application/json' },
});
async function compareImages(
image1Path: string,
image2Path: string
): Promise<{
similarities: string[];
differences: string[];
recommendation: string;
}> {
const image1 = fs.readFileSync(image1Path).toString('base64');
const image2 = fs.readFileSync(image2Path).toString('base64');
const result = await model.generateContent([
{ inlineData: { mimeType: 'image/png', data: image1 } },
{ inlineData: { mimeType: 'image/png', data: image2 } },
`Compare these two images in detail.
Return JSON with:
- similarities: array of things that are the same
- differences: array of things that are different
- recommendation: which image is better and why (or "both are equal" if applicable)`,
]);
return JSON.parse(result.response.text());
}
// Test
const comparison = await compareImages('./v1.png', './v2.png');
console.log('Similarities:', comparison.similarities);
console.log('Differences:', comparison.differences);
console.log('Recommendation:', comparison.recommendation);
Exercise 2: Document Q&A System
Build a system that answers questions about uploaded documents:
// Your implementation here
class DocumentQA {
private documentContent: string;
constructor() {
this.documentContent = '';
}
async loadDocument(filePath: string): Promise<void> {
// TODO: Load document content
}
async ask(question: string): Promise<string> {
// TODO: Answer questions about the loaded document
}
}
Solution
import { ChatSession, GoogleGenerativeAI } from '@google/generative-ai';
import fs from 'fs';
const genAI = new GoogleGenerativeAI(process.env.GOOGLE_API_KEY || '');
class DocumentQA {
private chat: ChatSession | null = null;
private documentContent: string = '';
async loadDocument(filePath: string): Promise<void> {
this.documentContent = fs.readFileSync(filePath, 'utf-8');
const model = genAI.getGenerativeModel({
model: 'gemini-1.5-flash',
systemInstruction: `You are a helpful assistant that answers questions about the provided document.
Always base your answers on the document content.
If the answer is not in the document, say so.
Quote relevant sections when helpful.`,
});
this.chat = model.startChat({
history: [
{
role: 'user',
parts: [{ text: `Here is the document to analyze:\n\n${this.documentContent}` }],
},
{
role: 'model',
parts: [{ text: "I've read the document. Feel free to ask me any questions about it." }],
},
],
});
}
async ask(question: string): Promise<string> {
if (!this.chat) {
throw new Error('No document loaded. Call loadDocument() first.');
}
const result = await this.chat.sendMessage(question);
return result.response.text();
}
}
// Test
const qa = new DocumentQA();
await qa.loadDocument('./terms-of-service.txt');
const answer1 = await qa.ask('What is the cancellation policy?');
console.log(answer1);
const answer2 = await qa.ask('Are there any fees mentioned?');
console.log(answer2);
Exercise 3: Multi-Image Story Generator
Create a function that generates a story based on multiple images:
// Your implementation here
async function generateStory(
imagePaths: string[],
genre: 'adventure' | 'mystery' | 'comedy'
): Promise<string> {
// TODO: Generate a story that incorporates all images
}
Solution
import { GoogleGenerativeAI } from '@google/generative-ai';
import fs from 'fs';
const genAI = new GoogleGenerativeAI(process.env.GOOGLE_API_KEY || '');
const model = genAI.getGenerativeModel({ model: 'gemini-1.5-flash' });
async function generateStory(
imagePaths: string[],
genre: 'adventure' | 'mystery' | 'comedy'
): Promise<string> {
const images = imagePaths.map((path) => {
const data = fs.readFileSync(path).toString('base64');
const ext = path.split('.').pop()?.toLowerCase();
const mimeType = ext === 'png' ? 'image/png' : 'image/jpeg';
return { inlineData: { mimeType, data } };
});
const genreInstructions = {
adventure: 'exciting and action-packed with a brave protagonist',
mystery: 'suspenseful with clues and an unexpected twist',
comedy: 'humorous and lighthearted with witty dialogue',
};
const result = await model.generateContent([
...images,
`Create a ${genre} story that incorporates ALL of these images as key scenes.
Requirements:
- The story should be ${genreInstructions[genre]}
- Each image should represent a distinct chapter or scene
- The narrative should flow naturally between the images
- Length: 500-800 words
- Include vivid descriptions that match what's in the images
Start the story now.`,
]);
return result.response.text();
}
// Test
const story = await generateStory(['./scene1.jpg', './scene2.jpg', './scene3.jpg'], 'adventure');
console.log(story);
Key Takeaways
- Model Selection: Use Flash models for speed, Pro for quality and long context
- Multimodal Native: Gemini handles images, video, and text in the same request
- Massive Context: 2M token window enables analyzing entire codebases or books
- JSON Mode: Use
responseMimeType: "application/json"for structured output - Free Tier: Generous free tier for development and testing
- Safety Settings: Configurable content filters for production applications
- Google Integration: Works well with Google Cloud services
Resources
| Resource | Type | Description |
|---|---|---|
| Google AI Documentation | Documentation | Official API documentation |
| Google AI Studio | Tool | Interactive prompt testing |
| Gemini API Cookbook | Tutorial | Practical examples |
| Google AI TypeScript SDK | Repository | Official SDK source |
Next Lesson
You have learned how to work with Google's Gemini models. In the next lesson, you will explore open-source alternatives like Llama and Mistral, which offer the flexibility of self-hosting and customization.