Androidify: How Androidify leverages Gemini, Firebase and ML Kit

Posted by Thomas Ezan – Developer Relations Engineer, Rebecca Franks – Developer Relations Engineer, and Avneet Singh – Product Manager We’re bringing back Androidify later this year, this time powered by Google AI, so you can customize your very own Android bot and share your creativity with the world. Today, we’re releasing a new open source demo app for Androidify as a great example of how Google is using its Gemini AI models to enhance app experiences. In this post, we'll dive into how the Androidify app uses Gemini models and Imagen via the Firebase AI Logic SDK, and we'll provide some insights learned along the way to help you incorporate Gemini and AI into your own projects. Read more about the Androidify demo app. App flow The overall app functions as follows, with various parts of it using Gemini and Firebase along the way: Gemini and image validation To get started with Androidify, take a photo or choose an image on your device. The app needs to make sure that the image you upload is suitable for creating an avatar. Gemini 2.5 Flash via Firebase helps with this by verifying that the image contains a person, that the person is in focus, and assessing image safety, including whether the image contains abusive content. val jsonSchema = Schema.obj( properties = mapOf("success" to Schema.boolean(), "error" to Schema.string()), optionalProperties = listOf("error"), ) val generativeModel = Firebase.ai(backend = GenerativeBackend.googleAI()) .generativeModel( modelName = "gemini-2.5-flash-preview-04-17", generationConfig = generationConfig { responseMimeType = "application/json" responseSchema = jsonSchema }, safetySettings = listOf( SafetySetting(HarmCategory.HARASSMENT, HarmBlockThreshold.LOW_AND_ABOVE), SafetySetting(HarmCategory.HATE_SPEECH, HarmBlockThreshold.LOW_AND_ABOVE), SafetySetting(HarmCategory.SEXUALLY_EXPLICIT, HarmBlockThreshold.LOW_AND_ABOVE), SafetySetting(HarmCategory.DANGEROUS_CONTENT, HarmBlockThreshold.LOW_AND_ABOVE), SafetySetting(HarmCategory.CIVIC_INTEGRITY, HarmBlockThreshold.LOW_AND_ABOVE), ), ) val response = generativeModel.generateContent( content { text("You are to analyze the provided image and determine if it is acceptable and appropriate based on specific criteria.... (more details see the full sample)") image(image) }, ) val jsonResponse = Json.parseToJsonElement(response.text) val isSuccess = jsonResponse.jsonObject["success"]?.jsonPrimitive?.booleanOrNull == true val error = jsonResponse.jsonObject["error"]?.jsonPrimitive?.content In the snippet above, we’re leveraging structured output capabilities of the model by defining the schema of the response. We’re passing a Schema object via the responseSchema param in the generationConfig. We want to validate that the image has enough information to generate a nice Android avatar. So we ask the model to return a json object with success = true/false and an optional error message explaining why the image doesn't have enough information. Structured output is a powerful feature enabling a smoother integration of LLMs to your app by controlling the format of their output, similar to an API response. Image captioning with Gemini Flash Once it's established that the image contains sufficient information to generate an Android avatar, it is captioned using Gemini 2.5 Flash with structured output. val jsonSchema = Schema.obj( properties = mapOf( "success" to Schema.boolean(), "user_description" to Schema.string(), ), optionalProperties = listOf("user_description"), ) val generativeModel = createGenerativeTextModel(jsonSchema) val prompt = "You are to create a VERY detailed description of the main person in the given image. This description will be translated into a prompt for a generative image model..." val response = generativeModel.generateContent( content { text(prompt) image(image) }) val jsonResponse = Json.parseToJsonElement(response.text!!) val isSuccess = jsonResponse.jsonObject["success"]?.jsonPrimitive?.booleanOrNull == true val userDescription = jsonResponse.jsonObject["user_description"]?.jsonPrimitive?.content The other option in the app is to start with a text prompt. You can enter in details about your accessories, hairstyle, and clothing, and let Imagen be a bit more creative. Android generation via Imagen We’ll use this detailed description of your image to enrich the prompt used for image generation. We’ll add extra details around what we would like to generate and include the bot color selection as part of this too, including the skin tone selected by the user. val imagenPrompt = "A 3D rendered cartoonish Android mascot in a p

May 20, 2025 - 19:31
 0
Androidify: How Androidify leverages Gemini, Firebase and ML Kit
Posted by Thomas Ezan – Developer Relations Engineer, Rebecca Franks – Developer Relations Engineer, and Avneet Singh – Product Manager

We’re bringing back Androidify later this year, this time powered by Google AI, so you can customize your very own Android bot and share your creativity with the world. Today, we’re releasing a new open source demo app for Androidify as a great example of how Google is using its Gemini AI models to enhance app experiences.

In this post, we'll dive into how the Androidify app uses Gemini models and Imagen via the Firebase AI Logic SDK, and we'll provide some insights learned along the way to help you incorporate Gemini and AI into your own projects. Read more about the Androidify demo app.

App flow

The overall app functions as follows, with various parts of it using Gemini and Firebase along the way:

flow chart demonstrating Androidify app flow

Gemini and image validation

To get started with Androidify, take a photo or choose an image on your device. The app needs to make sure that the image you upload is suitable for creating an avatar.

Gemini 2.5 Flash via Firebase helps with this by verifying that the image contains a person, that the person is in focus, and assessing image safety, including whether the image contains abusive content.

val jsonSchema = Schema.obj(
   properties = mapOf("success" to Schema.boolean(), "error" to Schema.string()),
   optionalProperties = listOf("error"),
   )
   
val generativeModel = Firebase.ai(backend = GenerativeBackend.googleAI())
   .generativeModel(
            modelName = "gemini-2.5-flash-preview-04-17",
   	     generationConfig = generationConfig {
                responseMimeType = "application/json"
                responseSchema = jsonSchema
            },
            safetySettings = listOf(
                SafetySetting(HarmCategory.HARASSMENT, HarmBlockThreshold.LOW_AND_ABOVE),
                SafetySetting(HarmCategory.HATE_SPEECH, HarmBlockThreshold.LOW_AND_ABOVE),
                SafetySetting(HarmCategory.SEXUALLY_EXPLICIT, HarmBlockThreshold.LOW_AND_ABOVE),
                SafetySetting(HarmCategory.DANGEROUS_CONTENT, HarmBlockThreshold.LOW_AND_ABOVE),
                SafetySetting(HarmCategory.CIVIC_INTEGRITY, HarmBlockThreshold.LOW_AND_ABOVE),
    	),
    )

 val response = generativeModel.generateContent(
            content {
                text("You are to analyze the provided image and determine if it is acceptable and appropriate based on specific criteria.... (more details see the full sample)")
                image(image)
            },
        )

val jsonResponse = Json.parseToJsonElement(response.text)
val isSuccess = jsonResponse.jsonObject["success"]?.jsonPrimitive?.booleanOrNull == true
val error = jsonResponse.jsonObject["error"]?.jsonPrimitive?.content

In the snippet above, we’re leveraging structured output capabilities of the model by defining the schema of the response. We’re passing a Schema object via the responseSchema param in the generationConfig.

We want to validate that the image has enough information to generate a nice Android avatar. So we ask the model to return a json object with success = true/false and an optional error message explaining why the image doesn't have enough information.

Structured output is a powerful feature enabling a smoother integration of LLMs to your app by controlling the format of their output, similar to an API response.

Image captioning with Gemini Flash

Once it's established that the image contains sufficient information to generate an Android avatar, it is captioned using Gemini 2.5 Flash with structured output.

val jsonSchema = Schema.obj(
            properties = mapOf(
                "success" to Schema.boolean(),
                "user_description" to Schema.string(),
            ),
            optionalProperties = listOf("user_description"),
        )
val generativeModel = createGenerativeTextModel(jsonSchema)

val prompt = "You are to create a VERY detailed description of the main person in the given image. This description will be translated into a prompt for a generative image model..."

val response = generativeModel.generateContent(
content { 
       	text(prompt) 
             	image(image) 
	})
        
val jsonResponse = Json.parseToJsonElement(response.text!!) 
val isSuccess = jsonResponse.jsonObject["success"]?.jsonPrimitive?.booleanOrNull == true

val userDescription = jsonResponse.jsonObject["user_description"]?.jsonPrimitive?.content

The other option in the app is to start with a text prompt. You can enter in details about your accessories, hairstyle, and clothing, and let Imagen be a bit more creative.

Android generation via Imagen

We’ll use this detailed description of your image to enrich the prompt used for image generation. We’ll add extra details around what we would like to generate and include the bot color selection as part of this too, including the skin tone selected by the user.

val imagenPrompt = "A 3D rendered cartoonish Android mascot in a photorealistic style, the pose is relaxed and straightforward, facing directly forward [...] The bot looks as follows $userDescription [...]"

We then call the Imagen model to create the bot. Using this new prompt, we create a model and call generateImages:

// we supply our own fine-tuned model here but you can use "imagen-3.0-generate-002" 
val generativeModel = Firebase.ai(backend = GenerativeBackend.googleAI()).imagenModel(
            "imagen-3.0-generate-002",
            safetySettings =
            ImagenSafetySettings(
                ImagenSafetyFilterLevel.BLOCK_LOW_AND_ABOVE,
                personFilterLevel = ImagenPersonFilterLevel.ALLOW_ALL,
            ),
)

val response = generativeModel.generateImages(imagenPrompt)

val image = response.images.first().asBitmap()

And that’s it! The Imagen model generates a bitmap that we can display on the user’s screen.

Finetuning the Imagen model

The Imagen 3 model was finetuned using Low-Rank Adaptation (LoRA). LoRA is a fine-tuning technique designed to reduce the computational burden of training large models. Instead of updating the entire model, LoRA adds smaller, trainable "adapters" that make small changes to the model's performance. We ran a fine tuning pipeline on the Imagen 3 model generally available with Android bot assets of different color combinations and different assets for enhanced cuteness and fun. We generated text captions for the training images and the image-text pairs were used to finetune the model effectively.

The current sample app uses a standard Imagen model, so the results may look a bit different from the visuals in this post. However, the app using the fine-tuned model and a custom version of Firebase AI Logic SDK was demoed at Google I/O. This app will be released later this year and we are also planning on adding support for fine-tuned models to Firebase AI Logic SDK later in the year.

moving image of Androidify app demo turning a selfie image of a bearded man wearing a black tshirt and sunglasses, with a blue back pack into a green 3D bearded droid wearing a black tshirt and sunglasses with a blue backpack
The original image... and Androidifi-ed image

ML Kit

The app also uses the ML Kit Pose Detection SDK to detect a person in the camera view, which triggers the capture button and adds visual indicators.

To do this, we add the SDK to the app, and use PoseDetection.getClient(). Then, using the poseDetector, we look at the detectedLandmarks that are in the streaming image coming from the Camera, and we set the _uiState.detectedPose to true if a nose and shoulders are visible:

private suspend fun runPoseDetection() {
    PoseDetection.getClient(
        PoseDetectorOptions.Builder()
            .setDetectorMode(PoseDetectorOptions.STREAM_MODE)
            .build(),
    ).use { poseDetector ->
        // Since image analysis is processed by ML Kit asynchronously in its own thread pool,
        // we can run this directly from the calling coroutine scope instead of pushing this
        // work to a background dispatcher.
        cameraImageAnalysisUseCase.analyze { imageProxy ->
            imageProxy.image?.let { image ->
                val poseDetected = poseDetector.detectPersonInFrame(image, imageProxy.imageInfo)
                _uiState.update { it.copy(detectedPose = poseDetected) }
            }
        }
    }
}

private suspend fun PoseDetector.detectPersonInFrame(
    image: Image,
    imageInfo: ImageInfo,
): Boolean {
    val results = process(InputImage.fromMediaImage(image, imageInfo.rotationDegrees)).await()
    val landmarkResults = results.allPoseLandmarks
    val detectedLandmarks = mutableListOf()
    for (landmark in landmarkResults) {
        if (landmark.inFrameLikelihood > 0.7) {
            detectedLandmarks.add(landmark.landmarkType)
        }
    }

    return detectedLandmarks.containsAll(
        listOf(PoseLandmark.NOSE, PoseLandmark.LEFT_SHOULDER, PoseLandmark.RIGHT_SHOULDER),
    )
}
moving image showing the camera shutter button activating when an orange droid figurine is held in the camera frame
The camera shutter button is activated when a person (or a bot!) enters the frame.

Get started with AI on Android

The Androidify app makes an extensive use of the Gemini 2.5 Flash to validate the image and generate a detailed description used to generate the image. It also leverages the specifically fine-tuned Imagen 3 model to generate images of Android bots. Gemini and Imagen models are easily integrated into the app via the Firebase AI Logic SDK. In addition, ML Kit Pose Detection SDK controls the capture button, enabling it only when a person is present in front of the camera.

To get started with AI on Android, go to the Gemini and Imagen documentation for Android.

Explore this announcement and all Google I/O 2025 updates on io.google starting May 22.