I built a tool on Google Colab to easily create demo videos and convert them into GIFs or MP4s.
Changelog
| Date | Changes |
|---|---|
| 2026/03/29 | Added BGM feature (MP4 output only). • Download audio from a YouTube URL and merge it into the final MP4 with configurable start position, speed, and volume • Advanced license auto-detection via metadata analysis • Slider changes reflected in the audio player in real time |
| 2026/03/08 | Applied # @title / # @markdown form mode to all cells.Overhauled zoom UI: • Added "🏁 Record end position" button to allow direct end-time input • Hold duration is now calculated automatically • Added collapsible event panels and selection highlighting |
| 2026/02/24 | Zoom feature overhauled. Start times can now be recorded automatically during preview playback. Overlapping zoom events are detected and prevented. |
When you share a tool or project on GitHub or social media, attaching a demo GIF or video makes a real difference in how well it lands. But the moment you try to convert a screen recording, you run into the usual friction: SaaS tools need a paid plan for fine-grained quality settings, and local GUI apps have a habit of breaking whenever the OS updates.
So I built a tool that runs ffmpeg inside Google Colab's free environment and converts screen recordings directly to GIF or MP4. Everything runs in the browser — no environment setup, no software to install.
What I Built
No setup required. Open the link below and run it directly in your browser.
⚡️ Run in Google Colab
Make Demo GIF / MP4 (English)
Click the link and run cells from top to bottom — that's all it takes.🐙 View the code on GitHub
hiroaki-com/colab-video-converter
Browse the source, leave a Star, or fork it.
Why I Built This
Adding a GIF to a GitHub README noticeably improves first impressions. Seeing something in motion immediately tells visitors what a project actually does, and I think it meaningfully increases the chance they keep reading.
That said, converting a screen recording to a GIF has always been quietly annoying.
SaaS tools are convenient because they run in the browser, but fine-grained quality control tends to sit behind a paid plan. Local GUI apps work until they don't — an OS update often breaks them, and keeping them working takes ongoing effort.
As an engineer, spending time learning a video conversion tool isn't the point, and paying a recurring subscription for it feels like overkill. I figured I might as well build something I could reuse however I want.
Google Colab can run ffmpeg for free, and building a UI directly in the notebook means settings are handled through a GUI without ever touching the command line. That was the idea behind this tool.
How to Use It
Run cells from top to bottom and interact with the UI as prompted. No Python code required. Each cell has # @title form mode applied, so running in Colab's form view gives a clean, focused interface.
1. Setup
Running the first cell installs ffmpeg and yt-dlp automatically. A _ffmpeg helper function is also defined here to make error output easier to handle. This only needs to be run once.
2. Upload Videos
A file selection dialog opens. Select the video file(s) you want to convert. Selecting multiple files enables merging.
Supported input formats: .mov .mp4 .avi .mkv .webm
3. Set Merge Order (Multiple Files Only)
If you uploaded multiple videos, use the dropdowns to specify the merge order. This step is skipped automatically for a single file.
4. Configure Format, Quality, and Speed
Use the radio buttons to choose an output format and quality level.
Choosing an Output Format
| Format | Characteristics | Best For | Not Ideal For |
|---|---|---|---|
| GIF | Auto-plays, loops, works in any Markdown renderer. 256 colors, larger file | GitHub README, Zenn / Qiita | Size-constrained environments |
| MP4 (H.264) | High quality, small file size, full color. Requires a media player. | X / Slack / GitHub README (click-to-play) | Scenarios requiring auto-play or looping |
GIF Quality Presets
| Preset | Width | FPS | Colors | Use Case |
|---|---|---|---|---|
| GitHub README | 960px | 15fps | 256 | README embedding (recommended) |
| SNS / Lightweight | 640px | 10fps | 128 | Minimize file size |
| High Quality | 1280px | 20fps | 256 | Quality-first output |
| Custom | any | any | any | Fine-grained control |
MP4 Quality Presets
| Preset | CRF | Notes |
|---|---|---|
| High Quality | 18 | Larger file size |
| Standard | 23 | Balanced (recommended) |
| Lightweight | 28 | Smaller file size |
| Custom | 0–51 | Fine-grained control |
Playback speed is also adjustable. Speeding up a long recording to 1.5x or 2.0x makes for a more compact demo.
5. Generate Preview
Run the cell to generate a preview. The preview is rendered at the same resolution and frame rate as the final output, so you can use it to identify exact timestamps for zoom configuration. To regenerate the preview after changing settings, adjust the format and quality settings and re-run this cell — no need to start from the beginning.
The preview is generated using the fast preset with no audio. Playback speed changes (via the setpts filter) are already applied at this stage.
6. Review Preview
The preview video appears inside the notebook with playback controls. Play and pause to find the exact timestamps where you want to apply zoom effects.
7. Zoom Settings (Optional)
You can add zoom effects to highlight specific moments in your demo.
How to use:
- Check "Add zoom" — the first zoom event is added automatically
- Play the embedded preview video and pause at the scene you want to zoom
- Click "📍 Record start position" to apply the current playback position to the selected event's start time
- Pause at the scene where you want the zoom to end, then click "🏁 Record end position" to set the end time. The hold duration (time spent at peak zoom) is calculated automatically from the in and out durations
- Each event's header can be collapsed by clicking it. Use the "Set as target event" button or the dropdown to switch which event you're recording to — the active event is highlighted with a green border
- Click "+ Add Zoom Event" to add more zoom points. Adding a new event collapses existing ones automatically, and the next event's start time is calculated from the end of the previous one
- Configure each event:
- Zoom area: select from the 3×3 grid (top-left, center, bottom-right, etc.)
- Max zoom level: peak magnification (1.1x – 5.0x)
- Start (sec): when the zoom begins (can be set via the 📍 button)
- In (sec): time to ramp from 1x to peak zoom
- Out (sec): time to ramp back down to 1x
- End (sec): when the zoom effect fully ends (can be set via the 🏁 button). Hold duration is auto-calculated as
End − Start − In − Out
Timeline example: with Start=5.0, In=0.3, Out=0.3, End=6.6:
- Zoom begins at 5.0 seconds
- Ramps from 1x to peak over 0.3 seconds (5.0 → 5.3 sec)
- Holds at peak for 1.0 second (5.3 → 6.3 sec) ← auto-calculated
- Ramps back to 1x over 0.3 seconds (6.3 → 6.6 sec)
- Normal playback resumes after 6.6 seconds
Tips:
- If zoom events overlap in time, an error is reported when the final output runs
- The 3×3 grid makes it straightforward to zoom in on a specific UI element
- Unchecking "Add zoom" disables all zoom effects without deleting your settings
Use cases:
- Draw attention to a button click in a UI walkthrough
- Highlight a code change in an editor
- Zoom in on specific data points in a dashboard demo
- Focus on a form field during an input demo
8. BGM Settings (Optional — MP4 Output Only)
When exporting as MP4, you can add background music sourced from a YouTube video.
How to use:
- Check "Add BGM" — the option is disabled when GIF is selected
- Enter a YouTube URL and click "Download". The tool fetches metadata first, then downloads the audio
- Preview the downloaded audio and adjust the following settings:
- Start position (sec): which point in the audio to begin playback from. Click "📍 Record start position" while the audio is playing to apply the current position
- ⚡ Speed: playback speed of the BGM itself (0.5x – 2.0x)
- 🔊 Volume: adjust from 0.0 to 1.0 — slider changes are reflected in the audio player in real time
- Run step ⑨ to generate the final output
Auto loop / cut: If the audio remaining from the start position (after speed adjustment) is shorter than the video, it loops automatically. If it's longer, it is cut at the point the video ends.
License detection: Metadata is analyzed at download time and a badge is displayed to indicate the likely copyright status:
| Badge | Condition |
|---|---|
| 🚫 Commercial track detected | artist / track metadata fields are present |
| 🚫 Official license notice found | Description contains "Licensed to YouTube by" |
| ✅ Likely royalty-free | Title or uploader name contains keywords like NCS, No Copyright, etc. |
| ✅ Creative Commons | License field contains a CC identifier |
| ⚠️ License unknown | None of the above conditions match |
In all cases, please verify usage rights yourself before publishing.
9. Final Output
Run the cell to generate the final output with all settings applied, including zoom effects and BGM. To adjust zoom and re-export, modify the zoom settings and re-run this cell.
If the resolution settings were changed after generating the preview, an error is raised here prompting you to regenerate the preview. This prevents mismatches between the preview and final output.
Final output behavior:
- GIF: full palette generation using the selected dithering method
- MP4 (no BGM): re-encoded with the slow preset for maximum compression efficiency (no audio)
- MP4 (with BGM): audio with speed and volume applied is merged in. Automatically loops or cuts to match the video length
10. Select Save Destination
Select a destination using the radio buttons.
| Destination | Description |
|---|---|
| Download locally | Saved via the browser download dialog |
| Save to Google Drive | Auto-copied to the specified path (includes mount step) |
| Both | Executes both options simultaneously |
11. Save
Run this cell to write the file to the selected destination. Separating destination selection and the save action makes it easier to review your choice before committing.
Technical Notes
A few implementation details worth highlighting.
-
High-quality GIF conversion
palettegen / paletteuse are FFmpeg's dedicated two-pass GIF filters.
palettegenanalyzes the entire video to generate an optimal 256-color palette, andpaletteuseapplies that palette when rendering each frame. Compared to single-pass conversion, color fidelity improves significantly.GIF output quality depends heavily on ffmpeg filter configuration. This tool uses a two-pass approach combining
palettegenandpaletteuse.[0:v] fps=15,scale=960:-1:flags=lanczos,split [a][b];
[a] palettegen=max_colors=256:stats_mode=full [p];
[b][p] paletteuse=dither=floyd_steinberg:diff_mode=rectangleThe
lanczosfilter improves resize quality, andfloyd_steinbergdithering produces smooth color gradients. Custom mode also supportsbayerdithering for a different file size vs. quality tradeoff. -
Automatic file size warning
os.path.getsize() is a function from Python's standard
oslibrary that returns the size of a given file in bytes. It requires no external dependencies, making it well-suited for a quick size check immediately after conversion.If a GIF exceeds 15 MB, the notebook displays a warning automatically with guidance on how to reduce it. GitHub's GIF size limit is 10 MB, so catching this before you try to embed it in a README is useful.
-
Playback speed adjustment
setpts (Set Presentation Timestamps) is an FFmpeg video filter that changes playback speed by rewriting the timestamp of each frame. Dividing by a factor greater than 1 speeds up playback — for example,
PTS/1.5produces 1.5x speed. When audio is present, theatempofilter must also be applied separately to match.Speed is controlled via the
setptsfilter. Bumping a long recording to 1.5x or 2.0x noticeably reduces file size. Since speed is applied during preview generation, the final output can reuse the preview footage directly.speed_filter = f'setpts=PTS/{speed}' # 1.5x speed → setpts=PTS/1.5 -
Merging multiple videos
FFmpeg concat demuxer is the mechanism for joining multiple video files sequentially. Combined with the
-c copyoption, it copies video and audio streams without re-encoding, resulting in fast, lossless merges.ffmpeg's concat muxer handles merging with
-c copy, skipping re-encoding entirely for fast results. The dropdown UI lets you specify merge order freely. Filenames containing special characters are escaped correctly. -
Separated preview and final output
FFmpeg
-presetoption controls the tradeoff between encoding speed and compression efficiency.fastencodes quickly at the cost of slightly larger files, whileslowtakes more time to achieve the best compression ratio. Using the appropriate preset at each stage means fast iteration during review and high-quality output at the end.The workflow is split into preview generation and final output. This lets you confirm exact timestamps on a full-quality preview before configuring zoom, and iterate on zoom settings without regenerating the base video. The final output step also checks that resolution settings haven't changed since the preview was generated, preventing silent configuration mismatches.
-
Zoom via FFmpeg's zoompan filter
zoompan is an FFmpeg video filter that produces dynamic pan-and-zoom effects by specifying zoom level (z), x position, and y position per frame using mathematical expressions. Built-in variables like
on(frame number) andiw/ih(input width / height) can be used inside these expressions, enabling complex timeline-driven zoom behavior.Zoom is implemented using ffmpeg's
zoompanfilter, which generates smooth, frame-accurate zoom animations. Each zoom event is translated into expressions that control zoom level, x position, and y position on a per-frame basis.One constraint with
zoompanis that thezvariable cannot be referenced inside thexandyexpressions. This was addressed by inliningz_exprdirectly into the coordinate expressions, which fixes a zoom-center drift issue present in earlier versions.# Example: zoom to center (area 5) at 2.0x peak
# Timeline: Start=5.0, In=0.3, Out=0.3, End=6.6 (Hold=1.0 auto-calculated)
zoompan=z='if(gte(on,150),if(lt(on,159),(1+(2.0-1)*(on-150)/9),\
if(lt(on,189),2.0,\
if(lt(on,198),(2.0-(2.0-1)*(on-189)/9),1))),1)':
x='if(between(on,150,198),floor(max(0,min(iw-iw/(z_expr),480-iw/(2*(z_expr))))),0)':
y='if(between(on,150,198),floor(max(0,min(ih-ih/(z_expr),270-ih/(2*(z_expr))))),0)':
d=1:s=960x540:fps=15The tool automatically:
- Converts timestamps to frame numbers based on the video's FPS
- Calculates zoom center coordinates from the 3×3 grid position
- Generates smooth interpolation curves for zoom-in and zoom-out transitions
- Sorts zoom events by start time and reports overlaps as errors before any output is generated
-
Start and end position recording buttons
google.colab.kernel.invokeFunction is a Google Colab-specific API that allows JavaScript running in cell output to call Python functions. Register a function with
colab_output.register_callback('name', fn), then invoke it from an HTMLonclickhandler usinggoogle.colab.kernel.invokeFunction('name', [args], {}).A preview video is embedded directly in the zoom settings cell. Pausing it at any point and clicking a button immediately applies the current playback position to the selected event. Both "📍 Record start position" and "🏁 Record end position" use
google.colab.kernel.invokeFunctionto invoke Python callbacks from JavaScript.def _record_position(time):
if _zoom_start_widgets and record_target_sel.value is not None:
_zoom_start_widgets[record_target_sel.value].value = round(float(time), 2)
def _record_end_position(time):
if _zoom_end_widgets and record_target_sel.value is not None:
_zoom_end_widgets[record_target_sel.value].value = round(float(time), 2)
colab_output.register_callback('record_position', _record_position)
colab_output.register_callback('record_end_position', _record_end_position) -
Collapsible event panels with highlight
ipywidgets is a library for rendering interactive UI widgets inside Jupyter notebooks and Google Colab.
VBoxandHBoxhandle layout composition, andLayoutobjects apply CSS-equivalent styling. Properties can be updated dynamically in Python, and the UI reflects changes immediately without any page reload.To keep multiple zoom events manageable, each event is wrapped in a collapsible container. Adding a new event automatically collapses existing ones, and the currently targeted event is highlighted with a green border. The "Set as target event" button switches the recording target to any event at any time.
-
BGM synthesis:
atempofilter and stream loopingatempo is an FFmpeg audio filter that changes playback speed without altering pitch (time-stretching). Its accepted range is 0.5x to 2.0x in a single pass; larger changes require chaining, e.g.
atempo=2.0,atempo=2.0.-stream_loop -1repeats an input stream indefinitely, and combined with-tto set a duration limit, it handles "loop if too short, cut if too long" in a single command without any branching logic.The BGM feature works in two stages. First, speed and volume adjustments are applied to the downloaded audio to produce an intermediate file. That file is then merged with the video.
# Apply speed and volume, write intermediate file
_ffmpeg(
'-ss', start_pos,
'-i', bgm_raw_path,
'-vn', '-af', f'atempo={speed},volume={vol}',
'-c:a', 'aac', '-b:a', '128k', '-ac', '2',
'bgm_segment.aac'
)
# -stream_loop -1 + -t {dur}: loop if short, cut if long — one command, no branching
_ffmpeg(
'-i', 'preview.mp4',
'-stream_loop', '-1', '-i', 'bgm_segment.aac',
'-t', vid_dur,
*VIDEO_ARGS, *AUDIO_ARGS,
OUTPUT_NAME
)Speed adjustment uses ffmpeg's
atempofilter. Becauseatempoonly accepts values between 0.5 and 2.0, the UI slider range is set to match. Looping and cutting are handled together using-stream_loop -1(infinite loop) and-t {video duration}(cut at that point), with no conditional logic needed. -
Real-time slider preview
Widget.observe() is an event listener provided by ipywidgets that monitors property changes on a widget and fires a callback when they occur. Specifying
names='value'restricts it to value changes only. Callingdisplay(Javascript(...))inside a callback injects JavaScript into the Colab cell output, enabling direct DOM manipulation.Moving the speed or volume slider is immediately reflected in the notebook's audio player. The
observecallback firesdisplay(Javascript(...))to update the player. A guard condition ensures JavaScript is only injected when the player actually exists, preventing unnecessary accumulation of cell output.def _on_bgm_speed_change(change):
_update_duration_hint()
if bgm_audio_ready:
display(Javascript(
f"var a=document.getElementById('bgm_audio_player');"
f"if(a) a.playbackRate={change['new']};"
))
bgm_speed_slider.observe(_on_bgm_speed_change, names='value')The audio player's initial values are applied using the
oncanplayevent, which fires once the audio is ready to play. An_initflag prevents the initialization block from running more than once.<audio id="bgm_audio_player" controls
oncanplay="if(!this._init){
this.volume={vol};
this.playbackRate={speed};
this._init=true;
}"> -
License auto-detection via metadata
yt-dlp
--dump-jsonwrites a video's metadata as JSON to standard output without downloading the actual file. It's fast and returns a rich set of fields includingtitle,uploader,artist,track,license, anddescription.Metadata fetched via yt-dlp's
--dump-jsonis analyzed to assess copyright risk in order of specificity: presence ofartist/trackfields (set by YouTube for commercially distributed music), the string "Licensed to YouTube by" in the description, and keyword matching against the title and uploader name for royalty-free signals (NCS, nocopyright, etc.). Since metadata alone isn't a reliable indicator, the result is displayed as a reference badge rather than a definitive determination. -
Efficient preview workflow
FFmpeg
-presetoption (libx264) offers a range fromultrafasttoveryslow, trading encoding speed for compression efficiency. Faster presets use less CPU but produce somewhat larger files; slower presets spend more time to achieve higher compression. The difference in file size betweenslowandfastat the same CRF value is noticeable.Preview generation uses the
fastpreset, keeping iteration quick. The final output uses theslowpreset for maximum compression efficiency. When no zoom is configured, the final output re-encodes the preview footage directly, so it completes quickly.
Platform Size Limits
Limits vary by platform. Zenn's 3 MB cap is particularly tight — the SNS / Lightweight preset (640px / 10fps / 128 colors) combined with 1.5x playback speed is a realistic target for clearing it.
| Platform | Supported Formats | Size Limit |
|---|---|---|
| GitHub README | GIF / MP4 / MOV | 100 MB (video) / 10 MB (GIF) |
| Zenn | GIF | 3 MB |
| Qiita | GIF / MP4 | 100 MB |
| X (standard) | MP4 / MOV | 512 MB |
| Slack | GIF / MP4 / MOV and more | 1 GB |
Closing
Not wanting to pay a subscription for a conversion tool, and not wanting to install extra software locally — those two things together were enough motivation to build this myself.
Packaging it as a Google Colab notebook means it runs anywhere with a browser, with all settings handled through a GUI. This update adds BGM support: paste a YouTube URL, download the audio, and adjust start position, speed, and volume with sliders before merging it into the final MP4. Slider changes reflect in the audio player in real time, so you can hear the balance while you're setting it. Personally, being able to add a bit of music to a demo video without leaving the notebook has made the whole workflow feel a lot more self-contained.
If you've ever found yourself thinking "converting demo videos is such a chore," "I wish I could highlight that one interaction more clearly," or "I want to add some background music to make this demo more engaging," I hope this is useful.