Windows Control
Full Windows desktop control. Mouse, keyboard, screenshots - interact with any Windows application like a human.
Full Windows desktop control. Mouse, keyboard, screenshots - interact with any Windows application like a human.
Real data. Real impact.
Emerging
Developers
Per week
Open source
Skills give you superpowers. Install in 30 seconds.
Full desktop automation for Windows. Control mouse, keyboard, and screen like a human user.
All scripts are in skills/windows-control/scripts/
py screenshot.py > output.b64
Returns base64 PNG of entire screen.
py click.py 500 300 # Left click at (500, 300) py click.py 500 300 right # Right click py click.py 500 300 left 2 # Double click
py type_text.py "Hello World"
Types text at current cursor position (10ms between keys).
py key_press.py "enter" py key_press.py "ctrl+s" py key_press.py "alt+tab" py key_press.py "ctrl+shift+esc"
py mouse_move.py 500 300
Moves mouse to coordinates (smooth 0.2s animation).
py scroll.py up 5 # Scroll up 5 notches py scroll.py down 10 # Scroll down 10 notches
py focus_window.py "Chrome" # Bring window to front py minimize_window.py "Notepad" # Minimize window py maximize_window.py "VS Code" # Maximize window py close_window.py "Calculator" # Close window py get_active_window.py # Get title of active window
# Click by text (No coordinates needed!) py click_text.py "Save" # Click "Save" button anywhere py click_text.py "Submit" "Chrome" # Click "Submit" in Chrome only # Drag and Drop py drag.py 100 100 500 300 # Drag from (100,100) to (500,300) # Robust Automation (Wait/Find) py wait_for_text.py "Ready" "App" 30 # Wait up to 30s for text py wait_for_window.py "Notepad" 10 # Wait for window to appear py find_text.py "Login" "Chrome" # Get coordinates of text py list_windows.py # List all open windows
py read_window.py "Notepad" # Read all text from Notepad py read_window.py "Visual Studio" # Read text from VS Code py read_window.py "Chrome" # Read text from browser
Uses Windows UI Automation to extract actual text (not OCR). Much faster and more accurate than screenshots!
py read_ui_elements.py "Chrome" # All interactive elements py read_ui_elements.py "Chrome" --buttons-only # Just buttons py read_ui_elements.py "Chrome" --links-only # Just links py read_ui_elements.py "Chrome" --json # JSON output
Returns buttons, links, tabs, checkboxes, dropdowns with coordinates for clicking.
py read_webpage.py # Read active browser py read_webpage.py "Chrome" # Target Chrome specifically py read_webpage.py "Chrome" --buttons # Include buttons py read_webpage.py "Chrome" --links # Include links with coords py read_webpage.py "Chrome" --full # All elements (inputs, images) py read_webpage.py "Chrome" --json # JSON output
Enhanced browser content extraction with headings, text, buttons, and links.
# List all open dialogs py handle_dialog.py list # Read current dialog content py handle_dialog.py read py handle_dialog.py read --json # Click button in dialog py handle_dialog.py click "OK" py handle_dialog.py click "Save" py handle_dialog.py click "Yes" # Type into dialog text field py handle_dialog.py type "myfile.txt" py handle_dialog.py type "C:\path\to\file" --field 0 # Dismiss dialog (auto-finds OK/Close/Cancel) py handle_dialog.py dismiss # Wait for dialog to appear py handle_dialog.py wait --timeout 10 py handle_dialog.py wait "Save As" --timeout 5
Handles Save/Open dialogs, message boxes, alerts, confirmations, etc.
py click_element.py "Save" # Click "Save" anywhere py click_element.py "OK" --window "Notepad" # In specific window py click_element.py "Submit" --type Button # Only buttons py click_element.py "File" --type MenuItem # Menu items py click_element.py --list # List clickable elements py click_element.py --list --window "Chrome" # List in specific window
Click buttons, links, menu items by name without needing coordinates.
py read_region.py 100 100 500 300 # Read text from coordinates
Note: Requires Tesseract OCR installation. Use read_window.py instead for better results.
# Press Windows key py key_press.py "win" # Type "notepad" py type_text.py "notepad" # Press Enter py key_press.py "enter" # Wait a moment, then type py type_text.py "Hello from AI!" # Save py key_press.py "ctrl+s"
# Read current VS Code content py read_window.py "Visual Studio Code" # Click at specific location (e.g., file explorer) py click.py 50 100 # Type filename py type_text.py "test.js" # Press Enter py key_press.py "enter" # Verify new file opened py read_window.py "Visual Studio Code"
# Read current content py read_window.py "Notepad" # User types something... # Read updated content (no screenshot needed!) py read_window.py "Notepad"
Method 1: Windows UI Automation (BEST)
Method 2: Click by Name (NEW)
Method 3: Dialog Handling (NEW)
Method 4: Screenshot + Vision (Fallback)
Method 5: OCR (Optional)
Status: ✅ READY FOR USE (v2.0 - Dialog & UI Elements) Created: 2026-02-01 Updated: 2026-02-02
No automatic installation available. Please visit the source repository for installation instructions.
View Installation Instructions1,500+ AI skills, agents & workflows. Install in 30 seconds. Part of the Torly.ai family.
© 2026 Torly.ai. All rights reserved.