Y-Detective aims at providing automated, scalable, fast, and parallelized object detection and image selection functionalities for a variety of scenarios, such as media production, customer supoort, crime detection, and so on. Combining computer vision and natural language processing techniques, the program is able to understand both the contents of images and the user's requirements or descriptions in words. As mentioned, the program contains two main parts: 1. Finding multiple objects from an image and create automatic crops based on user requirements, and 2. Selecting the images from a large set (10~100 images) matching user's descriptions most closely.