{"doi":"10.1109/tpami.2015.2389824","title":"Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition","abstract":"Existing deep convolutional neural networks (CNNs) require a fixed-size (e.g., 224 × 224) input image. This requirement is \"artificial\" and may reduce the recognition accuracy for the images or sub-images of an arbitrary size/scale. In this work, we equip the networks with another pooling strategy, \"spatial pyramid pooling\", to eliminate the above requirement. The new network structure, called SPP-net, can generate a fixed-length representation regardless of image size/scale. Pyramid pooling is also robust to object deformations. With these advantages, SPP-net should in general improve all CNN-based image classification methods. On the ImageNet 2012 dataset, we demonstrate that SPP-net boosts the accuracy of a variety of CNN architectures despite their different designs. On the Pascal VOC 2007 and Caltech101 datasets, SPP-net achieves state-of-the-art classification results using a single full-image representation and no fine-tuning. The power of SPP-net is also significant in object detection. Using SPP-net, we compute the feature maps from the entire image only once, and then pool features in arbitrary regions (sub-images) to generate fixed-length representations for training the detectors. This method avoids repeatedly computing the convolutional features. In processing test images, our method is 24-102 × faster than the R-CNN method, while achieving better or comparable accuracy on Pascal VOC 2007. In ImageNet Large Scale Visual Recognition Challenge (ILSVRC) 2014, our methods rank #2 in object detection and #3 in image classification among all 38 teams. This manuscript also introduces the improvement made for this competition.","journal":"IEEE Transactions on Pattern Analysis and Machine Intelligence","year":2015,"id":12200,"datarank":16.94904109297095,"base_score":9.331406899955626,"endowment":9.331406899955626,"self_citation_contribution":1.3997110349933441,"citation_network_contribution":15.549330057977608,"self_endowment_contribution":1.3997110349933441,"citer_contribution":15.549330057977608,"corpus_percentile":94.6,"corpus_rank":834,"citation_count":11286,"citer_count":198,"citers_with_citation_signal":198,"citers_with_endowment":198,"datacite_reuse_total":0,"is_dataset":false,"is_oa":false,"file_count":0,"downloads":0,"has_version_chain":false,"published_date":"2015-09-01","authors":[{"id":96715,"name":"Xiangyu Zhang","orcid":"0000-0003-2138-4608","position":1,"is_corresponding":false},{"id":18664,"name":"Shaoqing Ren","orcid":null,"position":2,"is_corresponding":false},{"id":18665,"name":"Jian Sun","orcid":"0000-0001-6270-2698","position":3,"is_corresponding":false},{"id":11162,"name":"Kaiming He","orcid":"0000-0001-7318-9658","position":0,"is_corresponding":true}],"reference_count":40,"raw_metadata":{"citation_network_status":"fetched"},"created_at":"2026-03-01T18:20:47.508186Z","pmid":null,"pmcid":null,"fwci":null,"citation_percentile":null,"influential_citations":0,"oa_status":null,"license":null,"views":0,"total_file_size_bytes":0,"version_count":0,"clinical_trials":[],"software_tools":[],"db_accessions":[],"linked_datasets":[],"topics":[]}