The Behance Artistic Media Dataset

What's Inside
  • Automatically-labeled binary attribute scores for over 2.5 million images across 20 attributes each
  • 393,000 crowdsourced binary attribute labels for individual images
  • Short image descriptions/captions for 74,000 images from the crowd
  • Image URLs for all images mentioned above
Collaborators
  • Michael Wilber, m...@cornell.edu
  • Chen Fang,
  • Hailin Jin,
  • Aaron Hertzmann,
  • John Collomosse,
  • Serge Belongie
Institutions

What kind of images does Behance-Artistic-Media have?

Our dataset is built from Behance, a portfolio website for professional and commercial artists. Behance contains over ten million projects and 65 million images.

Artwork on Behance spans many fields, such as sculpture, painting, photography, graphic design, graffiti, illustration, and advertising. Graphic design and advertising make up roughly one third of Behance. Photography, drawings, and illustrations make up roughly another third. This artwork is posted by professional artists to show off samples of their best work.

Here is a sample of the top-scoring images for your chosen attributes. At our quality threshold, you should expect 90% precision from these results. This should give you a sample of results that you can expect.





What space do artistic images span?

We consider a subset of 1,000 images that score high on each attribute. We then take the final layer off of a pre-trained ResNet and embed these images into 512-dimensional feature space. Finally, we use t-SNE to project these features down to two dimensions.

The resulting embedding is shown below. These images are typically arranged into discrete clusters that capture content and media. Cluster 1 shows watercolor images; Cluster 2 is comprised of oil paintings; Cluster 3 has vector art; Cluster 4 contains gloomy photographs of abandoned buildings or lonely landscapes; Cluster 5 shows various pen and pencil sketches.




You can explore this embedding above. Click+drag to pan, scroll to zoom. Note that this visualization could take several seconds to load.

Computer vision systems are designed to work well within the context of everyday photography. However, artists often render the world around them in ways that do not resemble photographs. Artwork produced by people is not constrained to mimic the physical world, making it more challenging for machines to recognize.

This work is a step toward teaching machines how to categorize images in ways that are valuable to humans. We collect a large-scale dataset of contemporary artwork from Behance, a website containing millions of portfolios from professional and commercial artists. We annotate Behance imagery with rich attribute labels for content, emotions, and artistic media. We believe our Behance Artistic Media dataset will be a good starting point for researchers wishing to study artistic imagery and relevant problems.

Our dataset is built from Behance, a portfolio website for professional and commercial artists. Behance contains over ten million projects and 65 million images.

Artwork on Behance spans many fields, such as sculpture, painting, photography, graphic design, graffiti, illustration, and advertising. Graphic design and advertising make up roughly one third of Behance. Photography, drawings, and illustrations make up roughly another third. This artwork is posted by professional artists to show off samples of their best work.

Our dataset requires some level of human expertise to label, but it is too costly to collect labels for all images. To address this issue, we use a hybrid human-in-the-loop strategy to incrementally learn a binary classifier for each attribute. Our hybrid annotation strategy is based on the LSUN dataset annotation pipeline.

At each step, humans label the most informative samples in the dataset with a single binary attribute label. The resulting labels are added to each classifier's training set to improve its discrimination. The classifier then ranks more images, and the most informative images are sent to the crowd for the next iteration. After four iterations, the final classifier re-scores the entire dataset and images that surpass a certain score threshold are assumed to be positive. This final threshold is chosen to meet certain precision and recall targets on a held-out validation set. This entire process is repeated for each attribute we wish to collect.

Quality guarantees

As a quality check, we tested whether the final labeling set meets our desired quality target of 90% precision. For each attribute, we show annotators 100 images from the final automatically-labeled positive set and 100 images from the final negative set using the same interface used to collect the dataset. The mean precision across all attributes is 90.4%, where precision is the number of positive images where at least one annotator indicates the image should be positive.

These checks are in addition to our MTurk quality checks: we only use human labels where two workers agree and we only accept work from turkers with a high reputation who have completed 10,000 tasks at 95% acceptance.

We are making all of our crowd annotations and captions available for download along with a subset of automatically-labeled images.

To download this dataset, click below to request an account. You can then sign in and download the dataset after completing some annotations for the next version.

Sign in to download
Request an account
To download the dataset, please help us annotate the next version! We are collecting a larger set of object categories and need your help to define them. It's also a fun way to get acclimated to the dataset.

Read the paper

Read the paper on ArXiv Read the ICCV version on the CV Foundation Website

BibTeX

If you use our dataset, please cite the ICCV version:
@InProceedings{Wilber_2017_ICCV,
    author = {Wilber, Michael J. and Fang, Chen and Jin, Hailin and Hertzmann, Aaron and Collomosse, John and Belongie, Serge},
    title = {BAM! The Behance Artistic Media Dataset for Recognition Beyond Photography},
    booktitle = {The IEEE International Conference on Computer Vision (ICCV)},
    month = {Oct},
    year = {2017}
}