We introduce a simple yet expressive image generation method. On the one hand, it does not require the user to paint the masks or define a bounding box of the various objects, since the model does it by itself. On the other hand, it supports defining a coarse location and size of each object. Based on this, we offer a simple, interactive GUI, that allows a layman user to generate diverse images effortlessly. From a technical perspective, we introduce a dual embedding of layout and appearance. In this scheme, the location, size, and appearance of an object can change independently of each other. This way, the model is able to generate innumerable images per scene graph, to better express the intention of the user. In comparison to previous work, we also offer better quality and higher resolution outputs. This is due to a superior architecture, which is based on a novel set of discriminators. Those discriminators better constrain the shape of the generated mask, as well as capturing the appearance encoding in a counterfactual way. Our code is publicly available at https://www.github.com/ ashual/scene generation.