Quantcast
Jump to content

Game Review: Mirages of Winter


STF News
 Share

Recommended Posts

2021-03-02-02-bannerv2.jpg

Welcome to a new game review series for developers. We will start showcasing and analyzing different games published in the Samsung Galaxy Store, and note the best practices that successful developers have followed to publish and promote their games.

We will start reviewing the game Mirages of Winter that you can download in your mobile devices here.

In the video you will be able to see more about the game’s unique style, gameplay, and how it took advantage of the listing in Galaxy Store to maximize the engagement with the players.

Mirari games, the studio behind this title, delivered a premium game that creates an experience that goes beyond gaming. It is highly recommended to anyone that wants to escape to a magical land for a little while.

Let’s connect

We love to chat with mobile and wearable developers to help as you develop, publish, and market apps, themes and watch faces.

If you have a published game in Galaxy Store, we would love to help you get discovered.

Follow Up

This site has many resources for developers looking to build for and integrate with Samsung devices and services. Stay in touch with the latest news by creating a free account or by subscribing to our monthly newsletter. Visit the Marketing Resources page for information on promoting and distributing your apps. Finally, our developer forum is an excellent way to stay up-to-date on all things related to the Galaxy ecosystem.

View the full blog at its source

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
 Share

  • Similar Topics

    • By STF News
      View the full blog at its source
    • By STF News
      The Samsung Developers team works with many companies in the mobile and gaming ecosystems. We're excited to support our partner, Arm, as they bring timely and relevant content to developers looking to build games and high-performance experiences. This Vulkan Extensions series will help developers get the most out of the new and game-changing Vulkan extensions on Samsung mobile devices.
      In previous blogs, we have already explored two key Vulkan extension game changers that will be enabled by Android R. These are Descriptor Indexing and Buffer Device Address. In this blog, we explore the third and final game changer, which is 'Timeline Semaphores'.
      The introduction of timeline semaphores is a large improvement to the synchronization model of Vulkan and is a required feature in Vulkan 1.2. It solves some fundamental grievances with the existing synchronization APIs in Vulkan.
      The problems with VkFence and VkSemaphore
      In earlier Vulkan extensions, there are two distinct synchronization objects for dealing with CPU <-> GPU synchronization and GPU queue <-> GPU queue synchronization.
      The VkFence object only deals with GPU -> CPU synchronization. Due to the explicit nature of Vulkan, you must keep track of when the GPU completes the work you submit to it.
      vkQueueSubmit(queue, …, fence); The previous code is the way we would use a fence, and later this fence can be waited on. When the fence signals, we know it is safe to free resources, read back data written by GPU, and so on. Overall, the VkFence interface was never a real problem in practice, except that it feels strange to have two entirely different API objects which essentially do the same thing.
      VkSemaphore on the other hand has some quirks which makes it difficult to use properly in sophisticated applications. VkSemaphore by default is a binary semaphore. The fundamental problem with binary semaphores is that we can only wait for a semaphore once. After we have waited for it, it automatically becomes unsignaled again. This binary nature is very annoying to deal with when we use multiple queues. For example, consider a scenario where we perform some work in the graphics queue, and want to synchronize that work with two different compute queues. If we know this scenario is coming up, we will then have to allocate two VkSemaphore objects, signal both objects, and wait for each of them in the different compute queues. This works, but we might not have the knowledge up front that this scenario will play out. Often where we are dealing with multiple queues, we have to be somewhat conservative and signal semaphore objects we never end up waiting for. This leads to another problem …
      A signaled semaphore, which is never waited for, is basically a dead and useless semaphore and should be destroyed. We cannot reset a VkSemaphore object on the CPU, so we cannot ever signal it again if we want to recycle VkSemaphore objects. A workaround would be to wait for the semaphore on the GPU in a random queue just to unsignal it, but this feels like a gross hack. It could also potentially cause performance issues, as waiting for a semaphore is a full GPU memory barrier.
      Object bloat is another considerable pitfall of the existing APIs. For every synchronization point we need, we require a new object. All these objects must be managed, and their lifetimes must be considered. This creates a lot of annoying “bloat” for engines.
      The timeline – fixing object bloat – fixing multiple waits
      The first observation we can make of a Vulkan queue is that submissions should generally complete in-order. To signal a synchronization object in vkQueueSubmit, the GPU waits for all previously submitted work to the queue, which includes the signaling operation of previous synchronization objects. Rather than assigning one object per submission, we synchronize in terms of number of submissions. A plain uint64_t counter can be used for each queue. When a submission completes, the number is monotonically increased, usually by one each time. This counter is contained inside a single timeline semaphore object. Rather than waiting for a specific synchronization object which matches a particular submission, we could wait for a single object and specify “wait until graphics queue submission #157 completes.”
      We can wait for any value multiple times as we wish, so there is no binary semaphore problem. Essentially, for each VkQueue we can create a single timeline semaphore on startup and leave it alone (uint64_t will not overflow until the heat death of the sun, do not worry about it). This is extremely convenient and makes it so much easier to implement complicated dependency management schemes.
      Unifying VkFence and VkSemaphore
      Timeline semaphores can be used very effectively on CPU as well:
      VkSemaphoreWaitInfoKHR info = { VK_STRUCTURE_TYPE_SEMAPHORE_WAIT_INFO_KHR }; info.semaphoreCount = 1; info.pSemaphores = &semaphore; info.pValues = &value; vkWaitSemaphoresKHR(device, &info, timeout); This completely removes the need to use VkFence. Another advantage of this method is that multiple threads can wait for a timeline semaphore. With VkFence, only one thread could access a VkFence at any one time.
      A timeline semaphore can even be signaled from the CPU as well, although this feature feels somewhat niche. It allows use cases where you submit work to the GPU early, but then 'kick' the submission using vkSignalSemaphoreKHR. The accompanying sample demonstrates a particular scenario where this function might be useful:
      VkSemaphoreSignalInfoKHR info = { VK_STRUCTURE_TYPE_SEMAPHORE_SIGNAL_INFO_KHR }; info.semaphore = semaphore; info.value = value; vkSignalSemaphoreKHR(device, &info); Creating a timeline semaphore
      When creating a semaphore, you can specify the type of semaphore and give it an initial value:
      VkSemaphoreCreateInfo info = { VK_STRUCTURE_TYPE_SEMAPHORE_CREATE_INFO }; VkSemaphoreTypeCreateInfoKHR type_info = { VK_STRUCTURE_TYPE_SEMAPHORE_TYPE_CREATE_INFO_KHR }; type_info.semaphoreType = VK_SEMAPHORE_TYPE_TIMELINE_KHR; type_info.initialValue = 0; info.pNext = &type_info; vkCreateSemaphore(device, &info, NULL, &semaphore);
      Signaling and waiting on timeline semaphores
      When submitting work with vkQueueSubmit, you can chain another struct which provides counter values when using timeline semaphores, for example:
      VkSubmitInfo submit = { VK_STRUCTURE_TYPE_SUBMIT_INFO }; submit.waitSemaphoreCount = 1; submit.pWaitSemaphores = &compute_queue_semaphore; submit.pWaitDstStageMask = &wait_stage; submit.commandBufferCount = 1; submit.pCommandBuffers = &cmd; submit.signalSemaphoreCount = 1; submit.pSignalSemaphores = &graphics_queue_semaphore; VkTimelineSemaphoreSubmitInfoKHR timeline = { VK_STRUCTURE_TYPE_TIMELINE_SEMAPHORE_SUBMIT_INFO_KHR }; timeline.waitSemaphoreValueCount = 1; timeline.pWaitSemaphoreValues = &wait_value; timeline.signalSemaphoreValueCount = 1; timeline.pSignalSemaphoreValues = &signal_value; submit.pNext = &timeline; signal_value++; // Generally, you bump the timeline value once per submission. vkQueueSubmit(queue, 1, &submit, VK_NULL_HANDLE); Out of order signal and wait
      A strong requirement of Vulkan binary semaphores is that signals must be submitted before a wait on a semaphore can be submitted. This makes it easy to guarantee that deadlocks do not occur on the GPU, but it is also somewhat inflexible. In an application with many Vulkan queues and a task-based architecture, it is reasonable to submit work that is somewhat out of order. However, this still uses synchronization objects to ensure the right ordering when executing on the GPU. With timeline semaphores, the application can agree on the timeline values to use ahead of time, then go ahead and build commands and submit out of order. The driver is responsible for figuring out the submission order required to make it work. However, the application gets more ways to shoot itself in the foot with this approach. This is because it is possible to create a deadlock with multiple queues where queue A waits for queue B, and queue B waits for queue A at the same time.
      Ease of porting
      It is no secret that timeline semaphores are inherited largely from D3D12’s fence objects. From a portability angle, timeline semaphores make it much easier to have compatibility across the APIs.
      Caveats
      As the specification stands right now, you cannot use timeline semaphores with swap chains. This is generally not a big problem as synchronization with the swap chain tends to be explicit operations renderers need to take care of.
      Another potential caveat to consider is that the timeline semaphore might not have a direct kernel equivalent on current platforms, which means some extra emulation to handle it, especially the out-of-order submission feature. As the timeline synchronization model becomes the de-facto standard, I expect platforms to get more native support for it.
      Conclusion
      All three key Vulkan extension game changers improve the overall development and gaming experience through improving graphics and enabling new gaming use cases. We hope that we gave you enough samples to get you started as you try out these new Vulkan extensions to help bring your games to life
      Follow Up
      Thanks to Hans-Kristian Arntzen and the team at Arm for bringing this great content to the Samsung Developers community. We hope you find this information about Vulkan extensions useful for developing your upcoming mobile games.
      The Samsung Developers site has many resources for developers looking to build for and integrate with Samsung devices and services. Stay in touch with the latest news by creating a free account or by subscribing to our monthly newsletter. Visit the Marketing Resources page for information on promoting and distributing your apps and games. Finally, our developer forum is an excellent way to stay up-to-date on all things related to the Galaxy ecosystem.
      View the full blog at its source
    • By STF News
      The Samsung Developers team works with many companies in the mobile and gaming ecosystems. We're excited to support our partner, Arm, as they bring timely and relevant content to developers looking to build games and high-performance experiences. This Vulkan Extensions series will help developers get the most out of the new and game-changing Vulkan extensions on Samsung mobile devices.
      Android R is enabling a host of useful Vulkan extensions for mobile, with three being key 'game changers'. These are set to improve the state of graphics APIs for modern applications, enabling new use cases and changing how developers can design graphics renderers going forward. You can expect to see these features across a variety of Android smartphones, such as the new Samsung Galaxy S21, and existing Samsung Galaxy S models like the Samsung Galaxy S20. The first blog explored the first game changer extension for Vulkan – ‘Descriptor Indexing'. This blog explores the second game changer extension – ‘Buffer Device Address.’
      VK_KHR_buffer_device_address
      VK_KHR_buffer_device_address is a monumental extension that adds a unique feature to Vulkan that none of the competing graphics APIs support.
      Pointer support is something that has always been limited in graphics APIs, for good reason. Pointers complicate a lot of things, especially for shader compilers. It is also near impossible to deal with plain pointers in legacy graphics APIs, which rely on implicit synchronization.
      There are two key aspects to buffer_device_address (BDA). First, it is possible to query a GPU virtual address from a VkBuffer. This is a plain uint64_t. This address can be written anywhere you like, in uniform buffers, push constants, or storage buffers, to name a few.
      The key aspect which makes this extension unique is that a SPIR-V shader can load an address from a buffer and treat it as a pointer to storage buffer memory immediately. Pointer casting, pointer arithmetic and all sorts of clever trickery can be done inside the shader. There are many use cases for this feature. Some are performance-related, and some are new use cases that have not been possible before.
      Getting the GPU virtual address (VA)
      There are some hoops to jump through here. First, when allocating VkDeviceMemory, we must flag that the memory supports BDA:
      VkMemoryAllocateInfo info = {…}; VkMemoryAllocateFlagsInfo flags = {…}; flags.flags = VK_MEMORY_ALLOCATE_DEVICE_ADDRESS_BIT_KHR; vkAllocateMemory(device, &info, NULL, &memory); Similarly, when creating a VkBuffer, we add the VK_BUFFER_USAGE_SHADER_DEVICE_ADDRESS_BIT_KHR usage flag. Once we have created a buffer, we can query the VA:
      VkBufferDeviceAddressInfoKHR info = {…}; info.buffer = buffer; VkDeviceSize va = vkGetBufferDeviceAddressKHR(device, &info); From here, this 64-bit value can be placed in a buffer. You can of course offset this VA. Alignment is never an issue as shaders specify explicit alignment later.
      A note on debugging
      When using BDA, there are some extra features that drivers must support. Since a pointer does not necessarily exist when replaying an application capture in a debug tool, the driver must be able to guarantee that virtual addresses returned by the driver remain stable across runs. To that end, debug tools supply the expected VA and the driver allocates that VA range. Applications do not care that much about this, but it is important to note that even if you can use BDA, you might not be able to debug with it.
      typedef struct VkPhysicalDeviceBufferDeviceAddressFeatures { VkStructureType sType; void* pNext; VkBool32 bufferDeviceAddress; VkBool32 bufferDeviceAddressCaptureReplay; VkBool32 bufferDeviceAddressMultiDevice; } VkPhysicalDeviceBufferDeviceAddressFeatures; If bufferDeviceAddressCaptureReplay is supported, tools like RenderDoc can support BDA.
      Using a pointer in a shader
      In Vulkan GLSL, there is the GL_EXT_buffer_reference extension which allows us to declare a pointer type. A pointer like this can be placed in a buffer, or we can convert to and from integers:
      #version 450 #extension GL_EXT_buffer_reference : require #extension GL_EXT_buffer_reference_uvec2 : require layout(local_size_x = 64) in; // These define pointer types. layout(buffer_reference, std430, buffer_reference_align = 16) readonly buffer ReadVec4 { vec4 values[]; }; layout(buffer_reference, std430, buffer_reference_align = 16) writeonly buffer WriteVec4 { vec4 values[]; }; layout(buffer_reference, std430, buffer_reference_align = 4) readonly buffer UnalignedVec4 { vec4 value; }; layout(push_constant, std430) uniform Registers { ReadVec4 src; WriteVec4 dst; } registers; Placing raw pointers in push constants avoids all indirection for getting to a buffer. If the driver allows it, the pointers can be placed directly in GPU registers before the shader begins executing.
      Not all devices support 64-bit integers, but it is possible to cast uvec2 <-> pointer. Doing address computation like this is fine.
      uvec2 uadd_64_32(uvec2 addr, uint offset) { uint carry; addr.x = uaddCarry(addr.x, offset, carry); addr.y += carry; return addr; } void main() { uint index = gl_GlobalInvocationID.x; registers.dst.values[index] = registers.src.values[index]; uvec2 addr = uvec2(registers.src); addr = uadd_64_32(addr, 20 * index); Cast a uvec2 to address and load a vec4 from it. This address is aligned to 4 bytes.
      registers.dst.values[index + 1024] = UnalignedVec4(addr).value; } Pointer or offsets?
      Using raw pointers is not always the best idea. A natural use case you could consider for pointers is that you have tree structures or list structures in GPU memory. With pointers, you can jump around as much as you want, and even write new pointers to buffers. However, a pointer is 64-bit and a typical performance consideration is to use 32-bit offsets (or even 16-bit offsets) if possible. Using offsets is the way to go if you can guarantee that all buffers live inside a single VkBuffer. On the other hand, the pointer approach can access any VkBuffer at any time without having to use descriptors. Therein lies the key strength of BDA.
      Extreme hackery: physical pointer as specialization constants
      This is a life saver in certain situations where you are desperate to debug something without any available descriptor set.
      A black magic hack is to place a BDA inside a specialization constant. This allows for accessing a pointer without using any descriptors. Do note that this breaks all forms of pipeline caching and is only suitable for debug code. Do not ship this kind of code. Perform this dark sorcery at your own risk:
      #version 450 #extension GL_EXT_buffer_reference : require #extension GL_EXT_buffer_reference_uvec2 : require layout(local_size_x = 64) in; layout(constant_id = 0) const uint DEBUG_ADDR_LO = 0; layout(constant_id = 1) const uint DEBUG_ADDR_HI = 0; layout(buffer_reference, std430, buffer_reference_align = 4) buffer DebugCounter { uint value; }; void main() { DebugCounter counter = DebugCounter(uvec2(DEBUG_ADDR_LO, DEBUG_ADDR_HI)); atomicAdd(counter.value, 1u); } Emitting SPIR-V with buffer_device_address
      In SPIR-V, there are some things to note. BDA is an especially useful feature for layering other APIs due to its extreme flexibility in how we access memory. Therefore, generating BDA code yourself is a reasonable use case to assume as well.
      Enables BDA in shaders.
      _OpCapability PhysicalStorageBufferAddresses OpExtension "SPV_KHR_physical_storage_buffer"_ The memory model is PhysicalStorageBuffer64 and not logical anymore.
      _OpMemoryModel PhysicalStorageBuffer64 GLSL450_ The buffer reference types are declared basically just like SSBOs.
      _OpDecorate %_runtimearr_v4float ArrayStride 16 OpMemberDecorate %ReadVec4 0 NonWritable OpMemberDecorate %ReadVec4 0 Offset 0 OpDecorate %ReadVec4 Block OpDecorate %_runtimearr_v4float_0 ArrayStride 16 OpMemberDecorate %WriteVec4 0 NonReadable OpMemberDecorate %WriteVec4 0 Offset 0 OpDecorate %WriteVec4 Block OpMemberDecorate %UnalignedVec4 0 NonWritable OpMemberDecorate %UnalignedVec4 0 Offset 0 OpDecorate %UnalignedVec4 Block_ Declare a pointer to the blocks. PhysicalStorageBuffer is the storage class to use.
      OpTypeForwardPointer %_ptr_PhysicalStorageBuffer_WriteVec4 PhysicalStorageBuffer %_ptr_PhysicalStorageBuffer_ReadVec4 = OpTypePointer PhysicalStorageBuffer %ReadVec4 %_ptr_PhysicalStorageBuffer_WriteVec4 = OpTypePointer PhysicalStorageBuffer %WriteVec4 %_ptr_PhysicalStorageBuffer_UnalignedVec4 = OpTypePointer PhysicalStorageBuffer %UnalignedVec4 Load a physical pointer from PushConstant.
      _%55 = OpAccessChain %_ptr_PushConstant__ptr_PhysicalStorageBuffer_WriteVec4 %registers %int_1 %56 = OpLoad %_ptr_PhysicalStorageBuffer_WriteVec4 %55_ Access chain into it.
      _%66 = OpAccessChain %_ptr_PhysicalStorageBuffer_v4float %56 %int_0 %40_ Aligned must be specified when dereferencing physical pointers. Pointers can have any arbitrary address and must be explicitly aligned, so the compiler knows what to do.
      OpStore %66 %65 Aligned 16 For pointers, SPIR-V can bitcast between integers and pointers seamlessly, for example:
      %61 = OpLoad %_ptr_PhysicalStorageBuffer_ReadVec4 %60 %70 = OpBitcast %v2uint %61 // Do math on %70 %86 = OpBitcast %_ptr_PhysicalStorageBuffer_UnalignedVec4 %some_address
      Conclusion
      We have already explored two key Vulkan extension game changers through this blog and the previous one. The third and final part of this game changer blog series will explore ‘Timeline Semaphores’ and how developers can use this new extension to improve the development experience and enhance their games.
      Follow Up
      Thanks to Hans-Kristian Arntzen and the team at Arm for bringing this great content to the Samsung Developers community. We hope you find this information about Vulkan extensions useful for developing your upcoming mobile games.
      The Samsung Developers site has many resources for developers looking to build for and integrate with Samsung devices and services. Stay in touch with the latest news by creating a free account or by subscribing to our monthly newsletter. Visit the Marketing Resources page for information on promoting and distributing your apps and games. Finally, our developer forum is an excellent way to stay up-to-date on all things related to the Galaxy ecosystem.
      View the full blog at its source
    • By STF News
      The Samsung Developers team works with many companies in the mobile and gaming ecosystems. We're excited to support our partner, Arm, as they bring timely and relevant content to developers looking to build games and high-performance experiences. This Vulkan Extensions series will help developers get the most out of the new and game-changing Vulkan extensions on Samsung mobile devices.
      As I mentioned previously, Android is enabling a host of useful new Vulkan extensions for mobile. These new extensions are set to improve the state of graphics APIs for modern applications, enabling new use cases and changing how developers can design graphics renderers going forward. These extensions will be available across various Android smartphones, including the new Samsung Galaxy S21, which was recently launched on 14 January. Existing Samsung Galaxy S models, such as the Samsung Galaxy S20, also allow upgrades to Android R.
      I have already discussed two of these extensions in previous blogs - Maintenance Extensions and Legacy Support Extensions. However, there are three further Vulkan extensions for Android that I believe are ‘game changers’. In the first of three blogs, I will explore these individual game changer extensions – what they do, why they can be useful and how to use them. The goal here is to not provide complete samples, but there should be enough to get you started. The first Vulkan extension is ‘Descriptor Indexing.’ Descriptor indexing can be available in handsets prior to Android R release. To check what Android devices are available with 'Descriptor Indexing' check here. You can also directly view the Khronos Group/ Vulkan samples that are relevant to this blog here.
      VK_EXT_descriptor_indexing
      Introduction
      In recent years, we have seen graphics APIs greatly evolve in their resource binding flexibility. All modern graphics APIs now have some answer to how we can access a large swathes of resources in a shader.
      Bindless
      A common buzzword that is thrown around in modern rendering tech is “bindless”. The core philosophy is that resources like textures and buffers are accessed through simple indices or pointers, and not singular “resource bindings”. To pass down resources to our shaders, we do not really bind them like in the graphics APIs of old. Simply write a descriptor to some memory and a shader can come in and read it later. This means the API machinery to drive this is kept to a minimum.
      This is a fundamental shift away from the older style where our rendering loop looked something like:
      render_scene() { foreach(drawable) { command_buffer->update_descriptors(drawable); command_buffer->draw(); } } Now it looks more like:
      render_scene() { command_buffer->bind_large_descriptor_heap(); large_descriptor_heap->write_global_descriptors(scene, lighting, shadowmaps); foreach(drawable) { offset = large_descriptor_heap->allocate_and_write_descriptors(drawable); command_buffer->push_descriptor_heap_offsets(offset); command_buffer->draw(); } } Since we have free-form access to resources now, it is much simpler to take advantage of features like multi-draw or other GPU driven approaches. We no longer require the CPU to rebind descriptor sets between draw calls like we used to.
      Going forward when we look at ray-tracing, this style of design is going to be mandatory since shooting a ray means we can hit anything, so all descriptors are potentially used. It is useful to start thinking about designing for this pattern going forward.
      The other side of the coin with this feature is that it is easier to shoot yourself in the foot. It is easy to access the wrong resource, but as I will get to later, there are tools available to help you along the way.
      VK_EXT_descriptor_indexing features
      This extension is a large one and landed in Vulkan 1.2 as a core feature. To enable bindless algorithms, there are two major features exposed by this extension.
      Non-uniform indexing of resources
      How resources are accessed has evolved quite a lot over the years. Hardware capabilities used to be quite limited, with a tiny bank of descriptors being visible to shaders at any one time. In more modern hardware however, shaders can access descriptors freely from memory and the limits are somewhat theoretical.
      Constant indexing
      Arrays of resources have been with us for a long time, but mostly as syntactic sugar, where we can only index into arrays with a constant index. This is equivalent to not using arrays at all from a compiler point of view.
      layout(set = 0, binding = 0) uniform sampler2D Textures[4]; const int CONSTANT_VALUE = 2; color = texture(Textures[CONSTANT_VALUE], UV); HLSL in D3D11 has this restriction as well, but it has been more relaxed about it, since it only requires that the index is constant after optimization passes are run.
      Dynamic indexing
      As an optional feature, dynamic indexing allows applications to perform dynamic indexing into arrays of resources. This allows for a very restricted form of bindless. Outside compute shaders however, using this feature correctly is quite awkward, due to the requirement of the resource index being dynamically uniform.
      Dynamically uniform is a somewhat intricate subject, and the details are left to the accompanying sample in KhronosGroup/Vulkan-Samples.
      Non-uniform indexing
      Most hardware assumes that the resource index is dynamically uniform, as this has been the restriction in APIs for a long time. If you are not accessing resources with a dynamically uniform index, you must notify the compiler of your intent.
      The rationale here is that hardware is optimized for dynamically uniform (or subgroup uniform) indices, so there is often an internal loop emitted by either compiler or hardware to handle every unique index that is used. This means performance tends to depend a bit on how divergent resource indices are.
      #extension GL_EXT_nonuniform_qualifier : require layout(set = 0, binding = 0) uniform texture2D Tex[]; layout(set = 1, binding = 0) uniform sampler Sampler; color = texture(nonuniformEXT(sampler2D(Tex[index], Sampler)), UV); In HLSL, there is a similar mechanism where you use NonUniformResourceIndex, for example.
      Texture2D<float4> Textures[] : register(t0, space0); SamplerState Samp : register(s0, space0); float4 color = Textures[NonUniformResourceIndex(index)].Sample(Samp, UV); All descriptor types can make use of this feature, not just textures, which is quite handy! The nonuniformEXT qualifier removes the requirement to use dynamically uniform indices. See the code sample for more detail.
      Update-after-bind
      A key component to make the bindless style work is that we do not have to … bind descriptor sets all the time. With the update-after-bind feature, we effectively block the driver from consuming descriptors at command recording time, which gives a lot of flexibility back to the application. The shader consumes descriptors as they are used and the application can freely update descriptors, even from multiple threads.
      To enable, update-after-bind we modify the VkDescriptorSetLayout by adding new binding flags. The way to do this is somewhat verbose, but at least update-after-bind is something that is generally used for just one or two descriptor set layouts throughout most applications:
      VkDescriptorSetLayoutCreateInfo info = { … }; info.flags = VK_DESCRIPTOR_SET_LAYOUT_CREATE_UPDATE_AFTER_BIND_POOL_BIT_EXT; const VkDescriptorBindingFlagsEXT flags = VK_DESCRIPTOR_BINDING_VARIABLE_DESCRIPTOR_COUNT_BIT_EXT | VK_DESCRIPTOR_BINDING_PARTIALLY_BOUND_BIT_EXT | VK_DESCRIPTOR_BINDING_UPDATE_AFTER_BIND_BIT_EXT | VK_DESCRIPTOR_BINDING_UPDATE_UNUSED_WHILE_PENDING_BIT_EXT; VkDescriptorSetLayoutBindingFlagsCreateInfoEXT binding_flags = { … }; binding_flags.bindingCount = info.bindingCount; binding_flags.pBindingFlags = &flags; info.pNext = &binding_flags; For each pBinding entry, we have a corresponding flags field where we can specify various flags. The descriptor_indexing extension has very fine-grained support, but UPDATE_AFTER_BIND_BIT and VARIABLE_DESCRIPTOR_COUNT_BIT are the most interesting ones to discuss.
      VARIABLE_DESCRIPTOR_COUNT deserves special attention as it makes descriptor management far more flexible. Having to use a fixed array size can be somewhat awkward, since in a common usage pattern with a large descriptor heap, there is no natural upper limit to how many descriptors we want to use. We could settle for some arbitrarily high limit like 500k, but that means all descriptor sets we allocate have to be of that size and all pipelines have to be tied to that specific number. This is not necessarily what we want, and VARIABLE_DESCRIPTOR_COUNT allows us to allocate just the number of descriptors we need per descriptor set. This makes it far more practical to use multiple bindless descriptor sets.
      When allocating a descriptor set, we pass down the actual number of descriptors to allocate:
      VkDescriptorSetVariableDescriptorCountAllocateInfoEXT variable_info = { … }; variable_info.sType = VK_STRUCTURE_TYPE_DESCRIPTOR_SET_VARIABLE_DESCRIPTOR_COUNT_ALLOCATE_INFO_EXT; variable_info.descriptorSetCount = 1; allocate_info.pNext = &variable_info; variable_info.pDescriptorCounts = &NumDescriptorsStreaming; VK_CHECK(vkAllocateDescriptorSets(get_device().get_handle(), &allocate_info, &descriptors.descriptor_set_update_after_bind)); GPU-assisted validation and debugging
      When we enter the world of descriptor indexing, there is a flipside where debugging and validation is much more difficult. The major benefit of the older binding models is that it is fairly easy for validation layers and debuggers to know what is going on. This is because the number of available resources to a shader is small and focused.
      With UPDATE_AFTER_BIND in particular, we do not know anything at draw time, which makes this awkward.
      It is possible to enable GPU assisted validation in the Khronos validation layers. This lets you catch issues like:
      "UNASSIGNED-Descriptor uninitialized: Validation Error: [ UNASSIGNED-Descriptor uninitialized ] Object 0: handle = 0x55625acf5600, type = VK_OBJECT_TYPE_QUEUE; | MessageID = 0x893513c7 | Descriptor index 67 is uninitialized__.  Command buffer (0x55625b184d60). Draw Index 0x4. Pipeline (0x520000000052). Shader Module (0x510000000051). Shader Instruction Index = 59.  Stage = Fragment.  Fragment coord (x,y) = (944.5, 0.5).  Unable to find SPIR-V OpLine for source information.  Build shader with debug info to get source information."
      Or:
      "UNASSIGNED-Descriptor uninitialized: Validation Error: [ UNASSIGNED-Descriptor uninitialized ] Object 0: handle = 0x55625acf5600, type = VK_OBJECT_TYPE_QUEUE; | MessageID = 0x893513c7 | Descriptor index 131 is uninitialized__.  Command buffer (0x55625b1893c0). Draw Index 0x4. Pipeline (0x520000000052). Shader Module (0x510000000051). Shader Instruction Index = 59.  Stage = Fragment.  Fragment coord (x,y) = (944.5, 0.5).  Unable to find SPIR-V OpLine for source information.  Build shader with debug info to get source information."
      RenderDoc supports debugging descriptor indexing through shader instrumentation, and this allows you to inspect which resources were accessed. When you have several thousand resources bound to a pipeline, this feature is critical to make any sense of the inputs.
      If we are using the update-after-bind style, we can inspect the exact resources we used.
      In a non-uniform indexing style, we can inspect all unique resources we used.
      Conclusion
      Descriptor indexing unlocks many design possibilities in your engine and is a real game changer for modern rendering techniques. Use with care, and make sure to take advantage of all debugging tools available to you. You need them.
      This blog has explored the first Vulkan extension game changer, with two more parts in this game changer blog series still to come. The next part will focus on ‘Buffer Device Address’ and how developers can use this new feature to enhance their games.
      Follow Up
      Thanks to Hans-Kristian Arntzen and the team at Arm for bringing this great content to the Samsung Developers community. We hope you find this information about Vulkan extensions useful for developing your upcoming mobile games. The original version of this article can be viewed at Arm Community.
      The Samsung Developers site has many resources for developers looking to build for and integrate with Samsung devices and services. Stay in touch with the latest news by creating a free account or by subscribing to our monthly newsletter. Visit the Marketing Resources page for information on promoting and distributing your apps and games. Finally, our developer forum is an excellent way to stay up-to-date on all things related to the Galaxy ecosystem.
      View the full blog at its source
    • By STF News
      Anti-Aliasing is an important addition to any game to improve visual quality by smoothing out the jagged edges of a scene. MSAA (Multisample Anti-Aliasing) is one of the oldest methods to achieve this and is still the preferred solution for mobile. However it is only suitable for forward rendering and, with mobile performance improving year over year, deferred rendering is becoming more common, necessitating the use of post-process AA. This leaves slim pickings as such algorithms tend to be too expensive for mobile GPUs with FXAA (Fast Approximate Anti-Aliasing) being the only ‘cheap’ option among them. FXAA may be performant enough but it only has simple colour discontinuity shape detection, leading to an often unwanted softening of the image. Its kernel is also limited in size, so it struggles to anti-alias longer edges effectively.

      Space Module scene with CMAA applied.

      Conservative Morphological Anti-Aliasing
      Conservative Morphological Anti-Aliasing (CMAA) is a post-process AA solution originally developed by Intel for their low power integrated GPUs 1. Its design goals are to be a better alternative to FXAA by:
      Being minimally invasive so it can be acceptable as a replacement in a wide range of applications, including worst case scenarios such as text, repeating patterns, certain geometries (power lines, mesh fences, foliage), and moving images. Running efficiently on low-medium range GPU hardware, such as integrated GPUs (or, in our case, mobile GPUs). We have repurposed this desktop-developed algorithm and come up with a hybrid between the original 1.3 version and the updated 2.0 version 2 to make the best use of mobile hardware. A demo app was created using Khronos’ Vulkan Samples as a framework (which could also be done with GLES) to implement this experiment. The sample has a drop down menu for easy switching between the different AA solutions and presents a frametime and bandwidth overlay.
      CMAA has four basic logical steps:
      Image analysis for colour discontinuities (afterwards stored in a local compressed 'edge' buffer). The method used is not unique to CMAA. Extracting locally dominant edges with a small kernel. (Unique variation of existing algorithms.) Handling of simple shapes. Handling of symmetrical long edge shapes. (Unique take on the original MLAA shape handling algorithm.) Pass 1

      Edge detection result captured in Renderdoc.

      A full screen edge detection pass is done in a fragment shader and the resulting colour discontinuity values are written into a colour attachment. Our implementation uses the pixels’ luminance values to find edge discontinuities for speed and simplicity. An edge exists if the contrast between neighbouring pixels is above an empirically determined threshold.
      Pass 2

      Neighbouring edges considered for local contrast adaptation.

      A local contrast adaptation is performed for each detected edge by comparing the value of the previous pass against the values of its closest neighbours by creating a threshold from the average and largest of these, as described by the formula below. Any that pass the threshold are written into an image as a confirmed edge.
      Pass 3
      This pass collects all the edges for each pixel from the previous pass and packs them into a new image for the final pass. This pass also does the first part of edge blending. The detected edges are used to look for 2, 3 and 4 edges in a pixel and then blend in the colours from the adjacent pixels. This helps avoid the unnecessary blending of straight edges.
      Pass 4
      The final pass does long edge blending by identifying Z-shapes in the detected edges. For each detected Z-shape, the length of the edge is traced in both directions until it reaches the end or until it runs into a perpendicular edge. Pixel blending is then performed along the traced edges proportional to their distance from the centre.

      Before and after of Z-shape detection.

      Results


      Image comparison shows a typical scenario for AA. CMAA manages high quality anti-aliasing while retaining sharpness on straight edges.

      CMAA demonstrates itself as a superior solution to aliasing than FXAA by avoiding the latter’s limitations. It maintains a crisper look to the overall image and won’t smudge thin lines, all while still providing effective anti-aliasing to curved edges. It also provides a significant performance advantage to Qualcomm devices and only a small penalty to ARM.

      Image comparison shows a weakness of FXAA where it smudges thin lined geometry into the background. CMAA shows no such weakness and retains the colour of the railing while adding anti-aliasing effectively.

      MSAA is still a clear winner and our recommended solution if your game allows for it to be resolved within a single render pass. For any case where that is impractical, CMAA is overall a better alternative than FXAA and should be strongly considered.

      Graph shows the increase in frametime for each AA method across a range of Samsung devices.

      References
      Filip Strugar and Leigh Davies: Conservative Morphological Anti-Aliasing (CMAA) – March 2014. Filip Strugar and Adam T Lake: Conservative Morphological Anti-Aliasing 2.0 – April 2018. View the full blog at its source
×
×
  • Create New...