Feature Level 9.3 appears to never have actually worked because shaders
are compiled as straight 4_0 instead of 4_0_level_9_3. That being the
case, baseline against 10_0 instead.
NV12 GPU copies to staging textures for CPU read take a ridiculously
long time on my integrated Intel GPU. Using R8/R8G8 instead seems to be
a huge speed-up.
Intel HD Graphics 530, D3D11 query timings, SetStablePowerState
NV12: ~3268 us (minimum of wild timings)
R8/R8G8: ~781 us (most frequently occurring timing)
This change only wraps the functionality. I have rough code to exercise
the the query functionality, but that part is not really clean enough to
submit.
Code submissions have continually suffered from formatting
inconsistencies that constantly have to be addressed. Using
clang-format simplifies this by making code formatting more consistent,
and allows automation of the code formatting so that maintainers can
focus more on the code itself instead of code formatting.
The cache coherency of rasterization for full-screen passes is better
using an oversized triangle that is clipped rather than two triangles.
Traversal order of rasterization is GPU-specific, but will almost
certainly be better using an undivided primitive.
A smaller benefit is that quads along the diagonal are not evaluated
multiple times, but that's minor in comparison.
Redo format shaders to bypass vertex buffer, and input layout. Add
global shader bool "obs_glsl_compile" to make API-specific decisions,
i.e. handle upside-down UVs. gl_ortho is not needed for format
conversion because the vertex shader does not use ViewProj anymore.
This can be applied to more situations, but start small first.
Testbed full screen passes, Intel HD Graphics 530:
RGBA -> UYVX: 467 -> 439 us, ~6% savings
UYVX -> uv: 295 -> 239 us, ~19% savings
There are cases where alpha is multiplied unnecessarily. This change
attempts to use premultiplied alpha blending for composition.
To keep this change simple, The filter chain will continue to use
straight alpha. Otherwise, every source would need to modified to output
premultiplied, and every filter modified for premultiplied input.
"DrawAlphaDivide" shader techniques have been added to convert from
premultiplied alpha to straight alpha for final output. "DrawMatrix"
techniques ignore alpha, so they do not appear to need changing.
One remaining issue is that scale effects are set up here to use the
same shader logic for both scale filters (straight alpha - incorrectly),
and output composition (premultiplied alpha - correctly). A fix could be
made to add additional shaders for straight alpha, but the "real" fix
may be to eliminate the straight alpha path at some point.
For graphics, SrcBlendAlpha and DestBlendAlpha were both ONE, and could
combine together to form alpha values greater than one. This is not as
noticeable of a problem for UNORM targets because the channels are
clamped, but it will likely become a problem in more situations if FLOAT
targets are used.
This change switches DestBlendAlpha to INVSRCALPHA. The blending
behavior of stacked transparents is preserved without overflowing the
alpha channel.
obs-transitions: Use premultiplied alpha blend, and simplify shaders
because both inputs and outputs use premultiplied alpha now.
Fixes https://obsproject.com/mantis/view.php?id=1108
Add support for debug markers via D3DPERF API and KHR_debug. This makes
it easier to understand RenderDoc captures.
D3DPERF is preferred to ID3DUserDefinedAnnotation because it supports
colors. d3d9.lib is now linked in to support this.
This feature is disabled by default, and is controlled by
GS_USE_DEBUG_MARKERS.
From: obsproject/obs-studio#1799
Currently SrcBlendAlpha and DestBlendAlpha are both ONE, and can
combine together to form two. This is not a noticeable problem for
UNORM targets because the channels are clamped, but it will likely
become a problem if FLOAT targets are more widely used.
This change switches DestBlendAlpha to INVSRCALPHA, and starts
backgrounds as opaque black instead of transparent black. The blending
behavior of stacked transparents is preserved without overflowing the
alpha channel.
On some AMD systems, the test for bad NV12 output would come up negative
(output was good), but when recording/streaming, it would have bad
output. While the test worked correctly with NVIDIA systems, it did not
work in these cases. This was because we did not correctly reproduce
the exact conditions of green output because the textures used for the
test were not also keyed mutex shared textures. This fixes that issue
and ensures the test works correctly in those other niche cases.
Performs a test on a texture to determine if NV12 textures are
functioning correctly. Older NVIDIA drivers appear to have a bug where
they will output their UV channel data on to the Y channel when copying
from the GPU. This test on startup determines whether that bug is
occurring, and if so, disables NV12 texture support.
If a texture has to be rebuilt due to a driver reset and is a keyed
mutex shared texture, make sure to reacquire the shared handle and
acquire the lock.
When NV12 textures were added, driver crashes and their rebuild process
was not taken in to consideration. This fixes that by adding explicit
NV12 rebuild functions.
Gives the ability to retrieve param annotations. Blocks wrapped in <>
following a parameter.
For example:
float slider < float max_value = 10.0; float min_value = 0.0; >;
These blocks are not for shading purposes but to help describe the
shader's gui as in the example above.
Adds graphics api functions for retrieving annotations:
size_t gs_param_get_num_annotations(const gs_eparam_t *param);
gs_eparam_t *gs_param_get_annotation_by_idx(const gs_eparam_t *param,
size_t annotation);
gs_eparam_t *gs_param_get_annotation_by_name(const gs_eparam_t *param,
const char *name);
This commit fixes a bug that occurs on Windows 8+ when two or more
"Display Capture" sources are active that are configured to capture the
same monitor. Only one display capture would show, while all subsequent
display captures would display nothing.
Closesjp9000/obs-studio#1142
(Note: This commit also modifies libobs-d3d11 and libobs-opengl)
Allows the ability to flush data directly without having to use the
buffer's internal data.
Allows the caller to manage his/her own vertex/index buffer data if
desired, working around the design flaw of having to rely on a
vertex/index buffer's internal data.
It looks like the link between the gs layer rgba enable flags and the
underlying D3D states never got fully implemented.
This change adds the missing piece, fixing an issue I had in a plugin
wherein I couldn't write a blended value to a RGBA render target without
also changing the alpha of the dest pixel. Debugging that led to the
missing gs_enable_color functionality.
Closesjp9000/obs-studio#1064
Prevents an issue where the output duplicator would cause the program to
crash if the graphics driver crashes and the graphics subsystem needs to
be rebuilt.
Sometimes when rebuilding a texture, it often has to fall back and
create a temporary texture, but it'll fail when trying to create a
shader resource for it. The suspicion is because it's due to not having
the proper shader binding flag when creating that temporary texture, so
this fixes that possible loophole.
There's no need to reset vertex buffers like this anymore. This would
unintentionally cause certain things (such as the freetype text source
on windows) to stop rendering properly.
Fixes a bug where loading vertex shaders could cause error messages
about mismatching vertex buffer data to appear because the vertex shader
would try to reload the previously used vertex buffer.
When rebuilding the graphics subsystem, it's possible a shared texture
may no longer be available. In this case, just soft fail and allow the
texture to be rebuilt rather than crash the entire program over it.
Due to an NVIDIA driver bug with the Windows 10 Anniversary Update,
there are an increasingly large number of reports of "Device Removed"
errors and TDRs. When this happens, OBS stops outputting all data
because all graphics functions are failing, and it appears to just
"freeze up" for users.
To temporarily alleviate this issue while waiting for it to be fixed,
the D3D subsystem can be rebuilt when that happens, all assets can be
reloaded to ensure that it can continue functioning (with a minor hiccup
in playback).
To allow rebuilding the entire D3D subsystem, all objects that contain
D3D references must be part of a linked list (with a few exceptions) so
we can quickly traverse them all whenever needed, and all data for those
resources (static resources primarily, such as shaders, textures, index
buffers, vertex buffers) must be stored in RAM so they can be recreated
whenever needed.
Then if D3D reports a "device removed" or "device reset" error, all D3D
references must first be fully released with no stray references; the
linked list must be fully traversed until all references are released.
Then, the linked list must once again be traversed again, and all those
D3D objects must be recreated with the same data and descriptors (which
are now saved in each object). Finally, all states need to be reset.
After that's complete, the device is able to continue functioning almost
as it was before, although the output to recording/stream may get a few
green frames due to texture data being reset.
This will temporarily alleviate the "Device Removed" issue while waiting
for a fix from NVIDIA.
Instead of letting vertex buffer data be freed immediately, store it so
it can be used for rebuilding later. Also, separate the buffer building
to a function.
Unloads all device data and clears all device references. Probably not
necessary, but it's unknown how D3D11 handles this internally so
probably best to be safe.
When DirectX throws error 0x887A0005 (DXGI_ERROR_DEVICE_REMOVED),
Microsoft recommends calling ID3D11Device::GetDeviceRemovedReason to
retrieve a more precise error code.
Closesjp9000/obs-studio#652
This allows the ability to separate the blend states of color and alpha.
The default blend state has also changed so that alpha is always added
together to ensure that the destination image always gets an alpha value
that is actually usable after the operation (for render targets).
Old default state:
color source: GS_BLEND_SRCALPHA, color dest: GS_BLEND_INVSRCALPHA
alpha source: GS_BLEND_SRCALPHA, alpha dest: GS_BLEND_INVSRCALPHA
New default state:
color source: GS_BLEND_SRCALPHA, color dest: GS_BLEND_INVSRCALPHA
alpha source: GS_BLEND_ONE, alpha dest: GS_BLEND_ONE
Someone's going to yell at me about this, but fix vertical alignment for
certain member variables in the main header.
For future reference, if you must use vertical alignment, always give it
plenty of space for the type names to grow in case you need to
add/change variables in the future; don't just align to the 'longest'
value, give it an extra 8-16 spaces for potential future variables.
This is done to prevent having to make commits like this in the future
that sort of pollute the history.
When using an enumeration value with a switch, it needs to be filled out
with all possible values to prevent compiler warnings. This warning is
used to prevent the developer from unintentionally forgetting to add new
enum values to any switches the enum is used on later on. Sadly, only
good compilers actually have this warning (mingw).
Microsoft's compiler doesn't seem to care about warning about things
like initializer list ordering. Mingw actually reports on this to
prevent potential confusion about ordering.
We have a sprintf_s function in mingw-w64, but it's the it won't compile
with visual studio because it's the C11 specification (aka the correct
specification that's not made by morons). Microsoft's version differs
to the specification (and is made by morons), so fall back to sprintf
(note if you can't tell, this commit message was edited by Jim)
The gs_enum_adapters function is an optional implementation to allow
enumeration of available graphics adapters that can be used with the
program. The ID associated with the adapter can be an index or a hash
depending on the implementation.
This adds support for the windows 8+ output duplicator feature which
allows the efficient capturing of a specific monitor connected to the
currently used device.
I do not want the D3D11 library to depend on a specific compiler
version. This way, I do not have to distribute D3D Compiler libraries
with the program (proprietary binary blobs). Any particular version
works because the API for the D3DCompiler function appears to be the
same; the only things that change are other features and additions
mostly (at least as far as I can tell). Using any version available on
the system should be more than sufficient rather than depending on some
specific D3D compiler version.
If the user doesn't have it, a download of the latest D3D distributables
should be fine, though it should work with the ones that come with
windows 7+ as well.
This Fixes a minor flaw with the API where data had to always be mutable
to be usable by the API.
Functions that do not modify the fundamental underlying data of a
structure should be marked as constant, both for safety and to signify
that the parameter is input only and will not be modified by the
function using it.
Typedef pointers are unsafe. If you do:
typedef struct bla *bla_t;
then you cannot use it as a constant, such as: const bla_t, because
that constant will be to the pointer itself rather than to the
underlying data. I admit this was a fundamental mistake that must
be corrected.
All typedefs that were pointer types will now have their pointers
removed from the type itself, and the pointers will be used when they
are actually used as variables/parameters/returns instead.
This does not break ABI though, which is pretty nice.
The alpha source and destination blend values were always being set to
one and zero, when they should have been set to the same as the color
values. This caused the alpha of the source texture to always overwrite
the alpha of the destination texture, rather than apply the blend
function upon it. Needless to say that it seriously screwed up the
render target if you rendered something with alpha on it.
Thanks to paibox for pointing this issue out and yelling at me to fix
it. I apologize for not getting to this sooner.
Changed API functions:
libobs: obs_reset_video
Before, video initialization returned a boolean, but "failed" is too
little information, if it fails due to lack of device capabilities or
bad video device parameters, the front-end needs to know that.
The OBS Basic UI has also been updated to reflect this API change.
There's no need to find DirectX because with VS2013 and mingw it's
already available by default. Older visual studio versions that didn't
come with DirectX by default are no longer supported anyway.
(Also mingw doesn't currently work at all due to lack of proper headers,
but once they do it'll be available in the same way. I think.)
NOTE: In texture_setimage, I had to move variables to the top of the
scope because microsoft's C compiler will give the legacy C90 error of:
'illegal use of this type as an expression'.
To sum it up, microsoft's C compiler is still utter garbage.
...I'm actually concerned that I went a bit overkill trying to prevent
backwards compatibility issues with this abstraction design, because
this is a large number of files that have to be modified just to add a
single graphics subsystem export. Someone's going to strangle me, and
when you know that someone might strangle you, that means that you did
something wrong. We'll have to look in to simplifying this in the
future without killing backward compatibility safety.
These functions were mostly related to being able to set true fullscreen
mode -- however, this has no place for our purposes, and these functions
were just sitting empty and unused, so they should be removed.
Besides, fullscreen mode only applies to the windows operating system.
This variable is currently somewhat pointless, I was originally going to
use it to tell the graphics subsystem to completely rebuild the internal
vertex buffers, but it would be bad/inefficient to allow that
functionality.
I had forgotten how constants worked when compiled; constants are
uploaded as constant registers. When constants are used with shaders,
multiple constants are often packed in to a single register when
possible to reduce constant register count.
For example, one 'float' constant and one 'float3' constant will be
packed in to a single register (c0.x for constant 1, c0.yzw for constant
2), but two 'float' constants and one 'float3' constant must inhabit two
registers (c0.xy for constant 1, c1.xyz for constant 2), so it must
start on a new register boundry (every 16 bytes).
I had first instinctively thought it was just a simple case of
alignment like it is on the CPU, but then I realized that it didn't
sound right, so I went back and did some more tests and then ultimately
remembered how constants actually are uploaded.
- Add dummy GL texture support to allow libobs texture references to be
created for GL without
- Add a texture_getobj function to allow the retrieval of the
context-specific object, such as the D3D texture pointer, or the
OpenGL texture object handle.
- Also cleaned up the export stuff. I realized it was all totally
superfluous. Kind of a dumb moment, but nice to clean it up
regardless.
- Implement OBS encoder interface. It was previously incomplete, but
now is reaching some level of completion, though probably should
still be considered preliminary.
I had originally implemented it so that encoders only have a 'reset'
function to reset their parameters, but I felt that having both a
'start' and 'stop' function would be useful.
Encoders are now assigned to a specific video/audio media output each
rather than implicitely assigned to the main obs video/audio
contexts. This allows separate encoder contexts that aren't
necessarily assigned to the main video/audio context (which is useful
for things such as recording specific sources). Will probably have
to do this for regular obs outputs as well.
When creating an encoder, you must now explicitely state whether that
encoder is an audio or video encoder.
Audio and video can optionally be automatically converted depending
on what the encoder specifies.
When something 'attaches' to an encoder, the first attachment starts
the encoder, and the encoder automatically attaches to the media
output context associated with it. Subsequent attachments won't have
the same effect, they will just start receiving the same encoder data
when the next keyframe plays (along with SEI if any). When detaching
from the encoder, the last detachment will fully stop the encoder and
detach the encoder from the media output context associated with the
encoder.
SEI must actually be exported separately; because new encoder
attachments may not always be at the beginning of the stream, the
first keyframe they get must have that SEI data in it. If the
encoder has SEI data, it needs only add one small function to simply
query that SEI data, and then that data will be handled automatically
by libobs for all subsequent encoder attachments.
- Implement x264 encoder plugin, move x264 files to separate plugin to
separate necessary dependencies.
- Change video/audio frame output structures to not use const
qualifiers to prevent issues with non-const function usage elsewhere.
This was an issue when writing the x264 encoder, as the x264 encoder
expects non-const frame data.
Change stagesurf_map to return a non-const data type to prevent this
as well.
- Change full range parameter of video scaler to be an enum rather than
boolean
- Implement windows monitor capture (code is so much cleaner than in
OBS1). Will implement duplication capture later
- Add GDI texture support to d3d11 graphics library
- Fix precision issue with sleep timing, you have to call
timeBeginPeriod otherwise windows sleep will be totally erratic.
- Add WASAPI audio capture for windows, input and output
- Check for null pointer in os_dlopen
- Add exception-safe 'WinHandle' and 'CoTaskMemPtr' helper classes that
will automatically call CloseHandle on handles and call CoTaskMemFree
on certain types of memory returned from windows functions
- Changed the wide <-> MBS/UTF8 conversion functions so that you use
buffers (like these functions are *supposed* to behave), and changed
the ones that allocate to a different naming scheme to be safe
LOG_ERROR should be used in places where though recoverable (or at least
something that can be handled safely), was unexpected, and may affect
the user/application.
LOG_WARNING should be used in places where it's not entirely unexpected,
is recoverable, and doesn't really affect the user/application.
- Changed glMapBuffer to glMapBufferRange to allow invalidation. Using
just glMapBuffer alone was causing some unacceptable stalls.
- Changed dynamic buffers from GL_DYNAMIC_WRITE to GL_STREAM_WRITE
because I had misunderstood the OpenGL specification
- Added _OPENGL and _D3D11 builtin preprocessor macros to effects to
allow special processing if needed
- Added fmod support to shaders (NOTE: D3D and GL do not function
identically with negative numbers when using this. Positive numbers
however function identically)
- Created a planar conversion shader that converts from packed YUV to
planar 420 right on the GPU without any CPU processing. Reduces
required GPU download size to approximately 37.5% of its normal rate
as well. GPU usage down by 10 entire percentage points despite the
extra required pass.
There were a *lot* of warnings, managed to remove most of them.
Also, put warning flags before C_FLAGS and CXX_FLAGS, rather than after,
as -Wall -Wextra was overwriting flags that came before it.
- Fill in the rest of the FFmpeg test output code for testing so it
actually properly outputs data.
- Improve the main video subsystem to be a bit more optimal and
automatically output I420 or NV12 if needed.
- Fix audio subsystem insertation and byte calculation. Now it will
seamlessly insert new audio data in to the audio stream based upon
its timestamp value. (Be extremely cautious when using floating
point calculations for important things like this, and always round
your values and check your values)
- Use 32 byte alignment in case of future optimizations and export a
function to get the current alignment.
- Make os_sleepto_ns return true if slept, false if the time has
already been passed before the call.
- Fix sinewave output so that it actually properly calculates a middle
C sinewave.
- Change the use of row_bytes to linesize (also makes it a bit more
consistent with FFmpeg's naming as well)