Commit graph

104 commits

Author SHA1 Message Date
jpark37 af01e044a2 libobs: Fix Lanczos calculations
- Fix: Ensure (1, 1) coordinate gets clamped.
- Fix: Increase weight precision by premultiplying UV in VS.
- Cleanup: Group coordinates 012/345 instead of 024/135.
- Cleanup: Remove unnecessary branches.

NVIDIA RTX 2080 Ti, Intel GPA, SetStablePowerState

256x224 -> 1323x1080: 123 us -> 123 us
2019-08-25 10:00:23 -07:00
jpark37 3485c4cdac libobs: Simplify bicubic weight calculations
Also increase weight precision by premultiplying UV in VS.

Intel HD Graphics 530, Intel GPA, SetStablePowerState

256x224 -> 1323x1080: 1221 us -> 1020 us
2019-08-25 10:00:10 -07:00
jpark37 9f5d218e16 libobs: Remove unnecessary divides from Lanczos 2019-08-14 21:36:23 -07:00
jpark37 93f1ab789d libobs: Fix dark lines using Lanczos
When texel samples are not exactly on texel centers, weight calculations
will involve a divide by a number very close to zero, resulting in
precision issues. Restore normalization of weights to compensate.
2019-08-14 21:00:09 -07:00
jpark37 3d6f5c8ad6 libobs: Add YUV alpha formats
This will allow YUV alpha formats to be converted to RGBA on the GPU.
2019-08-11 11:26:22 -07:00
Jim 31a902b3af
Merge pull request #2018 from jpark37/yuv-simplify2
libobs: Separate textures for YUV input
2019-08-10 22:41:28 -07:00
Jim ecfcb64056
Merge pull request #1994 from jpark37/faster-lanczos
libobs: Optimize lanczos shader, remove scaling
2019-08-10 03:02:26 -07:00
jpark37 bdd8d64053 libobs: Separate textures for YUV input
The shaders to unpack YUV information from the same texture were rather
complicated. Breaking them up into separate textures makes the shaders
much simpler, and we can remove the PRECISION_OFFSET hack.

Performance also gets a nice boost on Intel for planar textures.

Intel GPA, SetStablePowerState, Intel HD Graphics 530, 1920x1080

UYVY: 473 us -> 457 us
YUY2: 492 us -> 422 us
YVYU: 491 us -> 441 us
I420: 1637 us -> 505 us
I422: 1644 us -> 482 us
I444: 1653 us -> 504 us
NV12: 1656 us -> 369 us
Y800 (limited): 270 us -> 277 us
Y800 (full): 263 us -> 289 us
RGB (limited): 341 us -> 411 us
BGR3 (limited): 512 us -> 509 us
BGR3 (full): 527 us -> 534 us
2019-08-09 21:14:29 -07:00
jpark37 9aacc99b3e libobs: Separate textures for YUV output, fix chroma
The shaders to pack YUV information into the same texture were rather
complicated and suffering precision issues. Breaking them up into
separate textures makes the shaders much simpler and avoids having to
compute large integer offsets. Unfortunately, the code to handle
multiple textures is not as pleasant, but at least the NV12 rendering
path is no longer separate.

In addition, write chroma samples to "standard" offsets. For I444,
there's no difference, but I420/NV12 formats now have chroma shifted to
the left as 4:2:0 is shown in the H.264 specification.

Intel GPA, SetStablePowerState, Intel HD Graphics 530

Expect speed incrase:
I420: 844 us -> 493 us (254 us + 190 us + 274 us)
I444: 837 us -> 747 us (258 us + 276 us + 272 us)
NV12: 450 us -> 368 us (319 us + 168 us)

Expect no change:
NV12 (HW): 580 (481 us + 166 us) us -> 588 us (468 us + 247 us)
RGB: 359 us -> 387 us

Fixes https://obsproject.com/mantis/view.php?id=624
Fixes https://obsproject.com/mantis/view.php?id=1512
2019-07-26 23:21:41 -07:00
jpark37 f27ece50c9 libobs: Optimize lanczos shader, remove scaling
Use bilinear filtering to reduce 36 taps to 25 for the regular path.
This works because the middle weights are always between 0 and 1,
allowing texture coordinates to be placed strategically to sample
correct ratios. I'm not sure about the undistort path, so I've left that
alone.

Also remove scaling added in #526, after which weight normalization is
unnecessary. If we want to use or invent an algorithm with alternate
downscaling properties, that's fine, but I don't think we should change
Lanczos scaling to mean something it's not. The scale implementation was
also seen not working when applied directly to scene items because of
assumptions made about the projection matrix.

Intel GPA, SetStablePowerState, Intel HD Graphics 530, D3D11
644x478 -> 1323x1080: 3890 us -> 3401 us
1920x1080 -> 1280x720: 2555 us -> 2261 us
2019-07-26 20:45:33 -07:00
Jim 62c7e00d16
Merge pull request #1993 from jpark37/faster-bicubic
Optimize bicubic shader
2019-07-26 00:36:19 -07:00
jpark37 2721ac4a85 libobs: Optimize bicubic shader
Use bilinear filtering to reduce 16 taps to 9 for the regular path. This
works because the middle weights are always between 0 and 1, allowing
texture coordinates to be placed strategically to sample correct ratios.
I'm not sure about the undistort path, so I've left that alone.

Also remove weight normalization. I'm not seeing that make even a small
difference.

Intel HD Graphics 530, D3D11
644x478 -> 1323x1080: 1790 us -> 1279 us
1920x1080 -> 1280x720: 1301 us -> 918 us

References:
https://entropymine.com/imageworsener/bicubic/
http://vec3.ca/bicubic-filtering-in-fewer-taps/
http://developer.download.nvidia.com/books/HTML/gpugems/gpugems_ch24.html
2019-07-25 22:21:11 -07:00
James Park 37f663a789 libobs: obs-ffmpeg: win-dshow: Planar 4:2:2 video
This format has been seen when using FFmpeg MJPEG decompression.
2019-07-25 20:11:37 -07:00
jpark37 2656bf0a90 libobs: Rework RGB to YUV conversion
RGB to YUV converison was previously baked into every scale shader, but
this work has been moved to the YUV packing shaders. The scale shaders
now write RGBA instead. In the case where base and output resolutions
are identical, the render texture is forwarded directly to the YUV pack
step, skipping an entire fullscreen pass.

Intel GPA, SetStablePowerState, Intel HD Graphics 530, NV12

1920x1080, Before:
RGBA -> UYVX: ~321 us
UYVX -> Y: ~480 us
UYVX -> UV: ~127 us

1920x1080, After:
[forward render texture]
RGBA -> Y: ~487 us
RGBA -> UV: ~131 us

1920x1080 -> 1280x720, Before:
RGBA -> UYVX: ~268 us
UYVX -> Y: ~209 us
UYVX -> UV: ~57 us

1920x1080 -> 1280x720, After:
RGBA -> RGBA (rescale): ~268 us
RGBA -> Y: ~210 us
RGBA -> UV: ~58 us
2019-07-22 01:12:35 -07:00
jpark37 85cc7c84bc libobs: obs-filters: Area upscale shader
Add a separate shader for area upscaling to take advantage of bilinear
filtering. Iterating over texels is unnecessary in the upscale case
because a target pixel can only overlap 1 or 2 texels in X and Y
directions. When only overlapping one texel, adjust UVs to sample texel
center to avoid filtering.

Also add "base_dimension" uniform to avoid unnecessary division.

Intel HD Graphics 530, 644x478 -> 1323x1080: ~836 us -> ~232 us
2019-07-17 21:11:18 -07:00
James Park aa22b61e3e libobs: Full-screen triangle format conversions
The cache coherency of rasterization for full-screen passes is better
using an oversized triangle that is clipped rather than two triangles.
Traversal order of rasterization is GPU-specific, but will almost
certainly be better using an undivided primitive.

A smaller benefit is that quads along the diagonal are not evaluated
multiple times, but that's minor in comparison.

Redo format shaders to bypass vertex buffer, and input layout. Add
global shader bool "obs_glsl_compile" to make API-specific decisions,
i.e. handle upside-down UVs. gl_ortho is not needed for format
conversion because the vertex shader does not use ViewProj anymore.

This can be applied to more situations, but start small first.

Testbed full screen passes, Intel HD Graphics 530:
RGBA -> UYVX: 467 -> 439 us, ~6% savings
UYVX -> uv: 295 -> 239 us, ~19% savings
2019-06-18 22:29:07 -07:00
Jim ab70bff4b3
Merge pull request #1913 from jpark37/area-shader-optimization
libobs: Area-resampling shader optimizations
2019-06-17 20:40:25 -07:00
Jim fafda14963
Merge pull request #1906 from jpark37/bgr-three
libobs: linux-v412: obs-ffmpeg: Add packed BGR3 video support
2019-06-15 16:40:44 -07:00
Jim dd607b422f
Merge pull request #1881 from jpark37/lowres-fair-sampling
libobs: Improve low-resolution bilinear sampling
2019-06-15 16:03:02 -07:00
James Park 9f66b90d99 libobs: Area-resampling shader optimizations
Switch for loop to do/while because we know the condition is always
true for the first loop.

Replace int math with float math to play nicely with more GPUs.

Add variables imagesize/targetsize to avoid redundant reciprocals.

Intel GPA results: 1166 -> 836 us
2019-06-03 23:11:23 -07:00
James Park 614025742b libobs: linux-v412: obs-ffmpeg: Add packed BGR3 video support
Someone mentioned this format preserves the most quality for a
particular capture card using V4L2.
2019-05-30 06:05:53 -07:00
James Park 0c5cb83bf4 libobs: Remove saturate from RGB -> YUV conversion
Incoming texture is UNORM, so the value must already be saturated.
2019-05-18 22:10:42 -07:00
James Park fede4fb784 libobs: Improve low-resolution bilinear sampling
The issue with the current bilinear_lowres_scale effect is that it
samples adjacent texels, disregarding the texel-to-pixel ratio. If the
ratio is large, this can lead to aliasing. This change provides a fair
set of texture samples across the entire pixel.

The 8-sample pattern used here comes from Direct3D.
2019-05-13 23:54:14 -07:00
James Park ba21fb947e libobs: Fix various alpha issues
There are cases where alpha is multiplied unnecessarily. This change
attempts to use premultiplied alpha blending for composition.

To keep this change simple, The filter chain will continue to use
straight alpha. Otherwise, every source would need to modified to output
premultiplied, and every filter modified for premultiplied input.

"DrawAlphaDivide" shader techniques have been added to convert from
premultiplied alpha to straight alpha for final output. "DrawMatrix"
techniques ignore alpha, so they do not appear to need changing.

One remaining issue is that scale effects are set up here to use the
same shader logic for both scale filters (straight alpha - incorrectly),
and output composition (premultiplied alpha - correctly). A fix could be
made to add additional shaders for straight alpha, but the "real" fix
may be to eliminate the straight alpha path at some point.

For graphics, SrcBlendAlpha and DestBlendAlpha were both ONE, and could
combine together to form alpha values greater than one. This is not as
noticeable of a problem for UNORM targets because the channels are
clamped, but it will likely become a problem in more situations if FLOAT
targets are used.

This change switches DestBlendAlpha to INVSRCALPHA. The blending
behavior of stacked transparents is preserved without overflowing the
alpha channel.

obs-transitions: Use premultiplied alpha blend, and simplify shaders
because both inputs and outputs use premultiplied alpha now.

Fixes https://obsproject.com/mantis/view.php?id=1108
2019-05-08 20:26:52 -07:00
James Park a86710ec5b libobs: Support limited color range for RGB/Y800 sources
libobs: Add support for limited to full color range conversions when
using RGB or Y800 formats, and move RGB converison for Y800 formats to
the GPU.

decklink: Stop hiding color space/range properties for RGB formats, and
remove "YUV" from "YUV Color Space" and "YUV Color Range".

win-dshow: Remove "YUV" from "YUV Color Space" and "YUV Color Range".

UI: Remove "YUV" from "YUV Color Space" and "YUV Color Range".
2019-04-25 15:13:05 -07:00
James Park f66625bf1e libobs: Fix shader for GLSL
vec4 to vec3 truncation fix.
2019-04-14 14:15:48 -07:00
James Park 69c215345a libobs: Simplify YUV conversion
Currently several shaders need "DrawMatrix" techniques to support the
possibility that the input texture is a "YUV" format. Also, "DrawMatrix"
is overloaded for translation in both directions when it is written for
RGB to "YUV" only.

A cleaner solution is to handle "YUV" to RGB up-front as part of format
conversion, and ensure only RGB inputs reach the other shaders. This is
necessary to someday perform correct scale filtering without the cost of
redundant "YUV" conversions per texture tap.

A necessary prerequisite for this is to add conversion support for
VIDEO_FORMAT_I444, and that is now in place. There was already a hack in
place to cover VIDEO_FORMAT_Y800. All other "YUV" formats already have
conversion functions.

"DrawMatrix" has been removed from shaders that only supported "YUV" to
RGB conversions. It still exists in shaders that perform RGB to "YUV"
conversions, and the implementations have been sanitized accordingly.
2019-04-11 23:00:03 -07:00
James Park c4819678c9 libobs: Fix and simplify Area scale filter
It appears there's a projection flip that is applied in some situations,
like the preview pane in studio mode, and the shader math fails when
it's active causing the output color to be zero. This fixes the math for
GLSL (with a tiny redundancy penalty to HLSL), and cleans up some
unnecessary code along the way.

Use abs() to avoid zero area in case the OpenGL projection flip is
active. Also simplify the math, and remove the unnecessary sampler
state.
2019-04-04 08:39:54 -07:00
James Park 746820e35a libobs: Fix Area scale filter for GLSL
Remove const qualifiers because they are syntax errors for GLSL when
used like in C.
2019-03-23 13:16:50 -07:00
James Park 7d811499e0 Add "Area" scale filter
This new scale filter computes pixels by weighing the coverage area of
source pixels over the target pixel. This algorithm works well for both
upsampling and downsampling, but was mainly designed to upscale
high-quality low-resolution sources like RGB/HDMI retro consoles. I've
heard of people using odd workarounds like scaling up to very high
resolutions before scaling back down to preserve pixel shartpness. This
algorithm directly addresses this use-case in a much more direct
fashion.

The Area scale filter does a better job of preserving the thickness of
thin features than the Point filter.

The Area scale filter does not look at source pixels that lie outside
of the target pixel, leading to a much sharper image than Bilinear,
Bicubic, and Lanczos filters.

This filter should interpolate pixels in linear space, but OBS is not
equipped to do that at the moment.

libobs: Add GPU effect, and wire up scene serialization.

obs-filters: Add Area as an option for scale_filter.

UI: Add Area as an option for both scene items, and canvas downscaling.
2019-03-06 20:53:15 -08:00
VodBox f095cb2d0e UI: Add scene item canvas overflow to preview 2019-02-08 20:38:53 +13:00
jp9000 28d0cc8b97 libobs: Use NV12 textures when available 2019-02-07 17:00:46 -08:00
Palana db1da73647 libobs: Fix I420 shader for (width/2)%4 == 2 resolutions
For those resolutions the last two chroma samples of every other
line would be overwritten by the last chroma samples of the previous
line (depending on sampler used), producing artifacts on the left
edge of the resulting image (e.g. any color present on the right
edge of the image would "bleed" to every other line on
the left edge)
2017-09-13 16:39:36 +02:00
jp9000 7c6c7bc4c0 libobs: Add random shader
Strangely, to the "Solid" effect file.
2017-05-06 11:29:24 -07:00
jp9000 f9b5da513a libobs: Fix tex.Load lookup (needs int3, not int2)
libobs' shader language is basically HLSL, and tex.Load uses an int3 for
2D textures, with texture mipmap index for the last component.  This bug
bypassed testing because the front-end automatically switches to OpenGL
if D3D11 initialization fails, and when converted to GLSL, works fine
because texelFetch only requires two components.  This also means
there's a bug in GLSL shader conversion code, because it's essentially
ignoring the third component when it shouldn't be.
2017-05-06 10:39:42 -07:00
jp9000 e7f754df97 libobs: Use tex.Load for reverse NV12/I420 funcs
Eventually, most things should be replaced with Load where applicable
(though in some cases sub-pixel sampling is desired).

This commit also fixes a bug where NV12 async sources wouldn't render
correctly.
2017-05-06 01:24:45 -07:00
Take Vos ab3531caa9 libobs: Add optional ultrawide -> wide scaling techniques
This algorithm reduces scaling distortion on the center of the image
when scaling from ultrawide to wide.

(Jim: edited effect files to prevent an impact in performance for
standard scaling.  Now effectively generates an extra pixel shader, and
the extra code is only applied to the DrawUndistort technique, while the
original Draw technique is unaffected due to the compiler automatically
removing unused code branches via the hard-coded boolean value)

From jp9000/obs-studio#762
2017-01-30 05:59:17 -08:00
jp9000 84ce1076f1 libobs: Fix field order of retro/linear 2x shaders
The field orders of retro 2x and linear 2x deinterlace shaders were
inverted.  Note that yadif 2x does not act the same in this regard, its
field ordering is correct due to how it operates.
2016-04-24 01:21:30 -07:00
jp9000 8a9f1bc7c1 libobs: Fix discard/retro deinterlace equations 2016-04-20 20:13:49 -07:00
jp9000 96d848f3d2 libobs: Add premultiplied alpha base effect 2016-03-26 21:41:49 -07:00
sam8641 a7ce53367c libobs: Fix lanczos scaling quality issue
Closes jp9000/obs-studio#526
2016-03-24 12:35:24 -07:00
jp9000 07c644c581 libobs: Add deinterlacing API functions
Adds deinterlacing API functions.  Both standard and 2x variants are
supported.  Deinterlacing is set via obs_source_set_deinterlace_mode and
obs_source_set_deinterlace_field_order.

This was implemented in to the core itself because deinterlacing should
happen before effect filters are processed, but after async filters are
processed.  If this were added as a filter, there is the possibility
that a different filter is processed before deinterlacing, which could
mess with the result.  It was also a bit easier to implement this way
due to the fact that that deinterlacing may need to have access to the
previous async frame.

Effects were split in to separate files to reduce load time (especially
for yadif shaders which take a significant amount of time to compile).
2016-03-21 21:22:32 -07:00
jp9000 9e15e3d8fd libobs: Remove need for DrawMatrix technique in effects
(Note: This commit also modifies obs-filters and text-freetype2)

This simplifies writing of effects.  DrawMatrix is no longer necessary
because there are no sources that require drawing with a color matrix
other than async sources, and async sources are automatically processed
and don't defer their initial render stage to filters.
2016-03-21 21:22:26 -07:00
jp9000 7bc8dc3471 libobs: Add Planar444 conversion to effect 2015-04-16 22:43:46 -07:00
jp9000 6e572d849f libobs: Don't use 'output' as a keyword in shader
The bilinear lowres scale effect was using 'output' for a variable,
which is apparently a reserved keyword in GLSL on macs.  This slipped
by me due to the fact that this didn't occur with OpenGL on my windows
machine.
2015-04-10 09:58:04 -07:00
jp9000 65517ea4cf libobs: Add low resolution bilinear scale effect
This effect preserves detail of images that are scaled below half size
by using sampling 9 pixels.
2015-04-10 07:27:24 -07:00
jp9000 9b238ef71e libobs: Add obs_get_opaque_effect function
This returns a common effect useful for rendering an image with the
alpha channel overridden to 1.0.
2015-03-22 19:18:04 -07:00
jp9000 2fa37a1f2e libobs-opengl: Fix render targets being flipped
When render targets are used, they output to the render target inverted
due to the way that opengl works.  This fixes that issue by inverting
the projection matrix so that it renders the image upside down and
inverting the front face from counterclockwise to clockwise.
2015-03-22 18:38:45 -07:00
Palana 1a53c8ca66 Rename parameters to avoid GLSL keyword conflicts
Refer to https://www.opengl.org/registry/doc/GLSLangSpec.4.10.6.clean.pdf
for a list of current (reserved) keywords.

In the future the shader compiler in libobs-opengl should probably take
care of avoiding those name conflicts (bonus points for transparently
remapping the names of effect parameters)
2015-01-08 01:42:22 +01:00
jp9000 817a724dea libobs: Add NV12_Reverse shader 2014-12-21 10:14:18 -08:00
jp9000 c88220552f (API Change) libobs: Add bicubic/lanczos scaling
This adds bicubic and lanczos scaling capability to libobs to improve
scaling quality and sharpness when the output resolution has to be
scaled relative to the base resolution.  Bilinear is also available,
although bilinear has rather poor quality and causes scaling to appear
blurry.

If the output resolution is close to the base resolution, then bilinear
is used instead as an optimization, as there's no need to use these
shaders if scaling is not in use.

The Bicubic and Lanczos effects are also exposed via exported function
to allow the ability to use those shaders in plugin modules if desired.

The API change adds a variable 'scale_type' to the obs_video_info
structure that allows the user interface to choose what type of scaling
filter should be used.
2014-12-15 01:55:12 -08:00
jp9000 ca8a9fb5a7 libobs: Fix conversion shader D3D display bug
Just for a quick background: D3D's fmod intrinsic is very imprecise.
Naturally floating points aren't precise at all, and when the numbers
you're dealing with become very large, it can often be off by 0.1 or
more.

However, apparently 0.1 isn't enough of an offset to ensure a proper
value when using the fmod intrinsic and then flooring the value.  0.2
seems to fix the issue and make the image display properly.
2014-12-09 14:21:01 -08:00
Palana 0f15cc143e Add obs_get_default_rect_effect
This provides a default effect for users of GL_TEXTURE_RECTANGLE/textures
that return true for gs_texture_is_rect
2014-10-03 20:18:01 +02:00
BtbN 38c2fc87aa Move all data into the subdir it belongs to
Completely removes the build dir in favor of cmake based build layouting
2014-07-19 01:38:41 +02:00